Updated on 2025/05/10

写真a

 
Takafumi Koshinaka
 
Organization
Graduate School of Data Science Department of Data Science Professor
School of Data Science Department of Data Science
Title
Professor
Profile

平3京大・工・航空卒,平5同大大学院工学研究科修士課程了,平25東工大大学院情報理工学研究科博士課程了,博士(工学).平5 NEC入社,平18同社主任研究員,平25同社主幹研究員.平29人工知能学会理事,令1京大大学院情報学研究科非常勤講師.令2より横浜市立大学データサイエンス学部教授.パターン認識,信号処理,機械学習の研究に興味をもつ.

External link

Degree

  • Doctor of Engineering ( 2013.3   Tokyo Institute of Technology )

Research Interests

  • Natural language processing

  • Automatic speech recognition

  • Deep learning

  • Signal processing

  • Artificial intelligence

  • Pattern recognition

  • Machine learning

Research Areas

  • Informatics / Intelligent robotics

  • Informatics / Perceptual information processing

  • Informatics / Intelligent informatics

Education

  • Tokyo Institute of Technology   Graduate School of Information Science and Engineering   Department of Computer Science

    2009.10 - 2013.3

      More details

    Country: Japan

    researchmap

  • Kyoto University   Graduate School of Engineering   Department of Aeronautics

    1991.4 - 1993.3

      More details

    Country: Japan

    researchmap

  • Kyoto University   Faculty of Engineering   Department of Aeronautics

    1987.4 - 1991.3

      More details

    Country: Japan

    researchmap

Research History

  • Yokohama City University   School of Data Science   Professor

    2020.9

      More details

    Country:Japan

    researchmap

  • NEC Corporation   Biometrics Research Laboratories   Senior Principal Researcher

    2018.3 - 2020.8

      More details

    Country:Japan

    researchmap

  • NEC Corporation   Data Science Research Laboratories   Senior Principal Researcher

    2016.4 - 2018.3

      More details

    Country:Japan

    researchmap

  • NEC Corporation   Information and Media Processing Laboratories   Senior Principal Researcher

    2015.4 - 2018.3

      More details

    Country:Japan

    researchmap

  • NEC Corporation   Information and Media Processing Laboratories   Principal Researcher

    2010.4 - 2013.3

      More details

    Country:Japan

    researchmap

  • NEC Corporation   Common Platform Software Laboratories   Principal Researcher

    2007.4 - 2010.3

      More details

    Country:Japan

    researchmap

  • NEC Corporation   Media and Information Research Laboratories   Principal Researcher

    2006.4 - 2007.3

      More details

    Country:Japan

    researchmap

▼display all

Professional Memberships

  • The Association for Natural Language Processing

    2021.2

      More details

  • The Japanese Society for Artificial Intelligence

    2017.10

      More details

  • IEEE

    2013.3

      More details

  • 日本音響学会

    2004.12

      More details

  • 電子情報通信学会

    1993.6

      More details

Committee Memberships

  • ISO/IEC JTC1/SC29 WG1 JP   Expert  

    2021.5   

      More details

    Committee type:Academic society

    researchmap

  • IEEE BigData2022 Organizing Committee   Local Arrangement Co-chair  

    2020.12 - 2022.12   

      More details

    Committee type:Academic society

    researchmap

  • The Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA 2021)   Sponsorship Co-chair  

    2019.12 - 2021.12   

      More details

    Committee type:Academic society

    researchmap

  • The Japanese Society for Artificial Intelligence   Repsesentative  

    2019.6   

      More details

    Committee type:Academic society

    researchmap

  • The Speaker and Language Recognition Workshop (Odyssey 2020)   General Co-chair  

    2018.6 - 2020.11   

      More details

    Committee type:Academic society

    researchmap

  • The Japanese Society for Artificial Intelligence   Board Member  

    2017.6 - 2019.6   

      More details

    Committee type:Academic society

    researchmap

  • Industrial Membership Committee, Asia-Pacific Signal and Information Processing Association (APSIPA)   Committee Member  

    2016.6 - 2018.6   

      More details

    Committee type:Academic society

    researchmap

  • 電子情報通信学会音声研究専門委員会   研究専門委員  

    2013.5 - 2017.4   

      More details

    Committee type:Academic society

    researchmap

▼display all

Papers

  • An Experimental Study on Text-independent Speaker Verification for Forensic Applications

    Shigeki Ozawa, Akira Gotoh, Yuko Saito, Hiroki Matsuura, Takafumi Koshinaka

    124 ( 391 )   34 - 39   2025.3

     More details

    Authorship:Last author, Corresponding author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 検索エンジンを指向したLLMのアラインメント

    益子怜, 木村賢, 越仲孝文

    言語処理学会第31回年次大会   2025.3

     More details

    Authorship:Last author, Corresponding author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Reading is Believing: Revisiting Language Bottleneck Models for Image Classification Reviewed

    Honori Udo, Takafumi Koshinaka

    2024 IEEE International Conference on Image Processing (ICIP)   943 - 949   2024.10

     More details

    Authorship:Last author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icip51287.2024.10648091

    DOI: 10.60864/n50t-ax16

    researchmap

  • Editable Virtual Try-On Using Text Prompts

    Kosuke Takemoto, Koshinaka Takafumi

    2024.5

     More details

    Authorship:Last author, Corresponding author   Publishing type:Research paper (conference, symposium, etc.)  

    DOI: 10.11517/pjsai.JSAI2024.0_2C1GS702

    researchmap

  • LLM生成コンテンツのSEO観点での品質評価

    益子怜, 木村賢, 越仲孝文

    言語処理学会年次大会発表論文集(Web)   30th   2024

     More details

    Authorship:Last author, Corresponding author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    J-GLOBAL

    researchmap

  • Generalized Domain Adaptation Framework for Parametric Back-End in Speaker Recognition Reviewed

    Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

    IEEE Transactions on Information Forensics and Security   18   3936 - 3947   2023.6

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:Institute of Electrical and Electronics Engineers (IEEE)  

    DOI: 10.1109/tifs.2023.3287733

    researchmap

  • Image Captioners Tell More Than Images Given to Them

    有働帆乃璃, 越仲孝文

    人工知能学会全国大会論文集(Web)   37th   2023.6

     More details

    Authorship:Last author, Corresponding author   Language:Japanese  

    J-GLOBAL

    researchmap

  • Response Generation to Low-Rated Reviews Combined with Sentiment Analysis

    益子怜, 越仲孝文

    人工知能学会全国大会論文集(Web)   37th   2023.6

     More details

    Authorship:Last author, Corresponding author   Language:Japanese  

    J-GLOBAL

    researchmap

  • Analysis of Consumers’ Feedback on a Japanese EC Site Focusing on the Relation between Review Text and Rating

    小林義幸, 越仲孝文

    人工知能学会全国大会論文集(Web)   36th   2022.6

     More details

    Authorship:Last author, Corresponding author   Language:Japanese  

    DOI: 10.11517/pjsai.JSAI2022.0_1P5GS602

    J-GLOBAL

    researchmap

  • Task-aware Warping Factors in Mask-based Speech Enhancement Reviewed

    Qiongqiong Wang, Kong Aik Lee, Takafumi Koshinaka, Koji Okabe, Hitoshi Yamamoto

    European Signal Processing Conference (EUSIPCO 2021)   2021.8

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Xi-Vector Embedding for Speaker Recognition Reviewed

    Kong Aik Lee, Qiongqiong Wang, Takafumi Koshinaka

    IEEE Signal Processing Letters   28   1385 - 1389   2021.7

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:Institute of Electrical and Electronics Engineers ({IEEE})  

    DOI: 10.1109/LSP.2021.3091932

    researchmap

  • Using Multi-Resolution Feature Maps with Convolutional Neural Networks for Anti-Spoofing in ASV Reviewed

    Qiongqiong Wang, Kong Aik Lee, Takafumi Koshinaka

    Odyssey 2020 The Speaker and Language Recognition Workshop   2020.5

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/odyssey.2020-20

    researchmap

  • NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition. Reviewed

    Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda

    Computer Speech and Language   61   101033 - 101033   2020.5

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1016/j.csl.2019.101033

    researchmap

  • A Generalized Framework for Domain Adaptation of PLDA in Speaker Recognition Reviewed

    Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)   2020.5

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icassp40776.2020.9054113

    researchmap

  • Study on comparison of individuality of ear canal shape

    Riki Kimura, Shohei Yano, Rui Fujitsuka, Naoki Wakui, Takayuki Arakawa, Takafumi Koshinaka

    148th Audio Engineering Society International Convention   2020

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Audio Engineering Society  

    Ear acoustic authentication, a type of biometric authentication, uses the acoustic characteristics of the ear canal as a feature. Because ear acoustic authentication acquires features using earphones, the process of authentication is easy, and the method has attracted much attention recently. However, the mechanism of the acoustic characteristics of the ear canal has not been sufficiently studied. In this study, we verified two methods, the image matching method and Slicing method. In conclusion, Slicing method was found to outperform the image matching method, based on the results of this study.

    Scopus

    researchmap

  • NEC-TT speaker verification system for SRE'19 CTS challenge

    Kong Aik Lee, Koji Okabe, Hitoshi Yamamoto, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Keisuke Ishikawa, Koichi Shinoda

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH   2020-   2227 - 2231   2020

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:International Speech Communication Association  

    The series of speaker recognition evaluations (SREs) organized by the National Institute of Standards and Technology (NIST) is widely accepted as the de facto benchmark for speaker recognition technology. This paper describes the NEC-TT speaker verification system developed for the recent SRE'19 CTS Challenge. Our system is based on an x-vector embedding front-end followed by a thin scoring back-end. We trained a very-deep neural network for x-vector extraction by incorporating residual connections, squeeze-and-excitation networks, and angular-margin softmax at the output layer. We enhanced the back-end with a tandem approach leveraging the benefit of supervised and unsupervised domain adaptation. We obtained over 30% relative reduction in error rate with each of these enhancements at the front-end and back-end, respectively.

    DOI: 10.21437/Interspeech.2020-1132

    Scopus

    researchmap

  • Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding Reviewed

    Hitoshi Yamamoto, Kong Aik Lee, Koji Okabe, Takafumi Koshinaka

    Interspeech 2019   2019.9

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/interspeech.2019-1508

    researchmap

  • The NEC-TT 2018 Speaker Verification System Reviewed

    Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda

    Interspeech 2019   2019.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/interspeech.2019-1517

    researchmap

  • Unleashing the Unused Potential of i-Vectors Enabled by GPU Acceleration Reviewed

    Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka

    Interspeech 2019   2019.9

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/interspeech.2019-1955

    researchmap

  • The CORAL+ Algorithm for Unsupervised Domain Adaptation of PLDA Reviewed

    Kong Aik Lee, Qiongqiong Wang, Takafumi Koshinaka

    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)   2019.5

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icassp.2019.8682852

    researchmap

  • Feature selection and its evaluation in binaural ear acoustic authentication

    Masaki Yasuhara, Shohei Yano, Takayuki Arakawa, Takafumi Koshinaka

    AES 146th International Convention   2019

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Audio Engineering Society  

    A type of biometric authentication is ear acoustic authentication, which uses the ear canal transfer characteristic, showing the acoustic characteristics of the ear canal. In ear acoustic authentication, biological information can be acquired from both ears. However, extant literature on an accuracy improvement method using binaural features is inadequate. In this study, we experimentally determine a feature that represents the difference between each user to perform highly accurate authentication. Feature selection was performed by changing the combination of binaural features, and it was evaluated using the ratio of between-class variance and within-class variance and the Equal Error Ratio (EER). As a result, a method that concatenates the features of both ears has the highest performance.

    Scopus

    researchmap

  • Attention Mechanism in Speaker Recognition: What Does it Learn in Deep Speaker Embedding? Reviewed

    Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Hitoshi Yamamoto, Takafumi Koshinaka

    2018 IEEE Spoken Language Technology Workshop (SLT)   2018.12

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/slt.2018.8639586

    researchmap

  • Ear Acoustic Biometrics Using Inaudible Signals and Its Application to Continuous User Authentication Reviewed

    Shivangi Mahto, Takayuki Arakawa, Takafumi Koshinaka

    2018 26th European Signal Processing Conference (EUSIPCO)   2018.9

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.23919/eusipco.2018.8553015

    researchmap

  • Attentive Statistics Pooling for Deep Speaker Embedding Reviewed

    Koji Okabe, Takafumi Koshinaka, Koichi Shinoda

    Interspeech 2018   2018.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/interspeech.2018-993

    researchmap

  • DNN Based Speaker Embedding Using Content Information for Text-Dependent Speaker Verification Reviewed

    Subhadeep Dey, Takafumi Koshinaka, Petr Motlicek, Srikanth Madikeri

    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)   2018.4

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icassp.2018.8461389

    researchmap

  • Robust i-vector extraction tightly coupled with voice activity detection using deep neural networks Reviewed

    Hitoshi Yamamoto, Koji Okabe, Takafumi Koshinaka

    2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)   2017.12

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/apsipa.2017.8282114

    researchmap

  • Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification Reviewed

    Qiongqiong Wang, Takafumi Koshinaka

    Interspeech 2017   2017.8

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/interspeech.2017-727

    researchmap

  • i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition Reviewed

    Shivangi Mahto, Hitoshi Yamamoto, Takafumi Koshinaka

    Interspeech 2017   2017.8

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/interspeech.2017-731

    researchmap

  • 誤差の周波数拡散と加算平均処理による耳音紋認証の精度向上 Reviewed

    矢野 昌平, 荒川 隆行, 越仲 孝文, 今岡 仁, 入澤 英毅

    信学論A   J100-A ( 4 )   161 - 168   2017.4

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    researchmap

  • Fast and accurate personal authentication using ear acoustics Reviewed

    Takayuki Arakawa, Takafumi Koshinaka, Shohei Yano, Hideki Irisawa, Ryoji Miyahara, Hitoshi Imaoka

    2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)   2016.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/apsipa.2016.7820886

    researchmap

  • Domain adaptation using maximum likelihood linear transformation for PLDA-based speaker verification Reviewed

    Qiongqiong Wang, Hitoshi Yamamoto, Takafumi Koshinaka

    2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)   2016.3

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icassp.2016.7472651

    researchmap

  • Denoising autoencoder-based speaker feature restoration for utterances of short duration Reviewed

    Hitoshi Yamamoto, Takafumi Koshinaka

    Interspeech 2015   2015.9

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/interspeech.2017-731

    researchmap

  • Speech/acoustic analysis technology - Its application in support of public solutions

    Takafumi Koshinaka, Osamu Hoshuyama, Yoshifumi Onishi, Ryosuke Isotani, Masahiro Tani

    NEC Technical Journal   9 ( 1 )   82 - 85   2015.1

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:NEC Mediaproducts  

    The advent of the age of big data has further raised interest in the need to extract useful information from the huge amount of data that accumulates in the course of our everyday lives. This may be facilitated by high speed and low cost data analysis solutions. These technologies that process the speech/acoustic information that forms the critical component of real-world information are also becoming more important for understanding the context of the analyzed data. They are expected to be employed for public solutions that will support the safety, security, efficiency and equality of society. This paper introduces an innovative technology designed to extract meaningful information from speech/acoustic media and goes on to discuss its application in public solutions.

    Scopus

    researchmap

  • Anomaly detection of motors with feature emphasis using only normal sounds Reviewed

    Yumi Ono, Yoshifumi Onishi, Takafumi Koshinaka, Soichiro Takata, Osamu Hoshuyama

    2013 IEEE International Conference on Acoustics, Speech and Signal Processing   2013.5

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icassp.2013.6638167

    researchmap

  • A study on semantic indexing for spoken document retrieval Reviewed

    Takafumi Koshinaka

    Tokyo Institute of Technology   ( 甲第9187号 )   2013.3

     More details

    Authorship:Lead author   Language:Japanese   Publishing type:Doctoral thesis  

    researchmap

  • A noise-robust speech recognition method composed of weak noise suppression and weak Vector Taylor Series Adaptation Reviewed

    Shuji Komeiji, Takayuki Arakawa, Takafumi Koshinaka

    2012 IEEE Spoken Language Technology Workshop (SLT)   2012.12

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/slt.2012.6424205

    researchmap

  • Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model Reviewed

    Takafumi KOSHINAKA, Kentaro NAGATOMO, Koichi SHINODA

    IEICE Transactions on Information and Systems   E95.D ( 10 )   2469 - 2478   2012.10

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:Institute of Electronics, Information and Communications Engineers (IEICE)  

    A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.

    DOI: 10.1587/transinf.e95.d.2469

    CiNii Books

    researchmap

  • Efficient Estimation Method of Scaling Factors among Probabilistic Models in Speech Recognition Reviewed

    ONISHI Yoshifumi, EMORI Tadashi, KOSHINAKA Takafumi, SHINODA Koichi

    The IEICE transactions on information and systems (Japanese edetion)   J95-D ( 5 )   1276 - 1285   2012.5

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Books

    researchmap

  • Committee-Based Active Learning for Speech Recognition Reviewed

    Yuzo HAMANAKA, Koichi SHINODA, Takuya TSUTAOKA, Sadaoki FURUI, Tadashi EMORI, Takafumi KOSHINAKA

    IEICE Transactions on Information and Systems   E94-D ( 10 )   2015 - 2023   2011.11

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:Institute of Electronics, Information and Communications Engineers (IEICE)  

    We propose a committee-based method of active learning for large vocabulary continuous speech recognition. Multiple recognizers are trained in this approach, and the recognition results obtained from these are used for selecting utterances. Those utterances whose recognition results differ the most among recognizers are selected and transcribed. Progressive alignment and voting entropy are used to measure the degree of disagreement among recognizers on the recognition result. Our method was evaluated by using 191-hour speech data in the Corpus of Spontaneous Japanese. It proved to be significantly better than random selection. It only required 63h of data to achieve a word accuracy of 74%, while standard training (i.e., random selection) required 103h of data. It also proved to be significantly better than conventional uncertainty sampling using word posterior probabilities.

    DOI: 10.1587/transinf.e94.d.2015

    CiNii Books

    researchmap

  • Speech modeling based on committee-based active learning Reviewed

    Yuzo Hamanaka, Koichi Shinoda, Sadaoki Furui, Tadashi Emori, Takafumi Koshinaka

    2010 IEEE International Conference on Acoustics, Speech and Signal Processing   2010.3

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icassp.2010.5495650

    researchmap

  • Online speaker clustering using incremental learning of an ergodic hidden Markov model Reviewed

    Takafumi Koshinaka, Kentaro Nagatomo, Koichi Shinoda

    2009 IEEE International Conference on Acoustics, Speech and Signal Processing   2009.4

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icassp.2009.4960528

    researchmap

  • Open-vocabulary spoken-document retrieval based on query expansion using related web documents Reviewed

    Makoto Terao, Takafumi Koshinaka, Shinichi Ando, Ryosuke Isotani, Akitoshi Okumura

    Interspeech 2008   2008.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA  

    DOI: 10.21437/interspeech.2017-727

    researchmap

  • HMM-based text segmentation using variational Bayes learning and its application to audio-visual indexing

    Takafumi Koshinaka, Akitoshi Okumura, Ryosuke Isotani

    Electronics and Communications in Japan (Part II: Electronics)   90 ( 12 )   1 - 11   2007.12

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:Wiley  

    DOI: 10.1002/ecjb.20421

    researchmap

  • An HMM-Based Text Segmentation Method Using Variational Bayes Inference and Its Application to Audio-Visual Indexing Reviewed

    KOSHINAKA Takafumi, OKUMURA Akitoshi, ISOTANI Ryosuke

    The IEICE transactions on information and systems   J89-D ( 9 )   2113 - 2122   2006.9

     More details

    Authorship:Lead author   Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:The Institute of Electronics, Information and Communication Engineers  

    CiNii Books

    researchmap

  • An HMM-based Text Segmentation Method Using Variational Bayes Approach and Its Application to LVCSR for Broadcast News Reviewed

    Takafumi Koshinaka, Ken-ichi Iso, Akitoshi Okumura

    Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.   2005.3

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/icassp.2005.1415156

    researchmap

  • A Stochastic Model for Handwritten Word Recognition Using Context Dependency Between Character Patterns Reviewed

    Takafumi Koshinaka, Daisuke Nishiwaki, Keiji Yamada

    The 6th International Conference on Document Analysis and Recognition (ICDAR 2001)   2001.9

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Pressure waves in a separated gas-liquid layer in a horizontal duct with a step Reviewed

    Takafumi Koshinaka, Shigeki Morioka

    Fluid Dynamics Research   12 ( 6 )   323 - 333   1993.12

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:IOP Publishing  

    DOI: 10.1016/0169-5983(93)90034-8

    researchmap

▼display all

MISC

  • Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification

    Honori Udo, Takafumi Koshinaka

    2024.6

     More details

    We revisit language bottleneck models as an approach to ensuring the
    explainability of deep learning models for image classification. Because of
    inevitable information loss incurred in the step of converting images into
    language, the accuracy of language bottleneck models is considered to be
    inferior to that of standard black-box models. Recent image captioners based on
    large-scale foundation models of Vision and Language, however, have the ability
    to accurately describe images in verbal detail to a degree that was previously
    believed to not be realistically possible. In a task of disaster image
    classification, we experimentally show that a language bottleneck model that
    combines a modern image captioner with a pre-trained language model can achieve
    image classification accuracy that exceeds that of black-box models. We also
    demonstrate that a language bottleneck model and a black-box model may be
    thought to extract different features from images and that fusing the two can
    create a synergistic effect, resulting in even higher classification accuracy.

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/2406.15816v1

  • Generalized domain adaptation framework for parametric back-end in speaker recognition

    Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

    2023.5

     More details

    State-of-the-art speaker recognition systems comprise a speaker embedding
    front-end followed by a probabilistic linear discriminant analysis (PLDA)
    back-end. The effectiveness of these components relies on the availability of a
    large amount of labeled training data. In practice, it is common for domains
    (e.g., language, channel, demographic) in which a system is deployed to differ
    from that in which a system has been trained. To close the resulting gap,
    domain adaptation is often essential for PLDA models. Among two of its variants
    are Heavy-tailed PLDA (HT-PLDA) and Gaussian PLDA (G-PLDA). Though the former
    better fits real feature spaces than does the latter, its popularity has been
    severely limited by its computational complexity and, especially, by the
    difficulty, it presents in domain adaptation, which results from its
    non-Gaussian property. Various domain adaptation methods have been proposed for
    G-PLDA. This paper proposes a generalized framework for domain adaptation that
    can be applied to both of the above variants of PLDA for speaker recognition.
    It not only includes several existing supervised and unsupervised domain
    adaptation methods but also makes possible more flexible usage of available
    data in different domains. In particular, we introduce here two new techniques:
    (1) correlation-alignment in the model level, and (2) covariance
    regularization. To the best of our knowledge, this is the first proposed
    application of such techniques for domain adaptation w.r.t. HT-PLDA. The
    efficacy of the proposed techniques has been experimentally validated on NIST
    2016, 2018, and 2019 Speaker Recognition Evaluation (SRE'16, SRE'18, and
    SRE'19) datasets.

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/2305.15567v1

  • Image Captioners Sometimes Tell More Than Images They See

    Honori Udo, Takafumi Koshinaka

    2023.5

     More details

    Image captioning, a.k.a. "image-to-text," which generates descriptive text
    from given images, has been rapidly developing throughout the era of deep
    learning. To what extent is the information in the original image preserved in
    the descriptive text generated by an image captioner? To answer that question,
    we have performed experiments involving the classification of images from
    descriptive text alone, without referring to the images at all, and compared
    results with those from standard image-based classifiers. We have evaluate
    several image captioning models with respect to a disaster image classification
    task, CrisisNLP, and show that descriptive text classifiers can sometimes
    achieve higher accuracy than standard image-based classifiers. Further, we show
    that fusing an image-based classifier with a descriptive text classifier can
    provide improvement in accuracy.

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/2305.02932v2

  • 国際会議 Odyssey 2020 開催報告 Invited

    越仲 孝文, リー コンエイク, 篠田 浩一

    電子情報通信学会 情報・システムソサイエティ誌   26 ( 2 )   23 - 24   2021.8

     More details

    Authorship:Lead author   Language:Japanese   Publishing type:Meeting report  

    researchmap

  • Linear Discriminant Analysis Considering Worst-case Variance Ratio and Its Application to Ear Acoustic Authentication

    伊藤良峻, 越仲孝文

    日本音響学会研究発表会講演論文集(CD-ROM)   2020   2020

  • I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Chenglin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans

    2019.4

     More details

    The I4U consortium was established to facilitate a joint entry to NIST
    speaker recognition evaluations (SRE). The latest edition of such joint
    submission was in SRE 2018, in which the I4U submission was among the
    best-performing systems. SRE'18 also marks the 10-year anniversary of I4U
    consortium into NIST SRE series of evaluation. The primary objective of the
    current paper is to summarize the results and lessons learned based on the
    twelve sub-systems and their fusion submitted to SRE'18. It is also our
    intention to present a shared view on the advancements, progresses, and major
    paradigm shifts that we have witnessed as an SRE participant in the past decade
    from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm
    shift from supervector representation to deep speaker embedding, and a switch
    of research challenge from channel compensation to domain adaptation.

    arXiv

    researchmap

    Other Link: http://arxiv.org/pdf/1904.07386v1

  • 声認証技術がもたらす安全・安心で便利な社会 (バイオメトリクスを用いた社会価値創造特集) Invited

    越仲 孝文, リー コンエイク

    NEC技報   71 ( 2 )   2019.3

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese   Publishing type:Internal/External technical report, pre-print, etc.  

    researchmap

  • 人間の耳には聴こえない音で個人を識別する耳音響認証技術 Invited

    荒川 隆行, 越仲 孝文

    月刊自動認識   2019.3

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Article, review, commentary, editorial, etc. (trade magazine, newspaper, online media)  

    researchmap

  • A study of observation fluctuation reduction method for car acoustic authentication

    安原雅貴, 荒川隆行, 越仲孝文, 矢野昌平

    人工知能学会全国大会論文集(Web)   33rd   2019

  • 話者クラスタリングを用いた話者照合手法のNIST SRE18における比較評価

    GUO Ling, 山本仁, 岡部浩司, 越仲孝文

    日本音響学会研究発表会講演論文集(CD-ROM)   2019   2019

  • 単一話者検出に最適化した話者クラスタリングを用いる話者照合

    GUO Ling, 山本仁, 越仲孝文

    日本音響学会研究発表会講演論文集(CD-ROM)   2019   2019

  • 複数の話者が混在する環境下のスコア統合に基づく話者照合

    GUO Ling, 山本仁, LEE Kong Aik, 越仲孝文

    日本音響学会研究発表会講演論文集(CD-ROM)   2018   2018

  • PROMISING TECHNOLOGY : Technique of ear acoustic authentication Invited

    37 ( 439 )   18 - 22   2017.10

     More details

    Language:Japanese   Publishing type:Article, review, commentary, editorial, etc. (trade magazine, newspaper, online media)  

    CiNii Books

    researchmap

  • ヒアラブル技術によるヒューマン系IoTソリューションの取り組みと展望 (デジタルビジネスを支えるIoT特集) Invited

    古谷 聡, 越仲 孝文, 大杉 孝司

    NEC技報   70 ( 1 )   47 - 51   2017.9

     More details

    Language:Japanese   Publishing type:Internal/External technical report, pre-print, etc.   Publisher:日本電気  

    CiNii Books

    researchmap

  • 外耳道音響特性を用いた高精度個人認証

    荒川隆行, 矢野昌平, 越仲孝文, 入澤英毅, 今岡仁

    日本音響学会研究発表会講演論文集(CD-ROM)   2016   2016

  • i-vectorの重み付き次元圧縮と区分回帰による年齢推定

    児島一郁, 山本仁, 越仲孝文

    日本音響学会研究発表会講演論文集(CD-ROM)   2016   2016

  • 音声・音響分析技術とパブリックソリューションへの応用 (社会の安全・安心を支えるパブリックソリューション特集) Invited

    越仲 孝文, 宝珠山 治, 大西 祥史, 磯谷 亮介, 谷 真宏

    NEC技報   67 ( 1 )   86 - 89   2014.11

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese   Publishing type:Internal/External technical report, pre-print, etc.   Publisher:日本電気  

    CiNii Books

    researchmap

  • 正常音スペクトルモデルに基づく機器異常検知方式における特徴量強調の効果

    小野友督, 宝珠山治, 大西祥史, 越仲孝文

    日本音響学会研究発表会講演論文集(CD-ROM)   2014   2014

  • 話者認識の国際動向 (小特集: 話者認識に関する研究の動向) Invited Reviewed

    越仲 孝文, 篠田 浩一

    日本音響学会誌   69 ( 7 )   2013.7

     More details

    Authorship:Lead author   Language:Japanese   Publishing type:Article, review, commentary, editorial, etc. (scientific journal)  

    researchmap

  • GMM-SVMによるテキスト非依存話者識別

    谷真宏, 大西祥史, 越仲孝文

    日本音響学会研究発表会講演論文集(CD-ROM)   2013   2013

  • Current situations and issues of speaker recognition technologies

    網野加苗, 石原俊一, 小川哲司, 長内隆, 黒岩眞吾, 越仲孝文, 篠田浩一, 柘植覚, 西田昌史, 松井知子, WANG Longbiao

    電子情報通信学会技術研究報告   112 ( 450(SP2012 115-131) )   2013

  • 正常音の知識のみを利用した機器の異常検知

    小野友督, 大西祥史, 越仲孝文, 高田宗一朗

    日本音響学会研究発表会講演論文集(CD-ROM)   2012   2012

  • 音声・映像情報の構造化と検索 (小特集: 音声・映像認識連携への取り組み) Invited Reviewed

    越仲 孝文, 大網 亮磨, 細見 格, 今岡 仁

    情報処理   52 ( 1 )   2011.10

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese   Publishing type:Article, review, commentary, editorial, etc. (scientific journal)  

    researchmap

  • 雑音抑圧法とモデル適応法の重み付き組み合わせに基づく耐雑音音声認識手法

    古明地秀治, 荒川隆行, 越仲孝文

    日本音響学会研究発表会講演論文集(CD-ROM)   2011   2011

  • 複数マイクロフォンを用いた音声区間検出

    大西祥史, 越仲孝文, 篠田浩一

    日本音響学会研究発表会講演論文集(CD-ROM)   2011   2011

  • A hybrid method of noise suppression and model adaptation for robust speech recognition

    IEICE technical report   110 ( 356 )   49 - 54   2010.12

     More details

    Language:Japanese  

    researchmap

  • A Hybrid method of Noise Suppression and Model Adaptation for Robust Speech Recognition

    古明地秀治, 荒川隆行, 越仲孝文

    電子情報通信学会技術研究報告   110 ( 357(SP2010 88-102) )   49 - 54   2010.12

     More details

    Language:Japanese  

    J-GLOBAL

    researchmap

  • A Hybrid method of Noise Suppression and Model Adaptation for Robust Speech Recognition

    2010 ( 9 )   1 - 6   2010.12

     More details

  • 裁判員裁判向け音声認識システム (音声認識ソリューション・製品特集) Invited

    越仲 孝文, 江森 正, 大西 祥史

    NEC技報   63 ( 1 )   41 - 90   2010.2

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese   Publishing type:Internal/External technical report, pre-print, etc.   Publisher:日本電気  

    CiNii Books

    researchmap

  • オンライン話者クラスタリング技術と議事録作成支援への応用 (音声認識ソリューション・製品特集) Invited

    越仲 孝文, 長友 健太郎

    NEC技報   63 ( 1 )   84 - 87   2010.2

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese   Publishing type:Internal/External technical report, pre-print, etc.   Publisher:日本電気  

    CiNii Books

    researchmap

  • 法廷における音声認識システムの開発-音響モデル及び言語モデル-

    谷真宏, 北出祐, 江森正, 大西祥史, 越仲孝文, 佐藤研治

    日本音響学会研究発表会講演論文集(CD-ROM)   2010   2010

  • 法廷における音声認識システムの開発-システム概要-

    越仲孝文, 江森正, 大西祥史, 北出祐, 谷真宏, 佐藤研治

    日本音響学会研究発表会講演論文集(CD-ROM)   2010   2010

  • 法廷における音声認識システムの開発-オンライン話者適応の構成-

    大西祥史, 江森正, 谷真宏, 北出祐, 長友健太郎, 越仲孝文, 佐藤研治

    日本音響学会研究発表会講演論文集(CD-ROM)   2010   2010

  • 法廷における音声認識システムの開発-閲覧性向上のための諸技術の開発-

    北出祐, 大西祥史, 江森正, 谷真宏, 越仲孝文, 佐藤研治

    日本音響学会研究発表会講演論文集(CD-ROM)   2010   2010

  • 法廷における音声認識システムの開発-複数マイクロフォンを用いた音声検出-

    江森正, 辻川剛範, 大西祥史, 越仲孝文, 谷真宏, 北出祐, 佐藤研治

    日本音響学会研究発表会講演論文集(CD-ROM)   2010   2010

  • Active learning using multiple recognizers for speech recognition

    濱中悠三, 江森正, 越仲孝文, 越仲孝文, 篠田浩一, 古井貞煕

    電子情報通信学会技術研究報告   109 ( 355(NLC2009 12-32) )   19 - 23   2009.12

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    J-GLOBAL

    researchmap

  • Active learning using multiple recognizers for speech recognition

    HAMANAKA YUZO, EMORI TADASHI, KOSHINAKA TAKAFUMI, SHINODA KOICHI, FURUI SADAOKI

    2009 ( 4 )   1 - 5   2009.12

     More details

  • Online speaker clustering using an ergodic HMM and its application to meeting minute generation

    越仲孝文, 長友健太郎, 佐藤研治

    電子情報通信学会技術研究報告   3rd ( 376(MVE2009 79-129) )   53 - 58   2009

     More details

  • 音声認識のためのコミッティを用いた能動学習

    濱中悠三, 江森正, 越仲孝文, 越仲孝文, 篠田浩一, 古井貞熙

    日本音響学会研究発表会講演論文集(CD-ROM)   2009   2009

  • エルゴードHMMのインクリメンタル学習によるオンライン話者クラスタリング

    越仲孝文, 長友健太郎, 佐藤研治

    日本音響学会研究発表会講演論文集(CD-ROM)   2008   2008

  • Speaker Selection for Unsupervised Speaker Adaptation based on HMM Sufficient Statistics

    谷真宏, 江森正, 大西祥史, 越仲孝文, 篠田浩一

    情報処理学会研究報告   2007 ( 129(SLP-69) )   85 - 89   2007.12

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    We propose a new speaker selection method for the unsupervised speaker adaptation based on HMM sufficient statistics. The adaptation technique of using HMM sufficient statistics has been proposed as one of the rapid unsupervised speaker adaptation techniques in speech recognition. The procedure is as follows:First the training speakers acoustically close to the test speaker are selected. Then, the acoustic model is trained using the HMM sufficient statistics of these selected training speakers. In this technique, the number of selected training speakers is always constant.In our proposed speaker selection method, the number of speakers is determined by the distances between the test speaker and each training speaker. In our recognition experiments using spoken dialogue data, the proposed method improved word accuracy by 0.74 points. It was confirmed that the proposed method particularly effective when there are not many training speakers around the test speaker in acoustic space.

    CiNii Books

    J-GLOBAL

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00056768/

  • Speaker Selection for Unsupervised Speaker Adaptation based on HMM Sufficient Statistics

    TANI Masahiro, EMORI Tadashi, OHNISHI Yoshifumi, KOSHINAKA Takafumi, SHINODA Koichi

    IEICE technical report   107 ( 406 )   85 - 89   2007.12

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    We propose a new speaker selection method for the unsupervised speaker adaptation based on HMM sufficient statistics. The adaptation technique of using HMM sufficient statistics has been proposed as one of the rapid unsupervised speaker adaptation techniques in speech recognition. The procedure is as follows: First the training speakers acoustically close to the test speaker are selected. Then, the acoustic model is trained using the HMM sufficient statistics of these selected training speakers. In this technique, the number of selected training speakers is always constant. In our proposed speaker selection method, the number of speakers is determined by the distances between the test speaker and each training speaker. In our recognition experiments using spoken dialogue data, the proposed method improved word accuracy by 0.74 points. It was confirmed that the proposed method particularly effective when there are not many training speakers around the test speaker in acoustic space.

    CiNii Books

    researchmap

  • WEB文書を活用したニュース映像検索システム

    寺尾真, 越仲孝文, 安藤真一, 磯谷亮輔, 奥村明俊

    音声ドキュメント処理ワークショップ講演論文集   1st   2007

  • G_010 An audio-visual information retrieval system using related text documents

    TERAO Makoto, KOSHINAKA Takafumi, ANDO Shinichi, ISOTANI Ryosuke, OKUMURA Akitoshi

    FIT 2006 ( 2 )   373 - 374   2006.8

     More details

    Language:Japanese   Publisher:Forum on Information Technology  

    CiNii Books

    J-GLOBAL

    researchmap

  • 話し言葉における発話速度を隠れ変数にもつ継続時間長モデル

    越仲孝文

    日本音響学会研究発表会講演論文集   2005   2005

  • An HMM - based text segmentation method using variational Bayes approach

    KOSHINAKA Takafumi, ISO Ken-ichi, OKUMURA Akitoshi

    IPSJ SIG Notes   2004 ( 57 )   49 - 54   2004.5

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    This paper presents a new text segmentation method based on stochastic modeling. When supposing a generative model of a text document to be a discrete left-to-right hidden Markov model (HMM), a transition between topics in the text document corresponds to a state transition in the HMM, and text segmentation can be formulated as model parameter estimation using the text document. Compared to the traditional maximum likelihood approach, advantage of the Bayes approach (Variational Bayes) is shown by some experiments, which evaluate segmentation accuracy in segmenting Japanese broadcast news programs into each news article. Comparison between the proposed method and a conventional method, well-known Hearst's method, is also presented in this paper. The comparison shows the proposed method to be encouraging.

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00057136/

  • An HMM-based text segmentation method using variational Bayes approach

    越仲孝文, 磯健一, 奥村明俊

    電子情報通信学会技術研究報告   104 ( 87(SP2004 15-18) )   19 - 24   2004.5

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    This paper presents a new text segmentation method based on stochastic modeling. When supposing a generative model of a text document to be a discrete left-to-right hidden Markov model (HMM), a transition between topics in the text document corresponds to a state transition in the HMM, and text segmentation can be formulated as model parameter estimation using the text document. Compared to the traditional maximum likelihood approach, advantage of the Bayes approach (Variational Bayes) is shown by some experiments, which evaluate segmentation accuracy in segmenting Japanese broadcast news programs into each news article. Comparison between the proposed method and a conventional method, well-known Hearst's method, is also presented in this paper. The comparison shows the proposed method to be encouraging.

    CiNii Books

    J-GLOBAL

    researchmap

  • HMMの変分ベイズ学習によるテキストの話題分割法の検討

    越仲孝文, 磯健一

    日本音響学会研究発表会講演論文集   2004   2004

  • A Handwritten Word Recognition Method Using Context Dependency with Continuous HMM.

    越仲孝文, 西脇大輔, 山田敬嗣

    電子情報通信学会技術研究報告   99 ( 649(PRMU99 231-245) )   2000

  • 文字パタン間の依存性を考慮した文字列の学習と認識

    越仲孝文, 西脇大輔, 山田敬嗣

    電子情報通信学会大会講演論文集   1999   1999

  • A Slant Correction Method for Character Strings Based on Certainty Measure to Slant Estimation.

    越仲孝文, 西脇大輔, 山田敬嗣

    電子情報通信学会大会講演論文集   1997   1997

  • Handwritten Kana Recognition using Inverse Recall Neuralnets.

    越仲孝文, 西脇大輔, 山田敬嗣

    電子情報通信学会大会講演論文集   1996 ( Society D )   1996

  • A Segmentation and Recognition Method for Specific Chinese Numerics and Symbols.

    越仲孝文, 西脇大輔, 山田敬嗣

    電子情報通信学会大会講演論文集   1995 ( Sogo Pt 7 )   1995

▼display all

Presentations

  • 機械学習を用いた胸部X線画像左右反転防止システム開発の検討

    岡田圭伍, 越仲孝文, 平野高望, 本寺哲一, 安田光慶, 加藤京一

    第39回日本診療放射線技師学術大会  2023.10 

     More details

    Event date: 2023.9 - 2023.10

    Language:Japanese   Presentation type:Oral presentation (general)  

    researchmap

  • NECシンガポール研究所と音声・音響解析への取組み Invited

    谷 真宏, 仙田 裕三, 近藤 玲史, 越仲 孝文

    情報処理学会音声言語処理研究会(SIG-SLP)  2015.10 

     More details

    Language:Japanese   Presentation type:Oral presentation (invited, special)  

    researchmap

  • 音で耳を測る,新しい個人認証技術 Invited

    越仲 孝文

    センシング技術応用研究会 第201回研究例会  2017.11 

     More details

    Language:Japanese   Presentation type:Oral presentation (invited, special)  

    researchmap

  • インダストリーセッション Invited

    庄境 誠, 西村 雅史, 大淵 康成, 河村 聡典, 越仲 孝文

    情報処理学会音声言語情報処理研究会(SIG-SLP)  2014.3 

     More details

    Language:Japanese   Presentation type:Symposium, workshop panel (nominated)  

    researchmap

  • 話者認識技術の現状と課題 Invited

    小川 哲司, 長内 隆, 黒岩 眞吾, 越仲 孝文, 篠田 浩一, 西田 昌史

    電子情報通信学会音声研究会(SP)  2013.3 

     More details

    Language:Japanese   Presentation type:Symposium, workshop panel (nominated)  

    researchmap

  • 音で耳を測る,新しい個人認証技術 Invited

    越仲 孝文, 矢野 昌平

    第6回バイオメトリクスと認識・認証シンポジウム (SBRA2016)  2016.11 

     More details

    Language:Japanese   Presentation type:Oral presentation (invited, special)  

    researchmap

▼display all

Awards

  • 学術奨励賞

    2000.3   電子情報通信学会  

     More details

Research Projects

  • 音声に内在する個人性の言語的側面に関する研究

    Grant number:21K11967  2021.4 - 2024.3

    日本学術振興会  科学研究費助成事業  基盤研究(C)

    越仲 孝文

      More details

    Grant amount:\4160000 ( Direct Cost: \3200000 、 Indirect Cost:\960000 )

    本研究では、音声に含まれる個人性のうち、これまであまり研究されてこなかった言語的な個人性、すなわちテキスト情報に現れる書き手の特徴について明らかにする。研究成果は、音声通話やネット投稿のなりすましのような犯罪の防止などに有用である。
    初年度は、テキストからその筆者を予測する文書分類問題を想定し、ベースラインシステムの構築に注力した。すなわち、テキストから特徴量を抽出する処理、および特徴量を所定の筆者クラスに分類する処理を実行するプログラムを作成した。前者は、基本単位であるトークンの出現頻度に基づくTF-IDF特徴量を抽出する。後者はロジスティック回帰や多層パーセプトロン(MLP)に基づく分類器である。また、特徴抽出と分類を統合した、深層ニューラルネットワークによるend-to-endシステムも構築した。こちらは長短期記憶(LSTM)機構を備える双方向リカレントニューラルネット(bidirectional RNN)および注意機構を備えるTransformerなどのモデルを含む。End-to-endシステムでは、ニューラルネットの隠れ層から入力テキストの分散表現(埋め込みベクトル)を得ることも可能である。
    公開データセットである「青空文庫」から作品数の多い著名筆者10人を選び、日本語作品の段落単位での分類実験を実施した。段落総数は約33,000である。深層ニューラルネットに基づくシステムの分類精度が65%で最も高く、TF-IDF特徴量を用いる従来型システムの52%を大きく上回った。関連する研究成果を人工知能学会全国大会(JSAI2022)で発表予定。
    実験の効率化のために、NVIDIA RTX A6000搭載のGPUサーバ1台を購入した。また、将来の国際会議や雑誌での論文発表に備えてLanguage Data Consortium (LDC)の音声言語データを入手した。

    researchmap

  • Improvement of likelihood ratio measurement in a forensic speaker identification based on Bayesian statistics

    Grant number:21510185  2009 - 2012

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (C)

    OSANAI Takashi, KAMADA Toshiaki, MAKINAE Hisanori, AMINO Kanae, PHIL Rose

      More details

    Grant amount:\4290000 ( Direct Cost: \3300000 、 Indirect Cost:\990000 )

    In the forensic science field, in order to help the suitable judgment by judges, it is important to show the degree of a possibility that a suspected person is a criminal. In order to show this possibility, the likelihood ratio based on Bayesian statistics is used widely. In recent years, research which uses this likelihood ratio for forensic speaker recognition is carried out. However, by the conventional method, only a part of the given speech data is used. In this study, I proposed the likelihood ratio measurement which can be used without making useless the given speech data, and confirmed the effectiveness of using it

    researchmap

Teaching Experience

  • Data Mining

    2021.4 Institution:Yokohama City University

     More details

  • Automatic Speech Recognition

    2020.12 Institution:Takushoku University

     More details

  • Statistics and Probability Theory

    2020.9 Institution:Yokohama City University

     More details

  • Advanced Natural Language Processing

    2020.9 Institution:Yokohama City University

     More details

  • Speech Information Processing

    2019.12 Institution:Hosei University

     More details

  • Advanced Artificial Intelligence

    2019.11 - 2020.11 Institution:Kyoto University

     More details

  • Advanced Data Science

    2017.11 - 2020.11 Institution:Kobe University

     More details

▼display all

Academic Activities

  • International Joint Conference on Neural Networks (IJCNN)

    Role(s): Peer review

    IEEE  2025.3

     More details

  • ACM Transactions on Multimedia Computing Communications and Applications

    Role(s): Peer review

    Association for Computing Machinery (ACM)  2023.5

     More details

    Type:Peer review 

    researchmap

  • IEEE BigData2022 Local Arrangement Co-chair

    Role(s): Planning, management, etc.

    IEEE Computer Society  2022.12

     More details

    Type:Academic society, research group, etc. 

    researchmap

  • ICASSP 2022 Session Chair

    Role(s): Panel moderator, session chair, etc.

    IEEE Signal Processing Society  2022.5

     More details

    Type:Academic society, research group, etc. 

    researchmap

  • APSIPA ASC 2021 Sponsorship Co-chair

    Role(s): Planning, management, etc.

    Asia-Pacific Signal and Information Processing Association (APSIPA)  2021.12

     More details

    Type:Academic society, research group, etc. 

    researchmap

  • ICASSP 2021 Session Chair

    Role(s): Panel moderator, session chair, etc.

    IEEE Signal Processing Society  2021.6

     More details

    Type:Academic society, research group, etc. 

    researchmap

  • ICASSP 2020 Session Chair

    Role(s): Panel moderator, session chair, etc.

    IEEE Signal Processing Society  2020.5

     More details

    Type:Academic society, research group, etc. 

    researchmap

  • Computer Speech and Language

    Role(s): Peer review

    International Speech Communication Association (ISCA)  2019.5

     More details

    Type:Peer review 

    researchmap

  • Signal Processing Letters

    Role(s): Peer review

    IEEE Signal Processing Society  2019.4

     More details

    Type:Peer review 

    researchmap

  • Automatic Speech Recognition and Understanding Workshop (ASRU)

    Role(s): Peer review

    IEEE Signal Processing Society  2017.6

     More details

    Type:Peer review 

    researchmap

  • Spoken Language Technology Workshop (SLT)

    Role(s): Peer review

    IEEE Signal Processing Society  2016.6

     More details

    Type:Peer review 

    researchmap

  • 情報処理学会論文誌査読委員

    Role(s): Peer review

    情報処理学会  2016.5

     More details

    Type:Peer review 

    researchmap

  • International Conference on Audio, Speech, and Signal Processing (ICASSP)

    Role(s): Peer review

    IEEE Signal Processing Society  2015.9

     More details

    Type:Peer review 

    researchmap

  • Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

    Role(s): Peer review

    Asia-Pacific Signal and Information Processing Association (APSIPA)  2015.7

     More details

    Type:Peer review 

    researchmap

  • 電子情報通信学会 英文論文誌D (IEICE Trans. on Inf. & Syst.)

    Role(s): Peer review

    電子情報通信学会  2014.6

     More details

    Type:Peer review 

    researchmap

  • Speech Communication

    Role(s): Peer review

    International Speech Communication Association (ISCA)  2013.4

     More details

    Type:Peer review 

    researchmap

  • The Annual Conference of the International Speech Communication Association (INTERSPEECH)

    Role(s): Peer review

    International Speech Communication Association (ISCA)  2010.5

     More details

    Type:Peer review 

    researchmap

▼display all