Researcher Profile - Takafumi Koshinaka

All information, except for affiliations, is reprinted from the information registered on researchmap.

写真a

Takafumi Koshinaka

Organization

Graduate School of Data Science Department of Data Science Professor
School of Data Science Department of Data Science

Homepage

https://sites.google.com/view/koshinak-lab/

Profile

平3京大・工・航空卒，平5同大大学院工学研究科修士課程了，平25東工大大学院情報理工学研究科博士課程了，博士(工学)．平5 NEC入社，平18同社主任研究員，平25同社主幹研究員．平29人工知能学会理事，令1京大大学院情報学研究科非常勤講師．令2より横浜市立大学データサイエンス学部教授．パターン認識，信号処理，機械学習の研究に興味をもつ．

External link

Degree

Doctor of Engineering ( 2013.3 Tokyo Institute of Technology )

Research Interests

Pattern recognition
Machine learning
Deep learning
Natural language processing
Automatic speech recognition
Artificial intelligence
Signal processing

Research Areas

Informatics / Intelligent robotics
Informatics / Perceptual information processing
Informatics / Intelligent informatics

Education

Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Computer Science

2009.10 - 2013.3

　 More details

Country： Japan

researchmap
Kyoto University Graduate School of Engineering Department of Aeronautics

1991.4 - 1993.3

　 More details

Country： Japan

researchmap
Kyoto University Faculty of Engineering Department of Aeronautics

1987.4 - 1991.3

　 More details

Country： Japan

researchmap
石川県立金沢泉丘高等学校

1984.4 - 1987.3

　 More details

researchmap

Research History

Yokohama City University School of Data Science Deputy Dean

2025.4

　 More details

researchmap
Yokohama City University School of Data Science Professor

2020.9

　 More details

Country：Japan

researchmap
NEC Corporation Biometrics Research Laboratories Senior Principal Researcher

2018.3 - 2020.8

　 More details

Country：Japan

researchmap
NEC Corporation Data Science Research Laboratories Senior Principal Researcher

2016.4 - 2018.3

　 More details

Country：Japan

researchmap
NEC Corporation Information and Media Processing Laboratories Senior Principal Researcher

2015.4 - 2018.3

　 More details

Country：Japan

researchmap
NEC Corporation Information and Media Processing Laboratories Principal Researcher

2010.4 - 2013.3

　 More details

Country：Japan

researchmap
NEC Corporation Common Platform Software Laboratories Principal Researcher

2007.4 - 2010.3

　 More details

Country：Japan

researchmap
NEC Corporation Media and Information Research Laboratories Principal Researcher

2006.4 - 2007.3

　 More details

Country：Japan

researchmap

▼display all

Professional Memberships

The Association for Natural Language Processing

2021.2

　 More details

researchmap
The Japanese Society for Artificial Intelligence

2017.10

　 More details

researchmap
IEEE

2013.3

　 More details

researchmap
日本音響学会

2004.12

　 More details

researchmap
電子情報通信学会

1993.6

　 More details

researchmap

Committee Memberships

ISO/IEC JTC1/SC29 WG1 JP Expert

2021.5

　 More details

Committee type：Academic society

researchmap
IEEE BigData2022 Organizing Committee Local Arrangement Co-chair

2020.12 - 2022.12

　 More details

Committee type：Academic society

researchmap
The Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA 2021) Sponsorship Co-chair

2019.12 - 2021.12

　 More details

Committee type：Academic society

researchmap
The Japanese Society for Artificial Intelligence Repsesentative

2019.6

　 More details

Committee type：Academic society

researchmap
The Speaker and Language Recognition Workshop (Odyssey 2020) General Co-chair

2018.6 - 2020.11

　 More details

Committee type：Academic society

researchmap
The Japanese Society for Artificial Intelligence Board Member

2017.6 - 2019.6

　 More details

Committee type：Academic society

researchmap
Industrial Membership Committee, Asia-Pacific Signal and Information Processing Association (APSIPA) Committee Member

2016.6 - 2018.6

　 More details

Committee type：Academic society

researchmap
電子情報通信学会音声研究専門委員会研究専門委員

2013.5 - 2017.4

　 More details

Committee type：Academic society

researchmap

▼display all

Papers

An Experimental Study on Text-independent Speaker Verification for Forensic Applications

Shigeki Ozawa, Akira Gotoh, Yuko Saito, Hiroki Matsuura, Takafumi Koshinaka

124 ( 391 ) 34 - 39 2025.3

　More details

Authorship：Last author,　Corresponding author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
検索エンジンを指向したLLMのアラインメント

益子怜, 木村賢, 越仲孝文

言語処理学会第31回年次大会 2025.3

　More details

Authorship：Last author,　Corresponding author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap
Reading is Believing: Revisiting Language Bottleneck Models for Image Classification Reviewed

Honori Udo, Takafumi Koshinaka

2024 IEEE International Conference on Image Processing (ICIP) 943 - 949 2024.10

　More details

Authorship：Last author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icip51287.2024.10648091

DOI： 10.60864/n50t-ax16

researchmap
Editable Virtual Try-On Using Text Prompts

Kosuke Takemoto, Koshinaka Takafumi

2024.5

　More details

Authorship：Last author,　Corresponding author Publishing type：Research paper (conference, symposium, etc.)

DOI： 10.11517/pjsai.JSAI2024.0_2C1GS702

researchmap
LLM生成コンテンツのSEO観点での品質評価

益子怜, 木村賢, 越仲孝文

言語処理学会年次大会発表論文集(Web) 30th 2024

　More details

Authorship：Last author,　Corresponding author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

J-GLOBAL

researchmap
Image Captioners Tell More Than Images Given to Them

有働帆乃璃, 越仲孝文

人工知能学会全国大会論文集(Web) 37th 2023.6

　More details

Authorship：Last author,　Corresponding author Language：Japanese

J-GLOBAL

researchmap
Generalized Domain Adaptation Framework for Parametric Back-End in Speaker Recognition Reviewed

Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

IEEE Transactions on Information Forensics and Security 18 3936 - 3947 2023.6

　More details

Authorship：Last author Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers (IEEE)

DOI： 10.1109/tifs.2023.3287733

researchmap
Response Generation to Low-Rated Reviews Combined with Sentiment Analysis

益子怜, 越仲孝文

人工知能学会全国大会論文集(Web) 37th 2023.6

　More details

Authorship：Last author,　Corresponding author Language：Japanese

J-GLOBAL

researchmap
Analysis of Consumers’ Feedback on a Japanese EC Site Focusing on the Relation between Review Text and Rating

小林義幸, 越仲孝文

人工知能学会全国大会論文集(Web) 36th 2022.6

　More details

Authorship：Last author,　Corresponding author Language：Japanese

DOI： 10.11517/pjsai.JSAI2022.0_1P5GS602

J-GLOBAL

researchmap
Task-aware Warping Factors in Mask-based Speech Enhancement Reviewed

Qiongqiong Wang, Kong Aik Lee, Takafumi Koshinaka, Koji Okabe, Hitoshi Yamamoto

European Signal Processing Conference (EUSIPCO 2021) 2021.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Xi-Vector Embedding for Speaker Recognition Reviewed

Kong Aik Lee, Qiongqiong Wang, Takafumi Koshinaka

IEEE Signal Processing Letters 28 1385 - 1389 2021.7

　More details

Authorship：Last author Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electrical and Electronics Engineers ({IEEE})

DOI： 10.1109/LSP.2021.3091932

researchmap
Using Multi-Resolution Feature Maps with Convolutional Neural Networks for Anti-Spoofing in ASV Reviewed

Qiongqiong Wang, Kong Aik Lee, Takafumi Koshinaka

Odyssey 2020 The Speaker and Language Recognition Workshop 2020.5

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/odyssey.2020-20

researchmap
A Generalized Framework for Domain Adaptation of PLDA in Speaker Recognition Reviewed

Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020.5

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp40776.2020.9054113

researchmap
NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition. Reviewed

Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda

Computer Speech and Language 61 101033 - 101033 2020.5

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.csl.2019.101033

researchmap
NEC-TT speaker verification system for SRE'19 CTS challenge

Kong Aik Lee, Koji Okabe, Hitoshi Yamamoto, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Keisuke Ishikawa, Koichi Shinoda

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020- 2227 - 2231 2020

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：International Speech Communication Association

DOI： 10.21437/Interspeech.2020-1132

Scopus

researchmap
Study on comparison of individuality of ear canal shape

Riki Kimura, Shohei Yano, Rui Fujitsuka, Naoki Wakui, Takayuki Arakawa, Takafumi Koshinaka

148th Audio Engineering Society International Convention 2020

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Audio Engineering Society

Scopus

researchmap
The NEC-TT 2018 Speaker Verification System Reviewed

Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda

Interspeech 2019 2019.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2019-1517

researchmap
Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding Reviewed

Hitoshi Yamamoto, Kong Aik Lee, Koji Okabe, Takafumi Koshinaka

Interspeech 2019 2019.9

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2019-1508

researchmap
Unleashing the Unused Potential of i-Vectors Enabled by GPU Acceleration Reviewed

Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka

Interspeech 2019 2019.9

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2019-1955

researchmap
The CORAL+ Algorithm for Unsupervised Domain Adaptation of PLDA Reviewed

Kong Aik Lee, Qiongqiong Wang, Takafumi Koshinaka

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019.5

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp.2019.8682852

researchmap
Feature selection and its evaluation in binaural ear acoustic authentication

Masaki Yasuhara, Shohei Yano, Takayuki Arakawa, Takafumi Koshinaka

AES 146th International Convention 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Audio Engineering Society

Scopus

researchmap
Attention Mechanism in Speaker Recognition: What Does it Learn in Deep Speaker Embedding? Reviewed

Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Hitoshi Yamamoto, Takafumi Koshinaka

2018 IEEE Spoken Language Technology Workshop (SLT) 2018.12

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/slt.2018.8639586

researchmap
Attentive Statistics Pooling for Deep Speaker Embedding Reviewed

Koji Okabe, Takafumi Koshinaka, Koichi Shinoda

Interspeech 2018 2018.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2018-993

researchmap
Ear Acoustic Biometrics Using Inaudible Signals and Its Application to Continuous User Authentication Reviewed

Shivangi Mahto, Takayuki Arakawa, Takafumi Koshinaka

2018 26th European Signal Processing Conference (EUSIPCO) 2018.9

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.23919/eusipco.2018.8553015

researchmap
DNN Based Speaker Embedding Using Content Information for Text-Dependent Speaker Verification Reviewed

Subhadeep Dey, Takafumi Koshinaka, Petr Motlicek, Srikanth Madikeri

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018.4

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp.2018.8461389

researchmap
Robust i-vector extraction tightly coupled with voice activity detection using deep neural networks Reviewed

Hitoshi Yamamoto, Koji Okabe, Takafumi Koshinaka

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2017.12

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/apsipa.2017.8282114

researchmap
i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition Reviewed

Shivangi Mahto, Hitoshi Yamamoto, Takafumi Koshinaka

Interspeech 2017 2017.8

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2017-731

researchmap
Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification Reviewed

Qiongqiong Wang, Takafumi Koshinaka

Interspeech 2017 2017.8

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2017-727

researchmap
誤差の周波数拡散と加算平均処理による耳音紋認証の精度向上 Reviewed

矢野昌平, 荒川隆行, 越仲孝文, 今岡仁, 入澤英毅

信学論A J100-A ( 4 ) 161 - 168 2017.4

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap
Fast and accurate personal authentication using ear acoustics Reviewed

Takayuki Arakawa, Takafumi Koshinaka, Shohei Yano, Hideki Irisawa, Ryoji Miyahara, Hitoshi Imaoka

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 2016.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/apsipa.2016.7820886

researchmap
Domain adaptation using maximum likelihood linear transformation for PLDA-based speaker verification Reviewed

Qiongqiong Wang, Hitoshi Yamamoto, Takafumi Koshinaka

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016.3

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp.2016.7472651

researchmap
Denoising autoencoder-based speaker feature restoration for utterances of short duration Reviewed

Hitoshi Yamamoto, Takafumi Koshinaka

Interspeech 2015 2015.9

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2017-731

researchmap
Speech/acoustic analysis technology - Its application in support of public solutions

Takafumi Koshinaka, Osamu Hoshuyama, Yoshifumi Onishi, Ryosuke Isotani, Masahiro Tani

NEC Technical Journal 9 ( 1 ) 82 - 85 2015.1

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：NEC Mediaproducts

Scopus

researchmap
Anomaly detection of motors with feature emphasis using only normal sounds Reviewed

Yumi Ono, Yoshifumi Onishi, Takafumi Koshinaka, Soichiro Takata, Osamu Hoshuyama

2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013.5

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp.2013.6638167

researchmap
A study on semantic indexing for spoken document retrieval Reviewed

Takafumi Koshinaka

Tokyo Institute of Technology ( 甲第9187号 ) 2013.3

　More details

Authorship：Lead author Language：Japanese Publishing type：Doctoral thesis

researchmap
A noise-robust speech recognition method composed of weak noise suppression and weak Vector Taylor Series Adaptation Reviewed

Shuji Komeiji, Takayuki Arakawa, Takafumi Koshinaka

2012 IEEE Spoken Language Technology Workshop (SLT) 2012.12

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/slt.2012.6424205

researchmap
Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model Reviewed

Takafumi KOSHINAKA, Kentaro NAGATOMO, Koichi SHINODA

IEICE Transactions on Information and Systems E95.D ( 10 ) 2469 - 2478 2012.10

　More details

Authorship：Lead author Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electronics, Information and Communications Engineers (IEICE)

A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.

DOI： 10.1587/transinf.e95.d.2469

CiNii Books

researchmap
Efficient Estimation Method of Scaling Factors among Probabilistic Models in Speech Recognition Reviewed

ONISHI Yoshifumi, EMORI Tadashi, KOSHINAKA Takafumi, SHINODA Koichi

The IEICE transactions on information and systems (Japanese edetion) J95-D ( 5 ) 1276 - 1285 2012.5

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
Committee-Based Active Learning for Speech Recognition Reviewed

Yuzo HAMANAKA, Koichi SHINODA, Takuya TSUTAOKA, Sadaoki FURUI, Tadashi EMORI, Takafumi KOSHINAKA

IEICE Transactions on Information and Systems E94-D ( 10 ) 2015 - 2023 2011.11

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Institute of Electronics, Information and Communications Engineers (IEICE)

We propose a committee-based method of active learning for large vocabulary continuous speech recognition. Multiple recognizers are trained in this approach, and the recognition results obtained from these are used for selecting utterances. Those utterances whose recognition results differ the most among recognizers are selected and transcribed. Progressive alignment and voting entropy are used to measure the degree of disagreement among recognizers on the recognition result. Our method was evaluated by using 191-hour speech data in the Corpus of Spontaneous Japanese. It proved to be significantly better than random selection. It only required 63h of data to achieve a word accuracy of 74%, while standard training (i.e., random selection) required 103h of data. It also proved to be significantly better than conventional uncertainty sampling using word posterior probabilities.

DOI： 10.1587/transinf.e94.d.2015

CiNii Books

researchmap
Speech modeling based on committee-based active learning Reviewed

Yuzo Hamanaka, Koichi Shinoda, Sadaoki Furui, Tadashi Emori, Takafumi Koshinaka

2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010.3

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp.2010.5495650

researchmap
Online speaker clustering using incremental learning of an ergodic hidden Markov model Reviewed

Takafumi Koshinaka, Kentaro Nagatomo, Koichi Shinoda

2009 IEEE International Conference on Acoustics, Speech and Signal Processing 2009.4

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp.2009.4960528

researchmap
Open-vocabulary spoken-document retrieval based on query expansion using related web documents Reviewed

Makoto Terao, Takafumi Koshinaka, Shinichi Ando, Ryosuke Isotani, Akitoshi Okumura

Interspeech 2008 2008.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA

DOI： 10.21437/interspeech.2017-727

researchmap
HMM-based text segmentation using variational Bayes learning and its application to audio-visual indexing

Takafumi Koshinaka, Akitoshi Okumura, Ryosuke Isotani

Electronics and Communications in Japan (Part II: Electronics) 90 ( 12 ) 1 - 11 2007.12

　More details

Authorship：Lead author Language：English Publishing type：Research paper (scientific journal) Publisher：Wiley

DOI： 10.1002/ecjb.20421

researchmap
An HMM-Based Text Segmentation Method Using Variational Bayes Inference and Its Application to Audio-Visual Indexing Reviewed

KOSHINAKA Takafumi, OKUMURA Akitoshi, ISOTANI Ryosuke

The IEICE transactions on information and systems J89-D ( 9 ) 2113 - 2122 2006.9

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper (scientific journal) Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Books

researchmap
An HMM-based Text Segmentation Method Using Variational Bayes Approach and Its Application to LVCSR for Broadcast News Reviewed

Takafumi Koshinaka, Ken-ichi Iso, Akitoshi Okumura

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. 2005.3

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/icassp.2005.1415156

researchmap
A Stochastic Model for Handwritten Word Recognition Using Context Dependency Between Character Patterns Reviewed

Takafumi Koshinaka, Daisuke Nishiwaki, Keiji Yamada

The 6th International Conference on Document Analysis and Recognition (ICDAR 2001) 2001.9

　More details

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Pressure waves in a separated gas-liquid layer in a horizontal duct with a step Reviewed

Takafumi Koshinaka, Shigeki Morioka

Fluid Dynamics Research 12 ( 6 ) 323 - 333 1993.12

　More details

Authorship：Lead author Language：English Publishing type：Research paper (scientific journal) Publisher：IOP Publishing

DOI： 10.1016/0169-5983(93)90034-8

researchmap

▼display all

MISC

Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification

Honori Udo, Takafumi Koshinaka

2024.6

　More details

We revisit language bottleneck models as an approach to ensuring the
explainability of deep learning models for image classification. Because of
inevitable information loss incurred in the step of converting images into
language, the accuracy of language bottleneck models is considered to be
inferior to that of standard black-box models. Recent image captioners based on
large-scale foundation models of Vision and Language, however, have the ability
to accurately describe images in verbal detail to a degree that was previously
believed to not be realistically possible. In a task of disaster image
classification, we experimentally show that a language bottleneck model that
combines a modern image captioner with a pre-trained language model can achieve
image classification accuracy that exceeds that of black-box models. We also
demonstrate that a language bottleneck model and a black-box model may be
thought to extract different features from images and that fusing the two can
create a synergistic effect, resulting in even higher classification accuracy.

arXiv

researchmap

Other Link： http://arxiv.org/pdf/2406.15816v1
Generalized domain adaptation framework for parametric back-end in speaker recognition

Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

2023.5

　More details

State-of-the-art speaker recognition systems comprise a speaker embedding
front-end followed by a probabilistic linear discriminant analysis (PLDA)
back-end. The effectiveness of these components relies on the availability of a
large amount of labeled training data. In practice, it is common for domains
(e.g., language, channel, demographic) in which a system is deployed to differ
from that in which a system has been trained. To close the resulting gap,
domain adaptation is often essential for PLDA models. Among two of its variants
are Heavy-tailed PLDA (HT-PLDA) and Gaussian PLDA (G-PLDA). Though the former
better fits real feature spaces than does the latter, its popularity has been
severely limited by its computational complexity and, especially, by the
difficulty, it presents in domain adaptation, which results from its
non-Gaussian property. Various domain adaptation methods have been proposed for
G-PLDA. This paper proposes a generalized framework for domain adaptation that
can be applied to both of the above variants of PLDA for speaker recognition.
It not only includes several existing supervised and unsupervised domain
adaptation methods but also makes possible more flexible usage of available
data in different domains. In particular, we introduce here two new techniques:
(1) correlation-alignment in the model level, and (2) covariance
regularization. To the best of our knowledge, this is the first proposed
application of such techniques for domain adaptation w.r.t. HT-PLDA. The
efficacy of the proposed techniques has been experimentally validated on NIST
2016, 2018, and 2019 Speaker Recognition Evaluation (SRE'16, SRE'18, and
SRE'19) datasets.

arXiv

researchmap

Other Link： http://arxiv.org/pdf/2305.15567v1
Image Captioners Sometimes Tell More Than Images They See

Honori Udo, Takafumi Koshinaka

2023.5

　More details

Image captioning, a.k.a. "image-to-text," which generates descriptive text
from given images, has been rapidly developing throughout the era of deep
learning. To what extent is the information in the original image preserved in
the descriptive text generated by an image captioner? To answer that question,
we have performed experiments involving the classification of images from
descriptive text alone, without referring to the images at all, and compared
results with those from standard image-based classifiers. We have evaluate
several image captioning models with respect to a disaster image classification
task, CrisisNLP, and show that descriptive text classifiers can sometimes
achieve higher accuracy than standard image-based classifiers. Further, we show
that fusing an image-based classifier with a descriptive text classifier can
provide improvement in accuracy.

arXiv

researchmap

Other Link： http://arxiv.org/pdf/2305.02932v2
国際会議 Odyssey 2020 開催報告 Invited

越仲孝文, リーコンエイク, 篠田浩一

電子情報通信学会情報・システムソサイエティ誌 26 ( 2 ) 23 - 24 2021.8

　More details

Authorship：Lead author Language：Japanese Publishing type：Meeting report

researchmap
Linear Discriminant Analysis Considering Worst-case Variance Ratio and Its Application to Ear Acoustic Authentication

伊藤良峻, 越仲孝文

日本音響学会研究発表会講演論文集(CD-ROM) 2020 2020

　More details

J-GLOBAL

researchmap
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Chenglin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans

2019.4

　More details

The I4U consortium was established to facilitate a joint entry to NIST
speaker recognition evaluations (SRE). The latest edition of such joint
submission was in SRE 2018, in which the I4U submission was among the
best-performing systems. SRE'18 also marks the 10-year anniversary of I4U
consortium into NIST SRE series of evaluation. The primary objective of the
current paper is to summarize the results and lessons learned based on the
twelve sub-systems and their fusion submitted to SRE'18. It is also our
intention to present a shared view on the advancements, progresses, and major
paradigm shifts that we have witnessed as an SRE participant in the past decade
from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm
shift from supervector representation to deep speaker embedding, and a switch
of research challenge from channel compensation to domain adaptation.

arXiv

researchmap

Other Link： http://arxiv.org/pdf/1904.07386v1
人間の耳には聴こえない音で個人を識別する耳音響認証技術 Invited

荒川隆行, 越仲孝文

月刊自動認識 2019.3

　More details

Authorship：Last author Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (trade magazine, newspaper, online media)

researchmap
声認証技術がもたらす安全・安心で便利な社会 (バイオメトリクスを用いた社会価値創造特集) Invited

越仲孝文, リーコンエイク

NEC技報 71 ( 2 ) 2019.3

　More details

Authorship：Lead author,　Corresponding author Language：Japanese Publishing type：Internal/External technical report, pre-print, etc.

researchmap
話者クラスタリングを用いた話者照合手法のNIST SRE18における比較評価

GUO Ling, 山本仁, 岡部浩司, 越仲孝文

日本音響学会研究発表会講演論文集(CD-ROM) 2019 2019

　More details

J-GLOBAL

researchmap
A study of observation fluctuation reduction method for car acoustic authentication

安原雅貴, 荒川隆行, 越仲孝文, 矢野昌平

人工知能学会全国大会論文集(Web) 33rd 2019

　More details

J-GLOBAL

researchmap
単一話者検出に最適化した話者クラスタリングを用いる話者照合

GUO Ling, 山本仁, 越仲孝文

日本音響学会研究発表会講演論文集(CD-ROM) 2019 2019

　More details

J-GLOBAL

researchmap
複数の話者が混在する環境下のスコア統合に基づく話者照合

GUO Ling, 山本仁, LEE Kong Aik, 越仲孝文

日本音響学会研究発表会講演論文集(CD-ROM) 2018 2018

　More details

J-GLOBAL

researchmap
PROMISING TECHNOLOGY : Technique of ear acoustic authentication Invited

37 ( 439 ) 18 - 22 2017.10

　More details

Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (trade magazine, newspaper, online media)

CiNii Books

researchmap
ヒアラブル技術によるヒューマン系IoTソリューションの取り組みと展望 (デジタルビジネスを支えるIoT特集) Invited

古谷聡, 越仲孝文, 大杉孝司

NEC技報 70 ( 1 ) 47 - 51 2017.9

　More details

Language：Japanese Publishing type：Internal/External technical report, pre-print, etc. Publisher：日本電気

CiNii Books

researchmap
i-vectorの重み付き次元圧縮と区分回帰による年齢推定

児島一郁, 山本仁, 越仲孝文

日本音響学会研究発表会講演論文集(CD-ROM) 2016 2016

　More details

J-GLOBAL

researchmap
外耳道音響特性を用いた高精度個人認証

荒川隆行, 矢野昌平, 越仲孝文, 入澤英毅, 今岡仁

日本音響学会研究発表会講演論文集(CD-ROM) 2016 2016

　More details

J-GLOBAL

researchmap
音声・音響分析技術とパブリックソリューションへの応用 (社会の安全・安心を支えるパブリックソリューション特集) Invited

越仲孝文, 宝珠山治, 大西祥史, 磯谷亮介, 谷真宏

NEC技報 67 ( 1 ) 86 - 89 2014.11

　More details

Authorship：Lead author,　Corresponding author Language：Japanese Publishing type：Internal/External technical report, pre-print, etc. Publisher：日本電気

CiNii Books

researchmap
正常音スペクトルモデルに基づく機器異常検知方式における特徴量強調の効果

小野友督, 宝珠山治, 大西祥史, 越仲孝文

日本音響学会研究発表会講演論文集(CD-ROM) 2014 2014

　More details

J-GLOBAL

researchmap
話者認識の国際動向 (小特集: 話者認識に関する研究の動向) Invited Reviewed

越仲孝文, 篠田浩一

日本音響学会誌 69 ( 7 ) 2013.7

　More details

Authorship：Lead author Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (scientific journal)

researchmap
Current situations and issues of speaker recognition technologies

網野加苗, 石原俊一, 小川哲司, 長内隆, 黒岩眞吾, 越仲孝文, 篠田浩一, 柘植覚, 西田昌史, 松井知子, WANG Longbiao

電子情報通信学会技術研究報告 112 ( 450(SP2012 115-131) ) 2013

　More details

J-GLOBAL

researchmap
GMM-SVMによるテキスト非依存話者識別

谷真宏, 大西祥史, 越仲孝文

日本音響学会研究発表会講演論文集(CD-ROM) 2013 2013

　More details

J-GLOBAL

researchmap
正常音の知識のみを利用した機器の異常検知

小野友督, 大西祥史, 越仲孝文, 高田宗一朗

日本音響学会研究発表会講演論文集(CD-ROM) 2012 2012

　More details

J-GLOBAL

researchmap
音声・映像情報の構造化と検索 (小特集: 音声・映像認識連携への取り組み) Invited Reviewed

越仲孝文, 大網亮磨, 細見格, 今岡仁

情報処理 52 ( 1 ) 2011.10

　More details

Authorship：Lead author,　Corresponding author Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (scientific journal)

researchmap
雑音抑圧法とモデル適応法の重み付き組み合わせに基づく耐雑音音声認識手法

古明地秀治, 荒川隆行, 越仲孝文

日本音響学会研究発表会講演論文集(CD-ROM) 2011 2011

　More details

J-GLOBAL

researchmap
複数マイクロフォンを用いた音声区間検出

大西祥史, 越仲孝文, 篠田浩一

日本音響学会研究発表会講演論文集(CD-ROM) 2011 2011

　More details

J-GLOBAL

researchmap
A hybrid method of noise suppression and model adaptation for robust speech recognition

IEICE technical report 110 ( 356 ) 49 - 54 2010.12

　More details

Language：Japanese

researchmap
A Hybrid method of Noise Suppression and Model Adaptation for Robust Speech Recognition

古明地秀治, 荒川隆行, 越仲孝文

電子情報通信学会技術研究報告 110 ( 357(SP2010 88-102) ) 49 - 54 2010.12

　More details

Language：Japanese

J-GLOBAL

researchmap
A Hybrid method of Noise Suppression and Model Adaptation for Robust Speech Recognition

2010 ( 9 ) 1 - 6 2010.12

　More details

Language：Japanese

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00071573/
オンライン話者クラスタリング技術と議事録作成支援への応用 (音声認識ソリューション・製品特集) Invited

越仲孝文, 長友健太郎

NEC技報 63 ( 1 ) 84 - 87 2010.2

　More details

Authorship：Lead author,　Corresponding author Language：Japanese Publishing type：Internal/External technical report, pre-print, etc. Publisher：日本電気

CiNii Books

researchmap
裁判員裁判向け音声認識システム (音声認識ソリューション・製品特集) Invited

越仲孝文, 江森正, 大西祥史

NEC技報 63 ( 1 ) 41 - 90 2010.2

　More details

Authorship：Lead author,　Corresponding author Language：Japanese Publishing type：Internal/External technical report, pre-print, etc. Publisher：日本電気

CiNii Books

researchmap
法廷における音声認識システムの開発-システム概要-

越仲孝文, 江森正, 大西祥史, 北出祐, 谷真宏, 佐藤研治

日本音響学会研究発表会講演論文集(CD-ROM) 2010 2010

　More details

J-GLOBAL

researchmap
法廷における音声認識システムの開発-音響モデル及び言語モデル-

谷真宏, 北出祐, 江森正, 大西祥史, 越仲孝文, 佐藤研治

日本音響学会研究発表会講演論文集(CD-ROM) 2010 2010

　More details

J-GLOBAL

researchmap
法廷における音声認識システムの開発-複数マイクロフォンを用いた音声検出-

江森正, 辻川剛範, 大西祥史, 越仲孝文, 谷真宏, 北出祐, 佐藤研治

日本音響学会研究発表会講演論文集(CD-ROM) 2010 2010

　More details

J-GLOBAL

researchmap
法廷における音声認識システムの開発-閲覧性向上のための諸技術の開発-

北出祐, 大西祥史, 江森正, 谷真宏, 越仲孝文, 佐藤研治

日本音響学会研究発表会講演論文集(CD-ROM) 2010 2010

　More details

J-GLOBAL

researchmap
法廷における音声認識システムの開発-オンライン話者適応の構成-

大西祥史, 江森正, 谷真宏, 北出祐, 長友健太郎, 越仲孝文, 佐藤研治

日本音響学会研究発表会講演論文集(CD-ROM) 2010 2010

　More details

J-GLOBAL

researchmap
Active learning using multiple recognizers for speech recognition

濱中悠三, 江森正, 越仲孝文, 越仲孝文, 篠田浩一, 古井貞煕

電子情報通信学会技術研究報告 109 ( 355(NLC2009 12-32) ) 19 - 23 2009.12

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

J-GLOBAL

researchmap
Active learning using multiple recognizers for speech recognition

HAMANAKA YUZO, EMORI TADASHI, KOSHINAKA TAKAFUMI, SHINODA KOICHI, FURUI SADAOKI

2009 ( 4 ) 1 - 5 2009.12

　More details

Language：Japanese

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00067046/
音声認識のためのコミッティを用いた能動学習

濱中悠三, 江森正, 越仲孝文, 越仲孝文, 篠田浩一, 古井貞熙

日本音響学会研究発表会講演論文集(CD-ROM) 2009 2009

　More details

J-GLOBAL

researchmap
Online speaker clustering using an ergodic HMM and its application to meeting minute generation

越仲孝文, 長友健太郎, 佐藤研治

電子情報通信学会技術研究報告 3rd ( 376(MVE2009 79-129) ) 53 - 58 2009

　More details

Language：Japanese

CiNii Books

J-GLOBAL

researchmap
エルゴードHMMのインクリメンタル学習によるオンライン話者クラスタリング

越仲孝文, 長友健太郎, 佐藤研治

日本音響学会研究発表会講演論文集(CD-ROM) 2008 2008

　More details

J-GLOBAL

researchmap
Speaker Selection for Unsupervised Speaker Adaptation based on HMM Sufficient Statistics

谷真宏, 江森正, 大西祥史, 越仲孝文, 篠田浩一

情報処理学会研究報告 2007 ( 129(SLP-69) ) 85 - 89 2007.12

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

We propose a new speaker selection method for the unsupervised speaker adaptation based on HMM sufficient statistics. The adaptation technique of using HMM sufficient statistics has been proposed as one of the rapid unsupervised speaker adaptation techniques in speech recognition. The procedure is as follows:First the training speakers acoustically close to the test speaker are selected. Then, the acoustic model is trained using the HMM sufficient statistics of these selected training speakers. In this technique, the number of selected training speakers is always constant.In our proposed speaker selection method, the number of speakers is determined by the distances between the test speaker and each training speaker. In our recognition experiments using spoken dialogue data, the proposed method improved word accuracy by 0.74 points. It was confirmed that the proposed method particularly effective when there are not many training speakers around the test speaker in acoustic space.

CiNii Books

J-GLOBAL

researchmap

Other Link： http://id.nii.ac.jp/1001/00056768/
Speaker Selection for Unsupervised Speaker Adaptation based on HMM Sufficient Statistics

TANI Masahiro, EMORI Tadashi, OHNISHI Yoshifumi, KOSHINAKA Takafumi, SHINODA Koichi

IEICE technical report 107 ( 406 ) 85 - 89 2007.12

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

We propose a new speaker selection method for the unsupervised speaker adaptation based on HMM sufficient statistics. The adaptation technique of using HMM sufficient statistics has been proposed as one of the rapid unsupervised speaker adaptation techniques in speech recognition. The procedure is as follows: First the training speakers acoustically close to the test speaker are selected. Then, the acoustic model is trained using the HMM sufficient statistics of these selected training speakers. In this technique, the number of selected training speakers is always constant. In our proposed speaker selection method, the number of speakers is determined by the distances between the test speaker and each training speaker. In our recognition experiments using spoken dialogue data, the proposed method improved word accuracy by 0.74 points. It was confirmed that the proposed method particularly effective when there are not many training speakers around the test speaker in acoustic space.

CiNii Books

researchmap
WEB文書を活用したニュース映像検索システム

寺尾真, 越仲孝文, 安藤真一, 磯谷亮輔, 奥村明俊

音声ドキュメント処理ワークショップ講演論文集 1st 2007

　More details

J-GLOBAL

researchmap
G_010 An audio-visual information retrieval system using related text documents

TERAO Makoto, KOSHINAKA Takafumi, ANDO Shinichi, ISOTANI Ryosuke, OKUMURA Akitoshi

FIT 2006 ( 2 ) 373 - 374 2006.8

　More details

Language：Japanese Publisher：Forum on Information Technology

CiNii Books

J-GLOBAL

researchmap
話し言葉における発話速度を隠れ変数にもつ継続時間長モデル

越仲孝文

日本音響学会研究発表会講演論文集 2005 2005

　More details

J-GLOBAL

researchmap
An HMM - based text segmentation method using variational Bayes approach

KOSHINAKA Takafumi, ISO Ken-ichi, OKUMURA Akitoshi

IPSJ SIG Notes 2004 ( 57 ) 49 - 54 2004.5

　More details

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

This paper presents a new text segmentation method based on stochastic modeling. When supposing a generative model of a text document to be a discrete left-to-right hidden Markov model (HMM), a transition between topics in the text document corresponds to a state transition in the HMM, and text segmentation can be formulated as model parameter estimation using the text document. Compared to the traditional maximum likelihood approach, advantage of the Bayes approach (Variational Bayes) is shown by some experiments, which evaluate segmentation accuracy in segmenting Japanese broadcast news programs into each news article. Comparison between the proposed method and a conventional method, well-known Hearst's method, is also presented in this paper. The comparison shows the proposed method to be encouraging.

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1001/00057136/
An HMM-based text segmentation method using variational Bayes approach

越仲孝文, 磯健一, 奥村明俊

電子情報通信学会技術研究報告 104 ( 87(SP2004 15-18) ) 19 - 24 2004.5

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

This paper presents a new text segmentation method based on stochastic modeling. When supposing a generative model of a text document to be a discrete left-to-right hidden Markov model (HMM), a transition between topics in the text document corresponds to a state transition in the HMM, and text segmentation can be formulated as model parameter estimation using the text document. Compared to the traditional maximum likelihood approach, advantage of the Bayes approach (Variational Bayes) is shown by some experiments, which evaluate segmentation accuracy in segmenting Japanese broadcast news programs into each news article. Comparison between the proposed method and a conventional method, well-known Hearst's method, is also presented in this paper. The comparison shows the proposed method to be encouraging.

CiNii Books

J-GLOBAL

researchmap
HMMの変分ベイズ学習によるテキストの話題分割法の検討

越仲孝文, 磯健一

日本音響学会研究発表会講演論文集 2004 2004

　More details

J-GLOBAL

researchmap
A Handwritten Word Recognition Method Using Context Dependency with Continuous HMM.

越仲孝文, 西脇大輔, 山田敬嗣

電子情報通信学会技術研究報告 99 ( 649(PRMU99 231-245) ) 2000

　More details

J-GLOBAL

researchmap
文字パタン間の依存性を考慮した文字列の学習と認識

越仲孝文, 西脇大輔, 山田敬嗣

電子情報通信学会大会講演論文集 1999 1999

　More details

J-GLOBAL

researchmap
A Slant Correction Method for Character Strings Based on Certainty Measure to Slant Estimation.

越仲孝文, 西脇大輔, 山田敬嗣

電子情報通信学会大会講演論文集 1997 1997

　More details

J-GLOBAL

researchmap
Handwritten Kana Recognition using Inverse Recall Neuralnets.

越仲孝文, 西脇大輔, 山田敬嗣

電子情報通信学会大会講演論文集 1996 ( Society D ) 1996

　More details

J-GLOBAL

researchmap
A Segmentation and Recognition Method for Specific Chinese Numerics and Symbols.

越仲孝文, 西脇大輔, 山田敬嗣

電子情報通信学会大会講演論文集 1995 ( Sogo Pt 7 ) 1995

　More details

J-GLOBAL

researchmap

▼display all

Presentations

機械学習を用いた胸部X線画像左右反転防止システム開発の検討

岡田圭伍, 越仲孝文, 平野高望, 本寺哲一, 安田光慶, 加藤京一

第39回日本診療放射線技師学術大会 2023.10

　More details

Event date： 2023.9 - 2023.10

Language：Japanese Presentation type：Oral presentation (general)

researchmap
NECシンガポール研究所と音声・音響解析への取組み Invited

谷真宏, 仙田裕三, 近藤玲史, 越仲孝文

情報処理学会音声言語処理研究会(SIG-SLP) 2015.10

　More details

Language：Japanese Presentation type：Oral presentation (invited, special)

researchmap
音で耳を測る，新しい個人認証技術 Invited

越仲孝文

センシング技術応用研究会第201回研究例会 2017.11

　More details

Language：Japanese Presentation type：Oral presentation (invited, special)

researchmap
インダストリーセッション Invited

庄境誠, 西村雅史, 大淵康成, 河村聡典, 越仲孝文

情報処理学会音声言語情報処理研究会(SIG-SLP) 2014.3

　More details

Language：Japanese Presentation type：Symposium, workshop panel (nominated)

researchmap
話者認識技術の現状と課題 Invited

小川哲司, 長内隆, 黒岩眞吾, 越仲孝文, 篠田浩一, 西田昌史

電子情報通信学会音声研究会(SP) 2013.3

　More details

Language：Japanese Presentation type：Symposium, workshop panel (nominated)

researchmap
音で耳を測る，新しい個人認証技術 Invited

越仲孝文, 矢野昌平

第6回バイオメトリクスと認識・認証シンポジウム (SBRA2016) 2016.11

　More details

Language：Japanese Presentation type：Oral presentation (invited, special)

researchmap

▼display all

Awards

学術奨励賞

2000.3 電子情報通信学会

　More details

researchmap

Research Projects

On Visualizing the Text Generation Process of Image Captioners

Grant number：24K15012 2024.4 - 2027.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

　 More details

Grant amount：\4550000 （ Direct Cost: \3500000 、 Indirect Cost：\1050000 ）

researchmap
音声に内在する個人性の言語的側面に関する研究

Grant number：21K11967 2021.4 - 2024.3

日本学術振興会科学研究費助成事業基盤研究(C)

越仲孝文

　 More details

Grant amount：\4160000 （ Direct Cost: \3200000 、 Indirect Cost：\960000 ）

本研究では、音声に含まれる個人性のうち、これまであまり研究されてこなかった言語的な個人性、すなわちテキスト情報に現れる書き手の特徴について明らかにする。研究成果は、音声通話やネット投稿のなりすましのような犯罪の防止などに有用である。
初年度は、テキストからその筆者を予測する文書分類問題を想定し、ベースラインシステムの構築に注力した。すなわち、テキストから特徴量を抽出する処理、および特徴量を所定の筆者クラスに分類する処理を実行するプログラムを作成した。前者は、基本単位であるトークンの出現頻度に基づくTF-IDF特徴量を抽出する。後者はロジスティック回帰や多層パーセプトロン(MLP)に基づく分類器である。また、特徴抽出と分類を統合した、深層ニューラルネットワークによるend-to-endシステムも構築した。こちらは長短期記憶(LSTM)機構を備える双方向リカレントニューラルネット(bidirectional RNN)および注意機構を備えるTransformerなどのモデルを含む。End-to-endシステムでは、ニューラルネットの隠れ層から入力テキストの分散表現(埋め込みベクトル)を得ることも可能である。
公開データセットである「青空文庫」から作品数の多い著名筆者10人を選び、日本語作品の段落単位での分類実験を実施した。段落総数は約33,000である。深層ニューラルネットに基づくシステムの分類精度が65%で最も高く、TF-IDF特徴量を用いる従来型システムの52%を大きく上回った。関連する研究成果を人工知能学会全国大会(JSAI2022)で発表予定。
実験の効率化のために、NVIDIA RTX A6000搭載のGPUサーバ1台を購入した。また、将来の国際会議や雑誌での論文発表に備えてLanguage Data Consortium (LDC)の音声言語データを入手した。

researchmap
Improvement of likelihood ratio measurement in a forensic speaker identification based on Bayesian statistics

Grant number：21510185 2009 - 2012

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

OSANAI Takashi, KAMADA Toshiaki, MAKINAE Hisanori, AMINO Kanae, PHIL Rose

　 More details

Grant amount：\4290000 （ Direct Cost: \3300000 、 Indirect Cost：\990000 ）

In the forensic science field, in order to help the suitable judgment by judges, it is important to show the degree of a possibility that a suspected person is a criminal. In order to show this possibility, the likelihood ratio based on Bayesian statistics is used widely. In recent years, research which uses this likelihood ratio for forensic speaker recognition is carried out. However, by the conventional method, only a part of the given speech data is used. In this study, I proposed the likelihood ratio measurement which can be used without making useless the given speech data, and confirmed the effectiveness of using it

researchmap

Teaching Experience

Data Mining

2021.4 Institution：Yokohama City University

　More details

researchmap
Automatic Speech Recognition

2020.12 Institution：Takushoku University

　More details

researchmap
Statistics and Probability Theory

2020.9 Institution：Yokohama City University

　More details

researchmap
Advanced Natural Language Processing

2020.9 Institution：Yokohama City University

　More details

researchmap
Speech Information Processing

2019.12 Institution：Hosei University

　More details

researchmap
Advanced Artificial Intelligence

2019.11 - 2020.11 Institution：Kyoto University

　More details

researchmap
Advanced Data Science

2017.11 - 2020.11 Institution：Kobe University

　More details

researchmap

▼display all

Academic Activities

International Joint Conference on Neural Networks (IJCNN)

Role(s)： Peer review

IEEE 2025.3

　More details

researchmap
ACM Transactions on Multimedia Computing Communications and Applications

Role(s)： Peer review

Association for Computing Machinery (ACM) 2023.5

　More details

Type：Peer review

researchmap
IEEE BigData2022 Local Arrangement Co-chair

Role(s)： Planning, management, etc.

IEEE Computer Society 2022.12

　More details

Type：Academic society, research group, etc.

researchmap
ICASSP 2022 Session Chair

Role(s)： Panel moderator, session chair, etc.

IEEE Signal Processing Society 2022.5

　More details

Type：Academic society, research group, etc.

researchmap
APSIPA ASC 2021 Sponsorship Co-chair

Role(s)： Planning, management, etc.

Asia-Pacific Signal and Information Processing Association (APSIPA) 2021.12

　More details

Type：Academic society, research group, etc.

researchmap
ICASSP 2021 Session Chair

Role(s)： Panel moderator, session chair, etc.

IEEE Signal Processing Society 2021.6

　More details

Type：Academic society, research group, etc.

researchmap
ICASSP 2020 Session Chair

Role(s)： Panel moderator, session chair, etc.

IEEE Signal Processing Society 2020.5

　More details

Type：Academic society, research group, etc.

researchmap
Computer Speech and Language

Role(s)： Peer review

International Speech Communication Association (ISCA) 2019.5

　More details

Type：Peer review

researchmap
Signal Processing Letters

Role(s)： Peer review

IEEE Signal Processing Society 2019.4

　More details

Type：Peer review

researchmap
Automatic Speech Recognition and Understanding Workshop (ASRU)

Role(s)： Peer review

IEEE Signal Processing Society 2017.6

　More details

Type：Peer review

researchmap
Spoken Language Technology Workshop (SLT)

Role(s)： Peer review

IEEE Signal Processing Society 2016.6

　More details

Type：Peer review

researchmap
情報処理学会論文誌査読委員

Role(s)： Peer review

情報処理学会 2016.5

　More details

Type：Peer review

researchmap
International Conference on Audio, Speech, and Signal Processing (ICASSP)

Role(s)： Peer review

IEEE Signal Processing Society 2015.9

　More details

Type：Peer review

researchmap
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Role(s)： Peer review

Asia-Pacific Signal and Information Processing Association (APSIPA) 2015.7

　More details

Type：Peer review

researchmap
電子情報通信学会英文論文誌D (IEICE Trans. on Inf. & Syst.)

Role(s)： Peer review

電子情報通信学会 2014.6

　More details

Type：Peer review

researchmap
Speech Communication

Role(s)： Peer review

International Speech Communication Association (ISCA) 2013.4

　More details

Type：Peer review

researchmap
The Annual Conference of the International Speech Communication Association (INTERSPEECH)

Role(s)： Peer review

International Speech Communication Association (ISCA) 2010.5

　More details

Type：Peer review

researchmap

▼display all