Conference Paper A Unified Network for Multi-Speaker Speech Recognition with Multi-Channel Recordings

Liu, Conggui  ,  Liu, Conggui  ,  井上, 中順  ,  Inoue, Nakamasa  ,  篠田, 浩一  ,  Shinoda, Koichi

pp.1304 - 1307 , 2017-12
Despite the recent progress in speech recognition,meeting speech recognition is still a challenging task, since it isoften difficult to separate one speaker’s voice from the others inmeetings. In this paper, we propose a joint training frameworkof speaker separation and speech recognition with multi-channelrecordings for this purpose. The location of each speaker is firstestimated and then used to recover her/his original speech ina delay-and-subtraction (DAS) algorithm. The two components,speaker separation and speech recognition, are represented byone deep net, which is optimized as a whole using training data.We evaluated our method using simulated data generated fromWSJCAM0 database. Compared with the independent trainingof the two components, our proposed method improved wordaccuracy by 15.2% when the locations of speakers are known,and by 53.6% when the locations of speakers are unknown

Number of accesses :  

Other information