会議発表論文 Action Sequence Recognition in Videos by Combining a CTC Network with a Statistical Language Model

Lin, Mengxi  ,  Lin, Mengxi  ,  井上, 中順  ,  Inoue, Nakamasa  ,  篠田, 浩一  ,  Shinoda, Koichi

117 ( no. 362 )  , pp.1 - 6 , 2017-12 , 電子情報通信学会
内容記述
Action sequence recognition aims to recognize what actions occur in a video and their temporal order. In this paper, we propose to combine an LSTM network trained with Connectionist Temporal Classification (CTC) with a statistical language model for action sequence recognition. The statistical language model captures the relations between action instances, which are hardly learned by the CTC network. Our experiments on the Breakfast dataset show that the statistical language model can significantly boost the recognition accuracy of the CTC network, from 37.0% to 43.4%.
本文を読む

http://t2r2.star.titech.ac.jp/rrws/file/CTT100757838/ATD100000413/20171210_PRMU17_final.pdf

このアイテムのアクセス数:  回

その他の情報