紀要論文 教師なしデータを利用した単語の分散表現と事前学習を用いた Web ニュースデータのカテゴリ分類

加藤, 諒磨

57pp.1 - 4 , 2016-03-24 , 法政大学大学院理工学・工学研究科
ISSN:21879923
内容記述
In this research, we investigate a text classification method for a small amount of training data. In the field of text classification, supervised learnings based on Naive Bayes or support vector machine are frequently used. The accuracy by supervised learnings is high if a large amount of training data are available. On the other hand, it tends to become low if only a small amount of training data are available. Since preparing a large amount of training data is cost-ineffective, we propose to reduce the necessary amount of training data to achieve high accuracy. This is achieved by using self-training data in a pre-training phase. Moreover, we propose utilize Word2Vec to quantify texts because the dimension of resulting data produced by bag-of-words, often used for quantifying texts, is too high to calculate with a neural network. Through several numerical examinations, we found that the accuracy by the proposed method is relatively high even with a small amount of training data. Key Words : Data mining, Text mining, Machine mearning, Unsupervised data, Deep Learning
本文を読む

http://repo.lib.hosei.ac.jp/bitstream/10114/12720/1/14R6204%e5%8a%a0%e8%97%a4%e8%ab%92%e7%a3%a8.pdf

このアイテムのアクセス数:  回

その他の情報