Journal Article 機械学習を利用した構文情報に基づく自動生成ファイルの特定
Identifying Auto-Generated Files by Using Machine Learning Techniques Based on Syntactic Information

下仲, 健斗  ,  鷲見, 創一  ,  肥後, 芳樹  ,  楠本, 真二

58 ( 4 )  , pp.861 - 870 , 2017-04-15
These days, source code analysis is keenly studied because it came into use in practice and research such as mining source code repositories. We often see auto-generated files in target repositories, and remove them prior to source code analysis because they can be noise for source code analysis. We can remove auto-generated files by searching particular comments which exist in auto-generated files. However, we cannot identify auto-generated files automatically with such a way if comments have been deleted. Moreover, manually identifying auto-generated files makes us spend too much time. Therefore, in this study we propose a method to identify auto-generated files automatically by using machine learning techniques. In our method, we learn syntactic information of source code. Then, we can identify whether source files are auto-generated files or not. In this study, in order to evaluate the proposed method, we conducted experiments with source files generated by four kinds of code generators. As a result, we confirmed that the proposed method was able to identify auto-generated files with high accuracy.

Number of accesses :  

Other information