||Random Forests による英語理学療法論文からの特徴語抽出 : Corpus of Contemporary American English Full Text 版を参照コーパスとして
Key word analysis of English physical therapy articles using Random Forests
114 , 2015-07-26京都 , 同志社女子大学英語英文学会
The purpose of this study was to extract key words in English physical therapy articles using Random Forests. For the data analysis, the author compiled a corpus of English physical therapy articles (PT). The Corpus of Contemporary American English (COCA) Full-Text version, especially its sub-corpus "Academic Medicine" (CM), was used as a reference. Random Forests (RF), an ensemble classifier originally developed by Breiman (2001), was used to extract key words. Tabata (2012-a) utilized RF to spotlight lexical items that Charles Dickens consistently used. In the study, Tabata pointed out that measures used for key word analysis in previous studies, such as Log likelihood and Chi square tests, extract words that frequently appear in a single text as the key words in a whole corpus and proposed Random Forests as an alternative measure. The author hypothesized that using RF as a measure would extract the key words more consistently since previous studies on physical therapy English have not use RF for key word analysis and corpuses from other medical fields have not been used as references. In the results, words such as rehabilitation, motor, and mobility which are important in the field of physical therapy were extracted and the validity of the key words was demonstrated by an experienced physical therapist. These results confirmed that Random Forests can extract the key words which are consistently used in a corpus.