Departmental Bulletin Paper 変動係数を用いた語彙の豊富さ指標の比較評価
ヘンドウ ケイスウ オ モチイタ ゴイ ノ ホウフサ シヒョウ ノ ヒカク ヒョウカ
Evaluate lexical richness measures using coefficient of variation

鄭, 弯弯  ,  金, 明哲  ,  テイ, ワンワン  ,  キン, メイテツ  ,  Zheng, Wanwan  ,  Jin, Mingzhe

58 ( 4 )  , pp.230 - 241 , 2018-01-31 , 同志社大学ハリス理化学研究所 , Transcription:ドウシシャ ダイガク ハリス リカガク ケンキュウジョ , Alternative:Harris Science Research Institute of Doshisha University
Although numerous lexical richness measures have been proposed, a positive evaluation method has not been established to select measures independent of text length. As an existing evaluation method, it is common to view the transition curves of the measure's original data or standardized data. However, this method is mostly judged visually and cannot sufficiently capture the change of measures. In other words, this method cannot compare and evaluate lexical richness measures directly by viewing transition curves of either original data or standardized data. In this paper, evaluation statistic CV (coefficient of variation) is proposed as a possible method to evaluate lexical richness measures. CV overcomes the drawback of previous research and make it possible to compare the stability of measures by visual observation. A total of 11 measures of TTR, K, R, S, Uber, C, s, LN, k, M and m are compared and evaluated using CV. Meanwhile, Japanese, Chinese, and English corpora are used to avoid the possible influence of the languages. Analysis results indicate that s is the measure with the smallest influence of text length and language.

Number of accesses :  

Other information