Departmental Bulletin Paper 「人情本コーパス」の設計と構築
Design and Construction of the Ninjobon Corpus

藤本, 灯  ,  北﨑, 勇帆  ,  市村, 太郎  ,  岡部, 嘉幸  ,  小木曽, 智信  ,  高田, 智和  ,  Akari, FUJIMOTO  ,  Yuho, KITAZAKI  ,  Taro, ICHIMURA  ,  Yoshiyuki, OKABE  ,  Toshinobu, OGISO  ,  Tomokazu, TAKADA

(12)  , pp.1 - 12 , 2017-01 , 国立国語研究所
ISSN:2186-134x print2186-1358 online
The Ninjobon Corpus is currently under construction as a part of the Edo Period Collection of the Corpus of Historical Japanese. In October 2015, a trial version of the Ninjobon Corpus (full text search system in the Himawari edition) focusing on the Hiyokurenri Hana no Shimadai was publicly released. The Ninjobon Corpus creation is at the stage of (1) faithful transcription of the original printed book into text, and (2) creation of the "Himawari" XML texts with minimal revisions to (1). In the creation of the XML texts, the tag set is fundamentally based on the Sharebon Corpus, though a tag set with tags related to ligatures and revisions was prepared for the Ninjobon. Further, the results of a morphological analysis of the first volume of Hana no Shimadai showed an analytical precision of approximately 87%. The low precision is caused by the large number of characteristically irregular readings in the Ninjobon. One challenge in a corpus construction with annotated morphological information is on how to address the "rubies" attached to kanji characters with irregular native Japanese readings.

Number of accesses :  

Other information