图书情报工作 ›› 2015, Vol. 59 ›› Issue (12): 135-140.DOI: 10.13266/j.issn.0252-3116.2015.12.020

• 知识组织 • 上一篇    下一篇

基于先秦语料库的古汉语地名自动识别模型构建研究

黄水清, 王东波, 何琳   

  1. 南京农业大学信息科学技术学院 南京 210095
  • 收稿日期:2015-05-23 修回日期:2015-06-05 出版日期:2015-06-20 发布日期:2015-06-20
  • 作者简介:黄水清(ORCID:0000-0002-1646-9300),院长,教授,博士生导师,sqhuang@njau.edu.cn;王东波(ORCID:0000-0002-9894-9550),副教授,硕士生导师;何琳(ORCID:0000-0002-4207-3588),副院长,教授,硕士生导师。

Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus

Huang Shuiqing, Wang Dongbo, He Lin   

  1. College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095
  • Received:2015-05-23 Revised:2015-06-05 Online:2015-06-20 Published:2015-06-20

摘要:

[目的/意义] 在数字人文研究这一大趋势下,基于先秦古汉语语料库和条件随机场模型,构建古汉语地名自动识别模型。[方法/过程] 对《春秋左氏传》中的地名的内部和外部特征进行统计分析,构建模型的特征模板。在规模为187, 901个词汇的训练和测试语料上,对比条件随机场模型和最大熵模型的地名识别效果,把调和平均数为90.94%的条件随机场训练模型确定为最佳,作为本文所要构建的模型,并在《国语》语料上进行验证。[结果/结论] 在古汉语地名自动识别中,条件随机场模型优于最大熵模型,基于人工标注过的语料构建条件随机场自动识别模型能取得较好的识别效果。

关键词: 古汉语地名, 条件随机场, 词汇特征, 先秦语料库

Abstract:

[Purpose/significance] Under the trend of digital humanities research, the automatic recognition model for ancient Chinese place names is constructed based on Pre-Qin ancient Chinese corpus and conditional random field.[Method/process] The internal and external characteristics of ancient Chinese place names in Zuo Commentary are analyzed, and the feature template of model is constructed. The training model, which is gained in train and test corpus of 187,901 words and the F-score of which is 91.52%, is best identified the ancient Chinese place names recognition model and applied the model to recognize the place name in Guo Yu by comparing the recognition results of the models of conditional random field and maximum entropy .[Result/conclusion] The model of conditional random field is better than the model of maximum entropy in recognizing ancient Chinese place names. The performance of automatic recognition model based on conditional random field trained in annotated corpus is very well.

Key words: ancient Chinese place name, conditional random field, lexical feature, pre-Qin corpus

中图分类号: