Library and Information Service >
Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus
Received date: 2015-05-23
Revised date: 2015-06-05
Online published: 2015-06-20
[Purpose/significance] Under the trend of digital humanities research, the automatic recognition model for ancient Chinese place names is constructed based on Pre-Qin ancient Chinese corpus and conditional random field.[Method/process] The internal and external characteristics of ancient Chinese place names in Zuo Commentary are analyzed, and the feature template of model is constructed. The training model, which is gained in train and test corpus of 187,901 words and the F-score of which is 91.52%, is best identified the ancient Chinese place names recognition model and applied the model to recognize the place name in Guo Yu by comparing the recognition results of the models of conditional random field and maximum entropy .[Result/conclusion] The model of conditional random field is better than the model of maximum entropy in recognizing ancient Chinese place names. The performance of automatic recognition model based on conditional random field trained in annotated corpus is very well.
Huang Shuiqing , Wang Dongbo , He Lin . Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus[J]. Library and Information Service, 2015 , 59(12) : 135 -140 . DOI: 10.13266/j.issn.0252-3116.2015.12.020
[1] Sang E F T K, De Meulder F. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition[C]//Special Interest Group on Natural Language Learning of the Association for Computational Linguistics. Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL. Edmonton: CONLL, 2003:142-147.
[2] Busa R. The annals of humanities computing: The index thomisticus[J]. Computers and the Humanities, 1980,14(2):83-90.
[3] Unsworth J. What is humanities computing and what is not[EB/OL].[2015-03-26]. http://computerphilologie.uni-muench en.de/jg02/unsworth.html.
[4] 李丽双,党延忠,廖文平,等.CRF与规则相结合的中文地名识别[J].大连理工大学学报,2012,52(2):285-289.
[5] 邱莎,阿圆,王付艳,等.基于统计的中文地名自动识别研究[J].计算机技术与发展,2011,21(11):35-38.
[6] 唐旭日,陈小荷,许超,等.基于篇章的中文地名识别研究[J].中文信息学报,2010,24(2):24-32.
[7] 钱小飞,侯敏.中文基本地名识别[J].语言文字应用,2009,(3):129-135.
[8] 黄德根,岳广玲,杨元生.基于统计的中文地名识别[J].中文信息学报,2003,17(2):36-41.
[9] 李颖,王青海,池毓焕.句类分析准则在作战文书地名识别中的应用[J].计算机工程与设计,2013,34(8):2903-2907.
[10] 肖磊.先秦地名知识库构建[D].南京:南京师范大学,2010.
[11] 朱锁玲,包平.方志类古籍地名识别及系统构建[J].中国图书馆学报,2011,37(3):118-123.
[12] 朱锁玲,包平.方志类古籍地名识别及分析研究——以《方志物产》(广东分卷)为例[J].图书馆论坛,2012,32(4):171-175.
[13] 孙虹,陈俊杰.双层CRF与规则相结合的中文地名识别方法研究[J].计算机应用与软件,2014,31(11):175-182.
[14] 何炎祥,罗楚威,胡彬尧.基于CRF和规则相结合的地理命名实体识别方法[J].计算机应用与软件,2015,32(1):179-185.
[15] Lafferty J,McCallum A,Pereira F.Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//The International Machine Learning Society. Proceedings of 18th International Conference on Machine Learning. Williamstown: Williams College, 2001:282-289.
[16] CRF++[EB/OL].[2015-05-21]. http://sourceforge.net/projects/crfpp/.
[17] Jaynes E T.On the rationale of maximum entropy methods[J]. Institute of Electrical and Electronics Engineers,1982,70(9):939-952.
[18] 吴云芳.面向中文信息处理的现代汉语并列结构研究[D].北京:北京大学,2003.
[19] 陈小荷.先秦文献信息处理[M].北京:世界图书出版公司,2013:71.
[20] Atterer M, Schütze H.Prepositional phrase attachment without Oracles[J].Computational Linguistics, 2007, 33(4):469-476.
/
〈 |
|
〉 |