[Purpose/significance] It is very helpful to build semantic ontology of Chinese ancient books for texting mining and text analysis of China history. However, there are lots of differences between ancient and modern Chinese in syntactic structure. The difference makes a lot of difficulties in Ontology Building of Chinese ancient books. [Method/process] This paper focused on ontology building methods of ancient Chinese books based on Natural language processing (NLP) technique. We designed the ontology model based on CIDOC CRM which is an international standard for the description of cultural heritages. Then we gave a solution to extract instances of the ontology automatically which is a hybrid method of regulation extraction and CRFs recognition based on the syntactic structure of Chinese ancient books. At last, we did an examination using one of Chinese ancient books called Zuo Zhuan. [Result/conclusion] The experiment results show that our method can improve the extraction precision of Ontology instances, which can enhance the efficiency of ontology construction from Chinese ancient books. This paper got 93% F-score on the testing of regular-based method, and 82.51% F-score on CRFs method using the best feature template. It also finds that it is important to use the characters of the position and part-of-speech of words to enhance the extraction of ontology instances in our methods.
He Lin
,
Chen Yaling
,
Sun Kedi
. Research on Ontology Building Methods of Chinese Ancient Books[J]. Library and Information Service, 2020
, 64(7)
: 13
-19
.
DOI: 10.13266/j.issn.0252-3116.2020.07.002
[1] 踪凡. 让古籍文献"活起来"[N]. 光明日报,2017-11-30(14).
[2] 夏翠娟,张磊.关联数据在家谱数字人文服务中的应用[J].图书馆杂志,2016,35(10):26-34.
[3] 于彤,崔蒙,李海燕,等.ISO技术规范"中医药学语言系统语义网络框架"的应用研究[J].中国医药导报,2016,13(4):89-92.
[4] 董慧,徐雷,王菲,等.基于语义系统的中华史籍分析研究[J].图书馆理论与实践,2015(4):1-5, 46.
[5] 陈小荷.先秦文献的信息处理[M].北京:世界图书出版公司,2013.
[6] 欧阳剑. 面向数字人文研究的大规模古籍文本可视化分析与挖掘[J]. 中国图书馆学报,2016,42(2):66-80.
[7] 朱晓,金力.条件随机场图模型在《明史》词性标注研究中的应用效果探索[J].复旦学报(自然科学版),2014,53(3):297-304.
[8] 刘浏,李斌,曲维光,等. 先秦词汇的时代特征自动获取及文献时代的自动判定[J]. 中文信息学报, 2013, 27(5):107-113.
[9] 于丽丽,丁德鑫,曲维光,等.基于条件随机场的古汉语词义消歧研究[J]. 微电子学与计算机, 2009,26(10):45-48.
[10] 任飞亮,沈继坤,孙宾宾,等.从文本中构建领域本体技术综述[J].计算机学报, 2019,42(3):654-676.
[11] WIMALASURIYA D C, DOU D. Ontology-based information extraction:an introduction and a survey of current approaches[J]. Journal of information science, 2010, 36(3):306-323.
[12] 王颖,张智雄,孙辉,等.国史知识的语义揭示与组织方法研究[J].中国图书馆学报, 2015,41(4):55-64.
[13] THAKKER D, KARANASIOS S, BLANCHARD E, et al. Ontology for cultural variations in interpersonal communication:building on theoretical models and crowdsourced knowledge[J]. Journal of the Association for Information Science and Technology, 2017,68(6):1411-1428.
[14] 周耀林,赵跃,孙晶琼.非物质文化遗产信息资源组织与检索研究路径[J]. 2017,36(8):166-174.
[15] ISO technical committee 46 variations in interpersoation, subcomittee SC4 e 46 variations in interper. Information and documentation——a reference ontology for the interchange of cultural heritage information[S]. ISO 21127:2014. Geneva:ISO, 2014.
[16] DOERR M. The CIDOC conceptual reference module:an ontological approach to semantic interoperability of metadata[J]. AI magazine, 2003,24(3):75-92.
[17] 顾栋高. 春秋大事表[M]. 北京:中华书局,1993.
[18] 童书业.春秋左传研究[M]. 上海:上海人民出版社,2019.
[19] 陈小洁. 基于本体的《左传》战争知识地图构建研究[D].南京:南京农业大学,2018.
[20] 陈雅玲. 基于CIDOC CRM的先秦人物知识本体构建方法研究[D].南京:南京农业大学, 2019.
[21] CHEN X H, LI B, FENG M X, et al. Ancient Chinese corpus[M]. Philadelphia:Linguistic Data Consortium, 2017.
[22] LAFFERTY J D, McCALLUM A, PEREIRA F. Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th international conference on machine learning. San Francisco:Morgan Kaufmann Publishers Inc, 2001:282-289.
[23] 吕云云, 李旸, 王素格. 基于BootStrapping的集成分类器的中文观点句识别方法[J]. 中文信息学报, 2013, 27(5):84-93.