图书情报工作 ›› 2020, Vol. 64 ›› Issue (7): 13-19.DOI: 10.13266/j.issn.0252-3116.2020.07.002

• 专题:先秦典籍的语义组织与挖掘研究 • 上一篇    下一篇

面向先秦典籍的知识本体构建技术研究

何琳, 陈雅玲, 孙珂迪   

  1. 南京农业大学信息管理系 南京 210095
  • 收稿日期:2019-07-10 修回日期:2019-11-24 出版日期:2020-04-05 发布日期:2020-04-05
  • 作者简介:何琳(ORCID:0000-0002-4207-3588),教授,博士,博士生导师,E-mail:helin@njau.edu.cn;陈雅玲(OCRID:0000-0002-7515-4843),硕士研究生;孙珂迪(ORCID:0000-0003-0193-1117),硕士研究生。
  • 基金资助:
    本文系中央高校基本科研业务费资助项目"基于《汉学引得丛刊》的古文本体研究"(项目编号:SKCX2017004)研究成果之一。

Research on Ontology Building Methods of Chinese Ancient Books

He Lin, Chen Yaling, Sun Kedi   

  1. College of Information Science & Technology, Nanjing Agricultural University, Nanjing 210095
  • Received:2019-07-10 Revised:2019-11-24 Online:2020-04-05 Published:2020-04-05

摘要: [目的/意义] 构建面向典籍文本的语义本体,能够促进典籍文本的挖掘与分析。然而由于典籍文本与现代文本在语法上存在较大差异,给面向典籍的语义本体构建带来了困难。[方法/过程] 本文运用自然语言处理技术探讨针对先秦典籍的本体构建方法。以国际上文化遗产领域通用的CIDOC CRM为框架,设计先秦典籍本体模型。针对典籍文本内容的特点及句法特征,将规则抽取与条件随机场方法相结合,提出一套本体实例自动获取技术,并以《左传》为实验语料进行测试。[结果/结论] 实验表明,本文所提出的本体实例抽取技术能够较好地提高面向典籍文本的本体构建效率。基于规则的本体实例抽取实验F值在93%左右,基于条件随机场的本体实例抽取最佳特征模板的F值为82.51%。在本体实例获取中,词性信息和位置信息具有重要作用。

关键词: 先秦典籍, 左传, 本体构建, 条件随机场, 规则匹配

Abstract: [Purpose/significance] It is very helpful to build semantic ontology of Chinese ancient books for texting mining and text analysis of China history. However, there are lots of differences between ancient and modern Chinese in syntactic structure. The difference makes a lot of difficulties in Ontology Building of Chinese ancient books. [Method/process] This paper focused on ontology building methods of ancient Chinese books based on Natural language processing (NLP) technique. We designed the ontology model based on CIDOC CRM which is an international standard for the description of cultural heritages. Then we gave a solution to extract instances of the ontology automatically which is a hybrid method of regulation extraction and CRFs recognition based on the syntactic structure of Chinese ancient books. At last, we did an examination using one of Chinese ancient books called Zuo Zhuan. [Result/conclusion] The experiment results show that our method can improve the extraction precision of Ontology instances, which can enhance the efficiency of ontology construction from Chinese ancient books. This paper got 93% F-score on the testing of regular-based method, and 82.51% F-score on CRFs method using the best feature template. It also finds that it is important to use the characters of the position and part-of-speech of words to enhance the extraction of ontology instances in our methods.

Key words: pre-Qin of Chinese ancient books, Zuo Zhuan, Ontology building, CRFs, Regulation matching method

中图分类号: