图书情报工作 ›› 2020, Vol. 64 ›› Issue (11): 116-124.DOI: 10.13266/j.issn.0252-3116.2020.11.013

• 知识组织 • 上一篇    下一篇

《史记》历史事件自动抽取与事理图谱构建研究

刘忠宝1,2, 党建飞2, 张志剑2   

  1. 1 云计算与物联网技术福建省高等学校重点实验室(泉州信息工程学院) 泉州 362000;
    2 中北大学软件学院 太原 030051
  • 收稿日期:2019-12-04 修回日期:2020-02-06 出版日期:2020-06-05 发布日期:2020-06-05
  • 作者简介:刘忠宝(ORCID:0000-0002-0038-2462),教授,博士,E-mail:liu_zhongbao@hotmail.com;党建飞(ORCID:0000-0002-7419-0455),硕士研究生;张志剑(ORCID:0000-0002-7758-9277),硕士研究生。
  • 基金资助:
    本文系国家社会科学基金一般项目"大数据环境下面向图书馆资源的跨媒体知识服务研究"(项目编号: 19BTQ012)研究成果之一。

Research on Automatic Extraction of Historical Events and Construction of Event Graph Based on Historical Records

Liu Zhongbao1,2, Dang Jianfei2, Zhang Zhijian2   

  1. 1 Key Laboratory of Cloud Computing and Internet-of-Things Technology(Quanzhou University of Information Engineering), Fujian Province University, Quanzhou 362000;
    2 School of Software, North University of China, Taiyuan 030051
  • Received:2019-12-04 Revised:2020-02-06 Online:2020-06-05 Published:2020-06-05

摘要: [目的/意义] 《史记》是我国第一部纪传体史书,几乎囊括黄帝时代到汉武帝元狩元年3 000多年的重大历史事件。如何快速准确地发现这些历史事件及其之间的内在联系,对于透过历史现象、揭示历史实质以及发现历史规律具有重要意义。[方法/过程] 在BERT模型和LSTM-CRF模型的基础上,提出面向《史记》的历史事件及其组成元素抽取方法,并基于此构建《史记》事理图谱。[结果/结论] 实验结果表明,利用所提方法抽取历史事件及其组成元素的F1值分别达到0.823和0.760。通过事理图谱能够发现蕴含在《史记》中鲜为人知的知识,这为文献学、历史学、社会学等领域专家开展研究提供必要的资料准备。

关键词: 《史记》, 历史事件抽取, 事理图谱, BERT模型, 双向长短期记忆网络, 条件随机场

Abstract: [Purpose/significance] Historical Records is the first biographical history book in China, which contains almost all the significant historical events during more than 3000 years between the Yellow Emperor and the Emperor Wu of Han. How to efficiently extract these historical events and their relationships is quite important to penetrate the historical appearances, reveal the historical essences and discover the historical laws. [Method/process] The BERT model and LSTM-CRF model were introduced in this paper, and historical events extraction method based on Historical Records was proposed and the historical event graph was constructed. [Result/conclusion] The experiment results show that the F1 values of historical event and its components extraction are respectively 0.823 and 0.760. The rare known knowledge is invented by the event graph, which providing essential literature foundation for many researchers, such as philology, history and sociology, to conduct their researches.

Key words: Historical Records, extraction of historical events, event graph, bidirectional encoder representations from transformers (BERT), bidirectional long short-term memory (BiLSTM), conditional random field (CRF)

中图分类号: