图书情报工作 ›› 2021, Vol. 65 ›› Issue (22): 134-142.DOI: 10.13266/j.issn.0252-3116.2021.22.014

• 知识组织 • 上一篇    下一篇

基于词和实体标注的古籍数字人文知识库的构建与应用——以《资治通鉴·周秦汉纪》为例

常博林1, 万晨2, 李斌1, 陈欣雨1, 冯敏萱1, 王东波3   

  1. 1. 南京师范大学文学院 南京 210097;
    2. 复旦大学中国语言文学系 上海 200433;
    3. 南京农业大学信息管理学院 南京 210095
  • 收稿日期:2021-05-27 修回日期:2021-09-18 出版日期:2021-11-20 发布日期:2021-12-01
  • 通讯作者: 李斌,副教授,博士,通讯作者,E-mail:libin.njnu@gmail.com
  • 作者简介:常博林,本科生;万晨,硕士研究生;陈欣雨,本科生;冯敏萱,副教授,博士;王东波,教授,博士生导师。
  • 基金资助:
    本文系江苏省社会科学基金项目"人工智能辅助青少年传统文化教育研究"(项目编号:20JYB004)、国家社会科学基金项目"中文抽象语义库的构建及自动分析研究"(项目编号:18BYY127)和国家社会科学基金重大项目"基于《汉学引得丛刊》的典籍知识库构建及人文计算研究"(项目编号:15ZDB127)研究成果之一。

The Construction and Application for Digital Humanities Knowledge Base of Ancient Books Based on Word and Entity Annotation: A Case Study on Zhou Qin Han Annals ofZizhitongjian

Chang Bolin1, Wan Chen2, Li Bin1, Chen Xinyu1, Feng Minxuan1, Wang Dongbo3   

  1. 1 School of Chinese Language and Literature, Nanjing Normal University, Nanjing 210097;
    2 Department of Chinese language and literature, Fudan University, Shanghai 200433;
    3 College of Information Management, Nanjing Agricultural University, Nanjing 210095
  • Received:2021-05-27 Revised:2021-09-18 Online:2021-11-20 Published:2021-12-01

摘要: [目的/意义] 探索能够实现基于词和实体的检索与知识挖掘的人文知识库构建方法。[方法/过程] 以《资治通鉴·周秦汉纪》为例,对68卷60万字的文本自动分词与词性标注之后,人工标注文本中的人物、地点GIS、时间等实体信息,实现基于词和实体的全文检索和地图检索系统;利用同现信息,统计出人物关系与人物游历信息;进而使用TF-IDF方法,通过时间序列分析,挖掘出多事之秋、风云人物、风云之地等结果。[结果/结论] 基于词和实体的深度信息标注,能够解决缺乏词界、同名异指和异名同指的检索难题,更可以为古籍多角度的知识发掘与知识服务提供基础支撑。

关键词: 《资治通鉴》, 数字人文, 知识挖掘, 古籍检索, 古文信息处理

Abstract: [Purpose/significance] To explore a humanistic knowledge base construction method based on word and entity retrieval and knowledge mining. [Method/process] This paper constructed the Zhou Qin Han Annals of the Zizhitongjian, achieved the automatic segmentation and part-of-speech tagging of the 68-volume 600,000-character text, manually annotated entity information such as persons, locations, GIS and time in the text, and designed the system of full-text retrieval and map visualization based on words and entities. This paper used co-occurrence information to get the relationship and travel information of the characters. By TF-IDF and time series analysis, the key periods, people and locations in history were automatically extracted and illustrated. [Result/conclusion] Depth information labeling based on words and entities is a good solution to the problems of word boundaries, same name with different person and different name with same person, and it can solid the basis for multi-studies on the knowledge mining and knowledge service of ancient books.

Key words: Zizhitongjian, digital humanities, knowledge mining, ancient book retrieval, ancient Chinese language processing

中图分类号: