图书情报工作 ›› 2022, Vol. 66 ›› Issue (18): 105-113.DOI: 10.13266/j.issn.0252-3116.2022.18.010

• 知识组织 • 上一篇    下一篇

一种面向中医领域科技文献的实体关系抽取方法

董美1,2, 常志军1,2   

  1. 1. 中国科学院文献情报中心 北京 100190;
    2中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190
  • 收稿日期:2022-05-06 修回日期:2022-08-17 出版日期:2022-09-20 发布日期:2022-09-29
  • 通讯作者: 常志军,副研究馆员,硕士生导师,通信作者,E-mail:changzj@mail.las.ac.cn
  • 作者简介:董美,硕士研究生。
  • 基金资助:
    本文系中国科学院文献情报中心青年人才领域前沿创新团队项目"基于深度学习的通用领域知识图谱自动化构建方法研究——以中医药领域为例"(项目编号:E0290904)和国家社会科学基金一般项目"面向循证医学的领域文献实体关系识别方法研究"(项目编号:21BTQ106)研究成果之一。

An Entity Relation Extraction Method for Scientific and Technological Documents in the Field of Traditional Chinese Medicine

Dong Mei1,2, Chang Zhijun1,2   

  1. 1. National Science Library, Chinese Academy of Sciences, Beijing 100190;
    2. Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190
  • Received:2022-05-06 Revised:2022-08-17 Online:2022-09-20 Published:2022-09-29

摘要: [目的/意义]针对现有的中医领域知识图谱来源于科技文献的知识相对缺少的问题,提出一套面向中医科技文献实体关系抽取的解决方案,补充中医临床研究知识库,为领域知识图谱构建提供数据基础。[方法/过程]针对中医科技文献,设计领域实体关系表示模型;并根据领域数据多标签和重叠的特点,将实体关系抽取任务分解为关系分类和实体识别两个子任务,将关系分类结果融入实体识别任务中,设计基于预训练模型 BERT的实体关系抽取级联模型。[结果/结论]在自建中医科技文献信息抽取数据集(TCM-STD-IE)上进行实验验证,关系分类和实体识别的 F1-micro分别为 92. 74%、93. 58%。

关键词: 实体关系抽取, 预训练模型, 中医科技文献, 领域知识图谱

Abstract: [Purpose/Significance]Aiming at the relative lack of knowledge derived from scientific and technological document in the existing traditional Chinese medicine (TCM) knowledge graph,a set of solutions for entity relation extraction of TCM scientific and technological document is proposed to supplement the TCM clinical research knowledge base and provide a data foundation for the construction of domain knowledge graph.[Method/Process]Aiming at the scientific and technological document of TCM,a domain entity relationship representation model was designed.According to the characteristics of multi-label and overlapping domain data,the entity relationship extraction task was decomposed into two sub-tasks,relationship classification and entity recognition,and the relationship classification results were integrated into the entity recognition task.A cascaded model for entity relation extraction was dseigned based on the pre-trained model BERT.[Result/Conclusion]Experiments are carried out on the self-built TCM scientific and technological document information extraction data set (TCM-STD-IE),and the F1-micro of relation classification and entity recognition are 92.74% and 93.58%,respectively.

Key words: entity relationship extraction, pre-trained model, traditional Chinese medicine science and technology documents, domain knowledge graph

中图分类号: