知识组织

一种面向中医领域科技文献的实体关系抽取方法

  • 董美 ,
  • 常志军
展开
  • 1. 中国科学院文献情报中心 北京 100190;
    2中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190
董美,硕士研究生。

收稿日期: 2022-05-06

  修回日期: 2022-08-17

  网络出版日期: 2022-09-29

基金资助

本文系中国科学院文献情报中心青年人才领域前沿创新团队项目"基于深度学习的通用领域知识图谱自动化构建方法研究——以中医药领域为例"(项目编号:E0290904)和国家社会科学基金一般项目"面向循证医学的领域文献实体关系识别方法研究"(项目编号:21BTQ106)研究成果之一。

An Entity Relation Extraction Method for Scientific and Technological Documents in the Field of Traditional Chinese Medicine

  • Dong Mei ,
  • Chang Zhijun
Expand
  • 1. National Science Library, Chinese Academy of Sciences, Beijing 100190;
    2. Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190

Received date: 2022-05-06

  Revised date: 2022-08-17

  Online published: 2022-09-29

摘要

[目的/意义]针对现有的中医领域知识图谱来源于科技文献的知识相对缺少的问题,提出一套面向中医科技文献实体关系抽取的解决方案,补充中医临床研究知识库,为领域知识图谱构建提供数据基础。[方法/过程]针对中医科技文献,设计领域实体关系表示模型;并根据领域数据多标签和重叠的特点,将实体关系抽取任务分解为关系分类和实体识别两个子任务,将关系分类结果融入实体识别任务中,设计基于预训练模型 BERT的实体关系抽取级联模型。[结果/结论]在自建中医科技文献信息抽取数据集(TCM-STD-IE)上进行实验验证,关系分类和实体识别的 F1-micro分别为 92. 74%、93. 58%。

本文引用格式

董美 , 常志军 . 一种面向中医领域科技文献的实体关系抽取方法[J]. 图书情报工作, 2022 , 66(18) : 105 -113 . DOI: 10.13266/j.issn.0252-3116.2022.18.010

Abstract

[Purpose/Significance]Aiming at the relative lack of knowledge derived from scientific and technological document in the existing traditional Chinese medicine (TCM) knowledge graph,a set of solutions for entity relation extraction of TCM scientific and technological document is proposed to supplement the TCM clinical research knowledge base and provide a data foundation for the construction of domain knowledge graph.[Method/Process]Aiming at the scientific and technological document of TCM,a domain entity relationship representation model was designed.According to the characteristics of multi-label and overlapping domain data,the entity relationship extraction task was decomposed into two sub-tasks,relationship classification and entity recognition,and the relationship classification results were integrated into the entity recognition task.A cascaded model for entity relation extraction was dseigned based on the pre-trained model BERT.[Result/Conclusion]Experiments are carried out on the self-built TCM scientific and technological document information extraction data set (TCM-STD-IE),and the F1-micro of relation classification and entity recognition are 92.74% and 93.58%,respectively.

参考文献

[1] 黄恒琪,于娟,廖晓,等.知识图谱研究综述[J].计算机系统应用, 2019, 28(6):1-12.
[2] 黄巍,徐海强.知识图谱在汽车维修领域的应用[J].信息技术与标准化, 2021(5):31-34.
[3] 赵军.知识图谱[M].北京:高等教育出版社,2018.
[4] 王尚.中草药文献知识抽取方法研究与应用[D].长春:吉林大学,2020.
[5] CHINCHOR N, MARSH E. MUC-7 information extraction task definition[C]//Proceedings of the 7th message understanding conference. Stroudsburg:Association for Computational Linguistics,1998:359-367
[6] 李肖俊,邵必林.多源异构数据情境中学术知识图谱模型构建研究[J].现代情报, 2020,40(6):88-97.
[7] 王传栋,徐娇,张永.实体关系抽取综述[J].计算机工程与应用, 2020, 56(12):25-36.
[8] 刘峤,李杨,段宏,等.知识图谱构建技术综述[J].计算机研究与发展, 2016, 53(3):582-600.
[9] ZHOU G, SU J, ZHANG J, et al. Exploring various knowledge in relation extraction[C]//Proceedings of the 43rd annual meeting on Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics,2005:427-434.
[10] 郭喜跃,何婷婷,胡小华,等.基于句法语义特征的中文实体关系抽取[J].中文信息学报,2014,28(6):183-189.
[11] FUNDEL K, KüFFNER R, ZIMMER R. RelEx——relation extraction using dependency parse trees[J].Bioinformatics,2007,23(3):365-371.
[12] KAMBHATLA N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations[C]//Proceedings of the ACL on interactive poster and demonstration sessions. Stroudsburg:Association for Computational Linguistics,2004:22-26.
[13] 刘克彬,李芳,刘磊,等.基于核函数中文关系自动抽取系统的实现[J].计算机研究与发展,2007(8):1406-1411.
[14] ZELENKO D, AONE C, RICHARDELLA A. Kernel methods for relation extraction[J].Journal of machine learning research,2003,3(3):1083-1106.
[15] 姚春华,刘潇,高弘毅,等.基于句法语义特征的实体关系抽取技术[J].通信技术,2018,51(8):1828-1835.
[16] 鄂海红,张文静,肖思琪,等.深度学习实体关系抽取研究综述[J].软件学报,2019, 30(6):1793-1818.
[17] 李枫林,柯佳.基于深度学习框架的实体关系抽取研究进展[J].情报科学,2018, 36(3):169-176.
[18] ZHENG S, XU J, BAO H, et al. Joint learning of entity semantics and relation pattern for relation extraction[C]//Proceedings of the joint European conference on machine learning and knowledge discovery in databases. Cham:Springer-Verlag, 2016:443-458.
[19] 杨佳琦.基于中文自然语言处理的糖尿病知识图谱构建[D].包头:内蒙古科技大学,2020.
[20] MIWA M, BANSAL M. End-to-end relation extraction using LSTMs on sequences and tree structures[EB/OL].[2022-05-01]. https://arxiv.org/abs/1601.00770.
[21] ZHENG S, HAO Y, LU D, et al. Joint entity and relation extraction based on a hybrid neural network[J].Neurocomputing, 2017, 257(12):59-66.
[22] SUN C Z, FENG W, HONG Y B, et al. Joint extraction of entities and relations based on a novel tagging scheme[EB/OL].[2022-05-01]. https://arxiv.org/abs/1706.05075.
[23] WANG J, LU W. Two are better than one:joint entity and relation extraction with table-sequence encoder[C]//Proceedings of the 2020 conference on empirical methods in natural language processing. Stroudsburg:Association for Computational Linguistics, 2020:1706-1721.
[24] 刘一斌.中医中文电子病历命名实体语料库构建及研究[D].广州:广州中医药大学,2020.
[25] 高甦,金佩,张德政.基于深度学习的中医典籍命名实体识别研究[J].情报工程, 2019,5(1):113-123.
[26] 卢克治.基于中医古籍的知识图谱构建与应用[D].北京:北京交通大学,2020.
[27] 高佳奕,杨涛,董海艳,等.基于LSTM-CRF的中医医案症状命名实体抽取研究[J].中国中医药信息杂志,2021,28(5):20-24.
[28] 梁科.面向中医医案的数据挖掘技术研究及应用[D].济南:山东大学,2016.
[29] 魏尊强,舒红平,王亚强.基于序列标注的中医症状名识别技术研究[J].山东工业技术,2015(8):237-238.
[30] 孟洪宇,孟庆刚.基于条件随机场的中医术语抽取方法及其应用探析[J].中华中医药学刊,2014,32(10):2334-2337.
[31] 李明浩,刘忠,姚远哲.基于LSTM-CRF的中医医案症状术语识别[J].计算机应用,2018,38(S2):42-46.
[32] 肖瑞,胡冯菊,裴卫.基于BiLSTM-CRF的中医文本命名实体识别[J].世界科学技术-中医药现代化,2020,22(7):2504-2510.
[33] 王煜.面向医学文献的知识抽取关键技术研究[D].合肥:中国科学技术大学, 2021.
[34] 庞震,顾继昱,吴宇飞.中医诊治高血压医疗实体提取问题研究[J].医学信息学杂志, 2021, 42(9):45-51.
[35] 于彤,李敬华,朱玲,等.中医临床知识图谱的构建与应用[J].科技新时代,2017(4):51-54.
[36] 吴小雪,张庆辉.预训练语言模型在中文电子病历命名实体识别上的应用[J].电子质量, 2020(9):61-65.
[37] DEVLIN J, CHANG M W, LEE K, et al. BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2022-05-01]. https://arxiv.org/abs/1810.04805.
[38] ZENG X, ZENG D, HE S, et al. Extracting relational facts by an end-to-end neural model with copy mechanism[C]//Proceedings of the 56th annual meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2018:506-514.
文章导航

/