知识组织

自有知识增强下的学术全文本关系抽取研究

  • 卓可秋 ,
  • 沈思 ,
  • 王东波
展开
  • 1. 南京农业大学信息管理学院 南京 210095;
    2. 南京理工大学经济管理学院 南京 210094
卓可秋,博士研究生;沈思,副教授,博士生导师。

收稿日期: 2021-11-24

  修回日期: 2022-01-19

  网络出版日期: 2022-04-15

基金资助

本文系江苏省自然科学基金青年项目"基于深度学习的学术全文本时态语义知识标识及检索模型构建研究"(项目编号:BK20190450)和国家自然科学基金面上项目"基于深度学习的学术全文本知识图谱构建及检索研究"(项目编号:71974094)研究成果之一。

Research on Relation Extraction of Academic Full-Text Based on Self-Owned Knowledge Enhancement

  • Zhuo Keqiu ,
  • Shen Si ,
  • Wang Dongbo
Expand
  • 1. School of Information Management, Nanjing Agricultural University, Nanjing 210095;
    2. School of Economics and Management, Nanjing University of Technology, Nanjing 210094

Received date: 2021-11-24

  Revised date: 2022-01-19

  Online published: 2022-04-15

摘要

[目的/意义] 学术全文本下的关系抽取是学术全文本知识图谱构建的关键技术,所构建的学术知识图谱能够实现文献的结构化、知识化,提高研究人员检索文献、分析文献和把握科研动态的效率,以及通过图谱的认知推理,有助于隐式知识发现。[方法/过程] 通过外部知识来增强关系抽取已在不少研究取得成果,但针对特定领域的关系抽取往往缺少可用的外部知识。研究发现,全文本中自有的高置信度的知识也可以用来辅助全文本关系抽取。受认知过程双系统理论(系统1为直觉认知,系统2为推理认知)启发,设计一个句子级模型来获取知识,并通过远程监督方式获取高置信度知识,然后将高置信度知识融入到全文本级深度学习模型最后分类的一层上。[结果/结论] 在生物医学学术全文本数据集(CDR-revised)上,比当前最先进的模型在F1上提高11.13%。

本文引用格式

卓可秋 , 沈思 , 王东波 . 自有知识增强下的学术全文本关系抽取研究[J]. 图书情报工作, 2022 , 66(7) : 120 -131 . DOI: 10.13266/j.issn.0252-3116.2022.07.012

Abstract

[Purpose/Significance] Relation extraction under academic full-text is the key technology for the construction of academic full-text knowledge graph. The constructed academic knowledge graph can realize the structure and knowledge of documents, and improve the efficiency of researchers retrieving documents, analyzing documents and grasping scientific research trends, and cognitive reasoning through graphs contributes to implicit knowledge discovery.[Method/Process] Enhancing relation extraction through external knowledge has achieved results in many studies, but relation extraction for specific fields often lacked available external knowledge. The research in this paper found that the high-confidence knowledge in the full-text could also be used to assist the extraction of full-text relations. For this reason, based on the dual-system theory of cognitive processes (system 1 is intuitive cognition, system 2 is reasoning cognition), this paper designed a sentence-level model to acquire knowledge, and obtained high-confidence knowledge through remote supervision, and then high-confidence knowledge was integrated into the final classification layer of the text-level deep learning model.[Result/Conclusion] On the biomedical academic full-text data set (CDR-revised), the F1 is about 11.13% higher than the current state-of-the-art model.

参考文献

[1] NUNOKAWA T, AKAZAWA M, YOKOGAWA N, et al. Late-onset scleroderma renal crisis induced by tacrolimus and prednisolone:a case report[J]. American journal of therapeutics, 2014, 21(5):e130-e133.
[2] ZHOU G D, SU J, ZHANG J, et al. Exploring various knowledge in relation extraction[C]//Proceedings of the 43rd annual meeting of the association for computational linguistics (acl'05).Michigan:ACL, 2005:427-434.
[3] 李冬梅, 张扬, 李东远,等. 实体关系抽取方法研究综述[J]. 计算机研究与发展, 2020,57(7):25.
[4] 王嘉宁,何怡,朱仁煜,等. 基于远程监督的关系抽取技术[J]. 华东师范大学学报(自然科学版), 2020, 213(5):122-139.
[5] DEVLIN J, CHANG M W, LEE K, et al. Bert:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2022-03-01]. https://arxiv.org/pdf/1810.04805.pdf&usg=ALkJrhhzxlCL6yTht2BRmH9atgvKFxHsxQ.
[6] QUIRK C, POON H. Distant supervision for relation extraction beyond the sentence boundary[EB/OL].[2022-03-01]. https://arxiv.org/pdf/1609.04873.
[7] PENG N, POON H, QUIRK C, et al. Cross-sentence n-ary relation extraction with graph lstms[J]. Transactions of the Association for Computational Linguistics, 2017, 5(1):101-115.
[8] VERGA P, STRUBELL E, MCCALLUM A. Simultaneously self-attending to all mentions for full-abstract biological relation extraction[EB/OL].[2022-01-09]. https://arxiv.org/pdf/1802.10569.
[9] NAN G, GUO Z, SEKULIĆ I, et al. Reasoning with latent structure refinement for document-level relation extraction[EB/OL].[2022-01-09]. https://arxiv.org/pdf/2005.06312.
[10] LIU Y, OTT M, GOYAL N, et al. Roberta:a robustly optimized bert pretraining approach[EB/OL].[2022-01-09]. https://arxiv.org/pdf/1907.11692.pdf%5C.
[11] BELTAGY I, LO K, COHAN A. Scibert:a pretrained language model for scientific text[EB/OL].[2022-01-10]. https://arxiv.org/pdf/1903.10676.
[12] EVANS J S B T, FRANKISH K E. In two minds:dual processes and beyond[M]. Oxford:Oxford University Press, 2009.
[13] 薛露, 宋威. 基于动态标签的关系抽取方法[J]. 计算机应用, 2020, 40(6):1601-1606.
[14] 孙长志. 基于深度学习的联合实体关系抽取[D]. 上海:华东师范大学, 2020.
[15] LIN Y, SHEN S, LIU Z, et al. Neural relation extraction with selective attention over instances[C]//Proceedings of the 54th annual meeting of the Association for Computational Linguistics.Berlin:ACL,2016:2124-2133.
[16] ZENG D, LIU K, LAI S, et al. Relation classification via convolutional deep neural network[C]//Proceedings of COLING 2014, the 25th international conference on computational linguistics. Dublin:Dublin City University and Association for Computational Linguistics,2014:2335-2344.
[17] WANG L, CAO Z, DE MELO G, et al. Relation classification via multi-level attention cnns[C]//Proceedings of the 54th annual meeting of the Association for Computational Linguistics. Berlin:ACL, 2016:1298-1307.
[18] SOARES L B, FITZGERALD N, LING J, et al. Matching the blanks:distributional similarity for relation learning[EB/OL].[2022-03-10]. https://arxiv.org/pdf/1906.03158.
[19] YAO Y, YE D, LI P, et al. DocRED:a large-scale document-level relation extraction dataset[EB/OL].[2022-01-10]. https://arxiv.org/pdf/1906.06127.
[20] GUPTA P, RAJARAM S, SCHVTZE H, et al. Neural relation extraction within and across sentence boundaries[C]//Proceedings of the AAAI conference on artificial intelligence. Hawaii:AAAI, 2019, 33(1):6513-6520.
[21] XU W, CHEN K, ZHAO T. Discriminative reasoning for document-level relation extraction[EB/OL].[2022-01-10]. https://arxiv.org/pdf/2106.01562.
[22] ZHOU H, XU Y, YAO W, et al. Global context-enhanced graph convolutional networks for document-level relation extraction[C]//Proceedings of the 28th international conference on computational linguistics. Barcelona:International Committee on Computational Linguistics, 2020:5259-5270.
[23] CHRISTOPOULOU F, MIWA M, ANANIADOU S. Connecting the dots:document-level neural relation extraction with edge-oriented graphs[EB/OL].[2022-01-10]. https://arxiv.org/pdf/1909.00228.
[24] ZENG S, XU R, CHANG B, et al. Double graph based reasoning for document-level relation extraction[EB/OL].[2022-01-10]. https://arxiv.org/pdf/2009.13752.
[25] GUO Z, ZHANG Y, LU W. Attention guided graph convolutional networks for relation extraction[EB/OL].[2022-01-10]. https://arxiv.org/pdf/1906.07510.
[26] ZHOU W, HUANG K, MA T, et al. Document-level relation extraction with adaptive thresholding and localized context pooling[EB/OL].[2022-03-01]. https://www.aaai.org/AAAI21Papers/AAAI-8308.ZhouW.pdf
[27] YANG A, WANG Q, LIU J, et al. Enhancing pre-trained language representations with rich knowledge for machine reading comprehension[C]//Proceedings of the 57th annual meeting of the Association for Computational Linguistics. Florence:ACL, 2019:2346-2357.
[28] WANG C, JIANG H. Explicit utilization of general knowledge in machine reading comprehension[EB/OL].[2022-03-01]. https://arxiv.org/pdf/1809.03449.
[29] 王冠颖. 融合知识图谱嵌入的关系抽取[D]. 杭州:浙江大学,2019.
[30] CHEN Q, ZHU X, LING Z H, et al. Neural natural language inference models enhanced with external knowledge[EB/OL].[2022-03-01]. https://arxiv.org/pdf/1711.04289.
[31] WANG Z, LI L, ZENG D, et al. Knowledge-enhanced natural language inference based on knowledge graphs[C]//Proceedings of the 28th international conference on computational linguistics.Barcelona:International Committee on Computational Linguistics,2020:6498-6508.
[32] PETERS M E, NEUMANN M, LOGAN IV R L, et al. Knowledge enhanced contextual word representations[EB/OL].[2022-03-01]. https://arxiv.org/pdf/1909.04164.
[33] YAO Y, YE D, LI P, et al. DocRED:A large-scale document-level relation extraction dataset[EB/OL].[2022-03-01]. https://www.researchgate.net/profile/Zhenghao-Liu/publication/333815327_DocRED_A_Large-Scale_Document-Level_Relation_Extraction_Dataset/links/5fc60274299bf1a422c77e3d/DocRED-A-Large-Scale-Document-Level-Relation-Extraction-Dataset.pdf.
[34] BELTAGY I, LO K, COHAN A. SciBERT:A Pretrained language model for scientific text[C]//Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Hong Kong:ACL, 2019:3615-3620.
[35] XUE F, SUN A, ZHANG H, et al. GDPNet:refining latent multi-View graph for relation extraction[C]//Thirty-Fifth AAAI Conference on Artificial Intelligence.Online:AAAI,2021:2-9.
[36] LEE J, LEE I, KANG J. Self-attention graph pooling[C]//International conference on machine learning. Long Beach:ICML, 2019:3734-3743.
[37] 白龙, 靳小龙, 席鹏弼, 等. 基于远程监督的关系抽取研究综述[J]. 中文信息学报, 2019, 33(10):10-17.
[38] 马进, 杨一帆, 陈文亮. 基于远程监督的人物属性抽取研究[J]. 中文信息学报, 2020, 34(6):64-72.
[39] 谌予恒,王峥. 结合注意力机制与残差网络的远程监督关系抽取[J].计算机与数字工程,2020,48(4):909-913.
[40] 高勇. 基于远程监督的企业实体关系抽取算法的研究与改进[D].上海:上海市计算技术研究所,2020.
[41] ZHANG N, CHENn X, XIE X, et al. Document-level relation extraction as semantic segmentation[EB/OL].[2022-03-01]. https://arxiv.org/pdf/2106.03618.pdf?ref=https://githubhelp.com.
[42] Paperswithcode. Relation extraction|papers with code[EB/OL].[2022-01-14]. https://paperswithcode.com/task/relation-extraction#datasets.
[43] YAO Y, YE D, LI P, et al. DocRED:a large-scale document-level relation extraction dataset[EB/OL].[2022-03-01]. https://aclanthology.org/P19-1074/?ref=https://githubhelp.com.
[44] LI J, SUN Y, JOHNSON R J, et al. BioCreative V CDR task corpus:a resource for chemical disease relation extraction[J]. Database, 2016(1):1-10.
[45] WU Y, LUO R, LEUNG H C M, et al. Renet:a deep learning approach for extracting gene-disease associations from literature[C]//International conference on research in computational molecular biology. Berlin:Springer, 2019:272-284.
[46] MICIKEVICIUS P, NARANG S, ALBEN J, et al. Mixed precision training[EB/OL].[2022-03-01]. https://arxiv.org/pdf/1710.03740.pdf?ref=https://githubhelp.com.
[47] VERGA P, STRUBELL E, MCCALLUM A. Simultaneously self-attending to all mentions for full-abstract biological relation extraction[EB/OL].[2022-03-01]. https://arxiv.org/pdf/1802.10569.
[48] CHRISTOPOULOU F, MIWA M, ANANIADOU S. Connecting the dots:document-level neural relation extraction with edge-oriented graphs[EB/OL].[2022-03-01]. https://aclanthology.org/D19-1498/.
[49] ZHANG Z, YU B, SHU X, et al. Document-level relation extraction with dual-tier heterogeneous graph[C]//Proceedings of the 28th international conference on computational linguistics. Barcelona:International Committee on Computational Linguistics, 2020:1630-1641.
[50] WANG D, HU W, CAO E, et al. Global-to-local neural networks for document-level relation extraction[EB/OL].[2022-03-01]. https://arxiv.org/pdf/2009.10359.
[51] BELTAGY I, LO K, COHAN A. Scibert:a pretrained language model for scientific text[EB/OL].[2022-03-01]. https://aclanthology.org/D19-1371.pdf.
文章导航

/