A Research Entity Recognition Algorithm Based on Dependency Parsing

  • Zhao Huaming ,
  • Qian Li ,
  • Yu Li
Expand
  • 1 National Science Library, Chinese Academy of Sciences, Beijing 100190;
    2 Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190

Received date: 2019-09-24

  Revised date: 2020-02-02

  Online published: 2020-06-05

Abstract

[Purpose/significance] To explore the recognition and extraction of research entities and their relationships, improve their recognition effect in complex situations such as long sentences, and provide reference for further application. [Method/process] Based on the analysis of dependency syntactic features, a method for recognizing and extracting research entity relations was proposed, which includes:POS tagging of the target text using Standford Tagger tool; based on annotation results, the target text was divided into semantic segments of structure specification around the core predicate and SAO structure; through dependency parsing, we can find out the subject and object related to the core predicate and form a triple of entities, relationships and entities. [Result/conclusion] This method is compared with Ollie and Reverb mainstream algorithm. Experiments show that this method can effectively improve the accuracy of scientific entity recognition.

Cite this article

Zhao Huaming , Qian Li , Yu Li . A Research Entity Recognition Algorithm Based on Dependency Parsing[J]. Library and Information Service, 2020 , 64(11) : 108 -115 . DOI: 10.13266/j.issn.0252-3116.2020.11.012

References

[1] 徐芬, 王挺, 陈火旺. 基于SVM方法的中文实体关系抽取[C]//大连理工大学,清华大学智能技术与系统国家重点实验室.内容计算的研究与应用前沿——第九届全国计算语言学学术会议论文集.大连理工大学,清华大学智能技术与系统国家重点实验室:中国中文信息学会, 2007:497-502.
[2] KAMBHATLA N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations[C]//Proceedings of the ACL 2004 on interactive poster and demonstration sessions. Stroudsburg:ACL, 2004:1-4.
[3] 郭喜跃, 何婷婷, 胡小华, 等. 基于句法语义特征的中文实体关系抽取[J].中文信息学报, 2014, 28(6):183-189.
[4] 甘丽新, 万常选, 刘德喜, 等. 基于句法语义特征的中文实体关系抽取[J].计算机研究与发展, 2016, 53(2):284-302.
[5] LI H, WU X, LI Z, et al. A relation extraction method of Chinese named entities based on location and semantic features[J].Applied intelligence, 2013, 38(1):1-15.
[6] 奚斌, 钱龙华, 周国栋, 等. 语言学组合特征在语义关系抽取中的应用[J]. 中文信息学报, 2008, 22(3):44-50.
[7] BANKO M, CAFARELLA M J, SODERLAND S, et al. Open information extraction from the Web[C]//Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India. San Francisco:Morgan Kaufmann Publishers Inc., 2007:2670-2676.
[8] FADER A, SODERLAND S, ETZIONI O. Identifying Relations for Open Information Extraction[C]//Proceedings of the 2011 conference on empirical methods in natural language processing. Stroudsburg:ACL, 2011:1535-1545.
[9] ETZIONI O, FADER A, CHRISTENSEN J, et al. Open information extraction:the second generation[C]//Proceedings of conference on artificial intelligence. Palo Alto:AAAI Press, 2011:3-10.
[10] WU F, WELD D S. Open Information Extraction Using Wikipedia[C]//Proceedings of the 48th annual meeting of the Association for Computational Linguistics. Stroudsburg:ACL, 2010:118-127.
[11] AKBIK A, LÖSER A. KrakeN:N-ary facts in open information extraction[C]//Proceedings of the joint workshop on automatic knowledge base construction and web-scale knowledge extraction. Stroudsburg:ACL, 2012:52-56.
[12] SCHMITZ M, BART R, SODERL S, et al. Open language learning for information extraction[C]//Proceedings of the conference on empirical methods in natural language processing and computational natural language learning. Stroudsburg:ACL, 2012:523-534.
[13] MAUSAM M. Open information extraction systems and downstream applications[C]//Proceedings of the twenty-fifth international joint conference on artificial intelligence. Palo Alto:AAAI Press, 2016:4074-4077.
[14] 武文雅, 陈钰枫, 徐金安, 等. 中文实体关系抽取研究综述[J].计算机与现代化, 2018(8):21-27.
[15] 唐敏. 基于深度学习的中文实体关系抽取方法研究[D].成都:西南交通大学, 2018.
[16] LIN Y, LIU Z, SUN M. Neural relation extraction with multi-lingual attention[C]//Proceedings of the 55th annual meeting of the Association for Computational Linguistics. Vancouver:ACL, 2017:34-43.
[17] ILEVBARE I M, PROBERT D, PHAAL R. A review of TRIZ, and its benefits and challenges in practice[J].Technovation, 2013, 33(2):30-37.
[18] CHOI S, YOON J, KIM K, et al. SAO network analysis of patents for technology trends identification:a case study of polymer electrolyte membrane technology in proton exchange membrane fuel cells[J].Scientometrics, 2011, 88(3):863-883.
[19] 郭俊芳, 汪雪锋, 邱鹏君, 等. 基于SAO分析的技术路线图构建研究[J].科学学研究, 2014(7):976-981.
[20] 汪雪锋, 邱鹏君, 付芸. 一种新型技术路线图构建研究——基于SAO结构信息[J]. 科学学研究, 2015(8):1134-1140.
[21] ANGELI G, PREMKUMAR M J, MANNING C D. leveraging linguistic structure for open domain information extraction[C]//Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing. Stroudsburg:ACL, 2015:344-354.
[22] CORRO L D, GEMULLA R. ClausIE:Clause-based open information extraction[C]//Proceedings of the 22nd international conference on World Wide Web. New York:ACM, 2013:355-366.
[23] Tregex, Tsurgeon and Semgrex[EB/OL].[2019-09-17].https://nlp.stanford.edu/software/tregex.shtml.
[24] 蒋婷, 孙建军. 学术资源本体非等级关系抽取研究[J].图书情报工作, 2016, 60(20):112-122.
[25] What is WordNet?[EB/OL].[2019-09-17]. https://wordnet.princeton.edu/.
Outlines

/