A Review of Problem and Method Recognition and Relation Extraction in Academic Papers

  • Zhang Yingyi ,
  • Zhang Chengzhi ,
  • Daqing He
Expand
  • 1. Department of Information Management, School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094;
    2. School of Computing and Information, University of Pittsburgh, Pittsburgh 15260

Received date: 2021-11-22

  Revised date: 2022-03-31

  Online published: 2022-06-25

Abstract

[Purpose/Significance] Problems and methods are important parts of academic papers. Effectively organizing the problems and methods scattered in the academic papers, such as problem and method recognition and their relationship extraction, can mine the tacit knowledge in the academic papers and promote the construction of the method system and problem system in a discipline. To sort out previous studies on problem and method recognition and relationship extraction in academic papers, we can grasp the development trend, discover the shortcomings in this research, and provide guidance for future work. [Method/Process] In terms of mining problems and methods in academic papers, recent research was carried out around four research points, i.e., the definition of problems, methods and their relationship, the construction of problems, methods and their relationship datasets, problem and method recognition and relationship extraction methods, and the application of problems, methods and their relationship. This paper sorted out these four research points separately and summarized the current situation of knowledge mining of problems and methods in academic papers. [Result/Conclusion] The analysis finds that in the definitions of problems and methods, they seldom take the theories such as problemology in the philosophy of science into account; In problem and method dataset construction, there is a phenomenon of repeated annotations. Furthermore, most open-source datasets are in the field of natural science and are generally English corpus, while Chinese open-source corpus are scarce; In the problem and method recognition and relationship extraction, the performance of the existing extraction model is still low; The mining of problems and methods should not stop at concept recognition and relationship extraction, and in-depth analysis and application of the extracted knowledge is required.

Cite this article

Zhang Yingyi , Zhang Chengzhi , Daqing He . A Review of Problem and Method Recognition and Relation Extraction in Academic Papers[J]. Library and Information Service, 2022 , 66(12) : 125 -138 . DOI: 10.13266/j.issn.0252-3116.2022.12.012

References

[1] 杨信礼.社会发展动力机制的结构、功能与运行过程[J].中共中央党校学报, 2002, 6(4):28-33.
[2] 李丹.科学研究活动中的知识管理研究[D].武汉:武汉大学, 2005.
[3] 林定夷.问题与科学研究:问题学之探究[M].广州:中山大学出版社, 2006.
[4] BORNMANN L, MUTZ R. Growth rates of modern science:a bibliometric analysis based on the number of publications and cited references[J]. Journal of the Association for Information Science&Technology, 2015, 66(11):2215-2222.
[5] 科技部.关于破除科技评价中"唯论文"不良导向的若干措施(试行)[EB/OL].[2021-11-19]. https://www.cas.cn/zcjd/202002/t20200223_4735451.shtml.
[6] 中华人民共和国教育部.中共中央国务院印发《深化新时代教育评价改革总体方案》[EB/OL].[2021-11-19]. http://www.moe.gov.cn/jyb_xxgk/moe_1777/moe_1778/202010/t20201013_494381.html.
[7] 国务院办公厅.国务院办公厅关于完善科技成果评价机制的指导意见[EB/OL].[2021-08-02]. http://www.gov.cn/zhengce/content/2021-08/02/content_5628987.htm.
[8] 王海燕,潘云涛,马峥,等.基于科学研究问题成熟度的未来高影响力科技论文预测研究[J].情报学报, 2016, 35(1):36-47.
[9] 王玉琢,章成志.考虑全文本内容的算法学术影响力分析研究[J].图书情报工作, 2017, 61(23):6-14.
[10] 章成志,丁睿祎,王玉琢.基于学术论文全文内容的算法使用行为及其影响力研究[J].情报学报, 2018, 37(12):1175-1187.
[11] 钱佳佳,罗卓然,陆伟.基于问题-方法组合的科技论文新颖性度量与创新类型识别[J].图书情报工作, 2021, 65(14):82-89.
[12] 王艳艳,张均胜,乔晓东,等.基于问题-方法矩阵的文献新颖性评估方法[J].情报理论与实践, 2021, 44(2):90-95.
[13] NASAR Z, JAFFRY S W, MALIK M K. Information extraction from scientific articles:a survey[J]. Scientometrics, 2018, 117(3):1931-1990.
[14] ZHENG A, ZHAO H, LUO Z, et al. Improving on-line scientific resource profiling by exploiting resource citation information in the literature[J]. Information processing&management, 2021, 58(5):1-13.
[15] WESTERGAARD D, STRFELDT H-H, TØNSBERG C, et al. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts[J]. PLoS computational biology, 2018, 14(2):e1005962.
[16] LIN J. Is searching full text more effective than searching abstracts?[J]. BMC bioinformatics, 2009, 10(1):1-15.
[17] 杜秀杰,赵大良.学术论文语言表达范式分析[J].编辑学报, 2018, 30(3):260-263.
[18] 程齐凯.学术文本的词汇功能识别[D].武汉:武汉大学, 2015.
[19] KOVAEVIC'A, KONJOVIC'Z, MILOSAVLJEVIC'B, et al. Mining methodologies from NLP publications:a case study in automatic terminology recognition[J]. Computer speech&language, 2012, 26(2):105-126.
[20] AUGENSTEIN I, DAS M, RIEDEL S, et al. Semeval 2017 task 10:scienceie-extracting keyphrases and relations from scientific publications[C]//Proceedings of the 11th international workshop on semantic evaluation. Vancouver:Association for Computational Linguistics, 2017:546-555.
[21] LUAN Y, HE L, OSTENDORF M, et al. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction[C]//Proceedings of the 2018 conference on empirical methods in natural language processing. Brussels:Association for Computational Linguistics, 2018:3219-3232.
[22] 索传军,赖海媚.学术论文问题知识元的类型与描述规则[J].中国图书馆学报, 2021, 47(2):95-109.
[23] GUPTA S, MANNING C D. Analyzing the dynamics of research by extracting key aspects of scientific papers[C]//Proceedings of 5th international joint conference on natural language processing. Chiang Mai:Asian Federation of Natural Language Processing, 2011:1-9.
[24] SINGH M, DAN S, AGARWAL S, et al. AppTechMiner:Mining applications and techniques from scientific articles[C]//Proceedings of the 6th international workshop on mining scientific publications. New York:Association for Computing Machinery, 2017:1-8.
[25] 蒋婷.学科领域本体学习及学术资源语义标注研究[D].南京:南京大学, 2017.
[26] HOUNGBO H, MERCER R E. Method mention extraction from scientific research papers[C]//Proceedings of the coling. Mumbai:The COLING 2012 Organizing Committee, 2012:1211-1222.
[27] 章成志,张颖怡.基于学术论文全文的研究方法实体自动识别研究[J].情报学报, 2020, 39(6):589-600.
[28] 王曰芬.文献计量法与内容分析法的综合研究[D].南京:南京理工大学, 2007.
[29] QASEMIZADEH B, SCHUMANN A-K. The ACL RD-TEC 2.0:a language resource for evaluating term extraction and entity recognition methods[C]//Proceedings of the tenth international conference on language resources and evaluation. Portoro:European Language Resources Association, 2016:1862-1868.
[30] BRACK A, D'SOUZA J, HOPPE A, et al. Domain-independent extraction of scientific concepts from research articles[C]//Proceedings of the advances in information retrieval. Lisbon:Springer, 2020:251-266.
[31] SOLDATOVA L N, KING R D. An ontology of scientific experiments[J]. Journal of the Royal Society Interface, 2006, 3(11):795-803.
[32] DESS D, OSBORNE F, RECUPERO D R, et al. Ai-kg:an automatically generated knowledge graph of artificial intelligence[C]//Proceedings of the international semantic Web conference. Online:Springer, 2020:127-143.
[33] SHUM S B, MOTTA E, DOMINGUE J. ScholOnto:an ontology-based digital library server for research documents and discourse[J]. International journal on digital libraries, 2000, 3(3):237-248.
[34] CICCARESE P, WU E, KINOSHITA J, et al. The SWAN scientific discourse ontology[J]. Journal of biomedical informatics, 2008, 41(5):739-751.
[35] GÁBOR K, BUSCALDI D, SCHUMANN A-K, et al. Semeval-2018 task 7:Semantic relation extraction and classification in scientific papers[C]//Proceedings of the 12th international workshop on semantic evaluation. New Orleans:Association for Computational Linguistics, 2018:679-688.
[36] TATEISI Y, OHTA T, PYYSALO S, et al. Typed entity and relation annotation on computer science papers[C]//Proceedings of the tenth international conference on language resources and evaluation. Portoro:European Language Resources Association, 2016:3836-3843.
[37] JAIN S, ZUYLEN M V, HAJISHIRZI H, et al. SciREX:a challenge dataset for document-level information extraction[C]//Proceedings of the 58th annual meeting of the Association for Computational Linguistics. Online:Association for Computational Linguistics, 2020:7506-7516.
[38] MONDAL I, HOU Y, JOCHIM C. End-to-end construction of nlp knowledge graph[C]//Proceedings of the findings of the Association for Computational Linguistics. Online:Association for Computational Linguistics, 2021:1885-1895.
[39] 吴婷,孔芳.基于图注意力卷积神经网络的文档级关系抽取[J].中文信息学报, 2021, 35(10):73-80.
[40] NAN G, GUO Z, SEKULI I, et al. Reasoning with latent structure refinement for document-level relation extraction[C]//Proceedings of the 58th annual meeting of the Association for Computational Linguistics. Online:Association for Computational Linguistics, 2020:1546-1557.
[41] 黄萃,陈静,陈惠玲.第四研究范式:数据驱动下的人文社科研究模式跃迁[J].中国高校科技, 2021(10):10-14.
[42] 储荷婷.图书馆情报学界的研究方法:实践与发展[J].国家图书馆学刊, 2014, 23(3):3-14.
[43] CHU H, KE Q. Research methods:what's in the name?[J]. Library&information science research, 2017, 39(4):284-294.
[44] HOWISON J, BULLARD J. Software in the scientific literature:Problems with seeing, finding, and using software mentioned in the biology literature[J]. Journal of the Association for Information Science and Technology, 2015, 67(9):2137-2155.
[45] HEFFERNAN K, TEUFEL S. Identifying problems and solutions in scientific text[J]. Scientometrics, 2018, 116(2):1367-1382.
[46] QASEMIZADEH B. Investigating context parameters in technology term recognition[C]//Proceedings of the coling workshop on synchronic and diachronic approaches to analyzing technical language. Dublin:Dublin City University and Association for Computational Linguistics, 2014:1-10.
[47] 程齐凯,李信.面向语义出版的学术文本词汇语义功能自动识别[J].数字图书馆论坛, 2017(8):24-31.
[48] TUAROB S, BHATIA S, MITRA P, et al. AlgorithmSeer:a system for extracting and searching for algorithms in scholarly big data[J]. IEEE transactions on big data, 2016, 2(1):3-17.
[49] DEVLIN J, CHANG M-W, LEE K, et al. BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics:human language technologies. Minneapolis:Association for Computational Linguistics, 2019:4171-4186.
[50] BELTAGY I, LO K, COHAN A. SciBERT:a pretrained language model for scientific text[C]//Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing. Hong Kong:Association for Computational Linguistics, 2019:3615-3620.
[51] FÄRBER M, ALBERS A, SCHVBER F. Identifying used methods and datasets in scientific publications[C]//Proceedings of the SDU@AAAI. Online:AAAI, 2021:1-9.
[52] JIANG M, D'SOUZA J, AUER S, et al. Improving scholarly knowledge representation:evaluating bert-based models for scientific relation classification[C]//Proceedings of the international conference on Asian digital libraries. Online:Springer, 2020:3-19.
[53] LUAN Y, OSTENDORF M, Hajishirzi H. Scientific information extraction with semi-supervised neural tagging[C]//Proceedings of the 2017 conference on empirical methods in natural language processing. Copenhagen:Association for Computational Linguistics, 2017:2641-2651.
[54] WADDEN D, WENNBERG U, LUAN Y, et al. Entity, relation, and event extraction with contextualized span representations[C]//Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing. Hong Kong:Association for Computational Linguistics,2019:5784-5789.
[55] ZHONG Z, CHEN D. A frustratingly easy approach for joint entity and relation extraction[C]//Proceedings of the NAACL-HLT. Online:Association for Computational Linguistics, 2021:50-61.
[56] PETERS M, NEUMANNR M, IYYER M, et al. Deep contextualized word representations[C]//Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics:human language technologies. New Orleans:Association for Computational Linguistics, 2018:2227-2237.
[57] KAMEDA A, UCHIYAMA K, Takeda H, et al. Extraction of semantic relationships from academic papers using syntactic patterns[C]//Proceedings of eKNOW. Nice:IARIA, 2013:32-35.
[58] MIWA M, SASAKI Y. Modeling joint entity and relation extraction with table representation[C]//Proceedings of the 2014 conference on empirical methods in natural language processing. Doha:Association for Computational Linguistics, 2014:1858-1869.
[59] BARIK B, MARSI E. NTNU-2 at SemEval-2017 task 10:Identifying synonym and hyponym relations among keyphrases in scientific documents[C]//Proceedings of the 11th international workshop on semantic evaluation. Vancouver:Association for Computational Linguistics, 2017:965-968.
[60] LEE J Y, DERNONCOURT F, SZOLOVITS P. MIT at SemEval-2017 Task 10:Relation extraction with convolutional neural networks[C]//Proceedings of the 11th international workshop on semantic evaluation. Vancouver:Association for Computational Linguistics, 2017:978-984.
[61] DAI Q, INOUE N, REISERT P, et al. Improving scientific relation classification with task specific supersense[C]//Proceedings of the 32nd Pacific Asia conference on language, information and computation. Hong Kong:Association for Computational Linguistics, 2018:129-138.
[62] PAN X, YAN E, CUI M, et al. How important is software to library and information science research?a content analysis of full-text publications[J]. Journal of informetrics, 2019, 13(1):397-406.
[63] 聂辉华,江艇,杨汝岱.中国工业企业数据库的使用现状和潜在问题[J].世界经济,2012,35(5):142-158.
Outlines

/