[Purpose/significance] Contribution sentences of academic papers are elements to reflect the novelty and academic value of papers. This study takes the full text of academic papers and MeSH terms as data sources and uses natural language processing and deep learning techniques to achieve academic paper contribution sentence recognition. This study lays the foundation for fine-grained mining of innovative contents of academic texts, which is important for realizing the evaluation of academic papers based on cognitive computing.[Method/process] Firstly, the full-text PubMed papers were used as the data source for element analysis and feature extraction of the contributed sentences. Secondly, a semi-automatic approach was used to fulfill the data annotation. Finally, the automatic recognition of contributed sentences was realized based on Albert deep learning model.[Result/conclusion] The plausibility of the experimentally labeled training data is proved by the data consistency test, and the experimental results show that the automatic recognition model trained in this paper can identify the contribution sentences in academic papers more effectively compared with other deep learning models.
Luo Zhuoran
,
Cai Le
,
Qian Jiajia
,
Lu Wei
. Research on the Recognition of Innovative Contribution Sentences of Academic Papers[J]. Library and Information Service, 2021
, 65(12)
: 93
-100
.
DOI: 10.13266/j.issn.0252-3116.2021.12.009
[1] 新华网.习近平:在科学家座谈会上的讲话[EB/OL].[2021-05-07]. http://www.xinhuanet.com/2020-09/11/c_1126483997.htm.
[2] 国家标准化管理委员会.科学技术报告、学位论文和学术论文的编写格式:GB 7713-87[S]. 北京:中国标准出版社,1987.
[3] 李如森,彭彩红,赵福荣.科技论文创新性判断方法[J].鞍山钢铁学院学报,2001(3):234-236.
[4] 温有奎,吴广印.碎片化科研创新点动态挖掘研究[J].数字图书馆论坛,2014(7):25-32.
[5] 张帆,乐小虬.面向领域科技文献的句子级创新点抽取研究[J].现代图书情报技术,2014(9):15-21.
[6] 索传军,于果鑫.学术论文研究亮点的语言学特征与分布规律研究[J].图书情报工作,2020,64(9):104-113.
[7] 章成志,李铮.基于学术论文全文的创新研究评价句抽取研究[J].数据分析与知识发现,2019,3(10):12-19.
[8] 曹树金,闫欣阳,张倩,等.中外情报学论文创新性特征研究[J]. 图书情报工作, 2020,64(1):80-92.
[9] 温浩.科技文摘创新点语义识别与分类方法研究[J].情报学报,2019,38(3):249-256.
[10] 周海晨,郑德俊,郦天宇.学术全文本的学术创新贡献识别探索[J].情报学报,2020,39(8):845-851
[11] CHEN L L, FANG H. An automatic method for extracting innovative ideas based on the Scopus® database[J]. Knowledge organization, 2019, 46(3):171-186.
[12] ALLAN J, WADE C, BOLIVAR A. Retrieval and novelty detection at the sentence level[C]//Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. Toronto:ACM,2003:314-321.
[13] TEUFEL S, MOENS M. Summarizing scientific articles:experiments with relevance and rhetorical status[J]. Computational linguistics, 2002, 28(4):409-445.
[14] HEFFERNAN K, TEUFEL S. Identifying problems and solutions in scientific text[J]. Scientometrics, 2018, 116(2):1367-1382.
[15] 冷伏海,白如江,祝清松.面向科技文献的混合语义信息抽取方法研究[J].图书情报工作,2013,57(11):112-119.
[16] 毛琛瑜,乐小虬.领域内中文科技文献中新发现语言描述特征分析[J].现代图书情报技术,2016(5):47-55.
[17] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL].[2021-05-07]. https://arxiv.org/pdf/1301.3781v3.pdf.
[18] PENNINGTON J, SOCHER R, MANNING C D. Glove:global vectors for word representation[C]//Proceedings of the 2014 conference on empirical methods in natural language processing. Doha:Association for Computational Linguistics,2014:1532-1543.
[19] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]//Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics:human language technologies, Volume 1(long papers). New Orleans:Association for Computational Linguistics, 2018:2227-2237.
[20] DEVLIN J, CHANG M W, LEE K, et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics:human language technologies, volume 1(long and short papers). Minneapolis:Association for Computational Linguistics, 2019:4171-4186.
[21] 鲁威. 基于多因素特征的文本分类的研究[D]. 成都:电子科技大学,2019.
[22] 顾亦然,霍建霖,杨海根,等.基于BERT的电机领域中文命名实体识别方法[EB/OL].[2021-05-07].https://doi.org/10.19678/j.issn.1000-3428.0058838.
[23] 廖胜兰,吉建民,俞畅,等.基于BERT模型与知识蒸馏的意图分类方法[EB/OL].[2021-05-07].https://doi.org/10.19678/j.issn.1000-3428.0057416.
[24] LAN Z Z, CHEN M D, GOODMAN S, et al. ALBERT:a lite BERT for self-supervised learning of language representations[EB/OL].[2021-05-07]. https://openreview.net/pdf?id=H1eA7AEtvS.
[25] DENHOLM C J, PHILPOTT C. Making the implicit explicit:creating performance expectations for the dissertation[J]. Quality assurance in education,2009,17(2):204-206.
[26] DAHL T. Contributing to the academic conversation:a study of new knowledge claims in economics and linguistics[J]. Journal of pragmatics, 2008, 40(7):1184-1201.
[27] 李瑛,周立.科技期刊论文创新点合理呈现的价值及理想模式[J].中国科技期刊研究,2018,29(10):993-999.
[28] 李贺,杜杏叶. 基于知识元的学术论文内容创新性智能化评价研究[J]. 图书情报工作, 2020,64(1):93-104.
[29] MISHRA S, TORVIK V I. Quantifying conceptual novelty in the biomedical literature[EB/OL].[2021-05-07]. http://www.dlib.org/dlib/september16/mishra/09mishra.html.
[30] TEUFEL S, SIDDHARTHAN A,TIDHAR D. An annotation scheme for citation function[C]//Proceedings of the 7th SIGdial workshop on discourse and dialogue.New York:ACM,2006:80-87.
[31] DIETTERICH T G. Approximate statistical tests for comparing supervised classification learning algorithms[J]. Neural computation, 1998, 10(7):1895-1923.