研究论文

融合学术贡献与创新价值的论文学术影响力预测分级方法

  • 张彪 ,
  • 陈云伟 ,
  • 高道斌
展开
  • 1 中国科学院成都文献情报中心 成都 610299;
    2 中国科学院大学经济与管理学院信息资源管理系 北京 100190;
    3 大连理工大学科学学与科技管理研究所 大连 116024
张彪,博士研究生;陈云伟,研究员,博士,博士生导师,通信作者,E-mail:chenyw@clas.ac.cn;高道斌,博士研究生。

收稿日期: 2024-03-18

  修回日期: 2024-06-24

  网络出版日期: 2025-01-25

基金资助

本文系中国科学院成都文献情报中心2022年创新基金创新业务平台项目“科技创新评价研究中心”(项目编号:E3Z0000101)研究成果之一。

Hierarchical Method for Predicting the Academic Impact of Papers Based on the Integration of Academic Contributions and Innovative Value

  • Zhang Biao ,
  • Chen Yunwei ,
  • Gao Daobin
Expand
  • 1 National Science Library (Chengdu), Chinese Academy of Sciences, Chengdu 610299;
    2 Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190;
    3 Faculty of Humanities and Social Sciences, Dalian University of Technology, Dalian 116024

Received date: 2024-03-18

  Revised date: 2024-06-24

  Online published: 2025-01-25

Supported by

This work is supported by the project of the Innovation Fund of the Chengdu Library and Information Center,Chinese Academy of Sciences in 2022,titled “Science and Technology Innovation Evaluation Research Center”(Grant No.E3Z0000101).

摘要

[目的/意义] 基于论文自身携带的知识信息,提出融合学术贡献与创新价值的论文学术影响力预测分级方法,为更加全面、科学预测论文学术影响力提供新思路。[方法/过程] 从以文评文的角度出发,首先从论文本身选取常规的文献计量指标,采用 prompt 微调大模型的方法从论文摘要中生成学术贡献,并且创建知识差异度、知识离群度、知识前沿度、知识增长量表征论文的创新价值;其次,将向量化后的学术贡献与文献计量指标、创新价值指标拼接融合作为机器学习算法的输入,采用性能最优的算法预测论文属于高影响类别的概率,根据概率将论文学术影响力划分为 A—J 十个等级;最后,基于 SHAP(shapley additive explanations) 解释框架进一步分析学术贡献、创新价值、文献计量指标对论文学术影响力的影响及作用方向。[结果/结论] 创新价值对论文学术影响力预测的重要性最高,其次是学术贡献,文献计量指标的重要性最低。所提出的方法可以大幅度提升论文学术影响力预测的准确性。

本文引用格式

张彪 , 陈云伟 , 高道斌 . 融合学术贡献与创新价值的论文学术影响力预测分级方法[J]. 图书情报工作, 2025 , 69(2) : 108 -120 . DOI: 10.13266/j.issn.0252-3116.2025.02.010

Abstract

[Purpose/Significance] This study proposes a hierarchical method for predicting the academic impact of papers by integrating their academic contributions and innovative value, based on the knowledge information extracted from the papers themselves. This method provides a more comprehensive assessment for the academic impact of articles. [Method/Process] To evaluate papers based on their content, it first selected conventional bibliometric indicators from the papers themselves. It then employed a prompt-based fine-tuning approach for a large model to extract academic contributions from the paper abstracts and developed metrics to represent the innovative value of the papers, including knowledge heterogeneity, knowledge outlier, knowledge frontier, and knowledge growth. These vectorized academic contributions, with bibliometric indicators and innovative value metrics, were used as inputs for a machine learning algorithm. The algorithm, optimized for performance, predicted the probability of a paper belonging to a high-impact category. Papers were then classified into ten levels (A-J) based on this probability. Finally, it used the SHAP interpretative framework to further analyze the influence and direction of academic contributions, innovative value, and bibliometric indicators on the impact of papers. [Result/Conclusion] The innovative value of a paper is the most important factor in predicting its academic impact, followed by academic contributions, with bibliometric indicators being the least important. The method proposed in this paper significantly enhances the accuracy of predicting the academic impact of papers.

参考文献

[1] 陈云伟, 张志强. 科技评价走出“ 破” 与“ 立” 困局的思考 与建 议[J]. 情报 学报, 2020, 39(8): 796-805. (CHEN Y W, ZHANG Z Q. Opinions on new science and technology evaluation methods[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8): 796-805.)
[2] 陈云伟, 蒲虹君, 周海晨, 等. 新时代科学计量与科技评价工作发展新态势——2022科学计量与科技评价天府论坛后记[J]. 图书情报工作, 2023, 67(14): 130-140. (CHEN Y W, PU H J, ZHOU H C, et al. New trends of scientometrics & evaluation in the new era: postscript on 2022 Tianfu Forum on scientometrics & evaluation[J]. Library and information service, 2023, 67(14): 130-140.)
[3] 邱均平, 刘亚飞, 魏开洋. 科学交流视角下学术论文影响力多维评价[J]. 情报理论与实践, 2023, 46(6): 47-54. (QIU J P, LIU Y F, WEI K Y. Multi-dimensional evaluation of the impact of academic papers from the perspective of scientificcommunication[J]. Information studies: theory & application, 2023, 46(6): 47-54.)
[4] 中华人民共和国科学技术部. 科技部发展改革委教育部中 科院自然科学基金委关于印发《加强“ 从0 到1” 基础研究工作方案》的通知[EB/OL]. [2024-03-03]. https://www.most.gov.cn/xxgk/xinxifenlei/fdzdgknr/fgzc/gfxwj/gfxwj2020/202003/t20200303_152074.html. (Ministry of Science and Technology of the People’s Republic of China. Notice of the Ministry of Science and Technology, the Development and Reform Commission, the Ministry of Education, the Chinese Academy of Sciences, and the Natural Science Foundation of China on Issuing the “Plan for Strengthening Basic Research from 0 to 1”[EB/OL]. [2024-03-03]. https://www.most.gov.cn/xxgk/xinxifenlei/fdzdgknr/fgzc/gfxwj/gfxwj2020/202003/t20200303_152074.html.)
[5] 郭凤娇, 赵蓉英, 孙劭敏. 基于科学交流过程的学术论文影响力评价研究——以中国社会科学国际学术论文为例[J]. 情报学报, 2020, 39(4): 357-366. (GUO F J, ZHAO R Y, SUN S M. Evaluation of academic papers impact based on scientificcommunication path: a case study of Chinese international academic papers in social sciences[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(4): 357-366.)
[6] 霍朝光, 董克, 魏瑞斌. 学术影响力预测研究进展述评[J]. 情报学报, 2021, 40(7): 768-779. (HUO C G, DONG K, WEI R B. Review of scientific impact prediction[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(7): 768-779.)
[7] STEGEHUIS C, LITVAK N, WALTMAN L. Predicting the long-term citation impact of recent publications[J]. Journal of informetrics, 2015, 9(3): 642-657.
[8] RUAN X, ZHU Y, LI J, et al. Predicting the citation counts of individual papers via a BP neural network[J]. Journal of informetrics, 2020, 14(3): 101039.
[9] XU J, LI M, JIANG J, et al. Early prediction of scientific impact based on multi-bibliographic features and convolutional neural network[J]. IEEE access, 2019, 7(7): 92248-92258.
[10] ABRISHAMI A, ALIAKBARY S. Predicting citation counts based on deep neural network learning techniques[J]. Journal of informetrics, 2019, 13(2): 485-499.
[11] FU L, ALIFERIS C. Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature[J]. Scientometrics, 2010, 85(1): 257-270.
[12] WANG F, FAN Y, ZENG A, et al. Can we predict ESI highly cited publications?[J]. Scientometrics, 2019, 118(1): 109-125.
[13] TANG X, ZHOU H, LI S. Predictable by publication: discovery of early highly cited academic papers based on their own features[J]. Library hi tech, 2023, 42(4): 1366-1384.
[14] 索传军, 盖双双, 周志超. 认知计算——单篇学术论文评价的新视角[J]. 中国图书馆学报, 2018, 44(1): 50-61. (SUO C J, GAI S S, ZHOU Z C. Cognitivecomputing: a new perspective for evaluating the individual academic paper[J]. Journal of library science in China, 2018, 44(1): 50-61.)
[15] 罗卓然, 蔡乐, 钱佳佳, 等. 学术论文创新贡献句识别研究[J]. 图书情报工作, 2021, 65(12): 93-100. (LUO Z R, CAI L, QIAN J J, et al. Research on the recognition of innovative contribution sentences of academic papers[J]. Library and information service, 2021, 65(12): 93-100.)
[16] 胡泽文, 任萍, 崔静静. 基于机器学习模型的科技论文潜在 “ 精品 ” 识别 研究 [J]. 情报 学报, 2023, 42(2): 189-202. (HU Z W, REN P, CUI J J. Study on identification of potential “Treasures” in massive papers based on machine learning models[J]. Journal of the China Society for Scientific and Technical Information, 2023, 42(2): 189-202.)
[17] 夏琬钧, 陈晓红, 江艳萍. 学术论文引用预测研究进展[J]. 图书 情报 工作, 2020, 64(6): 138-145. (XIA W J, CHEN X H, JIANG Y P, et al. Research on academic paper citation prediction[J]. Library and information service, 2020, 64(6): 138-145.)
[18] 苏新宁, 蒋勋. 促进学术创新才是学术评价的根本[J]. 情报资料工作, 2020, 41(3): 9-13. (SU X N, JIANG X. Foundation of academic evaluation: promote academic innovation[J]. Information and documentation services, 2020, 41(3): 9-13.)
[19] D O RTA -G O N Z Á L E Z P, S A N TA N A -J I M É N E Z Y. Characterizing the highly cited articles: a large-scale bibliometric analysis of the top 1% most cited research[J]. arXiv preprint, 2018, arXiv:1804.10436.
[20] POLYAKOV M, POLYAKOV S, IFTEKHAR M S. Does academic collaboration equally benefit impact of research across topics? The case of agricultural, resource, environmental and ecological economics[J]. Scientometrics, 2017, 113(3): 1385-1405.
[21] 王海涛, 谭宗颖, 陈挺. 论文被引频次影响因素研究[J]. 科学学研究, 2016, 34(2): 171-177. (WANG H T, TAN Z Y, CHEN T. Research on the factors affecting papers’ citation frequency [J]. Studies in science of science, 2016, 34(2): 171-177.)
[22] LISKIEWICZ T, LISKIEWICZ G, PACZESNY J. Factors affecting the citations of papers in tribology journals[J]. Scientometrics, 2021, 126(4): 3321-3336.
[23] 周海晨, 郑德俊, 郦天宇. 学术全文本的学术创新贡献识别探索 [J]. 情报学报, 2020, 39(8): 845-851. (ZHOU H C, ZHEN D J, LI T Y. Research on the identification of academic innovation contributions of full academic texts[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8): 845-851.)
[24] WANG S, SCELLS H, KOOPMAN B, et al. Can ChatGPT write a good Boolean query for systematic review literature search?[J]. arXiv preprint, 2023, arXiv:2302.03495.
[25] 张颖怡, 章成志, 周毅, 等. 基于ChatGPT的多视角学术论文实体识别:性能测评与可用性研究[J]. 数据分析与知识发现, 2023, 7(9): 12-24. (ZHANG Y Y, ZHANG C Z, ZHOU Y, et al. ChatGPT-based scientific paper entity recognition: performance measurement and availability research[J]. Data analysis and knowledge discovery, 2023, 7(9): 12-24.)
[26] 智谱 AI. 智谱 AI推出 新一 代基 座模 型GLM-4[EB/OL]. [2024-06-16]. https://zhipuai.cn/devday. (ZHI PU AI. Zhi Pu AI launches a new generation base model GLM-4[EB/OL]. [2024-06-16]. https://zhipuai.cn/devday.)
[27] 白如江, 陈启明, 张玉洁, 等. 基于ChatGPT+Prompt的专利技术功效实体自动生成研究[J]. 数据分析与知识发现, 2024, 8(4): 14-25. (BAI R J, CHEN Q M, ZHANG Y J, et al. Generating effectiveness entities of patent technology based on ChatGPT+Prompt[J]. Data analysis and knowledge discovery, 2024, 8(4): 14-25.)
[28] LIN C Y. Rouge: a package for automatic evaluation of summaries[C]//Text summarization branches out. Barcelona: Association for Computational Linguistics, 2004: 74-81.
[29] RODRÍGUEZ P, BAUTISTA M A, GONZALEZ J, et al. Beyond One-Hot encoding: lower dimensional target embedding[J]. Image and visioncomputing, 2018, 75(7): 21-31.
[30] MIKOLOV T, KOMBRINK S, BURGET L, et al. Extensions of recurrent neural network language model[C]//2011 IEEE international conference on acoustics, speech and signal processing. Piscataway: IEEE, 2011: 5528-5531.
[31] PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation[C]//Proceedings of the 2014 conference on empirical methods in natural language processing. Qatar: Association for Computational Linguistics, 2014: 1532-1543.
[32] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[J]. arXiv preprint, 2016, arXiv: 1607.01759.
[33] 魏绪秋, 申力旭. 学术论文创新性研究述评[J]. 图书情报知识, 2022, 39(4): 68-79. (WEI X Q, SHEN L X. A research review of the academic paper innovativeness[J]. Documentation, information & knowledge, 2022, 39(4): 68-79.)
[34] 汪雪锋, 于慧妍, 郑思佳, 等. 学术论文创新质量评价研究——以多能干细胞技术为例[J]. 数据分析与知识发现, 2024, 8(5): 127-138. (WANG X F, YU H Y, ZHENG S J, et al. Evaluating innovation quality of academic papers: case study of pluripotent stem cells[J]. Data analysis and knowledge discovery, 2024, 8(5): 127-138.)
[35] 全国文献工作标准化技术委员会第七分委员会. 科学技术报告、学位论文和学术论文的编写格式: GB/T 7713-1987[S]. 北京: 国家标准局, 1987: 16. (The Seventh Sub Committee of the National Standardization Technical Committee for Literature Work. Format for writing scientific and technological reports, dissertations, and academic papers: GB/T 7713-1987[S]. Beijing: National Bureau of Standards, 1987: 16.)
[36] ZHOU H, LIU H, ZHANG Y, et al. An outlier detection algorithm based on an integrated outlier factor[J]. Intelligent data analysis, 2019, 23(5): 975-990.
[37] BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: identifying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD international conference on management of data. New York: Association for Computing Machinery, 2000: 93-104.
[38] 常霞, 魏绪秋, 张以迪, 等. 基于知识单元属性特征的学 术论 文创 新性 评价 研究 [J/OL]. 情报 理论 与实 践:1-13[2024-06-20]. http://kns.cnki.net/kcms/detail/11.1762. g3.20240605.0854.002.html. (CHANG X, WEI X Q, ZHANG Y D, et al. Research on innovative evaluation of academic papers based on knowledge unit attributes[J/OL]. Information studies: theory & application:1-13[2024-06-20]. http://kns.cnki.net/kcms/detail/11.1762.g3.20240605.0854.002.html.)
[39] HAND D J, HENLEY W E. Statistical classification methods in consumer credit scoring: a review[J]. Journal of the Royal Statistical Society: Series A (Statistics in society), 1997, 160(3): 523-541.
[40] 张彪, 吴红, 高道斌. 融合多维特征的高校专利价值分级方法及其实证研究[J]. 图书馆论坛, 2022, 42(11): 42-49. (ZHANG B, WU H, GAO D B. An empirical study on value-based grading method for university patents with multidimensional features[J]. Library tribune, 2022, 42(11): 42-49.)
[41] LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions[J]. arXiv preprint, 2017, arXiv: 1705.07874.
[42] 曾建勋. 中国高被引分析报告2022[M]. 北京: 科学技术文献出版社, 2023. (ZENG J X. China high citation analysis report 2022[M]. Beijing: Science and Technology Literature Press, 2023.)
[43] BORNMANN L, LEYDESDORFF L. Skewness of citation impact data and covariates of citation distributions: a large-scale empirical analysis based on Web of science data[J]. Journal of informetrics, 2017, 11(1): 164-175.
文章导航

/