图书情报工作 ›› 2018, Vol. 62 ›› Issue (14): 29-40.DOI: 10.13266/j.issn.0252-3116.2018.14.004

• 理论研究 • 上一篇    下一篇

学术论文引用预测及影响因素分析

耿骞, 景然, 靳健, 罗清扬   

  1. 北京师范大学政府管理学院 北京 100875
  • 收稿日期:2018-01-09 修回日期:2018-05-22 出版日期:2018-07-20 发布日期:2018-07-20
  • 通讯作者: 靳健(ORCID:0000-0002-3239-2294),副教授,博士,通讯作者,E-mail:jinjian.jay@bnu.edu.cn
  • 作者简介:耿骞(ORCID:0000-0001-5064-4996),教授,博士;景然(ORCID:0000-0002-4191-6221),硕士研究生;罗清扬(ORCID:0000-0003-0750-4028),硕士研究生。
  • 基金资助:
    本文系中央高校基本科研业务费专项资金资助项目"基于社会网络关系的智能专家遴选与推荐平台建设"(项目编号:SKZZB2014037) 和教育部人文社会科学研究青年基金项目"面向论文评审专家推荐的兴趣变化挖掘与回避机制生成的研究"(项目编号:16YJC870006)研究成果之一。

Citation Prediction and Influencing Factors Analysis on Academic Papers

Geng Qian, Jing Ran, Jin Jian, Luo Qingyang   

  1. School of Government, Beijing Normal University, Beijing 100875
  • Received:2018-01-09 Revised:2018-05-22 Online:2018-07-20 Published:2018-07-20

摘要: [目的/意义]在引文分析中,可通过论文的一些属性特征对其未来的被引情况进行预测,并通过预测结果对论文、论文作者、作者所属机构及出版物做出评价。[方法/过程] 从出版物、作者和论文三个方面对影响论文被引的多个因素展开研究,以图书馆学情报学领域被SCI索引的论文作为分析及验证数据,使用逻辑回归、GBDT、XGBoost、AdaBoost、随机森林等算法进行预测,使用多组评测指标对比不同预测方法的效果,并使用GBDT识别对论文被引影响较大的因素。[结果/结论]确定三个方面的影响因素对论文被引预测的影响程度,构建预测模型,并较好地预测论文在未来一段时间的被引情况。大量实验分析发现GBDT、XGBoost和随机森林的预测能力较强,且预测的时间段越长,效果也就相对越好。

关键词: 学术论文, 引用预测, 影响因素

Abstract: [Purpose/significance] In this study, the prediction about future citation of a paper is analyzed by a set of features, which intends to evaluate the academic influence of a scholar, a paper and/or a publication. [Method/process] In this study, publications, authors and papers are investigated to discuss potential factors for citation prediction and SCI indexed papers in the field of Library Information are utilized as a concrete example to evaluate the validity of these factors. Several algorithms, such as logistic regression, GBDT, XGBoost, AdaBoost and Random Forest, are benchmarked on different evaluation metrics and the algorithm of GBDT is applied to identify influential factors. [Result/conclusion] Three aspects of influential factors for citation prediction are analyzed and different approaches are evaluated, which aims to predict citations of papers in the near future. Categories of experiments are conducted and it is found that GBDT, XGBoost and Random Forest perform the best. Also, the performance of citation prediction tends to be better on papers with a relative longer publication time.

Key words: academic papers, citation prediction, influencing factors

中图分类号: