Citation Prediction and Influencing Factors Analysis on Academic Papers

  • Geng Qian ,
  • Jing Ran ,
  • Jin Jian ,
  • Luo Qingyang
Expand
  • School of Government, Beijing Normal University, Beijing 100875

Received date: 2018-01-09

  Revised date: 2018-05-22

  Online published: 2018-07-20

Abstract

[Purpose/significance] In this study, the prediction about future citation of a paper is analyzed by a set of features, which intends to evaluate the academic influence of a scholar, a paper and/or a publication. [Method/process] In this study, publications, authors and papers are investigated to discuss potential factors for citation prediction and SCI indexed papers in the field of Library Information are utilized as a concrete example to evaluate the validity of these factors. Several algorithms, such as logistic regression, GBDT, XGBoost, AdaBoost and Random Forest, are benchmarked on different evaluation metrics and the algorithm of GBDT is applied to identify influential factors. [Result/conclusion] Three aspects of influential factors for citation prediction are analyzed and different approaches are evaluated, which aims to predict citations of papers in the near future. Categories of experiments are conducted and it is found that GBDT, XGBoost and Random Forest perform the best. Also, the performance of citation prediction tends to be better on papers with a relative longer publication time.

Cite this article

Geng Qian , Jing Ran , Jin Jian , Luo Qingyang . Citation Prediction and Influencing Factors Analysis on Academic Papers[J]. Library and Information Service, 2018 , 62(14) : 29 -40 . DOI: 10.13266/j.issn.0252-3116.2018.14.004

References

[1] 陈仕吉, 史丽文, 左文革. 基于ESI的学术影响力指标测度方法与实证[J]. 图书情报工作, 2013, 57(2):97-102,123.
[2] BEEL J, GIPP B. Google Scholar's ranking algorithm:the impact of citation counts (an empirical study)[C]//Proceedings of the 3rd IEEE International Conference on Research Challenges in Information Science. Piscataway:IEEE, 2009:439-446.
[3] YAN R, TANG J, LIU X, et al. Citation count prediction:learning to estimate future citations for literature[C]//Proceedings of the 20th ACM Conference on Information and Knowledge Management. Glasgow:ACM, 2011:1247-1252.
[4] YAN R, HUANG C, TANG J, et al. To better stand on the shoulder of giants[C]//Proceeding of the ACM/IEEE Joint Conference on Digital Libraries. Washington:ACM, 2012:51-60.
[5] POBIEDINA N, ICHISE R. Predicting citation counts for academic literature using graph pattern mining[C]//International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems. Kaohsiung:Springer, 2014:109-119.
[6] POBIEDINA N, ICHISE R. Citation count prediction as a link prediction problem[J]. Applied intelligence, 2016, 44(2):252-268.
[7] CHAKRABORTY T, KUMAR S, GOYAL P, et al. Towards a stratified learning approach to predict future citation counts[C]//Proceeding of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. London:Piscataway, 2014:351-360.
[8] SHI X, LESKOVEC J, MCFARLAND D A. Citing for high impact[C]//Proceeding of the 10th annual joint conference on Digital libraries. Queenland:ACM, 2010:49-58.
[9] WANG D, SONG C, BARABÁSI A L. Quantifying long-term scientific impact[J]. Science, 2013, 342(6154):127-132.
[10] SHEN H W, WANG D, SONG C, et al. Modeling and predicting popularity dynamics via reinforced poisson processes[C]//Proceeding of the 28th AAAI Conference on Artificial Intelligence.Quebec:AAAI, 2014:291-297.
[11] XIAO S, YAN J, LI C, et al. On modeling and predicting individual paper citation count over time[C]//Proceeding of the Twenty-Fifth International Joint Conference on Artificial Intelligence. New York:Morgan Kaufmann, 2016.
[12] DONG Y, JOHNSON R A, CHAWLA N V. Will this paper increase your h-index?:Scientific impact prediction[C]//Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2015.Porto:Springer, 2015:149-158.
[13] BHAT H S, HUANG L H, RODRIGUEZ S, et al. Citation prediction using diverse features[C]//2015 IEEE International Conference on Data Mining Workshop. Atlantic:IEEE, 2015:589-596.
[14] MCKEOWN K, DAUME H, CHATURVEDI S, et al. Predicting the impact of scientific concepts using full-text features[J]. Journal of the association for information science and technology, 2016, 67(11):2684-2696.
[15] NEZHADBIGLARI M, GONÇALVES M A, ALMEIDA J M. Early prediction of scholar popularity[C]//Proceeding of the ACM/IEEE Joint Conference on Digital Libraries. Newark:IEEE, 2016:181-190.
[16] ACUNA D E, ALLESINA S, KORDING K P. Future impact:predicting scientific success[J]. Nature, 2012, 489(7415):201-202.
[17] GARFIELD E. Citation indexes for science:A new dimension in documentation through association of ideas[J]. Science, 1955, 122(3159):108-111.
[18] HIRSCH J E. An index to quantify an individual's scientific research output[J]. Proceedings of the national academy of sciences, 2005, 102(46):16569-16572.
[19] CALLAHAM M, WEARS R L, WEBER E. Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals[J]. Jama, 2002, 287(21):2847-2850.
[20] KULKARNI A V, BUSSE J W, SHAMS I. Characteristics associated with citation rate of the medical literature[J]. PloS one, 2007, 2(5):e403.
[21] LIVNE A, ADAR E, TEEVAN J, et al. Predicting citation counts using text and graph mining[C]//iConference. Fort Worth:Morgan & Claypool Publishers, 2013.
[22] IBÁÑEZ A, LARRAÑAGA P, BIELZA C. Predicting citation count of Bioinformatics papers within four years of publication[J]. Bioinformatics, 2009, 25(24):3303-3309.
[23] WALKER D, XIE H, YAN K K, et al. Ranking scientific publications using a model of network traffic[J]. Journal of statistical mechanics:theory and experiment, 2007 (6):P06010.
[24] 刘大有, 齐红, 薛锐青. 基于作者权威值的论文价值预测算法[J]. 自动化学报, 2012, 38(10):1654-1662.
[25] 张美平, 尚明生. 基于持续关注度衰减的重要论文预测[J]. 复杂系统与复杂性科学, 青岛大学, 2015, 12(3):77-84.
[26] DAVLETOV F, AYDIN A S, CAKMAK A. High impact academic paper prediction using temporal and topological features[C]//Proceeding of the 23rd ACM international conference on information and knowledge management. Shanghai:ACM, 2014:491-498.
[27] LI C T, LIN Y J, YAN R, et al. Trend-based citation count prediction for research articles[C]//Advances in knowledge discovery and data mining. Ho Chi Minh:Springer, 2015:659-671.
[28] BUELA-CASAL G, ZYCH I. Analysis of the relationship between the number of citations and the quality evaluated by experts in psychology journals[J]. Psicothema, 2010, 22(2):270-276.
[29] JABBOUR C J C, JABBOUR A B L de S, DE OLIVEIRA J H C. The perception of brazilian researchers concerning the factors that influence the citation of their articles:A study in the field of sustainability[J]. Serials review, 2013, 39(2):93-96.
[30] 苏芳荔. 科研合作对期刊论文被引频次的影响[J]. 图书情报工作, 2011, 55(10):144-148.
[31] DONG Y, JOHNSON R A, CHAWLA N V. Can scientific impact be predicted?[J]. IEEE transactions on big data, 2016, 2(1):18-30.
[32] FRIEDMAN J H. Greedy function approximation:a gradient boosting machine[J]. Annals of statistics, 2001, 29(5):1189-1232.
Outlines

/