情报研究

基于多维特征的引文扩散模式预测研究

  • 韩旭 ,
  • 闵超 ,
  • 张靖雯
展开
  • 南京大学信息管理学院, 南京 210023
韩旭,硕士研究生;张靖雯,博士研究生。

收稿日期: 2021-10-29

  修回日期: 2022-01-27

  网络出版日期: 2022-05-12

基金资助

本文系教育部人文社会科学基金项目"施引群体视角的科学产出评价方法研究"(项目编号:19YJC870017)研究成果之一。

Research on Citation Diffusion Pattern Prediction Based on Multidimensional Features

  • Han Xu ,
  • Min Chao ,
  • Zhang Jingwen
Expand
  • School of Information Management, Nanjing University, Nanjing 210023

Received date: 2021-10-29

  Revised date: 2022-01-27

  Online published: 2022-05-12

摘要

[目的/意义]基于科学论文发表后的早期特征,准确预测论文未来的引文扩散演变模式,对科学产出评估、科学突破早期发现等都具有潜在的价值。[方法/过程]归纳总结9种不同的引文扩散演变模式,并基于论文自发表后的早期时序、结构和文献特征,建模预测未来一定引文窗口内的演变模式。选择美国物理学会的引文数据集进行实证研究,探究不同特征组合下引文扩散演变模式的预测效果。[结果/结论]结果显示,时序特征对预测模型的贡献程度最大,同时结构特征和文献特征也起到重要的作用,当融合3个特征时所有预测模型的准确率均超过了80%,证明了本文所选特征的有效性。

本文引用格式

韩旭 , 闵超 , 张靖雯 . 基于多维特征的引文扩散模式预测研究[J]. 图书情报工作, 2022 , 66(9) : 82 -92 . DOI: 10.13266/j.issn.0252-3116.2022.09.009

Abstract

[Purpose/Significance] Based on the early features of scientific papers after publication, accurately predicting the future citation diffusion evolving pattern of papers has potential values for scientific output evaluation and early discovery of scientific breakthroughs, etc. [Method/Process] This study summarized 9 different evolving patterns of citation diffusion. And based on the early chronological, structural and literature features after papers publicated, it modeled and predicted the evolving patterns within a certain citation window in the future. The citation dataset of APS was chosen for empirical research, and predictive effect of the citation diffusion evolution mode under different combinations of features was explored. [Result/Conclusion] The results show that the chronological features contribute the most to the prediction model. At the same time, the structural feature and the literature feature also play an important role. When the three features are combined, the precision of all prediction models exceeds 80%, which proves the effectiveness of the features selected in this article.

参考文献

[1] GARFIELD E. Citation indexes for science. a new dimension in documentation through association of ideas[J]. International journal of epidemiology, 2006, 35(5):1123-1127.
[2] 陈柏彤,张斌.科学知识扩散研究框架[J].图书情报工作, 2014, 58(15):48-57.
[3] VAN RAAN A F J. Sleeping beauties in science[J]. Scientometrics, 2004, 59(3):467-472.
[4] LI J, YE F Y. The phenomenon of all-elements-sleeping-beauties in scientific literature[J]. Scientometrics, 2012, 92(3):795-799.
[5] MOED H F. Bibliometric measurement of research performance and Price's theory of differences among the sciences[J]. Scientometrics, 1989, 15(5):473-483.
[6] PAIVA C E, LIMA J P S N, PAIVA B S R. Articles with short titles describing the results are cited more often[J]. Clinics, 2012, 67(5):509-513.
[7] BORNMANN L, DANIEL H D. Selecting scientific excellence through committee peer review-a citation analysis of publications previously published to approval or rejection of post-doctoral research fellowship applicants[J]. Scientometrics, 2006, 68(3):427-440.
[8] BORNMANN L, LEYDESDORFF L, WANG J. How to improve the prediction based on citation impact percentiles for years shortly after the publication date?[J]. Journal of informetrics, 2014, 8(1):175-180.
[9] 熊泽泉,段宇锋.论文早期下载量可否预测后期被引量?——以图书情报领域期刊为例[J].图书情报知识, 2018(4):32-42.
[10] 闵超,DING Y,李江,等.单篇论著的引文扩散[J].情报学报,2018,37(4):341-350.
[11] BUELA-CASAL G, ZYCH I. Analysis of the relationship between the number of citations and the quality evaluated by experts in psychology journals[J]. Psicothema, 2010, 22(2):270-276.
[12] 曹艺文,许海云,武华维,等.基于引文曲线拟合的新兴技术主题的突破性预测——以干细胞领域为例[J].图书情报工作, 2020, 64(5):100-113.
[13] MIN C, DING Y, LI J, et al. Innovation or imitation:the diffusion of citations[J]. Journal of the Association for Information Science and Technology, 2018, 69(10):1271-1282.
[14] 屈文建,胡志伟,周小渝.面向图情学科热点高被引论文引文曲线特征分析[J].情报杂志, 2017, 36(8):138-143.
[15] GOGOGLOU A, SIDIROPOULOS A, KATSAROS D, et al. The fractal dimension of a citation curve:quantifying an individual's scientific output using the geometry of the entire curve[J]. Scientometrics, 2017, 111(3):1751-1774.
[16] CAO X, CHEN Y, LIU K J R. A data analytic approach to quantifying scientific impact[J]. Journal of informetrics, 2016, 10(2):471-484.
[17] 李江,姜明利,李玥婷.引文曲线的分析框架研究——以诺贝尔奖得主的引文曲线为例[J].中国图书馆学报, 2014, 40(2):41-49.
[18] ROLDAN-VALADEZ E, RIOS C. Alternative bibliometrics from impact factor improved the esteem of a journal in a 2-year-ahead annual-citation calculation:multivariate analysis of gastroenterology and hepatology journals[J]. European journal of gastroenterology & hepatology, 2015, 27(2):115-122.
[19] 夏琬钧,陈晓红,江艳萍.科学论文引用预测研究进展[J].图书情报工作,2020,64(6):138-145.
[20] SAVOV P, JATOWT A, NIELEK R. Identifying breakthrough scientific papers[J]. Information processing & management, 2020, 57(2):102168.
[21] CHAKRABORTY T, KUMAR S, GOYAL P, et al. Towards a stratified learning approach to predict future citation counts[C]//IEEE/ACM Joint conference on digital libraries. New York:IEEE, 2014:351-360.
[22] GARNER J, PORTER A L, NEWMAN N C. Distance and velocity measures:using citations to determine breadth and speed of research impact[J]. Scientometrics, 2014, 100(3):687-703.
[23] WANG J. Citation time window choice for research impact evaluation[J]. Scientometrics, 2013, 94(3):851-872.
[24] ZHAI Y, DING Y, ZHANG H. Innovation adoption:broadcasting versus virality[J]. Journal of the Association for Information Science and Technology, 2021, 72(4):403-416.
[25] GOEL S, ANDERSON A, HOFMAN J, et al. The structural virality of online diffusion[J]. Management science, 2016, 62(1):180-196.
[26] YANG J, ZHOU N, LI Y, et al. Opinion-based analysis of structural patterns in online viral diffusion[C]//2018 international conference on advances in computing and communication engineering. New York:IEEE, 2018:284-289.
[27] ADAMS J. Early citation counts correlate with accumulated impact[J]. Scientometrics, 2005, 63(3):567-581.
[28] POBIEDINA N, ICHISE R. Citation count prediction as a link prediction problem[J]. Applied intelligence, 2016, 44(2):252-268.
[29] ABRISHAMI A, ALIAKBARY S. Predicting citation counts based on deep neural network learning techniques[J]. Journal of informetrics, 2019, 13(2):485-499.
[30] DONG Y, JOHNSON R A, CHAWLA N V. Will this paper increase your h-index? scientific impact prediction[C]//Proceedings of the eighth ACM international conference on Web search and data mining. New York:ACM, 2015:149-158.
[31] LI C T, LIN Y J, YAN R, et al. Trend-based citation count prediction for research articles[C]//Pacific-Asia conference on knowledge discovery and data mining. New York:Springer, 2015:659-671.
[32] ZHOU F, JING X, XU X, et al. Continual information cascade learning[C]//Globecom 2020-2020 IEEE global communications conference. New York:IEEE, 2020:1-6.
[33] XU Z, QIAN M, HUANG X, et al. CasGCN:predicting future cascade growth based on information diffusion graph[EB/OL].[2021-12-10]. https://arxiv.org/abs/2009.05152.
[34] AVRAMESCU A. Actuality and obsolescence of scientific literature[J]. Journal of the American Society for Information Science, 1979, 30(5):296-303.
[35] MOHAPATRA D, PAL S, DE S, et al. Modeling citation trajectories of scientific papers[J]. Advances in knowledge discovery and data mining, 2020, 12085:620-632.
[36] CHAKRABORTY T, NANDI S. Universal trajectories of scientific success[J]. Knowledge and information systems, 2018, 54(2):487-509.
[37] LIU Y, ROUSSEAU R. Citation analysis and the development of science:a case study using articles by some Nobel prize winners[J]. Journal of the Association for Information Science and Technology, 2014, 65(2):281-289.
[38] WIENER H. Structural determination of paraffin boiling points[J].Journal of the American Chemical Society, 1947, 69(1):17-20.
[39] 耿骞,景然,靳健,等.科学论文引用预测及影响因素分析[J].图书情报工作,2018,62(14):29-40.
[40] GOEL S, ANDERSON A, HOFMAN J, et al. The structural virality of online diffusion[J]. Management science, 2015, 62(1):180-196.
[41] ZHAO Y, DA J, YAN J. Detecting health misinformation in online health communities:incorporating behavioral features into machine learning based approaches[J]. Information processing & management, 2021, 58(1):102390.
[42] 杜建,武夷山.基于被引速率指标识别睡美人文献及其"王子"——以2014年诺贝尔化学奖得主Stefan Hell的睡美人文献为例[J].情报学报, 2015, 34(5):508-521.
[43] BARABÁSI A L, ALBERT R. Emergence of scaling in random networks[J]. Science, 1999, 286(5439):509-512.
[44] DE SOLLA PRICE D. A general theory of bibliometric and other cumulative advantage processes[J]. Journal of the American Society for Information Science, 1976, 27(5):292-306.
[45] BARABASI A L. The origin of bursts and heavy tails in human dynamics[J]. Nature, 2005, 435(7039):207-211.
[46] HU X, ROUSSEAU R. Scientific influence is not always visible:the phenomenon of under-cited influential publications[J]. Journal of informetrics, 2016, 10(4):1079-1091.
[47] PETERS H P F, VAN RAAN A F J. On determinants of citation scores:a case study in chemical engineering[J]. Journal of the American Society for Information Science, 1994, 45(1):39-49.
[48] HASLAM N, BAN L, KAUFMANN L, et al. What makes an article influential? predicting impact in social and personality psychology[J]. Scientometrics, 2008,76(1):169-185.
[49] YANG J, LESKOVEC J. Patterns of temporal variation in online media[C]//Proceedings of the fourth ACM international conference on web search and data mining. New York:ACM, 2011:177-186.
[50] LESKOVEC J, BACKSTROM L, KLEINBERG J. Meme-tracking and the dynamics of the news cycle[C]//Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2009:497-506.
文章导航

/