情报研究

作者-关键词二分网络中的合著关系预测研究

  • 张金柱 ,
  • 韩涛 ,
  • 王小梅
展开
  • 1. 南京理工大学经济管理学院信息管理系 南京 210094;
    2. 中国科学院文献情报中心 北京 100190
张金柱(ORCID:0000-0001-7581-1850),讲师,博士,E-mail:zhangjinzhu@njust.edu.cn;韩涛,副研究员,博士;王小梅,研究员。

收稿日期: 2016-07-28

  修回日期: 2016-10-09

  网络出版日期: 2016-11-05

基金资助

本文系国家自然科学基金青年基金“基于被引科学知识突变的突破性创新动态识别及其形成机理研究”(项目编号:71503125)和教育部人文社会科学研究青年基金“异构知识网络中主题突变动态识别研究”(项目编号:14YJC870025)研究成果之一。

Co-authorship Prediction in the Author-keyword Bipartite Networks

  • Zhang Jinzhu ,
  • Han Tao ,
  • Wang Xiaomei
Expand
  • 1. Department of Information Management, School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094;
    2. National Science Library, Chinese Academy of Sciences, Beijing 100190

Received date: 2016-07-28

  Revised date: 2016-10-09

  Online published: 2016-11-05

摘要

[目的/意义] 明晰由关键词形成的主题内容类关联关系对合著关系预测的影响和作用,形成作者-关键词二分网络上的合著关系预测指标和方法,提高预测准确率和结果可解释性。[方法/过程] 首先,在作者-关键词二分网络上抽取多种路径表示作者间的关联关系,并结合关联强度的计算方式,共同形成多种合著关系预测指标;接着应用逻辑回归的机器学习方法学习不同指标对于合著关系预测的贡献,由此构建二分网络中基于路径组合的合著关系预测指标;最后基于链路预测方法对指标进行评测。[结果/结论] 在图书情报领域的实验证实,作者-关键词二分网络中路径组合指标的准确率最高,较4种单路径指标均有大幅度提高;多种路径均对合著关系预测产生影响,且路径“作者-关键词-作者”(AKA)的作用明显高于路径“作者-关键词-作者-关键词-关键词”(AKAKA);同时,使作者产生关联的关键词能表示作者间的共同研究主题和兴趣,使得结果更易解释。下一步将引入更多路径到该模型中并在其他领域验证方法的通用性。

本文引用格式

张金柱 , 韩涛 , 王小梅 . 作者-关键词二分网络中的合著关系预测研究[J]. 图书情报工作, 2016 , 60(21) : 74 -80 . DOI: 10.13266/j.issn.0252-3116.2016.21.010

Abstract

[Purpose/significance] This paper aims to clarify the influence of the content relationships based on author's keywords for co-authorship prediction and form the specialized indicators and methods in an author-keyword bipartite network, to improve the accuracy and interpretability of the co-authorship prediction.[Method/process] Firstly, the relationships between authors via keywords were represented by paths. The authors formed the co-authorship predictors with measurements of relations. Then, the logistic regression method was applied to learn the contributions of different paths for co-authorship prediction and the paths combination predictor was formed in an author-paper bipartite network. Finally, the predictors were quantitatively evaluated by the link prediction.[Result/conclusion] In the field of library and information science, the result confirms that the paths combination predictor performs best with far higher accuracy than other single path predictors. It also shows that the paths contribute differently to the co-authorship prediction where AKA is much more important than AKAKA. Furthermore, the predicted co-authorships are more easily interpreted by common interests denoted by keywords. Other paths will be added in the co-authorship prediction model and the generality of methods needs to be validated in other areas in the next step.

参考文献

[1] YAN E, GUNS R. Predicting and recommending collaborations:an author-, institution-, and country-level analysis[J]. Journal of informetrics, 2014, 8(2):295-309.
[2] DING Y. Scientific collaboration and endorsement:network analysis of coauthorship and citation networks[J]. Journal of informetrics, 2011, 5(1):187-203.
[3] 陈卫静, 郑颖. 基于作者关键词耦合的潜在合作关系挖掘[J].情报杂志, 2013, 32(5):127-131.
[4] ZHANG Q, XU X, ZHU Y, et al. Measuring multiple evolution mechanisms of complex networks[J]. Scientific reports, 2015(5):10350.
[5] 张斌, 马费成. 科学知识网络中的链路预测研究述评[J].中国图书馆学报, 2015, 41(3):99-113.
[6] ZHANG J. Uncovering mechanisms of co-authorship evolution by multirelations-based link prediction[EB/OL].[2016-07-06]. http://dx.doi.org/10.1016/j.ipm.2016.06.005.
[7] YU Q, LONG C, LV Y, et al. Predicting co-author relationship in medical co-authorship networks[J]. PLoS ONE, 2014, 9(7):e101214.
[8] 刘志辉, 张志强. 作者关键词耦合分析方法及实证研究[J].情报学报, 2010, 29(2):268-275.
[9] 宋艳辉, 武夷山. 作者文献耦合分析与作者关键词耦合分析比较研究:Scientometrics实证分析[J].中国图书馆学报, 2014(1):25-38.
[10] SUN Y, BARBER R, GUPTA M, et al. Co-author relationship prediction in heterogeneous bibliographic networks[C]//Proceedings of the 2011 international conference on advances in social network analysis and mining. Kaohsiung:IEEE, 2011:121-128.
[11] CHATTERJEE S, HADI A S. Regression analysis by example[M]. California:John Wiley & Sons, 2015.
[12] LÜ L, ZHOU T. Link prediction in complex networks:a survey[J]. Physica a:statistical mechanics and its applications, 2010, 390(6):1150-1170.
[13] KOHAVI R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C]//Proceedings of the international joint conference on artificial intelligence. California:AAAI, 1995:1137-1145.
[14] LORRAIN F, WHITE H C. Structural equivalence of individuals in social networks[J]. The journal of mathematical sociology, 1971, 1(1):49-80.
[15] ZHOU T, L L, ZHANG Y C. Predicting missing links via local information[J]. The European physical journal b-condensed matter and complex systems, 2009, 71(4):623-630.
[16] KATZ L. A new status index derived from sociometric analysis[J]. Psychometrika, 1953, 18(1):39-43.
[17] GÜNES ī, GÜNDÜZ-ÖGÜDÜCÜ S, ÇATALTEPE Z. Link prediction using time series of neighborhood-based node similarity scores[J]. Data mining and knowledge discovery, 2016, 30(1):147-180.
[18] SETT N, SINGH S R, NANDI S. Influence of edge weight on node proximity based link prediction methods:an empirical analysis[J]. Neurocomputing, 2016, 172:71-83. 作者贡献说明:张金柱:提出研究思路,设计研究方案,完成实验并撰写论文初稿; 韩涛:进行数据分析,修改论文; 王小梅:进行数据分析,负责论文最终版本修订。

文章导航

/