[目的/意义] 通过构建二模复杂网络模型,揭示隐藏在海量文献中的隐性知识。[方法/过程] 通过NetworkX复杂网络工具包,依据任意两个节点的共现关系构建二模复杂网络模型;对网络模型中节点的共现关系进行加权,计算网络的拓扑信息并进行AP聚类,提取节点间的直接关系;采用AUC方法对AA、JC、加权改进的wAA和wJC等4种链路预测算法进行评价,遴选出最合适的预测算法,并对复杂网络的隐性关系进行预测分析。[结果/结论] 以潜在药物靶点挖掘为例进行的实证研究结果表明,wAA链路预测算法为最优的链路预测算法;二模复杂网络模型、指标和方法体系在美国化学文摘社数据库中的药物靶点挖掘中具有一定的有效性。下一步计划在其他数据库中或其他研究领域中进行尝试,以进一步验证该模型的通用性和有效性。
[Purpose/significance] This paper aims to extract the tacit knowledge from the massive literatures by constructing a two-mode complex network model. [Method/process] Through the NetworkX complex network toolkit, a two-mode complex network model was constructed based on the co-occurrence relationship of any two nodes. The direct relationship between nodes and nodes was extracted by weighting the co-occurrence relationship of nodes in the network model, calculating the topology information of the network and AP clustering. The most appropriate prediction algorithm was selected by using AUC method to evaluate the four link prediction algorithms, such as AA, JC, wAA and wJC. The tacit knowledge was predicted by the most appropriate prediction algorithm from the complex networks. [Result/conclusion] The results showed that the wAA link prediction algorithm was the optimal link prediction algorithm. The two mode complex network model, indicators and method system were effective in drug target mining in the Chemical Abstracts Service database. The next step is to try in other databases or other research fields to further verify the generality and effectiveness of the model.
[1] 周青玲. 用户隐性知识的挖掘流程及实现技术[J]. 中国科技信息, 2015(11):61-62.
[2] 吕琳媛, 周涛. 链路预测[M]. 北京:高等教育出版社. 2013.
[3] 吕琳媛. 复杂网络链路预测[J]. 电子科技大学学报, 2010, 39(5):651-661.
[4] 姚亚兵. 基于复杂网络拓扑结构的链路预测方法研究[D]. 兰州:兰州大学, 2017.
[5] ADAMIC L A, ADAR E. Friends and neighbors on the Web[J]. Social networks, 2003, 25(3):211-230.
[6] JACCARD, P. Étude comparative de la distribution florale dans une portion des Alpes et des Jura[J]. Bulletin de la societe vaudoise des sciences naturelles, 1901, 37:547-579.
[7] KATZ L A new status index derived from sociometric analysis[J]. Psychometrika, 1953, 18(1):39-43.
[8] PAPADIMITRIOU A, SYMEONIDIS P, MANOLOPOULOS Y. Fast and accurate link prediction in social networking systems[J]. Journal of systems and software, 2012, 85(9):2119-2132.
[9] BRIN S, PAGE L. Reprint of:the anatomy of a large-scale hypertextual Web search engine[J]. Computer networks, 2012, 56(18):3825-3833.
[10] LIU W, LUE L. Link prediction based on local random walk[J]. Epl, 2010, 89(5):58007.
[11] 余黄樱子, 董庆兴, 张斌. 基于网络表示学习的疾病知识关联挖掘与预测方法研究[J]. 情报理论与实践, 2019, 42(12):156-162.
[12] 李星. 基于复杂网络的症状基因预测方法研究[D]. 北京:北京交通大学, 2014.
[13] BUKET K, MUSTAFA P. Age-series based link prediction in evolving disease networks[J]. Computers in biology and medicine, 2015, 63:1-10.
[14] 丁亮. 基于异质性网络链路预测算法的非编码RNA-疾病相关性预测研究[D]. 安徽:中国科学技术大学, 2018.
[15] HU H, ZHU C Y, AI H X. LPI-ETSLP:lncRNA-protein interaction prediction using eigenvalue transformation-based semi-supervised link prediction[J]. Molecular biosystems, 2017, 13(9):1781-1787.
[16] 吴金华. 基于数据挖掘的阿尔兹海默症蛋白质网络研究[D]. 沈阳:辽宁大学, 2018.
[17] CRICHTON G, GUO Y F, PYYSALO S. Neural networks for link prediction in realistic biomedical graphs:a multi-dimensional evaluation of graph embedding-based approaches[J]. BMC bioinformatics, 2018, 19:176.
[18] 周涛, 柏文洁, 汪秉宏,等. 复杂网络研究概述[J]. 物理, 2005, 34(1):31-36.
[19] 李星. 基于复杂网络的症状基因预测方法研究[D]. 北京:北京交通大学, 2014.
[20] 李兰茜. 基于复杂网络结构的链路预测技术研究[D]. 北京:北京邮电大学, 2019.
[21] 张斌, 李亚婷. 学科合作网络链路预测结果的排序鲁棒性[J]. 信息资源管理学报, 2018, 8(4):89-97.
[22] 葛军. 一种重叠社区发现算法及其在MapReduce上的实现[D]. 西安:电子科技大学, 2013.
[23] FREY B J, DELBERT D. Clustering by passing messages between data points[J]. Science, 2007, 315(5814):972-976.
[24] 王林, 董小江. 社团挖掘的并行化AP聚类方法[J]. 微型机与应用, 2017, 36(12):16-18.
[25] LU L, ZHOU T. Link prediction in complex networks:a survey[J]. Physica a:statistical mechanics and its applications, 2011, 390(6):1150-1170.
[26] 杨晓翠, 宋甲秀, 张曦煌. 基于网络表示学习的链路预测算法[J]. 计算科学与探索, 2019, 13(5):812-821.
[27] 杨育捷. 复杂网络下基于拓扑相似性的链路预测研究[D]. 北京:北京邮电大学, 2019.
[28] 陈嘉颖, 于炯, 杨兴耀, 等. 基于复杂网络节点重要性的链路预测算法[J]. 计算机应用, 2016, 36(12):3251-3255,3268.