图书情报工作 ›› 2020, Vol. 64 ›› Issue (21): 120-129.DOI: 10.13266/j.issn.0252-3116.2020.21.015

• 知识组织 • 上一篇    下一篇

基于二模复杂网络的隐性知识发现方法研究——以潜在药物靶点挖掘为例

李东巧1, 陈芳1, 韩涛1, 杨艳萍1, 王学昭1, 王燕鹏1, Cynthia Liu2, Yingzhu Li2   

  1. 1. 中国科学院文献情报中心 北京 100190;
    2. 美国化学文摘社 哥伦布市 43202
  • 收稿日期:2020-02-22 修回日期:2020-07-09 出版日期:2020-11-05 发布日期:2020-11-05
  • 作者简介:李东巧(ORCID:0000-0002-7447-2436),副研究员,博士,E-mail:lidq@mail.las.ac.cn;陈芳(ORCID:0000-0003-2517-5299),副研究员,硕士;韩涛(ORCID:0000-0001-5955-7813),研究员,博士;杨艳萍(ORCID:0000-0003-0428-4939),副研究员,博士;王燕鹏(ORCID:0000-0002-2583-9895),助理研究员,硕士;王学昭(ORCID:0000-0001-8496-3354),副研究员,博士;Cynthia Liu(0000-0003-3858-1501),科学信息经理,博士;Yingzhu Li(0000-0002-4946-7272),信息科学家,博士。
  • 基金资助:
    本文系中国科学院文献情报中心青年人才领域前沿项目"基于二模复杂网络的隐性知识发现方法研究-以潜在药物为例"(项目编号:G180181001)研究成果之一。

Research on the Tacit Knowledge Discovery Based on Two-mode Complex Network——Take mining Potential Drug Targets as an Example

Li Dongqiao1, Chen Fang1, Han Tao1, Yang Yanping1, Wang Xuezhao1, Wang Yanpeng1, Cynthia Liu2, Yingzhu Li2   

  1. 1 National Science Library, Chinese Academy of Sciences, Beijing 100190;
    2 Chemical Abstracts Service, Columbus, OH 43202, USA
  • Received:2020-02-22 Revised:2020-07-09 Online:2020-11-05 Published:2020-11-05

摘要: [目的/意义] 通过构建二模复杂网络模型,揭示隐藏在海量文献中的隐性知识。[方法/过程] 通过NetworkX复杂网络工具包,依据任意两个节点的共现关系构建二模复杂网络模型;对网络模型中节点的共现关系进行加权,计算网络的拓扑信息并进行AP聚类,提取节点间的直接关系;采用AUC方法对AA、JC、加权改进的wAA和wJC等4种链路预测算法进行评价,遴选出最合适的预测算法,并对复杂网络的隐性关系进行预测分析。[结果/结论] 以潜在药物靶点挖掘为例进行的实证研究结果表明,wAA链路预测算法为最优的链路预测算法;二模复杂网络模型、指标和方法体系在美国化学文摘社数据库中的药物靶点挖掘中具有一定的有效性。下一步计划在其他数据库中或其他研究领域中进行尝试,以进一步验证该模型的通用性和有效性。

关键词: 隐性知识, 链路预测, 复杂网络, 药物靶点, 疾病

Abstract: [Purpose/significance] This paper aims to extract the tacit knowledge from the massive literatures by constructing a two-mode complex network model. [Method/process] Through the NetworkX complex network toolkit, a two-mode complex network model was constructed based on the co-occurrence relationship of any two nodes. The direct relationship between nodes and nodes was extracted by weighting the co-occurrence relationship of nodes in the network model, calculating the topology information of the network and AP clustering. The most appropriate prediction algorithm was selected by using AUC method to evaluate the four link prediction algorithms, such as AA, JC, wAA and wJC. The tacit knowledge was predicted by the most appropriate prediction algorithm from the complex networks. [Result/conclusion] The results showed that the wAA link prediction algorithm was the optimal link prediction algorithm. The two mode complex network model, indicators and method system were effective in drug target mining in the Chemical Abstracts Service database. The next step is to try in other databases or other research fields to further verify the generality and effectiveness of the model.

Key words: tacit knowledge, link prediction, complex network, drug target, diseases

中图分类号: