图书情报工作 ›› 2020, Vol. 64 ›› Issue (13): 120-132.DOI: 10.13266/j.issn.0252-3116.2020.13.016

• 知识组织 • 上一篇    下一篇

阿尔茨海默病基因-疾病关联的知识挖掘

王雪1,2, 武俊伟3, 陈观群4, 李燕琼2, 马路1   

  1. 1 首都医科大学医学人文学院 北京 100069;
    2 首都医科大学宣武医院图书馆 北京 100053;
    3 中国人民解放军总医院医学信息室 北京 100853;
    4 首都医科大学宣武医院神经内科 北京 100053
  • 收稿日期:2020-01-03 修回日期:2020-02-28 出版日期:2020-07-05 发布日期:2020-07-05
  • 通讯作者: 马路(ORCID:0000-0001-9147-5746),教授,博士,博士生导师,通讯作者,E-mail:malulib@ccmu.edu.cn
  • 作者简介:王雪(ORCID:0000-0002-5675-6726),助理馆员,硕士研究生;武俊伟(ORCID:0000-0002-0806-8160),信息工程师,助理工程师,硕士研究生;陈观群(ORCID:0000-0002-8133-834X),博士研究生;李燕琼(ORCID:0000-0002-1481-3593),馆长,副研究馆员,本科生。
  • 基金资助:
    本文系首都医科大学宣武医院院级管理课题"基于科技影响力排行的医院重点学科影响力分析"(项目编号:XWGL-2019003)和首都医科大学宣武医院院级教学课题"基于元素养理论的医学生信息素养教学路径研究"(项目编号:2019XWJXGG-10)研究成果之一。

Knowledge Mining of Alzheimer's Disease Gene-Disease Associations

Wang Xue1,2, Wu Junwei3, Chen Guanqun4, Li Yanqiong2, Ma Lu1   

  1. 1 Medical Humanities School, Capital Medical University, Beijing 100069;
    2 Department of Library, Xuanwu Hospital, Capital Medical University, Beijing 100053;
    3 Medical Information Section, Chinese PLA General Hospital, Beijing 100853;
    4 Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing 100053
  • Received:2020-01-03 Revised:2020-02-28 Online:2020-07-05 Published:2020-07-05

摘要: [目的/意义] 对阿尔茨海默病(AD)进行基因-疾病关联挖掘,以捕捉潜力研究方向。[方法/过程] 基于LBD理论构建开放式知识发现架构,结合MeSH词表、DisGeNET等医学术语、组学数据对PubMed中AD文献进行知识挖掘,采用关联规则与算法排序等方法对部分基因重合的强关联主题共现疾病和优先候选基因进行筛选,结合时间切片和其他LBD工具对比加以验证。[结果/结论] 对88 334篇AD文献进行基因-疾病识别,并与2 120种AD基因进行匹配;以XYZ分析视角对识别出的992种主题共现疾病及11 899种候选基因进行关联排序;精炼10种强关联疾病与25种优选候选基因,结合文献报道加以论述。通过LBD挖掘目标疾病-共现疾病-基因之间潜在关联,可快速捕捉潜力研究方向,缩小基因测序范围,为新研究假设的生成提供重要指导依据。

关键词: 知识发现, 基因组学, 阿尔茨海默病, 实体识别, 数据挖掘, 排序算法, 时间分析

Abstract: [Purpose/significance] To explore the gene-disease association of Alzheimer's disease (AD) in order to capture the potential research directions.[Method/process] An open knowledge discovery framework was constructed based on LBD theory. Combined with MeSH thesaurus, DisGeNET and other medical terms and group data, knowledge mining was carried out in AD literatures in PubMed. Association rules and algorithm sorting were used to screen strongly associated MeSH terms co-occurrence diseases and priority candidate genes for partial gene coincidence, results of time slicing and comparison with other LBD tools were used to verify them.[Result/conclusion] 88 334 AD literatures were identified and matched with 2 120 AD genes, 11 899 candidate genes and 992 comorbidity genes were identified according to XYZ analysis, 10 strongly associated co-occurrence diseases and 25 preferred candidate genes were refined and discussed in combination with literature reports. Mining the potential associations between target disease, co-occurrence diseases and genes by LBD can quickly capture the potential research directions, narrow the scopes of gene sequencing, and provide important guidance for the generations of new research hypotheses.

Key words: literature based discovery, genomics, Alzheimer's disease, entity recognition, data mining, sorting algorithm, time analysis

中图分类号: