知识组织

阿尔茨海默病基因-疾病关联的知识挖掘

  • 王雪 ,
  • 武俊伟 ,
  • 陈观群 ,
  • 李燕琼 ,
  • 马路
展开
  • 1 首都医科大学医学人文学院 北京 100069;
    2 首都医科大学宣武医院图书馆 北京 100053;
    3 中国人民解放军总医院医学信息室 北京 100853;
    4 首都医科大学宣武医院神经内科 北京 100053
王雪(ORCID:0000-0002-5675-6726),助理馆员,硕士研究生;武俊伟(ORCID:0000-0002-0806-8160),信息工程师,助理工程师,硕士研究生;陈观群(ORCID:0000-0002-8133-834X),博士研究生;李燕琼(ORCID:0000-0002-1481-3593),馆长,副研究馆员,本科生。

收稿日期: 2020-01-03

  修回日期: 2020-02-28

  网络出版日期: 2020-07-05

基金资助

本文系首都医科大学宣武医院院级管理课题"基于科技影响力排行的医院重点学科影响力分析"(项目编号:XWGL-2019003)和首都医科大学宣武医院院级教学课题"基于元素养理论的医学生信息素养教学路径研究"(项目编号:2019XWJXGG-10)研究成果之一。

Knowledge Mining of Alzheimer's Disease Gene-Disease Associations

  • Wang Xue ,
  • Wu Junwei ,
  • Chen Guanqun ,
  • Li Yanqiong ,
  • Ma Lu
Expand
  • 1 Medical Humanities School, Capital Medical University, Beijing 100069;
    2 Department of Library, Xuanwu Hospital, Capital Medical University, Beijing 100053;
    3 Medical Information Section, Chinese PLA General Hospital, Beijing 100853;
    4 Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing 100053

Received date: 2020-01-03

  Revised date: 2020-02-28

  Online published: 2020-07-05

摘要

[目的/意义] 对阿尔茨海默病(AD)进行基因-疾病关联挖掘,以捕捉潜力研究方向。[方法/过程] 基于LBD理论构建开放式知识发现架构,结合MeSH词表、DisGeNET等医学术语、组学数据对PubMed中AD文献进行知识挖掘,采用关联规则与算法排序等方法对部分基因重合的强关联主题共现疾病和优先候选基因进行筛选,结合时间切片和其他LBD工具对比加以验证。[结果/结论] 对88 334篇AD文献进行基因-疾病识别,并与2 120种AD基因进行匹配;以XYZ分析视角对识别出的992种主题共现疾病及11 899种候选基因进行关联排序;精炼10种强关联疾病与25种优选候选基因,结合文献报道加以论述。通过LBD挖掘目标疾病-共现疾病-基因之间潜在关联,可快速捕捉潜力研究方向,缩小基因测序范围,为新研究假设的生成提供重要指导依据。

本文引用格式

王雪 , 武俊伟 , 陈观群 , 李燕琼 , 马路 . 阿尔茨海默病基因-疾病关联的知识挖掘[J]. 图书情报工作, 2020 , 64(13) : 120 -132 . DOI: 10.13266/j.issn.0252-3116.2020.13.016

Abstract

[Purpose/significance] To explore the gene-disease association of Alzheimer's disease (AD) in order to capture the potential research directions.[Method/process] An open knowledge discovery framework was constructed based on LBD theory. Combined with MeSH thesaurus, DisGeNET and other medical terms and group data, knowledge mining was carried out in AD literatures in PubMed. Association rules and algorithm sorting were used to screen strongly associated MeSH terms co-occurrence diseases and priority candidate genes for partial gene coincidence, results of time slicing and comparison with other LBD tools were used to verify them.[Result/conclusion] 88 334 AD literatures were identified and matched with 2 120 AD genes, 11 899 candidate genes and 992 comorbidity genes were identified according to XYZ analysis, 10 strongly associated co-occurrence diseases and 25 preferred candidate genes were refined and discussed in combination with literature reports. Mining the potential associations between target disease, co-occurrence diseases and genes by LBD can quickly capture the potential research directions, narrow the scopes of gene sequencing, and provide important guidance for the generations of new research hypotheses.

参考文献

[1] 中国痴呆与认知障碍诊治指南写作组, 中国医师协会神经内科医师分会认知障碍疾病专业委员会. 2018中国痴呆与认知障碍诊治指南(七):阿尔茨海默病的危险因素及其干预[J]. 中华医学杂志, 2018, 98(19):1461-1466.
[2] SCHELTENS P, BLENNOW K, BRETELER M M, et al. Alzheimer's disease[J]. Lancet, 2016, 388(10043):505-517.
[3] PRINCE M J, WIMO A, GUERCHET M M, et al. World Alzheimer report 2015-the global impact of dementia[M]. London:Alzheimer's Disease International, 2015.
[4] TAYLOR C A, GREENLUND S F, MCGUIRE L C, et al. Deaths from Alzheimer's disease-United States, 1999-2014[J]. MMWR-morbidity and mortality weekly report, 2017, 66(20):521-526.
[5] JIA J, WEI C, CHEN S, et al. The cost of Alzheimer's disease in China and re-estimation of costs worldwide[J]. Alzheimers & dementia, 2018, 14(4):483-491.
[6] PATTERSON C. World Alzheimer report 2018-the state of the art of dementia research:new frontiers[M]. London:Alzheimer's Disease International, 2018.
[7] VERHEIJEN J, SLEEGERS K. Understanding Alzheimer disease at the interface between genetics and transcriptomics[J]. Trends in genetics, 2018, 34(6):434-447.
[8] VAN CAUWENBERGHE C, VAN BROECKHOVEN C, SLEEGERS K. The genetic landscape of Alzheimer disease:clinical implications and perspectives[J]. Genetics in medicine, 2016, 18(5):421-430.
[9] MALHOTRA A, YOUNESI E, GURULINGAPPA H, et al. ‘Hypothesisfinder’:a strategy for the detection of speculative statements in scientific text[J]. Plos computational biology, 2013, 9(7):e1003117.
[10] HENRY S. Indirect relatedness evaluation and visualization for literature based discovery[D]. Virginia:Virginia Commonwealth University, 2019.
[11] SWANSON D R. Fish oil, raynaud's syndrome, and undiscovered public knowledge[J]. Perspectives in biology and medicine, 1986, 30(1):7-18.
[12] HENRY S, MCINNES B T. Literature based discovery:models, methods, and trends[J]. Journal of biomedical informatics, 2017, 74:20-32.
[13] COHEN T, SCHVANEVELDT R W. The trajectory of scientific discovery:concept co-occurrence and converging semantic distance[J]. Studies in health technology and informatics, 2010, 160(1):661-665.
[14] HRISTOVSKI D, RINDFLESCH T, PETERLIN B. Using literature-based discovery to identify novel therapeutic approaches[J]. Cardiovascular & hematological agents in medicinal chemistry, 2013, 11(1):14-24.
[15] KIM Y H, BEAK S H, CHARIDIMOU A, et al. Discovering new genes in the pathways of common sporadic neurodegenerative diseases:a bioinformatics approach[J]. Journal of Alzheimers disease, 2016, 51(1):293-312.
[16] KAWALIA S B, RASCHKA T, NAZ M, et al. Analytical strategy to prioritize Alzheimer's disease candidate genes in gene regulatory networks using public expression data[J]. Journal of Alzheimers disease, 2017, 59(4):1237-1254.
[17] GUBIANI D, FABBRETTI E, CESTNIK B, et al. Outlier based literature exploration for cross-domain linking of Alzheimer's disease and gut microbiota[J]. Expert systems with applications, 2017, 85:386-396.
[18] GRECO I, DAY N, RIDDOCH-CONTRERAS J, et al. Alzheimer's disease biomarker discovery using in silico literature mining and clinical validation[J]. Journal of translational medicine, 2012, 10:217.
[19] MALHOTRA A, YOUNESI E, BAGEWADI S, et al. Linking hypothetical knowledge patterns to disease molecular signatures for biomarker discovery in Alzheimer's disease[J]. Genome medicine, 2014, 6(11):97.
[20] SMALHEISER N R, SWANSON D R. Linking estrogen to Alzheimer's disease:an informatics approach[J]. Neurology, 1996, 47(3):809-810.
[21] YETISGEN-YILDIZ M, PRATT W. Using statistical and knowledge-based approaches for literature-based discovery[J]. Journal of biomedical informatics, 2006, 39(6):600-611.
[22] SMALHEISER N R, SWANSON D R. Indomethacin and Alzheimer's disease[J]. Neurology, 1996, 46(2):583.
[23] LI J, ZHU X Y, CHEN J Y. Building disease-specific drug-protein connectivity maps from molecular interaction networks and PubMed abstracts[J]. Plos computational biology, 2009, 5(7):e1000450.
[24] CHEN R, LIN H F, YANG Z H. Passage retrieval based hidden knowledge discovery from biomedical literature[J]. Expert systems with applications, 2011, 38(8):9958-9964.
[25] ZHANG R, SIMON G, YU F. Advancing Alzheimer's research:a review of big data promises[J]. International journal of medical informatics, 2017, 106:48-56.
[26] RAJA K, PATRICK M, GAO Y, et al. A review of recent advancement in integrating omics data with literature mining towards biomedical discoveries[J]. International journal of genomics, 2017, 2017:6213474.
[27] 刘群, 孙昌朋, 王谦, 等. 入选PubMed数据库对提升医学期刊国际影响力的作用[J]. 中国科技期刊研究, 2015, 26(12):1344-1347.
[28] 刘菊红, 于建荣, 缪有刚. 基于MeSH词表和共词分析的疾病本体半自动构建方法研究[J]. 现代情报, 2009, 29(3):208-211.
[29] 张云秋, 冷伏海. 非相关文献知识发现的理论基础研究[J]. 中国图书馆学报, 2009, 35(4):25-30.
[30] 阮光册. 主题模型与文本知识发现应用研究[M].上海:华东师范大学出版社, 2018.
[31] SEHGAL A, QIU X, SRINIVASAN P. Analyzing LBD methods using a general tramewerk[C]//BRUZA P, WEEBER M. Literature-based discovery.Berlin:Springer, 2008:75-100.
[32] STEGMANN J, GROHMANN G. Hypothesis generation guided by co-word clustering[J]. Scientometrics, 2003, 56(1):111-135.
[33] STEGMANN J, GROHMANN G. Advanced information retrieval for hypothesis generation[C]//Society for Information Science. International workshop on webometrics, informetrics and scientometrics. Roorkee:Central Library, Indian Institute of Technology, 2004:334-346.
[34] ONO T, KUHARA S. A novel method for gathering and prioritizing disease candidate genes based on construction of a set of disease-related MeSH (R) terms[J]. BMC bioinformatics, 2014, 15:179.
[35] PINERO J, QUERALT-ROSINACH N, BRAVO A, et al. DisGeNET:a discovery platform for the dynamical exploration of human diseases and their genes[J]. Database-the journal of biological databases and curation, 2015:bav028.
[36] RAPPAPORT N, FISHILEVICH S, NUDEL R, et al. Rational confederation of genes and diseases:NGS interpretation via GeneCards, MalaCards and VarElect[J]. Biomedical engineering online, 2017, 16(s1):72.
[37] SHUI Q Y. Big data analysis for bioinformatics and biomedical discoveries[M]. Portland:CRC Press, 2016.
[38] FAYYA D, USAMA M. Advances in knowledge discovery and data mining[M]. California:AAAI Press, 1996.
[39] HRISTOVSKI D, PETERLIN B, MITCHELL J A, et al. Using literature-based discovery to identify disease candidate genes[J]. International journal of medical informatics, 2005, 74(2/4):289-298.
[40] MAKIN S. The amyloid hypothesis on trial[J]. Nature, 2018, 559(7715):s4-s7.
[41] IADANZA M G, JACKSON M P, HEWITT E W, et al. A new era for understanding amyloid structures and disease[J]. Nature reviews molecular cell biology, 2018, 19(12):755-773.
[42] Alzheimer's Association. Vascular dementia[EB/OL].[2019-12-24].https://www.alz.org/alzheimers-dementia/what-is-dementia/types-of-dementia/vascular-dementia.
[43] 吴佳慧. 阿尔茨海默病和血管性痴呆的病理机制及相关临床研究比较[J]. 浙江医学, 2019, 41(11):1227-1231.
[44] ASHRAF G M, CHIBBER S, MOHAMMAD, et al. Recent updates on the association between Alzheimer's disease and vascular dementia[J]. Medicinal chemistry, 2016, 12(3):226-237.
[45] 中国医师协会神经内科分会认知障碍专业委员会, 《中国血管性认知障碍诊治指南》编写组. 2019年中国血管性认知障碍诊治指南[J]. 中华医学杂志, 2019, 99(35):2737-2744.
[46] WUNG J K, PERRY G, KOWALSKI A, et al. Increased expression of the remodeling and tumorigenic associated factor osteopontin in pyramidal neurons of the Alzheimer's disease brain[J]. Current alzheimer research, 2007, 4(1):67-72.
[47] SHI M, MOVIUS J, DATOR R, et al. Cerebrospinal fluid peptides as potential Parkinson disease biomarkers:a staged pipeline for discovery and validation[J]. Molecular & cellular proteomics, 2015, 14(3):544-555.
[48] BEGCEVIC I, BRINC D, BROWN M, et al. Brain-related proteins as potential CSF biomarkers of Alzheimer's disease:a targeted mass spectrometry approach[J]. Journal of proteomics, 2018, 182:12-20.
[49] YAO F, HONG X, LI S, et al. Urine-based biomarkers for Alzheimer's disease identified through coupling computational and experimental methods[J]. Journal of Alzheimers disease, 2018, 65(2):421-431.
[50] RENTSENDORJ A, SHEYN J, FUCHS D T, et al. A novel role for osteopontin in macrophage-mediated amyloid-β clearance in Alzheimer's models[J]. Brain behavior and immunity, 2018, 67:163-180.
[51] KAMPHUIS W, KOOIJMAN L, SCHETTERS S, et al. Transcriptional profiling of CD11c-positive microglia accumulating around amyloid plaques in a mouse model for Alzheimer's disease[J]. Biochimica et biophysica acta-molecular basis of disease, 2016, 1862(10):1847-1860.
[52] YIN Z, RAJ D, SAIEPOUR N, et al. Immune hyperreactivity of Aβ plaque-associated microglia in Alzheimer's disease[J]. Neurobiology of aging, 2017, 55:115-122.
[53] SALA FRIGERIO C, WOLFS L, FATTORELLI N, et al. The major risk factors for Alzheimer's disease:age, sex, and genes modulate the microglia response to Aβ plaques[J]. Cell reports, 2019, 27(4):1293-306.e1-e6.
[54] YETISGEN-YILDIZ M, PRATT W. Evaluation of literature-based discovery systems[C]//BRUZA P, WEEBER M. Literature-based discovery. Berlin:Springer, 2008:101-13.
[55] THILAKARATNE M, FALKNER K, ATAPATTU T. A systematic review on literature-based discovery workflow[J]. PeerJ computer science, 2019, 5:e235.
[56] HRISTOVSKI D, STARE J, PETERLIN B, et al. Supporting discovery in medicine by association rule mining in Medline and UMLS[J]. Studies in health technology and informatics, 2001, 84(2):1344-1348.
[57] HENRY S, MCINNES B T. Indirect association and ranking hypotheses for literature based discovery[J]. BMC bioinformatics, 2019, 20(1):425.
[58] YETISGEN-YILDIZ M, PRATT W. A new evaluation methodology for literature-based discovery systems[J]. Journal of biomedical informatics, 2009, 42(4):633-643.
[59] HRIPCSAK G, ROTHSCHILD A S. Agreement, the f-measure, and reliability in information retrieval[J]. Journal of the American Medical Informatics Association, 2005, 12(3):296-298.
[60] CARTERETTE B, VOORHEES E M. Overview of information retrieval evaluation[C]//LUPU M, MAYER K, TAIT J, et al. Current challenges in patent information retrieval. Berlin:Springer, 2011:69-85.
[61] CRUZ-RIVERA Y E, PEREZ-MORALES J, SANTIAGO Y M, et al. A selection of important genes and their correlated behavior in Alzheimer's disease[J]. Journal of Alzheimers disease, 2018, 65(1):193-205.
[62] CIFUENTES R A, MURILLO-ROJAS J. Alzheimer's disease and HLA-A2:linking neurodegenerative to immune processes through an in silico approach[J]. Biomed research international, 2014:791238.
[63] GOPALAKRISHNAN V, JHA K, JIN W, et al. A survey on literature based discovery approaches in biomedical domain[J]. Journal of biomedical informatics, 2019, 93:103141.
[64] HOFMANN-APITIUS M, BALL G, GEBEL S, et al. Bioinformatics mining and modeling methods for the identification of disease mechanisms in neurodegenerative disorders[J]. International journal of molecular sciences, 2015, 16(12):29179-29206.
[65] PLETSCHER-FRANKILD S, PALLEJA A, TSAFOU K, et al. DISEASES:text mining and data integration of disease-gene associations[J]. Methods, 2015, 74:83-89.
文章导航

/