Research on Semantic Similarity of Diseases Based on Multidimensional

  • Zhang Junliang
Expand
  • 1. School of Management, Xinxiang Medical University, Xinxiang 453003;
    2. Center for Health Information Resources, Xinxiang Medical University, Xinxiang 453003;
    3. Institutes of Health Central Plain, Xinxiang 453003

Received date: 2019-05-13

  Revised date: 2019-08-22

  Online published: 2020-06-20

Abstract

[Purpose/significance] Aiming at different expression of disease knowledge, this paper proposes a comprehensive semantic similarity calculation scheme that integrates multi-dimension of disease.[Method/process] On the basis of integrating the characteristics of disease ontology and Medical Encyclopedia, the comprehensive semantic similarity, which consists of semantic similarity based on disease ontology and disease semantic similarity based on medical encyclopedia, was built. Semantic similarity of diseases based on medical encyclopedia was calculated by LDA, set theory and vector space model.[Result/conclusion] The results show that the proposed method can effectively reflect the semantic similarity of diseases. The comprehensive semantic similarity calculation scheme offers helpful reference for further research.

Cite this article

Zhang Junliang . Research on Semantic Similarity of Diseases Based on Multidimensional[J]. Library and Information Service, 2020 , 64(12) : 127 -135 . DOI: 10.13266/j.issn.0252-3116.2020.12.014

References

[1] ILAKIVA P, SUMATHI M, KARTHIK S. A survey on semantic similarity between words in semantic Web[C]//International conference on radar, communication and computing. Tiruvannamalai:IEEE, 2012:213-216.
[2] 沙勇忠,史忠贤.基于语义相似度的公共危机事件案例检索方法[J].情报资料工作, 2014(6):78-81.
[3] LIU L, YU Z. An improved knowledge push method based on semantic similarities[C]//Fourth international conference on multimedia information networking and security. Nanjing:IEEE, 2012:378-380.
[4] 王道平,赵耀,刘涛.敏捷供应链中知识服务检索的语义相似度问题研究[J].图书情报工作,2010,54(16):78-81.
[5] KULMANOV M, HOEHNDORF R. Evaluating the effect of annotation size on measures of semantic similarity[J]. Journal of biomedical semantics, 2017, 8(1):7.
[6] 李杰,初砚硕,程亮,等.基于疾病本体的疾病相似性计算方法[J].生物化学与生物物理进展,2015, 42(2):115-122.
[7] NCBO BioPortal[EB/OL].[2019-08-08].https://bioportal.bioontology.org/.
[8] SCHRIML L, ARZE C, NADENDLA S, et al. Disease ontology:a backbone for disease semantic integration[J]. Nucleic acids research, 2012, 40(D1):D940-D946.
[9] 朱玲,杨峰, HE Y,等.基本形形式化本体重要概念解析及对中医领域本体构建的提示[J].中国数字医学,2018,13(2):27-30,56.
[10] 陈云志.肝炎本体构建及语义相似度研究[D].杭州:浙江大学, 2017.
[11] JORGE M. An overview of textual semantic similarity measures based on Web intelligence[J]. Artificial intelligence review, 2012, 42(4):935-943.
[12] 秦春秀,赵捧未,刘怀亮.词语相似度计算研究[J].情报理论与实践,2007,30(1):105-108.
[13] SPAGNOLA S, LAGOZE C. Edge dependent pathway scoring for calculating semantic similarity in conceptnet[C]//Proceedings of the ninth international conference on computational semantics. Tilburg:Association for Computational Linguistics, 2011:385-389.
[14] CILIBRASI R, VITANYI M. The google similarity distance[J]. IEEE transactions on knowledge and data engineering, 2007,19(3):370-383.
[15] 李峰,李芳.中文词语语义相似度计算——基于《知网》2000[J].中文信息学报,2007(3):99-105.
[16] 刘杰,郭宇,汤世平,等.基于《知网》2008的词语相似度计算[J].小型微型计算机系统,2015,36(8):1728-1733.
[17] NGUYEN T, CONRAD S. A semantic similarity measure between nouns based on the structure of wordnet[C]//Proceedings of international conference on information integration and Web-based applications & services. Vienna:ACM, 2013:605-619.
[18] LIU X, ZHOU Y, ZHENG R. Measuring semantic similarity in wordnet[C]//International conference on machine learning and cybernetics. Hong Kong:IEEE, 2007:3431-3435.
[19] 张军亮,朱学芳.基于《农业大词典》的农业概念簇表示研究[J].情报科学,2013,31(7):15-17,22.
[20] 陈二静,姜恩波.文本相似度计算方法研究综述[J].数据分析与知识发现,2017,1(6):1-11.
[21] AMINUL I, DIANA I. Semantic text similarity using corpus-based word similarity and string similarity[J/OL].ACM Transactions on knowledge discovery from data, 2008,2(2):10.[2019-08-08]. http://www.researchgate.net/publication/220345072.
[22] CHEN Q, YAO L, YANG J. Short text classification based on LDA topic model[C]//International conference on audio, language and image processing. Shanghai:IEEE, 2016:749-753.
[23] FAROUK M. Sentence semantic similarity based on word embedding and WordNet[C]//13th international conference on computer engineering and systems. Cairo:IEEE, 2018:33-37.
[24] 李琳,李辉.一种基于概念向量空间的文本相似度计算方法[J].数据分析与知识发现,2018,2(5):48-58.
[25] 詹志建,杨小平.一种基于复杂网络的短文本语义相似度计算[J].中文信息学报,2016,30(4):71-80,89.
[26] 李慧.词语相似度算法研究综述[J].现代情报,2015,35(4):172-177.
[27] BOLLEGALA D, ISHIZUKA M, MATSUO Y. Measuring semantic similarity between words using web search engines[C]//International conference on World Wide Web. Banff:ACM, 2007:757-766.
[28] ZHU G, IGLESIAS C. Computing semantic similarity of concepts in knowledge graphs[J]. IEEE transactions on knowledge and data engineering, 2017,29(1):72-85.
[29] RADAR, MILI H, BICHNELL E, et al. Development and application of a metric on semantic nets[J]. IEEE transaction on systems, man, and cybernetics. 1989,19(1):17-30.
[30] BANU A, FATIMA S S, KHAN K U R. A new ontology-based semantic similarity measure for concepts subsumed by multiple super concepts[J]. International journal of Web applications, 2014, 6(1):14-22.
[31] ZHU X, LI F, CHEN H, et al. An efficient path computing model for measuring semantic similarity using edge and density[J]. Knowledge and information systems, 2018, 55(1):79-111.
[32] 李文清,孙新,张常有,等.一种本体概念的语义相似度计算方法[J].自动化学报,2012,38(2):229-235.
[33] SAHNI L, SEHGAL A, KOCHAR A, et al, A novel approach to find semantic similarity measure between words[C]//2nd international symposium on computational and business intelligence. New Delhi:IEEE, 2014:89-92.
[34] YANG Y, PING Y. An Ontology-based semantic similarity computation model[C]//IEEE international conference on big data and smart computing. Shanghai:IEEE, 2018:561-564.
[35] PESQUITA C, FARIA D, FALCÃO A O, et al. Semantic similarity in biomedical ontologies[J]. PLoS computational biology, 2009, 5(7):e1000443.
[36] DUTTA P, BASU S, KUNDU M. A new hybrid semantic similarity measure using information content and topological features of the Gene Ontology graph[C]//International conference on computer communication and informatics. Coimbatore:IEEE, 2017:1-5.
[37] JEONG J, CHEN X. A new semantic functional similarity over gene ontology[J]. IEEE/ACM transactions on computational biology and bioinformatics, 2015, 12(2):322-334.
[38] DUTTA P, BASU S, KUNDU M. Assessment of semantic similarity between proteins using information content and topological properties of the gene ontology graph[J]. IEEE/ACM transactions on computational biology & bioinformatics, 2018,15(3):839-849.
[39] AL-MUBAID H, NGUYEN H. Using MEDLINE as standard corpus for measuring semantic similarity in the biomedical domain[C]//Sixth IEEE international symposium on bioInformatics and bioEngineering. Arlington:IEEE, 2006:315-318.
[40] 李文庆. 基于医学领域本体的语义相似度算法研究[D].太原:太原理工大学,2013.
[41] ZHANG J, ZHU X, ZHU G. Designing an automated FAQ answering system for farmers based on hybrid strategies[J].Chinese journal of library and information science,2012,5(4):21-36.
[42] BLEI D, NG A, JORDAN M I, et al. Latent dirichlet allocation[J]. Journal of machine learning research, 2003,3(3):993-1022.
[43] 何伟林,谢红玲,奉国和.潜在狄利克雷分布模型研究综述[J].信息资源管理学报,2018,8(1):55-64.
[44] 刘铭,王晓龙,刘远超.基于语义的高维数据聚类技术[J].电子学报,2009,37(5):925-929.
[45] Disease ontology[EB/OL].[2019-08-08].http://www.disease-ontology.org/.
[46] 百科名医[EB/OL].[2019-08-08].http://www.baikemy.com/.
[47] Python[EB/OL].[2019-08-08].http://www.python.org/.
[48] HanLP[EB/OL].[2019-08-08].http://hanlp.linrunsoft.com/.
[49] NumPy[EB/OL].[2019-08-08].http://www.numpy.org/.
[50] gensim:Topic modelling for humans[EB/OL].[2019-08-08].http://radimrehurek.com/gensim/.
[51] 周爱明. 图书情报领域实用多元统计[M].郑州:郑州大学出版社,2017.
[52] 关鹏,王曰芬.科技情报分析中LDA主题模型最优主题数确定方法研究[J].现代图书情报技术,2016(9):42-50.
Outlines

/