图书情报工作 ›› 2017, Vol. 61 ›› Issue (8): 96-105.DOI: 10.13266/j.issn.0252-3116.2017.08.012

• 知识组织 • 上一篇    下一篇

医学文献主题语义相似度计算方法研究

范少萍1, 安新颖1, 逯万辉2   

  1. 1. 中国医学科学院医学信息研究所 北京 100020;
    2. 中国社会科学院中国社会科学评价中心 北京 100732
  • 收稿日期:2017-02-03 修回日期:2017-04-06 出版日期:2017-04-20 发布日期:2017-04-20
  • 作者简介:范少萍(ORCID:0000-0002-6675-5460),助理研究员,博士,E-mail:fan.shaoping@imicams.ac.cn;安新颖(ORCID:0000-0002-9870-7009),副研究员,博士;逯万辉(ORCID:0000-0002-0130-2276),助理研究员,硕士。
  • 基金资助:
    国家自然科学基金项目"基于语义的医学领域前沿知识发现及演化机制研究"(项目编号:71303259)和中央级公益性科研院所基本科研业务费"基于统计和语义的医学文献主题新颖性探测方法研究"(项目编号:2016RC330004)

The Study on Method for Topic Semantic Similarity Based on Medical Literature

Fan Shaoping1, An Xinying1, Lu Wanhui2   

  1. 1. Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Beijing 100020;
    2. Center of Chinese Social Science Evaluation, Chinese Academy of Social Sciences, Beijing 100732
  • Received:2017-02-03 Revised:2017-04-06 Online:2017-04-20 Published:2017-04-20

摘要: [目的/意义]针对目前医学领域基于主题的语义相似度计算研究较少,尚不足以揭示主题间在语义层面的关系,提出一套用于主题间语义相似度计算的方法,进而从语义角度判断主题间关系,为主题新颖性判断、主题关联研究等提供参考。[方法/过程]以MeSH词表为语义计算的基础,剖析词表结构与现有研究成果,从入口词、语义距离、注释3个维度综合测度主题间的语义相似度,利用PubMed中2011-2014年干细胞领域的文献进行实证研究。[结果/结论]利用通用验证主题词对,验证了本文所提3个测度维度的有效性。通过主题间语义相似度的计算,发现干细胞领域2011-2014年较为新颖的主题为未成年人干细胞研究。后续研究中还需融入基于统计的主题相似度,从而更加全面地揭示主题间的关系,发现语义层面领域的新颖性研究主题。

关键词: 语义相似度, MeSH词表, 主题语义相似度

Abstract: [Purpose/significance]For there are less studies on topic semantic similarity in medical field, and can't reveal the relationship between topics on the semantic level, this paper proposed the semantic similarity calculation method, in order to get the method of judging semantic relationship between topics.[Method/process]We used MeSH as computing basis. Firstly, it analyzed the structure of MeSH. Then, it calculated topic semantic similarity from three dimensions of enty terms, semantic distance and annotation. Finally, it used the field of stem cell for empirical study.[Result/conclusion]The validity of three dimensions proposed is verified by using the common verification concept words. It is found that, the young stem cell research is more novel than others between 2011-2014 through the topic semantic similarity method. In the follow-up study, it is necessary to integrate statistics method for topic similarity calculation, so as to reveal the relationship between topics, and find the novelty research topic in the field.

Key words: semantic similarity, MeSH, topic semantic similarity

中图分类号: