情报研究

基于LDA模型的医学领域主题分裂融合探测

  • 宫小翠 ,
  • 安新颖
展开
  • 中国医学科学院医学信息研究所 北京 100020
宫小翠(ORCID:0000-0001-6815-3546),研究实习员,硕士。

收稿日期: 2017-06-12

  修回日期: 2017-07-24

  网络出版日期: 2017-09-20

基金资助

国家自然科学基金项目"基于语义的医学领域前沿知识发现及演化机制研究"(项目编号:71303259)和中国医学科学院中央公益性基本科研业务费课题"面向医学科技评价的多源异构数据处理机制研究"(项目编号:2016ZX330027)研究成果之一。

A Research of Topic Splitting and Merging Detecting in the Medical Field Based on the LDA Model

  • Gong Xiaocui ,
  • An Xinying
Expand
  • Institute of Medical Information/Medical Library, CAMS & PUMC, Beijing 100020

Received date: 2017-06-12

  Revised date: 2017-07-24

  Online published: 2017-09-20

摘要

[目的/意义]随着信息资源在数量和种类上的急剧增长,学科间的交叉融合不断涌现,快速主动地从海量信息资源中识别和判断研究主题的发展演化是实现科技创新的基础。[方法/过程]在相关理论调研的基础上,结合医学领域的资源特点,提出一种基于LDA模型的主题演化探测模型和相应的流程步骤。主要步骤包括医学主题词抽取、主题识别、主题关联、关键主题识别、关键主题的演化主路径识别、演化主路径上主题分裂、融合事件识别,实现深度、细致的主题演化分析。[结果/结论]选用乳腺癌治疗研究文献为实验案例,对判断模型进行试验并对结果进行分析验证,证实提出的技术方法具有一定的可靠性。

本文引用格式

宫小翠 , 安新颖 . 基于LDA模型的医学领域主题分裂融合探测[J]. 图书情报工作, 2017 , 61(18) : 76 -83 . DOI: 10.13266/j.issn.0252-3116.2017.18.010

Abstract

[Purpose/significance] With the increase in the amount and types of medical information resources and in the interdisciplinarity of the related works, it has become challenging for researchers and information personnel to grasp the theme development.[Method/process] Considering the prominent position of medical research among all the subject areas in scientific research, the authors carried out a new topic evolution detection method. The authors also proposed a model based on the LDA model for judging the topic evolution in medical researches and demonstrated its operating process. The main stages in the process included medical words extraction, topic area identification, topic association, key topic identification, the identification of the main path of key topics and the splitting and merging events on the main path.[Result/conclusion] This paper continues to take the study of breast neoplasms treatment research as a field to test the new model for identifying the topic evolution in the medical research. The test results are highly concordant with authoritative literature reviews in the field and are further confirmed by interviews with the field's leading experts; thus verifying the reliability of the techniques and approaches proposed by the study.

参考文献

[1] 程薛柯,苏成.基于共词分析的世界肿瘤学研究主题演化分析[J].国际肿瘤学杂志,2015,42(10):795-800.
[2] 李勇,安新颖.基于LDA的主题演化研究[J].医学信息学杂志,2013,34(2):57-61.
[3] 王莉亚.主题演化研究进展[J].情报探索,2014(4):29-31.
[4] 刘自强,王效岳,白如江.语义分类的学科主题演化分析方法研究——以我国图书情报领域大数据研究为例[J].图书情报工作,2016,60(15):76-85.
[5] 马费成,张勤.国内外知识管理研究热点——基于词频的统计分析[J].情报学报,2006(2):146-151.
[6] 王莉亚.基于关键词突变的主题突变研究[J].情报理论与实践,2013,36(11):45-48.
[7] 安新颖.基于改进信息熵的干细胞研究领域共词分析[J].图书情报工作,2011,55(2):37-40.
[8] 侯剑华,吕东博,王鹏.从硕士学位论文看我国科学技术哲学研究的转向——基于对硕士学位论文的计量分析[J].黑龙江高教研究,2014(2):7-10.
[9] 倪文珊,宗乾进,袁勤俭.国际电子商务研究主题演化及启示——基于Web of Science的计量分析[J].现代情报,2013,33(8):84-88.
[10] SMALL H. Tracking and predicting growth areas inscience[J].Scientometrics,2006,68(3):595-610.
[11] ALLAN J,CARBONELL J, DODDINGTON G,et al. Topic detection and tracking pilot study:final report[J].Proceedings of the DARPA broadcast news transcription andunderstandingworkshop,1998:194-218.
[12] GRIFFITHS T,STEYVERS M.Finding scientific topics[J]. Proceedings of the National Academy of Sciences of the United States of America,2004,101(S1):5228-5235.
[13] ALSUMAIR L,BARBARA D,DOMENICONI C. On-line LDA:adaptive topic models for mining text streams with applications to topic detection and tracking[C]//Proceeding of the 2008 eighth IEEE international conference on data mining.Fairfax:IEEE Computer Society,2008:3-12.
[14] BLEI D M,LAFFETY J D.Dynamic topic models[C]//Proceedings of the 23rd international conference on machine learning.Pennsylvania:ACM,2006:113-120.
[15] 程齐凯,王晓光.一种基于共词网络社区的科研主题演化分析框架[J].图书情报工作,2013,57(8):91-96.
[16] 李湘东,张娇,袁满.基于LDA模型的科技期刊主题演化研究[J].情报杂志,2014,33(7):115-121.
[17] HUAGONG_ADU.主题模型-LDA浅析[EB/OL].[2017-06-10].http://blog.csdn.net/huagong_adu/article/details/7937616.
[18] 陈亮,杨冠灿,张静,等.面向技术演化分析的多主路径方法研究[J].图书情报工作,2015,59(10):124-130.
[19] ASUR S,PARTHASARATHY S,UCAR D.An event-based framework for characterizing the evolutionary behavior of interaction graphs[C]//Proceedingsof the 13th ACM SIGKDD conference.Columbus:ACM,2007:1-35.
文章导航

/