图书情报工作 ›› 2020, Vol. 64 ›› Issue (3): 92-99.DOI: 10.13266/j.issn.0252-3116.2020.03.010

• 情报研究 • 上一篇    下一篇

基于关键词关联度指标(KRI)进行LDA噪声主题过滤的方法研究

蒋甜, 刘小平, 刘会洲   

  1. 中国科学院文献情报中心 北京 100190
  • 收稿日期:2019-04-16 修回日期:2019-07-23 出版日期:2020-02-05 发布日期:2020-02-05
  • 作者简介:蒋甜(ORCID:0000-0002-9065-1223),博士后,E-mail:jiangtian@mail.las.ac.cn;刘小平(ORCID:0000-0002-3342-8041),研究员,硕士生导师;刘会洲(ORCID:0000-0002-7808-8570),研究员,博士生导师。
  • 基金资助:
    本文系中国科学院文献情报能力建设专项"科技领域战略情报研究与决策咨询体系建设"子课题"基础交叉前沿领域战略情报研究与决策咨询"(项目编号: Y8C0381005-01)研究成果之一。

Topic Filtering of LDA Model Recognition Results Based on the Keywords Relevance Index (KRI)

Jiang Tian, Liu Xiaoping, Liu Huizhou   

  1. National Science Library, Chinese Academy of Sciences, Beijing 100190
  • Received:2019-04-16 Revised:2019-07-23 Online:2020-02-05 Published:2020-02-05

摘要: [目的/意义] 针对LDA模型主题识别结果通常包含噪声主题的问题,建立科学有效的主题过滤方法,排除噪声主题,确保主题识别及后续演化分析的准确性。[方法/过程] 基于关键词之间的共现关系,构建关键词关联度指标(KRI),借助定量手段进行主题筛选和过滤。以单细胞研究领域为例,计算各主题-关键词分布的KRI值,与人工判读结果进行对比分析。[结果/结论] 实验结果表明,该方法能够有效排除LDA模型识别结果中的噪声主题,提高主题识别的准确性,也在一定程度上降低了主题识别过程对人工判读的依赖性。

关键词: 主题过滤, LDA模型, 关键词关联度指标KRI

Abstract: [Purpose/significance] The identification results of the LDA model is sometimes unsatisfactory due to some meaningless topics mixed together. Therefore, it's quite necessary to establish an effective topic filtering method to eliminate these noise topics and to ensure the accuracy of subsequent evolution analysis.[Method/process] Based on the co-occurrence relationship between keywords, keywords relevance index (KRI) was constructed. Taking the field of single cell research as an example, KRI values of the distribution of theme-keywords were calculated and compared with the results of manual interpretation.[Result/conclusion] Experimental results show that this method can effectively eliminate meaningless noise topics in the LDA model recognition results, which can improve the accuracy of topic recognition and the subsequent topic evolution analysis. It also helps to reduce the dependence on manual interpretation in the process of topic identification through the topic model method.

Key words: topic filtering, LDA model, keywords relevance index (KRI)

中图分类号: