图书情报工作 ›› 2022, Vol. 66 ›› Issue (13): 138-149.DOI: 10.13266/j.issn.0252-3116.2022.13.013

• 综述述评 • 上一篇    下一篇

基于文本挖掘的科技文献主题演化研究进展

梁爽1,2, 刘小平1,2   

  1. 1. 中国科学院文献情报中心 北京 100190;
    2. 中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190
  • 收稿日期:2022-01-12 修回日期:2022-05-01 出版日期:2022-07-05 发布日期:2022-07-06
  • 通讯作者: 刘小平,研究员,硕士生导师,通信作者,E-mail:liuxp@mail.las.ac.cn。
  • 作者简介:梁爽,硕士研究生。
  • 基金资助:
    本文系中国科学院文献情报能力建设专项"支撑院科技规划与布局的全球科技态势战略研判"(项目编号:E1290423)研究成果之一。

Research Progress on Topic Evolution of Scientific and Technical Literatures Based on Text Mining

Liang Shuang1,2, Liu Xiaoping1,2   

  1. 1. National Science Library, Chinese Academy of Sciences, Beijing 100190;
    2. Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190
  • Received:2022-01-12 Revised:2022-05-01 Online:2022-07-05 Published:2022-07-06

摘要: [目的/意义]梳理国内外基于文本挖掘的科技文献主题演化相关研究,对主题演化分析中使用的各种方法进行分类、归纳与总结,并提出现有研究存在的不足,为主题演化研究提供新的思路与借鉴意义。[方法/过程]依照国内外学者进行主题演化研究的一般流程,对数据集选取与对象分析、主题识别研究、主题演化研究(主题演化时序分析、主题强度演化分析、主题内容演化分析)3个分析层面中所使用的各类模型、指标与方法进行梳理比较与优缺点总结,提出现有研究的局限性并对未来发展做出展望。[结果/结论]当前研究已具有一定规模和较为成熟的分析体系,但仍存在以下不足:数据来源较为单一;LDA及相关扩展模型存在的弊端需进一步克服;缺乏对其他机器学习及深度学习算法的探索应用;演化分析方法需相互结合、互补互融。未来应针对以上问题做出相应改进与深入探究。

关键词: 文本挖掘, 主题模型, 主题识别, 主题演化

Abstract: [Purpose/Significance] This paper classifies, summarizes and concludes the various methods used in the analysis of topic evolution by sorting out the research related to the topic evolution of scientific and technical literatures based on text mining at home and abroad, and proposes the shortcomings of the existing research to provide new ideas and reference significance for the study of topic evolution. [Method/Process] According to the general process of topic evolution research by domestic and foreign scholars, this paper compared and summarized the advantages and disadvantages of various models, indicators and methods used in the three levels of analysis: data set selection and object analysis, topic identification research and topic evolution research (topic evolution time sequence analysis, topic intensity evolution analysis and topic content evolution analysis). Finally, pointing out the limitations of existing research and putting forward the prospect for the future development. [Result/Conclusion] At present, the research has a certain scale and a relatively mature analysis system, but there are still the following shortcomings: the data source is single; the drawbacks of LDA and related extended models need to be further overcome; the lack of exploration and application of other machine learning and deep learning algorithms; evolutionary analysis methods need to be combined and complemented with each other. In the future, we should make corresponding improvements and in-depth exploration for the above problems.

Key words: text mining, topic model, topic identification, topic evolution

中图分类号: