图书情报工作 ›› 2020, Vol. 64 ›› Issue (5): 78-88.DOI: 10.13266/j.issn.0252-3116.2020.05.009

• 情报研究 • 上一篇    下一篇

多源数据环境下科研热点识别方法研究

裘惠麟1, 邵波1,2   

  1. 1. 南京大学信息管理学院 南京 210046;
    2. 南京大学图书馆 南京 210046
  • 收稿日期:2019-05-20 修回日期:2019-09-16 出版日期:2020-03-05 发布日期:2020-03-05
  • 作者简介:裘惠麟(ORCID:0000-0001-6516-2789),硕士研究生,E-mail:qiuhuilinqha@163.com;邵波(ORCID:0000-0002-6528-5196),副馆长,教授,博士。

Research on Identification Methods of Scientific Research Hotspots Under Multi-source Data

Qiu Huilin1, Shao Bo1,2   

  1. 1. School of Information Management, Nanjing University, Nanjing 210046;
    2. Nanjing University Library, Nanjing 210046
  • Received:2019-05-20 Revised:2019-09-16 Online:2020-03-05 Published:2020-03-05

摘要: [目的/意义] 在科学研究中,从不同来源的科技文献中识别挖掘科研热点对于开展科研工作具有指导意义。旨在通过本研究提出的模型方法,快速准确地识别蕴含在多源文本中的热点主题,为科研创新提供支撑服务。[方法/过程] 提出一种基于LDA2vec模型的多源文本下科研热点识别的方法并针对科研热点识别构建模型,该方法融合LDA主题模型对隐含语义挖掘的优势和Word2Vec词向量模型对于上下文关系把握的优势。以机器学习领域的科技文献为例,利用模型困惑度和主题一致性两个指标对LDA2vec的在本领域应用的可行性和有效性进行验证,并与LDA的主题提取效果进行对比。[结果/结论] 实验结果表明,提出的方法在面对多源数据情况下,进行科研热点识别挖掘是可行的,且在一定程度上有效果的提升,对利用单一数据源进行主题分析的不足进行补充,对多数据源融合的实践应用进行丰富。

关键词: 主题模型, LDA2vec, 科研热点, LDA, Word2vec, 多源数据融合

Abstract: [Purpose/significance] In scientific research, identifying mining scientific research hotspots from different sources of scientific literature is of guiding significance for carrying out the next scientific research work. It aims to quickly and accurately identify hot topics contained in multi-source texts through the model method proposed in this study, and provide support services for scientific research innovation.[Method/process] This paper proposed a method based on LDA2vec model for multi-source text research hotspot identification and built a model for scientific research hotspot identification. This method combined the advantages of LDA topic model on implicit semantic mining and the context of Word2Vec word vector model. Taking the scientific literature in the field of machine learning as an example, the model extraction degree (perplexity) and topic coherence (topic coherence) were used to compare the topic extraction effects of LDA2vec and LDA in the context of multi-source text.[Result/conclusion] After experiments, the results show that the method proposed in this paper is feasible and can be improved to some extent in the face of multi-source data. The method can relatively quickly and accurately identify the hot content in the multi-data source text, make up for the shortcoming of the single analysis data source for subject detection, and enrich the practical application of the multi-data source fusion theory system.

Key words: topic model, LDA2vec, research hotspot, LDA, word2vec, multisource data fusion

中图分类号: