Research on Identification Methods of Scientific Research Hotspots Under Multi-source Data

  • Qiu Huilin ,
  • Shao Bo
Expand
  • 1. School of Information Management, Nanjing University, Nanjing 210046;
    2. Nanjing University Library, Nanjing 210046

Received date: 2019-05-20

  Revised date: 2019-09-16

  Online published: 2020-03-05

Abstract

[Purpose/significance] In scientific research, identifying mining scientific research hotspots from different sources of scientific literature is of guiding significance for carrying out the next scientific research work. It aims to quickly and accurately identify hot topics contained in multi-source texts through the model method proposed in this study, and provide support services for scientific research innovation.[Method/process] This paper proposed a method based on LDA2vec model for multi-source text research hotspot identification and built a model for scientific research hotspot identification. This method combined the advantages of LDA topic model on implicit semantic mining and the context of Word2Vec word vector model. Taking the scientific literature in the field of machine learning as an example, the model extraction degree (perplexity) and topic coherence (topic coherence) were used to compare the topic extraction effects of LDA2vec and LDA in the context of multi-source text.[Result/conclusion] After experiments, the results show that the method proposed in this paper is feasible and can be improved to some extent in the face of multi-source data. The method can relatively quickly and accurately identify the hot content in the multi-data source text, make up for the shortcoming of the single analysis data source for subject detection, and enrich the practical application of the multi-data source fusion theory system.

Cite this article

Qiu Huilin , Shao Bo . Research on Identification Methods of Scientific Research Hotspots Under Multi-source Data[J]. Library and Information Service, 2020 , 64(5) : 78 -88 . DOI: 10.13266/j.issn.0252-3116.2020.05.009

References

[1] 邱均平,温芳芳.近五年来图书情报学研究热点与前沿的可视化分析——基于13种高影响力外文源刊的计量研究[J].中国图书馆学报,2011,37(2):51-60.
[2] 任红娟.文献特征融合的的科学结构分析方法研究[J].情报杂志,2013,32(7):97-100.
[3] MORRIS S A, YEN G, WU Z, et al. Time line visualization of research fronts[J]. Journal of the Association for Information Science & Technology,2003,54(5):413-422.
[4] SMALL H. Co-citation in the scientific literature:a new measure of the relationship between two documents[J]. Journal of the Association for Information Science & Technology,1973,24(4):265-269.
[5] 雷晓庆,刘晓雁.论文关键词特征的统计与分析[J].图书情报工作,1998(5):19-20,32.
[6] 曾倩,杨思洛.国外图书情报学科知识交流的比较研究——以期刊引证分析为视觉[J].情报理论与实践,2013,36(10):114-119.
[7] 智库百科[EB/OL].[2019-02-20].https://wiki.mbalib.com/wiki/知识单元.2018-12-02-2019-02-28.
[8] 王晓光.科学知识网络的形成与演化(I):共词网络方法的提出[J].情报学报, 2009,28(4):599-605.
[9] 祝清松,冷伏海.基于引文内容分析的高被引论文主题识别研究[J].中国图书馆学报, 2014,40(1):39-49.
[10] 杨超,朱东华,汪雪锋,等.专利技术主题分析:基于SAO结构的LDA主题模型方法[J].图书情报工作,2017(3):86-96.
[11] 阮光册,夏磊.基于Doc2Vec的期刊论文热点选题识别[J].情报理论与实践,2019, 42(4):110-115.
[12] 赵一方,裴雷,康乐乐.基于段落信息增益的政策文本主题识别研究[J].数字图书馆论坛,2018(11):2-10.
[13] LOWE S A. The beta-binomial mixture model for word frequencies in documents with applications to information retrieval[C]//Proceedings of the sixth European conference on speech communication and technology. Budapest:Dragon System Inc,1999.
[14] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003, 3:993-1022.
[15] 冯佳,张云秋.基于本体的研究主题语义分析方法研究[J].图书情报工作, 2018,62(7):96-103.
[16] 王洪伟,高松,陆頲.基于LDA和SNA的在线新闻热点识别研究[J].情报学报, 2016(10):1022-1037.
[17] 李永忠,蔡佳.基于LDA的国内电子政务研究主题演化及可视化分析[J].现代情报, 2017,37(4):158-164.
[18] 叶春蕾,冷伏海.基于引文-主题概率模型的科技文献主题识别方法研究[J].情报理论与实践,2013,36(9):100-103.
[19] 王连喜.国内微博研究热点分析及主题挖掘——以计算机和图书情报学科为研究对象[J].情报杂志, 2015(4):127-132.
[20] 马红,蔡永明.共词网络LDA模型的中文文本主题分析:以交通法学文献(2000-2016)为例[J].数据分析与知识发现,2017,32(12):17-26.
[21] 沈思,徐飞,吴鹏.面向科学研究主题的文献隐含时间信息分析与挖掘[J].情报学报,2017,36(4):370-381.
[22] 蒲姗姗.基于知识互补的科研合作专家推荐模型研究[J].情报理论与实践, 2018,41(8):100-105.
[23] 周娜,李秀霞,高丹,等.基于潜在主题的知识组合分析研究——以传播学为例[J].农业图书情报学刊, 2018(9):85-90.
[24] 刘玉文,吴宣够,郭强.网络热点新闻焦点识别与演化跟踪[J].小型微型计算机系统, 2017(4):738-743.
[25] 张聪,易秀双,朱明浩,等.一种基于Spark学术研究热点的挖掘方法[J/OL].计算机工程,2019.[2019-02-20]. http://kns.cnki.net/kcms/detail/31.1289.TP.20190129.1332.005.html.
[26] 关鹏,王曰芬.学科领域生命周期中作者研究兴趣演化分析[J].图书情报工作, 2016,60(10):116-124.
[27] HOFMANN T. Probabilistic latent semantic analysis[C]//Fifteenth conference on uncertainty in artificial intelligence. Berkeley:Morgan kaufmann publishers Inc.,1999:289-296.
[28] MOODY C E. Mixing dirichlet topic models and word embedding to make LDA2vec[EB/OL].[2019-02-20]. http://arxiv.org/abs/1605.02019.
[29] 化柏林,李广建.大数据环境下的多源融合型竞争情报研究[J].情报理论与实践, 2015,38(4):1-5.
[30] 化柏林,李广建.大数据环境下多源信息融合的理论与应用探讨[J].图书情报工作, 2015,59(16):5-10.
[31] 化柏林.多源信息融合方法研究[J].情报理论与实践,2013,36(11):16-19.
Outlines

/