图书情报工作 ›› 2020, Vol. 64 ›› Issue (11): 67-76.DOI: 10.13266/j.issn.0252-3116.2020.11.008

• 情报研究 • 上一篇    下一篇

多测度的突发词探测及验证研究

奉国和, 武佳佳, 莫幸清   

  1. 华南师范大学经济与管理学院信息管理系 广州 510006
  • 收稿日期:2019-05-08 修回日期:2019-09-21 出版日期:2020-06-05 发布日期:2020-06-05
  • 作者简介:奉国和(ORCID:0000-0002-0774-1544),教授,博士,E-mail:ghfeng@163.com;武佳佳(ORCID:0000-0002-7342-8388),硕士研究生;莫幸清(ORCID:000-0001-5481-0349),硕士研究生。
  • 基金资助:
    本文系广州市科技计划项目(基础与应用基础研究专题)"突发词探测理论、方法与应用研究"(项目编号:202002030384)研究成果之一。

Research on Detection and Verification of Burst Words with Multiple Measures

Feng Guohe, Wu Jiajia, Mo Xingqing   

  1. Information Managment Department, School of Economics & Management, South China Normal University, Guangzhou 510006
  • Received:2019-05-08 Revised:2019-09-21 Online:2020-06-05 Published:2020-06-05

摘要: [目的/意义] 为有效探测科技文献中潜在的研究热点,研究文献中关键词突发的特征条件,构建突发词识别模型对促进科研人员精确把握研究方向具有重要意义。[方法/过程] 获取各年度内关键词及词频,构建关键词-年度矩阵,将分析时间段划分为标准窗口、观察窗口和表现窗口,在观察窗口内利用多测度突发词探测模型识别具有突发特征的关键词;在表现窗口内利用LDA挖掘主题词汇作为热点词集合。设计突发词覆盖率指标,辅助滑动时间窗口法,计算不同时间窗口内突发词集合和热点词集合的覆盖率,验证模型识别准确性。[结果/结论] 3次滑动时间窗口,计算得到3次突发词覆盖率都在70%以上;与Citespace突发词的对照试验中,本模型3次覆盖率均大于前者,表明设计的突发词探测模型性能良好。

关键词: 突发词探测, 滑动时间窗口, 多测度, LDA主题挖掘

Abstract: [Purpose/significance] In order to effectively detect potential research hotspots in scientific and technological literature, to study the characteristic conditions of keyword emergencies in the literature, and to construct a model of burst word recognition is of great significance to promote scientific researchers to accurately grasp the research direction. [Method/process] This paper got keywords and word frequency in each year, constructed keyword-year matrix, divided the analysis period into standard window, observation window and performance window, used multi-measure burst word detection model to identify keywords with burst characteristics in the observation window, and used LDA to mine topic words as hot words set in the performance window. The coverage index of burst words was designed, and the sliding time window method was used to calculate the coverage of burst words and hot words in different time windows to verify the accuracy of model recognition. [Result/conclusion] The three sliding time windows calculated that the coverage of the three sudden words is more than 70%. In the control test with Citespace, the coverage of the model three times is greater than the former, indicating that the designed burst word detection model performs well.

Key words: burst word detection, sliding time window, multiple measures, LDA topic mining

中图分类号: