[Purpose/significance] In order to effectively detect potential research hotspots in scientific and technological literature, to study the characteristic conditions of keyword emergencies in the literature, and to construct a model of burst word recognition is of great significance to promote scientific researchers to accurately grasp the research direction. [Method/process] This paper got keywords and word frequency in each year, constructed keyword-year matrix, divided the analysis period into standard window, observation window and performance window, used multi-measure burst word detection model to identify keywords with burst characteristics in the observation window, and used LDA to mine topic words as hot words set in the performance window. The coverage index of burst words was designed, and the sliding time window method was used to calculate the coverage of burst words and hot words in different time windows to verify the accuracy of model recognition. [Result/conclusion] The three sliding time windows calculated that the coverage of the three sudden words is more than 70%. In the control test with Citespace, the coverage of the model three times is greater than the former, indicating that the designed burst word detection model performs well.
Feng Guohe
,
Wu Jiajia
,
Mo Xingqing
. Research on Detection and Verification of Burst Words with Multiple Measures[J]. Library and Information Service, 2020
, 64(11)
: 67
-76
.
DOI: 10.13266/j.issn.0252-3116.2020.11.008
[1] 关鹏, 王曰芬.基于LDA主题模型和生命周期理论的科学文献主题挖掘[J].情报学报, 2015, 34(3):286-299.
[2] KLEINBERG J.Bursty and hierarchical structure in streams[J]. Data mining & knowledge discovery, 2003, 7(4):373-397.
[3] 郑乐丹.基于突发检测的我国数字图书馆研究前沿及其演进分析[J].图书馆论坛, 2013, 33(1):47-51.
[4] CHEN C M. CitespaceII:detecting and visualizing emerging trends and transient patterns in scientific literature[J]. Journal of the Association for Information Science & Technology, 2006, 57(3):359-377.
[5] 杨选辉,蔡志强.基于突变检测与共词分析的关联数据新兴趋势探测[J].情报科学, 2018, 36(11):164-168.
[6] 唐晓彬,周志敏,董莉.大数据背景下网络突发事件动态监测研究[J].统计研究, 2017, 34(2):46-56.
[7] 卓可秋,虞为,苏新宁.突发事件检测的MapReduce并行化实现[J].现代图书情报技术, 2015(2):46-54.
[8] 陈国兰.基于爆发词识别的微博突发事件监测方法研究[J].情报杂志, 2014, 33(9):123-128.
[9] 逯万辉,马建霞.基于CRFs的领域爆发词识别的研究与实现[J].情报科学, 2014, 32(1):89-93.
[10] 介飞,谢飞,李磊,等.社交网络中隐式事件突发性检测[J].自动化学报, 2018, 44(4):730-742.
[11] XIE W, ZHU F, JIANG J, et al. TopicSketch:real-time bursty topic detection from Twitter[J].IEEE transactions on knowledge and data engineering,2016,28(8):2216-2229.
[12] 王莉亚.基于关键词突变的主题突变研究[J].情报理论与实践, 2013, 36(11):45-48.
[13] 王征,易莉,赵磊.基于突发词检测的科研热点发掘服务模型研究[J].情报杂志, 2015, 34(12):176-180.
[14] 张金柱,吕品.基于主题关联度改进的主题演变和突变分析[J].情报理论与实践, 2018, 41(3):129-135.
[15] 姜鑫,王德庄,马海群.关键词词频变化视角下我国"科学数据"领域研究主题演化分析[J]. 现代情报, 2018, 38(1):141-146,161.
[16] SHI L,DU J P,LIANG M Y. Strm:a sparse rnn-topic model for discovering bursty topics in big data of social networks[J]. Journal of information science and engineering, 2019, 35(4):749-767.
[17] 傅柱,王曰芬.共词分析中术语收集阶段的若干问题研究[J].情报学报,2016,35(7):704-713.
[18] 刘敏娟,张学福,颜蕴.基于词频、词量、累积词频占比的共词分析词集范围选取方法研究[J].图书情报工作,2016,60(23):135-142.
[19] Wikipedia.Long tail[EB/OL].[2019-09-08]. https://en.wikipedia.org/wiki/Long_tail.
[20] 徐剑,黄秋月. "二八定律"在图书馆管理中的应用[J].中国图书馆学报, 2007(5):106-108.
[21] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J].Journal of machine learning research, 2003, 3(4/5):993-1022.
[22] 王建.基于多特征融合的微博突发事件检测方法研究[D]. 北京:北京信息科技大学, 2018.
[23] 马文建.基于突发词检测的中文专利预警系统[D]. 北京:北京工业大学, 2016.
[24] 安璐,杜廷尧,李纲,等.突发公共卫生事件利益相关者在社交媒体中的关注点及演化模式[J].情报学报, 2018, 37(4):394-405.