图书情报工作 ›› 2021, Vol. 65 ›› Issue (16): 98-107.DOI: 10.13266/j.issn.0252-3116.2021.16.011

• 情报研究 • 上一篇    下一篇

基于LDA-BERT融合模型的弱信号识别研究【涉嫌严重学术不端撤稿】

杨波1,2, 邵婉婷1,2   

  1. 1 江西财经大学信息管理学院 南昌 330013;
    2 江西财经大学信息资源管理研究所 南昌 330013
  • 收稿日期:2021-04-11 修回日期:2021-06-11 出版日期:2021-08-20 发布日期:2021-08-20
  • 作者简介:杨波(ORCID:0000-0001-6012-9007),副教授,博士,博士生导师,E-mail:yangbo@jxufe.edu.cn;邵婉婷(ORCID:0000-0002-0700-0113),硕士研究生。
  • 基金资助:
    本文系国家自然科学基金项目"基于免疫方法的新创企业成长风险管理知识服务模型研究"(项目编号:72064015)和江西省社会科学规划重点项目"面向新创企业成长风险管理的知识服务机制研究"(项目编号:19TQ01)研究成果之一。

Research on Weak Signal Recognition Based on LDA-BERT Fusion Model

Yang Bo1,2, Shao Wanting1,2   

  1. 1 School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330013;
    2 Institute of Information Resources Management, Jiangxi University of Finance and Economics, Nanchang 330013
  • Received:2021-04-11 Revised:2021-06-11 Online:2021-08-20 Published:2021-08-20

摘要: [目的/意义] 针对现有弱信号全自动识别研究尚不完善的问题,提出基于LDA-BERT融合模型的弱信号全自动识别方法。[方法/过程] 基于无监督的LDA主题模型对文本数据集进行主题分类,构建主题和术语双层过滤函数从主题分类的结果中提取早期预警信号,通过紧密中心度、主题权重以及主题自相关性三大度量函数评价主题的弱性,并基于主题内术语的归一化频率和概率提取出弱信号。最后,运用BERT深度学习模型从语义层面对弱信号上下文及其类似词进行扩展。[结果/结论] 以2021年1月初疫情重爆发事件为例,使用爆发前三月的社交媒体新闻数据集对构建的系统模型进行验证。实验结果表明,该方法可有效检测出相关弱信号,并挖掘出弱信号随时间推移逐渐增强的演化特性。此外,该融合模型在实现弱信号全自动识别的同时,也表现出较单一模型更强的结果可解释能力。

关键词: 弱信号, LDA-BERT融合模型, 新冠肺炎疫情

Abstract: [Purpose/significance] Aiming at the problem that the existing weak signal automatic recognition research is still incomplete, this paper proposes a weak signal automatic recognition method based on the LDA-BERT fusion model.[Method/process] Based on the unsupervised LDA topic model, the text data set was classified by topic, and the topic and term double-layer filter function was constructed to extract early warning signals from the results of topic classification.The weakness of the topic was evaluated by the three major metrics of close centrality, topic weight and topic autocorrelation, and weak signals were extracted based on the normalized frequency and probability of terms within the topic. Finally, the BERT deep learning model was used to expand the weak signal context and similar words from the semantic level.[Result/conclusion] Taking the re-eruption of the epidemic in early January 2021 as an example, the constructed system model was verified using the social media news data set of the three months before the outbreak. The experimental results show that the method can effectively detect the relevant weak signals and dig out the evolution characteristics of the weak signals that gradually increase over time. In addition, the fusion model not only realizes the automatic identification of weak signals, but also shows stronger result interpretability than a single model.

Key words: weak signals, LDA-BERT model, new crown pneumonia epidemic

中图分类号: