INFORMATION RESEARCH

Research on Weak Signal Recognition Based on LDA-BERT Fusion Model

  • Yang Bo ,
  • Shao Wanting
Expand
  • 1 School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330013;
    2 Institute of Information Resources Management, Jiangxi University of Finance and Economics, Nanchang 330013

Received date: 2021-04-11

  Revised date: 2021-06-11

  Online published: 2021-08-20

Abstract

[Purpose/significance] Aiming at the problem that the existing weak signal automatic recognition research is still incomplete, this paper proposes a weak signal automatic recognition method based on the LDA-BERT fusion model.[Method/process] Based on the unsupervised LDA topic model, the text data set was classified by topic, and the topic and term double-layer filter function was constructed to extract early warning signals from the results of topic classification.The weakness of the topic was evaluated by the three major metrics of close centrality, topic weight and topic autocorrelation, and weak signals were extracted based on the normalized frequency and probability of terms within the topic. Finally, the BERT deep learning model was used to expand the weak signal context and similar words from the semantic level.[Result/conclusion] Taking the re-eruption of the epidemic in early January 2021 as an example, the constructed system model was verified using the social media news data set of the three months before the outbreak. The experimental results show that the method can effectively detect the relevant weak signals and dig out the evolution characteristics of the weak signals that gradually increase over time. In addition, the fusion model not only realizes the automatic identification of weak signals, but also shows stronger result interpretability than a single model.

Cite this article

Yang Bo , Shao Wanting . Research on Weak Signal Recognition Based on LDA-BERT Fusion Model[J]. Library and Information Service, 2021 , 65(16) : 98 -107 . DOI: 10.13266/j.issn.0252-3116.2021.16.011

References

[1] 吴金红,张飞,鞠秀芳.大数据:企业竞争情报的机遇、挑战及对策研究[J].情报杂志,2013,32(1):5-9.
[2] 邵波, 宋继伟. 反竞争情报预警中的风险识别及排序[J]. 情报理论与实践, 2007, 30(5):642-645.
[3] WISSEMA H. Driving through red lights[J]. Long range planning, 2002, 35(5):521-539.
[4] MUHLROTH C,GROTTKE M. A systematic literature review of mining weak signals and trends for corporate foresight[J]. Journal of business economics, 2018, 88(5):643-687.
[5] 蒋甜,刘小平,刘会洲.基于关键词关联度指标(KRI)进行LDA噪声主题过滤的方法研究[J].图书情报工作,2020,64(3):92-99.
[6] YOON J. Detecting weak signals for long-term business opportunities using text mining of Web news[J]. Expert systems with applications, 2012, 39(16):12543-12550.
[7] COFFMAN B. Weak signal research, part I:introduction[EB/OL].[2021-07-10].http://legacy.mgtaylor.com/mgtaylor/jotm/winter97/jotmwi97.htm.
[8] ROSSEL P. Weak signals as a flexible framing space for enhanced management and decision-making[J]. Technology analysis and strategic management, 2009, 21(3):307-320.
[9] MENDONA S,PINAEC M,KAIVO-OJA J,et al. Wild cards,weak signals and organisational improvisation[J].Futures,2004,36(2):201-218.
[10] SANDRO M,GUSTAVO C,JOAO C. The strategic strength of weak signal anal-ysis[J]. Futures, 2012, 44(3):218-228.
[11] IGOR ANSOFF H. Managing strategic surprise by response to weak signals[J]. California management review, 1975, 18(2):21-33.
[12] HOLOPAINEN M, TOIVONEN M. Weak signals:ansoff today[J]. Futures, 2012, 44(3):198-205.
[13] 沈固朝. 信号分析:竞争情报研究的又一重要课题[J]. 图书情报工作, 2009, 53(20):11-59.
[14] 单彬. 认知视角下的弱信号分析及实证研究[D].北京:中国人民解放军军事医学科学院,2014.
[15] 赵小康.弱信号:识别、探测与应对[J].情报杂志,2010,29(1):159-163.
[16] GRIOL-BARRES I, MILLA S, CEBRIÁN A, et al. Detecting weak signals of the future:a system implementation based on text mining and natural language pro-cessing[J]. Sustainability, 2020, 12(19):1-22.
[17] GRIOL-BARRES I, MILLA S, MILLET J. System implementation for detection of future weak signals using text mining[J]. Revista española de documentación científica, 2019, 42(2):e234-e234.
[18] 邓胜利,林艳青,王野.企业竞争弱信号的特征提取与定量识别研究[J].图书情报工作,2016,60(10):67-75.
[19] HIRSCHBERG J, MANNING C D. Advances in natural language processing[J]. Science,2015, 349(6245):261-266.
[20] YOUNG T, HAZARIKA D, PORIA S, et al, Recent trends in deep learning based natural language processing[J].Journal of engineering, 2018, 13(3):55-75.
[21] DIENG A B, RUIZ F J R, BLEI D M. Topic modeling in embedding spaces[J]. Transactions of the Association for Computational Linguistics, 2020, 8:439-453.
[22] PEPIN L, KUNTZ P, BLANCHARD J, et al. Visual analytics for exploring topic long-term evolution and detecting weak signals in company targeted Tweets[J]. Computers & industrial engineering, 2017, 112(2):450-458.
[23] GUTSCHE T. Automatic weak signal detection and forecasting[D]. Enschede:University of Twente, 2018.
[24] 庄穆妮,李勇,谭旭,等.基于BERT-LDA模型的新冠肺炎疫情网络舆情演化仿真[J].系统仿真学报,2021,33(1):24-36.
[25] MAITRE J, MÉNARD M, CHIRON G, et,al. A meaningful information extrac-tion system for interactive analysis of documents[C]//2019 international conference on document analysis and recognition. Sydney:IEEE. 2019.92-99.
[26] LEE K, FILANNINO M, UZUNER Ö. An empirical test of GRUs and deep contextualized word representations on de-identification[J]. Studies in health technology and informatics, 2019, 264(5):218-222.
[27] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of ma-chine learning research, 2003(3):993-1022.
[28] 赵凯,王鸿源.LDA最优主题数选取方法研究:以CNKI文献为例[J].统计与决策,2020,36(16):175-179.
[29] CHANG J, GERRISH S, WANG C,et al.Reading tea leaves:how humans inter-pret topic models[C]//Neural information processing systems. New York:Curran Associates.2009:288-296.
[30] NEWMAN D, LAU J H, GRIESER K, et al. Automatic evaluation of topic coherence[C]//The 2010 annual conference of the North American chapter of the Association for Computational Linguistics. Los Angeles:Association for Computational Linguistics. 2010:100-108.
[31] 黄佳佳,李鹏伟,彭敏,等.基于深度学习的主题模型研究[J].计算机学报,2020,43(5):827-855.
[32] RODER M, BOTH A, HINNEBURG A. Exploring the space of topic co-herence measures[C]//Proceedings of the eighth ACM international conference on Web search and data mining. New York:Association for Computing Machinery, 2015:399-408.
[33] YOKOYAMA S, SANADA H. Logistic regression model for predicting language change[A]//KOHLER R. Issues in quantitative linguistics. Lüdenscheid:RAM-Verlag, 2009.
[34] THORLEUCHTER D, POEL D. Weak signal identification with semantic Web mining[J]. Expert systems with applications, 2013, 40(12):4978-4985.
[35] CHUANG J, MANNING C D, HEER J, Termite:visualization techniques for assessing textual topic models[C]//Proceedings of the international working conference on advanced visual interfaces. New York:Association for Computing Machinery, 2012:74-77.
[36] SIEVERT C, SHIRLEY K. LDAvis:a method for visualizing and interpreting topics[C]//Proceedings of the workshop on interactive language learning, visualization, and interfaces. Baltimore:Association for Computational Linguistics, 2014:63-70.
[37] DEVLIN J, CHANG M W, LEE K, et al. BERT:pre-training of deep bidirectional transformers for language understanding[C/OL].NAACL-HLT,2019(1).[2021-05-25]. https://arxiv.org/abs/1810.04805.
[38] FRANKLAND R, SMITH A D, SHARPE J, et al. Calibration of VaR models with overlapping data[J]. British actuarial journal,2019(24).[2021-06-25]. http://dx.doi.org/10.1017/S1357321719000151.
[39] EL AKROUCHI M, BENBRAHIM H, KASSOU I. Early warning signs detection in competitive intelligence[C]//The 25th International Business Information Management Association conference. Amsterdam:Association for Computing Machinery, 2015:512-524.
[40] BLANCO S, LESCA H.Business intelligence:integrating knowledge into selec-tion of early warning signals[EB/OL].[2021-06-25].http://veille-strategique.eolas-services.com.
Outlines

/