情报研究

基于LDA-BERT融合模型的弱信号识别研究【涉嫌严重学术不端撤稿】

  • 杨波 ,
  • 邵婉婷
展开
  • 1 江西财经大学信息管理学院 南昌 330013;
    2 江西财经大学信息资源管理研究所 南昌 330013
杨波(ORCID:0000-0001-6012-9007),副教授,博士,博士生导师,E-mail:yangbo@jxufe.edu.cn;邵婉婷(ORCID:0000-0002-0700-0113),硕士研究生。

收稿日期: 2021-04-11

  修回日期: 2021-06-11

  网络出版日期: 2021-08-20

基金资助

本文系国家自然科学基金项目"基于免疫方法的新创企业成长风险管理知识服务模型研究"(项目编号:72064015)和江西省社会科学规划重点项目"面向新创企业成长风险管理的知识服务机制研究"(项目编号:19TQ01)研究成果之一。

Research on Weak Signal Recognition Based on LDA-BERT Fusion Model

  • Yang Bo ,
  • Shao Wanting
Expand
  • 1 School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330013;
    2 Institute of Information Resources Management, Jiangxi University of Finance and Economics, Nanchang 330013

Received date: 2021-04-11

  Revised date: 2021-06-11

  Online published: 2021-08-20

摘要

[目的/意义] 针对现有弱信号全自动识别研究尚不完善的问题,提出基于LDA-BERT融合模型的弱信号全自动识别方法。[方法/过程] 基于无监督的LDA主题模型对文本数据集进行主题分类,构建主题和术语双层过滤函数从主题分类的结果中提取早期预警信号,通过紧密中心度、主题权重以及主题自相关性三大度量函数评价主题的弱性,并基于主题内术语的归一化频率和概率提取出弱信号。最后,运用BERT深度学习模型从语义层面对弱信号上下文及其类似词进行扩展。[结果/结论] 以2021年1月初疫情重爆发事件为例,使用爆发前三月的社交媒体新闻数据集对构建的系统模型进行验证。实验结果表明,该方法可有效检测出相关弱信号,并挖掘出弱信号随时间推移逐渐增强的演化特性。此外,该融合模型在实现弱信号全自动识别的同时,也表现出较单一模型更强的结果可解释能力。

本文引用格式

杨波 , 邵婉婷 . 基于LDA-BERT融合模型的弱信号识别研究【涉嫌严重学术不端撤稿】[J]. 图书情报工作, 2021 , 65(16) : 98 -107 . DOI: 10.13266/j.issn.0252-3116.2021.16.011

Abstract

[Purpose/significance] Aiming at the problem that the existing weak signal automatic recognition research is still incomplete, this paper proposes a weak signal automatic recognition method based on the LDA-BERT fusion model.[Method/process] Based on the unsupervised LDA topic model, the text data set was classified by topic, and the topic and term double-layer filter function was constructed to extract early warning signals from the results of topic classification.The weakness of the topic was evaluated by the three major metrics of close centrality, topic weight and topic autocorrelation, and weak signals were extracted based on the normalized frequency and probability of terms within the topic. Finally, the BERT deep learning model was used to expand the weak signal context and similar words from the semantic level.[Result/conclusion] Taking the re-eruption of the epidemic in early January 2021 as an example, the constructed system model was verified using the social media news data set of the three months before the outbreak. The experimental results show that the method can effectively detect the relevant weak signals and dig out the evolution characteristics of the weak signals that gradually increase over time. In addition, the fusion model not only realizes the automatic identification of weak signals, but also shows stronger result interpretability than a single model.

参考文献

[1] 吴金红,张飞,鞠秀芳.大数据:企业竞争情报的机遇、挑战及对策研究[J].情报杂志,2013,32(1):5-9.
[2] 邵波, 宋继伟. 反竞争情报预警中的风险识别及排序[J]. 情报理论与实践, 2007, 30(5):642-645.
[3] WISSEMA H. Driving through red lights[J]. Long range planning, 2002, 35(5):521-539.
[4] MUHLROTH C,GROTTKE M. A systematic literature review of mining weak signals and trends for corporate foresight[J]. Journal of business economics, 2018, 88(5):643-687.
[5] 蒋甜,刘小平,刘会洲.基于关键词关联度指标(KRI)进行LDA噪声主题过滤的方法研究[J].图书情报工作,2020,64(3):92-99.
[6] YOON J. Detecting weak signals for long-term business opportunities using text mining of Web news[J]. Expert systems with applications, 2012, 39(16):12543-12550.
[7] COFFMAN B. Weak signal research, part I:introduction[EB/OL].[2021-07-10].http://legacy.mgtaylor.com/mgtaylor/jotm/winter97/jotmwi97.htm.
[8] ROSSEL P. Weak signals as a flexible framing space for enhanced management and decision-making[J]. Technology analysis and strategic management, 2009, 21(3):307-320.
[9] MENDONA S,PINAEC M,KAIVO-OJA J,et al. Wild cards,weak signals and organisational improvisation[J].Futures,2004,36(2):201-218.
[10] SANDRO M,GUSTAVO C,JOAO C. The strategic strength of weak signal anal-ysis[J]. Futures, 2012, 44(3):218-228.
[11] IGOR ANSOFF H. Managing strategic surprise by response to weak signals[J]. California management review, 1975, 18(2):21-33.
[12] HOLOPAINEN M, TOIVONEN M. Weak signals:ansoff today[J]. Futures, 2012, 44(3):198-205.
[13] 沈固朝. 信号分析:竞争情报研究的又一重要课题[J]. 图书情报工作, 2009, 53(20):11-59.
[14] 单彬. 认知视角下的弱信号分析及实证研究[D].北京:中国人民解放军军事医学科学院,2014.
[15] 赵小康.弱信号:识别、探测与应对[J].情报杂志,2010,29(1):159-163.
[16] GRIOL-BARRES I, MILLA S, CEBRIÁN A, et al. Detecting weak signals of the future:a system implementation based on text mining and natural language pro-cessing[J]. Sustainability, 2020, 12(19):1-22.
[17] GRIOL-BARRES I, MILLA S, MILLET J. System implementation for detection of future weak signals using text mining[J]. Revista española de documentación científica, 2019, 42(2):e234-e234.
[18] 邓胜利,林艳青,王野.企业竞争弱信号的特征提取与定量识别研究[J].图书情报工作,2016,60(10):67-75.
[19] HIRSCHBERG J, MANNING C D. Advances in natural language processing[J]. Science,2015, 349(6245):261-266.
[20] YOUNG T, HAZARIKA D, PORIA S, et al, Recent trends in deep learning based natural language processing[J].Journal of engineering, 2018, 13(3):55-75.
[21] DIENG A B, RUIZ F J R, BLEI D M. Topic modeling in embedding spaces[J]. Transactions of the Association for Computational Linguistics, 2020, 8:439-453.
[22] PEPIN L, KUNTZ P, BLANCHARD J, et al. Visual analytics for exploring topic long-term evolution and detecting weak signals in company targeted Tweets[J]. Computers & industrial engineering, 2017, 112(2):450-458.
[23] GUTSCHE T. Automatic weak signal detection and forecasting[D]. Enschede:University of Twente, 2018.
[24] 庄穆妮,李勇,谭旭,等.基于BERT-LDA模型的新冠肺炎疫情网络舆情演化仿真[J].系统仿真学报,2021,33(1):24-36.
[25] MAITRE J, MÉNARD M, CHIRON G, et,al. A meaningful information extrac-tion system for interactive analysis of documents[C]//2019 international conference on document analysis and recognition. Sydney:IEEE. 2019.92-99.
[26] LEE K, FILANNINO M, UZUNER Ö. An empirical test of GRUs and deep contextualized word representations on de-identification[J]. Studies in health technology and informatics, 2019, 264(5):218-222.
[27] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of ma-chine learning research, 2003(3):993-1022.
[28] 赵凯,王鸿源.LDA最优主题数选取方法研究:以CNKI文献为例[J].统计与决策,2020,36(16):175-179.
[29] CHANG J, GERRISH S, WANG C,et al.Reading tea leaves:how humans inter-pret topic models[C]//Neural information processing systems. New York:Curran Associates.2009:288-296.
[30] NEWMAN D, LAU J H, GRIESER K, et al. Automatic evaluation of topic coherence[C]//The 2010 annual conference of the North American chapter of the Association for Computational Linguistics. Los Angeles:Association for Computational Linguistics. 2010:100-108.
[31] 黄佳佳,李鹏伟,彭敏,等.基于深度学习的主题模型研究[J].计算机学报,2020,43(5):827-855.
[32] RODER M, BOTH A, HINNEBURG A. Exploring the space of topic co-herence measures[C]//Proceedings of the eighth ACM international conference on Web search and data mining. New York:Association for Computing Machinery, 2015:399-408.
[33] YOKOYAMA S, SANADA H. Logistic regression model for predicting language change[A]//KOHLER R. Issues in quantitative linguistics. Lüdenscheid:RAM-Verlag, 2009.
[34] THORLEUCHTER D, POEL D. Weak signal identification with semantic Web mining[J]. Expert systems with applications, 2013, 40(12):4978-4985.
[35] CHUANG J, MANNING C D, HEER J, Termite:visualization techniques for assessing textual topic models[C]//Proceedings of the international working conference on advanced visual interfaces. New York:Association for Computing Machinery, 2012:74-77.
[36] SIEVERT C, SHIRLEY K. LDAvis:a method for visualizing and interpreting topics[C]//Proceedings of the workshop on interactive language learning, visualization, and interfaces. Baltimore:Association for Computational Linguistics, 2014:63-70.
[37] DEVLIN J, CHANG M W, LEE K, et al. BERT:pre-training of deep bidirectional transformers for language understanding[C/OL].NAACL-HLT,2019(1).[2021-05-25]. https://arxiv.org/abs/1810.04805.
[38] FRANKLAND R, SMITH A D, SHARPE J, et al. Calibration of VaR models with overlapping data[J]. British actuarial journal,2019(24).[2021-06-25]. http://dx.doi.org/10.1017/S1357321719000151.
[39] EL AKROUCHI M, BENBRAHIM H, KASSOU I. Early warning signs detection in competitive intelligence[C]//The 25th International Business Information Management Association conference. Amsterdam:Association for Computing Machinery, 2015:512-524.
[40] BLANCO S, LESCA H.Business intelligence:integrating knowledge into selec-tion of early warning signals[EB/OL].[2021-06-25].http://veille-strategique.eolas-services.com.
文章导航

/