[目的/意义]微博在当前信息传播中起着重要作用,为有效预测微博热点及舆情导控,建立实时线性预警模型。[方法/过程]将采集的指标进行缺失值和异常值的处理后,对微博话题热度与大V影响力因子进行因子分析与逐步回归的比较,筛选出公共影响因子;再对其加权,探索不同权重调节因子下的最佳定量公式;用此公式每次输入当前时刻起前3小时的数据,预测当前时刻起后30分钟的加权值对应的话题词,每隔10分钟重新更新一遍参数。[结果/结论]实验证明该预测模型能大大降低数据采集解析和预测时间,保持较好的准确率,并可通过选择合适的阈值,进一步提升精确度。
[Purpose/significance] Microblog plays a significant role in information diffusion. Real-time linear model was set up in order to predict microblog hotspots and conduct public opinion effectively. [Method/process] Real-time linear model was used to predict the hot topic of microblog. Stepwise regression model was used to select impact factors affecting the hot topic of microblog. Missing value and outlier were processed. Comparison of microblog topics hotness and effectiveness factor of VIP was carried out according to factor analysis and stepwise regression. The common impact factors were filtered out and weighted. The appropriate formula was obtained by selecting different factors. The microblog data 3 hours before will be used before predicting microblog hot topic in 30 minutes. Parameters were updated every 10 minutes. [Result/conclusion] Experiments show that the prediction model could greatly reduce the time of data acquisition, analysis and prediction, maintain a relatively good accuracy. If a more appropriate threshold is selected, accuracy can be improved further.
[1] HUANG G B, ZHOU H, DING X, et al. Extreme learning machine for regression and multiclass classification[J]. IEEE transactions on systems, man & cybernetics:Part B, 2012, 42(2):513-529.
[2] ZHAO L J, TANG J, CHAI T Y. Modeling spectral data based on mutual information and kernel extreme learning machines[C]//International conference on advances in neural networks. Berlin:Springer-Verlag, 2012:29-36.
[3] 杨长春, 王天允, 叶施仁. 微博意见领袖舆情危机管理能力评判体系研究——基于危机生命周期视角[J]. 情报科学, 2016, 34(6):19-25.
[4] 孙江华, 张殊. 基于主成分分析和聚类分析的传统报纸微博影响力研究[J]. 现代传播(中国传媒大学学报), 2015, 37(4):141-143.
[5] 孙茜,陈盛双. 新浪微博用户的人气值计算模型评估[EB/OL].[2017-01-14].http://www.paper.edu.cn/releasepaper/content/201301-612.
[6] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5):58-63.
[7] HE Y, TAN J. Study on SINA micro-blog personalized recommendation based on semantic network[J]. Expert systems with applications, 2015, 42(10):4797-4804.
[8] 经管之家, 徐筱刚,常国珍,等. 如虎添翼!数据处理的SPSS和SAS EG实现[M]. 北京:电子工业出版社, 2016:62-68.
[9] 李英乐. 微博传播效果预测技术研究[D]. 郑州:解放军信息工程大学, 2013.
[10] 郝建波. 微博突发话题检测、跟踪与传播预测技术研究[D]. 哈尔滨:哈尔滨工程大学, 2013.
[11] 郭景峰, 米浦波, 刘国华. 基于决策树的数据遗失值填充方法的研究[J]. 计算机工程与科学, 2002, 24(5):8-10.
[12] JANG C, YOUN B D, WANG P F, et al. Forward-stepwise regression analysis for fine leak batch testing of wafer-level hermetic MEMS packages[J]. Microelectronics reliability, 2010, 50(4):507-513.
[13] 刘功申,孟魁,谢婧. 一种微博预警算法[J]. 计算机科学,2014(12):33-37.
[14] 张金伟,刘晓平. 基于心理预警模型的微博情感识别研究[J]. 合肥工业大学学报(自然科学版),2013(11):1318-1322.
[15] PENG Y, WANG H. CMPK:a high accuracy microblog user classification method for professional analysis[C]//International conference on cloud and service computing. Piscataway:IEEE, 2014:134-139.