图书情报工作 ›› 2020, Vol. 64 ›› Issue (21): 67-76.DOI: 10.13266/j.issn.0252-3116.2020.21.010

• 情报研究 • 上一篇    下一篇

异质信息网络嵌入视角下公安微博传播预测研究

孙冉1, 安璐1,2   

  1. 1. 武汉大学信息管理学院 武汉 430072;
    2. 武汉大学信息资源研究中心 武汉 430072
  • 收稿日期:2020-04-27 修回日期:2020-08-04 出版日期:2020-11-05 发布日期:2020-11-05
  • 通讯作者: 安璐(ORCID:0000-0002-5408-7135),教授,博士生导师,通讯作者,E-mail:anlu97@163.com
  • 作者简介:孙冉(ORCID:0000-0002-3597-0096),博士研究生。
  • 基金资助:
    本文系教育部哲学社会科学研究重大课题攻关项目"提高反恐怖主义情报信息工作能力对策研究"(项目编号:17JZD034)、国家自然科学基金重大课题"国家安全大数据综合信息集成与分析方法"(项目编号:71790612)和国家自然科学基金创新研究群体项目"信息资源管理"(项目编号:71921002)研究成果之一。

Propagation Prediction of Police Microblog Entries Based on Heterogeneous Information Network

Sun Ran1, An Lu1,2   

  1. 1 School of Information Management, Wuhan University, Wuhan 430072;
    2 Center for Studies of Information Resources, Wuhan University, Wuhan 430072
  • Received:2020-04-27 Revised:2020-08-04 Online:2020-11-05 Published:2020-11-05
  • Supported by:
     

摘要: [目的/意义] 预测用户是否转发、评论通缉微博,研究及评估影响通缉微博传播的重要特征,有助于公安微博提升其运营绩效,增强警民之间的沟通和合作。[方法/过程] 针对通缉微博的特点,在抽取通缉微博的用户特征、时间特征、微博文本结构特征的基础上,提取通缉微博中的案件特征,包含案件地点关键字、时间关键字、通缉令等级、有无悬赏等,利用xgboost算法计算不同特征在转发、评论预测中的重要性,并结合传播网络特征和节点属性,构建基于特征属性异质信息网络嵌入的公安微博传播预测模型,并对模型进行训练和评估。[结果/结论] 预测模型在转发、评论数据集上的AUC值分别达到0.737和0.799。由于该模型融合了网络结构特征和不同节点属性,更贴近现实的异质信息网络,相比传统的链接预测模型精确度更高。另外,特征重要性实验结果表明,所提出的案件关键字特征在影响微博转发、评论预测的所有特征中重要性最高。

 

关键词: 信息传播, 公安微博, 链接预测, 图表示学习, 异质信息网络

Abstract: [Purpose/significance] This study aimed to predict whether microblog users would retweet or comment on the microblog entries containing wanted information. We also evaluated the important features that affected the spread of wanted microblog entries to help the public security departments improve their operation performance and enhance the communication and cooperation between the police and the public. [Method/process] Based on the characteristics of the wanted microblogging, we combined user features, time features and structure features, and extracted event features in microblog entries, such as location keywords, time keywords, the wanted level and so on. The Xgboost algorithm was used to calculate the importance of different features in the retweet and comment prediction. In combination with the features of transmission network and node attributes, we trained and evaluated a prediction model based on heterogeneous information network embedding. [Result/conclusion] The values of the AUC in retweeting and commenting data sets are 0.737 and 0.799 respectively. As the model integrated network structure characteristics and different nodes' attributes, it was closer to the heterogeneous information network in reality and had higher accuracy than the traditional link prediction model. In addition, the result of features' importance showed that the keyword features of the proposed event features had the highest importance among all the features that affected the prediction of microblog entries retweeted and commented.

Key words: information dissemination, public security microblog, link prediction, graph representation learning, heterogeneous information network

中图分类号: