图书情报工作 ›› 2020, Vol. 64 ›› Issue (17): 114-122.DOI: 10.13266/j.issn.0252-3116.2020.17.012

• 情报研究 • 上一篇    下一篇

多源信息融合的微博查询似然模型

吴树芳1, 张雄涛2, 朱杰3   

  1. 1 河北大学管理学院 保定 071000;
    2 北京科技大学东凌经济管理学院 北京 100083;
    3 中央司法警官学院信息管理系 保定 071000
  • 收稿日期:2019-12-16 修回日期:2020-05-04 出版日期:2020-09-05 发布日期:2020-09-05
  • 通讯作者: 张雄涛(ORCID:0000-0002-2134-9602),博士研究生,通讯作者,E-mail:zhangxiongtao1@163.com
  • 作者简介:吴树芳(ORCID:0000-0001-6587-812X),教授,博士,博士生导师;朱杰(ORCID:0000-0002-5698-135X),副教授,博士。
  • 基金资助:
    本文系国家社会科学基金项目"网络信息治理视域下社交网络不可信用户识别研究"(项目编号:17BTQ068)研究成果之一。

Microblog Query Likelihood Model Based on Multi-Source Information Fusion

Wu Shufang1, Zhang Xiongtao2, Zhu Jie3   

  1. 1 School of Management, Hebei University, Baoding, 071000;
    2 Dongling School of Economics and Management, University of Science and Technology, Beijing 100083;
    3 Department of Information Management, the Central Institute for Correctional Police, Baoding 071000
  • Received:2019-12-16 Revised:2020-05-04 Online:2020-09-05 Published:2020-09-05

摘要: [目的/意义] 查询似然模型存在零概率问题,融合多源信息对模型进行扩展,不仅可以解决零概率问题,还可以实现对全局信息的差异化处理,降低噪声。[方法/过程] 通过LDA主题挖掘和历史微博兴趣挖掘,分别获取初始微博的主题相关信息和兴趣相关信息,并将二者与全局信息融合,用于改进初始微博的语言模型估计,从而得到扩展的微博查询似然模型。运用网络爬虫工具从新浪微博爬取数据,并通过实证研究验证扩展模型的有效性。[结果/结论] 实验结果表明:与已有的查询似然模型扩展方法相比,新模型具有较好的检索性能。

关键词: 多源信息, 微博检索, 查询似然模型, 主题信息, 作者兴趣

Abstract: [Purpose/significance] Due to the existence of zero probability problem in the query likelihood model, we propose to extend the model by multi-source information fusion, which not only solves zero probability problem, but also achieves the differential processing of global information to reduce the introduction of noise.[Method/process] Topic related information and interest related information were obtained based on LDA topic mining and historical Microblog interest mining respectively, then we integrated them with global information to modify the evaluation of the original Microblog's language model. Finally, an extended microblog query likelihood model is obtained. We used the web crawler tools to crawl data from Sina Weibo to verify the effectiveness of the extended model by empirical study.[Result/conclusion] Experimental results indicate that our model can achieve better retrieval performance.

Key words: multi-source information, microblog retrieval, query likelihood model, topic information, author interest

中图分类号: