图书情报工作 ›› 2023, Vol. 67 ›› Issue (14): 73-84.DOI: 10.13266/j.issn.0252-3116.2023.14.008

• 情报研究 • 上一篇    下一篇

基于文本特征融合的衍生性网络健康谣言识别模型研究

陈燕方1, 周晓英2   

  1. 1 中国人民大学图书馆 北京 100872;
    2 中国人民大学信息资源管理学院 北京 100872
  • 收稿日期:2022-12-07 修回日期:2023-04-16 出版日期:2023-07-20 发布日期:2023-07-28
  • 通讯作者: 周晓英,教授,博士生导师,通信作者,E-mail:xyz-ruc@qq.com。
  • 作者简介:陈燕方,馆员,博士。
  • 基金资助:
    本文系中国人民大学公共健康与疾病预防控制文理交叉重大创新平台“中央高校建设世界一流大学(学科)和特色发展引导专项资金”和国家社会科学基金重点项目“全媒体语境下的信息流行病学理论与实践研究”(项目编号:20AZD132)研究成果之一。

Research on Derivative Online Health Rumors Identification Modal Based on Text Feature Fusion

Chen Yanfang1, Zhou Xiaoying2   

  1. 1 Renmin University of China Libraries, Beijing 100872;
    2 School of Information Resource Management, Renmin University of China, Beijing 100872
  • Received:2022-12-07 Revised:2023-04-16 Online:2023-07-20 Published:2023-07-28

摘要: [目的/意义] 衍生性网络健康谣言生成门槛低,周期性强,危害影响深远,是网络健康谣言识别与治理中需要优先解决的重点问题之一,也是重要突破口。[方法/过程] 借助深度语义表征和聚合方法,探索衍生性网络健康谣言文本内容的六要素特征;通过结合网络健康谣言的分布式语义特征预训练模型,构建包括六个类别、6287个词汇的网络健康谣言文本内容要素词库;在将健康谣言标题特征、内容文本六要素特征以及主体内容文本特征进行统一的向量空间表示与融合后,构建面向多源文本特征融合的网络健康谣言识别模型。[结果/结论] 模型的实证研究表明:与已有的对照模型相比,本文所提出的文本特征融合模型使衍生性网络健康谣言识别的准确率有较好的提升,且丰富的可拓展健康谣言要素词库可为后续的研究提供较好的资源支持。

关键词: 网络健康谣言, 健康谣言识别, 文本特征, 文本挖掘

Abstract: [Purpose/Significance] Online derivative health rumors are characterized by low generation thresholds, strong periodicity, and far-reaching consequences. This is one of the key issues that need to be prioritized in the identification and goverance of online health rumors, and it is also an important breakthrough point. [Method/Process] Through the methods of deep semantic representation and aggregation, this paper explored six element features of the derivative text features of online health rumors. At the same time, combined with the distributed semantic features pre-trained model of online health rumors, the thesaurus of content elements of online health rumors (6 categories, 6287 words in total)is obtained. Finally, through the unified vector space representation and fusion of title feature, six element features of health rumors content and main content feature, a online health rumor discrimination model framework based on multi-source text feature fusion was constructed. [Result/Conclusion] The empirical study of the model shows that text feature fusion model proposed in this paper has a significant improvement in the recognition of derivative online health rumors compared with the control model, and the abundant and expandable thesaurus of health rumor elements provides better resource support for subsequent research.

Key words: online health rumors, health rumor detection, text features, text mining

中图分类号: