情报研究

基于文本特征融合的衍生性网络健康谣言识别模型研究

  • 陈燕方 ,
  • 周晓英
展开
  • 1 中国人民大学图书馆 北京 100872;
    2 中国人民大学信息资源管理学院 北京 100872
陈燕方,馆员,博士。

收稿日期: 2022-12-07

  修回日期: 2023-04-16

  网络出版日期: 2023-07-28

基金资助

本文系中国人民大学公共健康与疾病预防控制文理交叉重大创新平台“中央高校建设世界一流大学(学科)和特色发展引导专项资金”和国家社会科学基金重点项目“全媒体语境下的信息流行病学理论与实践研究”(项目编号:20AZD132)研究成果之一。

Research on Derivative Online Health Rumors Identification Modal Based on Text Feature Fusion

  • Chen Yanfang ,
  • Zhou Xiaoying
Expand
  • 1 Renmin University of China Libraries, Beijing 100872;
    2 School of Information Resource Management, Renmin University of China, Beijing 100872

Received date: 2022-12-07

  Revised date: 2023-04-16

  Online published: 2023-07-28

摘要

[目的/意义] 衍生性网络健康谣言生成门槛低,周期性强,危害影响深远,是网络健康谣言识别与治理中需要优先解决的重点问题之一,也是重要突破口。[方法/过程] 借助深度语义表征和聚合方法,探索衍生性网络健康谣言文本内容的六要素特征;通过结合网络健康谣言的分布式语义特征预训练模型,构建包括六个类别、6287个词汇的网络健康谣言文本内容要素词库;在将健康谣言标题特征、内容文本六要素特征以及主体内容文本特征进行统一的向量空间表示与融合后,构建面向多源文本特征融合的网络健康谣言识别模型。[结果/结论] 模型的实证研究表明:与已有的对照模型相比,本文所提出的文本特征融合模型使衍生性网络健康谣言识别的准确率有较好的提升,且丰富的可拓展健康谣言要素词库可为后续的研究提供较好的资源支持。

本文引用格式

陈燕方 , 周晓英 . 基于文本特征融合的衍生性网络健康谣言识别模型研究[J]. 图书情报工作, 2023 , 67(14) : 73 -84 . DOI: 10.13266/j.issn.0252-3116.2023.14.008

Abstract

[Purpose/Significance] Online derivative health rumors are characterized by low generation thresholds, strong periodicity, and far-reaching consequences. This is one of the key issues that need to be prioritized in the identification and goverance of online health rumors, and it is also an important breakthrough point. [Method/Process] Through the methods of deep semantic representation and aggregation, this paper explored six element features of the derivative text features of online health rumors. At the same time, combined with the distributed semantic features pre-trained model of online health rumors, the thesaurus of content elements of online health rumors (6 categories, 6287 words in total)is obtained. Finally, through the unified vector space representation and fusion of title feature, six element features of health rumors content and main content feature, a online health rumor discrimination model framework based on multi-source text feature fusion was constructed. [Result/Conclusion] The empirical study of the model shows that text feature fusion model proposed in this paper has a significant improvement in the recognition of derivative online health rumors compared with the control model, and the abundant and expandable thesaurus of health rumor elements provides better resource support for subsequent research.

参考文献

[1] 刘鹏飞, 周悦. 食品安全谣言的法律处置[J]. 中国报业, 2015(13):52-53.
[2] 黄淼, 黄佩. 基于知识关联特征的网络内容识别——以健康谣言为重点[J]. 北京邮电大学学报(社会科学版), 2020, 22(01):1-6, 13.
[3] 谭励, 王舸, 周丽娜, 等. 基于多示例学习的食品健康领域长文谣言检测[J]. 计算机工程与设计, 2022, 43(11):3101-3107.
[4] 王世海. 社交媒体健康谣言特征与主要易感人群关联性研究[J]. 记者摇篮, 2022(4):24-26.
[5] 陈昊. 微信中健康类谣言的传播与治理策略[D]. 济南:山东师范大学, 2020.
[6] 奥尔波特, 波斯特曼. 谣言心理学[M]. 刘水平,梁元元, 黄鹂. 译. 沈阳:辽宁教育出版社, 2003:5.
[7] 陈燕方, 李志宇, 梁循, 等. 在线社会网络谣言检测综述[J].计算机学报, 2018, 41(7):1648-1677.
[8] CASTILLO C, MENDOZA M, POBLETE B. Information credibility on Twitter[C]//Proceedings of the 20th international conference on World Wide Web. New York:Association for computing machinery, 2011:675-684.
[9] WU K, YANG S, ZHU K Q. False rumors detection on sina weibo by propagation structures[C]//Proceedings of the 2015 IEEE 31st international conference on data engineering. Piscataway:Institute of electrical and electronics engineers, 2015:651-662.
[10] KWON S, CHA M, JUNG K, et al. Prominent features of rumor propagation in online social media[C]//Proceedings of the 2013 IEEE 13th international conference on data mining. Piscataway:Institute of electrical and electronics engineers, 2013:1103-1108.
[11] MA J, GAO W, MITRA P, et al. Detecting rumors from microblogs with recurrent neural networks[C]//Proceedings of the 25th international joint conference on artificial intelligence. Menlo Park:Association for the advancement of artificial intelligence press, 2016:3818-3824.
[12] KALIYAR R K, GOSWAMI A, NARANG P, et al. Fndnet-a deep convolutional neural network for fake news detection[J]. Cognitive systems research, 2020, 61(6):32-44.
[13] 汪建梅, 彭云, 余晨钰. 融合时间序列与卷积神经网络的网络谣言检测[J]. 小型微型计算机系统, 2022, 43(5):1020-1026.
[14] 伊静. 面向在线社交媒体的谣言识别与传播分析研究[D]. 济南:山东师范大学, 2021.
[15] 朱梦蝶, 付少雄, 郑德俊, 等. 文献视角下的社交媒体健康谣言研究:特征、传播与治理[J]. 图书情报知识, 2022, 39(5):131-143.
[16] SICILIA R, LO GIUDICE S, PEI Y, et al. Twitter rumor detection in the health domain[J]. Expert systems with applications, 2018, 110:33-40.
[17] 许诺, 赵薇, 尚柯源, 等. 基于预训练语言模型的健康谣言检测[J]. 系统科学与数学, 2022, 42(10):2582-2589.
[18] 张帅. 社交媒体虚假健康信息特征识别[J]. 图书情报工作, 2021, 65(9):70-78.
[19] 李月琳, 张秀, 王姗姗. 社交媒体健康信息质量研究:基于真伪健康信息特征的分析[J]. 情报学报, 2018, 37(3):294-304.
[20] 石锴文, 刘勘. 突发公共卫生事件中微博谣言的识别[J]. 图书情报工作, 2021, 65(13):87-95.
[21] 於张闲, 冒宇清, 胡孔法. 基于深度学习的虚假健康信息识别[J]. 软件导刊, 2020, 19(3):16-20.
[22] LIU Y, YU K, WU X, et al. Analysis and detection of healthrelated misinformation on Chinese social media[J]. IEEE access, 2019(7):154480-154489.
[23] SAFARNEJAD L, XU Q, GE Y, et al. A multiple feature category data mining and machine learning approach to characterize and detect health misinformation on social media[J]. IEEE Internet computing, 2021, 25(5):43-51.
[24] GHENAI A, MEJOVA Y. Fake cures:user-centric modeling of health misinformation in social media[J]. Proceedings of the ACM on human-computer interaction, 2018, 2(9):1-20.
[25] GF A, FI A, IMDD A, et al. Experts perception-based system to detect misinformation in health Websites[J]. Pattern recognition letters, 2021, 152(12):333-339.
[26] ZHAO Y, J DA, J YAN. Detecting health misinformation in online health communities:incorporating behavioral features into machine learning based approaches[J]. Information processing & management, 2021, 58(1):102390.
[27] SAEED F, YAFOOZ W, AL-SAREM M, et al. Detecting healthrelated rumors on Twitter using machine learning methods[J]. International journal of advanced computer science and applications, 2020, 11(8):324-332.
[28] 陆恒杨, 范晨悠, 吴小俊. 面向网络社交媒体的少样本新冠谣言检测[J]. 中文信息学报, 2022, 36(1):135-144, 172.
[29] 赵月华, 朱思成, 苏新宁. 面向网络虚假医疗信息的识别模型构建研究——一种基于预训练的BERT模型[J]. 情报科学, 2021, 39(12):165-173.
[30] 爱微帮. 谣言过滤器[EB/OL].[2023-06-14]. http://data.aiweibang.com/user/search?kw=%E8%B0%A3%E8%A8%80%E8%BF%87%E6%BB%A4%E5%99%A8.
[31] RICHARDSON L. Beautiful soup[EB/OL].[2023-06-14]. https://www.crummy.com/software/BeautifulSoup/.
[32] DOUBLECLICK. MongoDB[EB/OL].[2023-06-14]. https://www.mongodb.com/.
[33] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning internal representations by error propagation[R]. Cambridge:Massachusetts Institute of Technology Press, 1985:318-362.
[34] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL].[2023-06-14]. https://arxiv.org/pdf/1301.3781.pdf.
[35] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of 31st international conference on machine learning. Red Hook:Curran Associates, 2014:2931- 2939.
[36] 珠海健康云科技有限公司. 健康百科[EB/OL].[2023-06-14]. https://baike.120ask.com/.
[37] KIM Y. Convolutional neural networks for sentence classification[EB/OL].[2023-06-14]. https://arxiv.org/pdf/1408.5882.pdf.
[38] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[OL].[2023-06-14]. https://www.semanticscholar.org/paper/Bag-of-Tricks-for-Efficient-TextClassification-Joulin-Grave/892e53fe5cd39f037cb2a961499f42f3002595dd.
[39] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of 31st conference on neural information processing systems. Red Hook:Curran Associates, 2017:5998-6006.
文章导航

/