KNOWLEDGE ORGANIZATION

Network Sensitive Information Discovery and Empirical Research Based on Punishing BiGRU Model by Fusion Weight

  • Wu Shufang ,
  • Yang Qiang ,
  • Zhu Jie
Expand
  • 1 School of Management, Hebei University, Baoding 071000;
    2 School of Mathematics and Information Science, Hebei University, Baoding 071000

Received date: 2023-11-30

  Revised date: 2024-04-14

  Online published: 2024-07-09

Supported by

This work is supported by the General Program of the 13th Five-Year Plan of National Education Science titled “Research on the Path Optimization of Higher Education Promoting Regional Innovation-driven Development Strategy” (Grant No. BIA200203).

Abstract

[Purpose/Significance] The discovery of network sensitive information is of great significance for purifying cyberspace and maintaining social stability. The current research on network sensitive information discovery ignores the long-distance contextual semantics, which leads to poor discovery performance. This paper proposes a network sensitive information discovery method based on punishing BiGRU model by fusion weight of sensitive terms. [Method/Process] Firstly, statistical weight, category weight and sentiment weight of sensitive terms are obtained, and the three are fused to obtain the fusion weight of sensitive terms. Secondly, the weighted loss function of sensitive terms is constructed by using the fusion weight for punishing misidentification of the text containing sensitive terms on BiGRU model. Finally, the discovery of network sensitive information is realized based on the punished BiGRU model. [Result/Conclusion] Empirical results on a real dataset from Sina Weibo indicate that compared to the existing methods, the proposed method has a certain improvement in Precision, Recall and F1 value.

Cite this article

Wu Shufang , Yang Qiang , Zhu Jie . Network Sensitive Information Discovery and Empirical Research Based on Punishing BiGRU Model by Fusion Weight[J]. Library and Information Service, 2024 , 68(13) : 144 -153 . DOI: 10.13266/j.issn.0252-3116.2024.13.013

References

[1] 中国互联网络信息中心. 第53次《中国互联网络发展状况 统计 报告》 [EB/OL]. [2024-03-22]. https://www.cnnic.net.cn/n4/2024/0321/c208-10962.html. (China Internet Network Information Center. The 53rd statistical report on China’s internet development[EB/OL]. [2024-05-20]. https://www.cnnic.net.cn/n4/2024/0321/c208-10962.html.)
[2] 黄炜, 童青云, 李岳峰. 基于广度学习的异构社交网络敏感实体识别模型研究[J]. 情报学报, 2020, 39(6): 579-588. (HUANG W, TONG Q Y, LI Y F. Research on terror-related sensitive entity recognition model of a heterogeneous social network based on broad learning[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(6): 579-588.)
[3] 梁怀新, 宋诚. AIGC时代的网络信息内容生态安全风险及其治理——兼以ChatGPT为对象的实验访谈案例分析[J]. 图书情报工作, 2023, 67(20): 58-69. (LIANG H X, SONG C. Ecological security risk and governance of network information content in AIGC era: take ChatGPT as an example[J]. Library and information service, 2023, 67(20): 58-69.)
[4] 王浩. 基于半监督学习的网络敏感信息识别[D]. 天津: 天津大学, 2012. (WANG H. Internet sensitive information identification based on semi-supervised learning[D]. Tianjin: Tianjin University, 2012.)
[5] LIN Q, MAO R, LIU J, et al. Fusing topology contexts and logical rules in language models for knowledge graph completion[J]. Information fusion, 2023, 90: 253-264.
[6] 李佩琪, 王昊, 任秋彤, 等. 融合结构特性的语义增强式古籍句读识别方法研究[J]. 情报学报, 2023, 42(2): 150-163. (LI P Q, WANG H, REN Q T, et al. Study of antiquarian punctuation recognition methods incorporating semantic enhancement with structural properties[J]. Journal of the China Society for Scientific and Technical Information, 2023, 42(2): 150-163.)
[7] 高浩鑫, 孙利娟, 吴京宸, 等. 基于异构图卷积网络的网络社区敏感文本分类模型[J]. 数据分析与知识发现, 2023, 7(11): 26-36. (GAO H X, SUN L J, WU J C, et al. Web community sensitive text classification model based on heterogeneous graph convolution network[J]. Data analysis and knowledge discovery, 2023, 7(11): 26-36.)
[8] FU Y, YU Y, WU X. A sensitive word detection method based on variants recognition[C]//2019 international conference on Machine learning, big data and business intelligence. Taiyuan: IEEE, 2019: 47-52.
[9] 李瀛, 王冠楠. 网络新闻敏感信息识别与风险分级方法研究[J]. 情报理论与实践, 2022, 45(4): 105-112. (LI Y, WANG G N. Research on identification and risk grading method of network news sensitive information[J]. Information studies: theory & application, 2022, 45(4): 105-112.)
[10] LIN H, JIANG J. Research on intelligent perception algorithm for sensitive information[J]. Applied sciences, 2023, 13(6): 3383.
[11] XU Y, JIAO Y, CHEN S, et al. Research on detection method of unhealthy message in social network[C]//International conference on artificial intelligence and security. Cham: Springer, 2019: 497-508.
[12] 陈祖琴, 蒋勋, 葛继科. 基于网络舆情敏感信息的突发事件情景分析[J]. 现代情报, 2021, 41(5): 25-32. (CHEN Z Q, JIANG X, GE J K. Emergency scenario analysis based on sensitive information of online public opinion[J]. Modern information, 2021, 41(5): 25-32.)
[13] XU Y, LI Y, ZHANG Z. Sensitive text classification and detection method based on sentiment analysis[J]. International core journal of engineering, 2021, 7(5): 60-66.
[14] ZHANG X, GHORBANI A A. An overview of online fake news: characterization, detection, and discussion[J]. Information processing & management, 2020, 57(2): 102025.
[15] CONG K, LI T, LI B, et al. KGDetector: detecting Chinese sensitive information via knowledge graph-enhanced BERT[J]. Security and communication networks, 2022, 2022: 4656837.
[16] 吴树芳, 杨强, 侯晓舟, 等. 基于SSI-GuidedLDA模型的引导式网络敏感信息识别研究[J]. 情报杂志, 2023, 42(11): 119-125. (WU S F, YANG Q, HOU X Z, et al. Research on guided network sensitive information identification based on SSI-GuidedLDA model[J]. Journal of information, 2023, 42(11): 119-125.)
[17] 高旭, 白如江, 王效岳. 面向“ 卡脖子” 技术场景的科技前沿发现与态势演化研究——以集成电路技术为例[J]. 图书情报工作, 2023, 67(4): 40-54. (GAO X, BAI R J, WANG X Y. Frontier discovery of science and technology and research on situation evolution for “neck stuck” technical scenario: taking integrated circuit technology as an example[J]. Library and information service, 2023, 67(4): 40-54.)
[18] 吴树芳, 尹凯. 基于敏感语义和复合共现的网络敏感词典构建研究[J]. 情报科学, 2023, 41(10): 12-20, 39. (WU S F, YIN K. Research on the construction of network sensitive dictionary based on sensitive semantic and compound co-occurrence[J]. Information science, 2023, 41(10): 12-20, 39.)
[19] ROZADO D, AL-GHARBI M, HALBERSTADT J. Prevalence of prejudice-denoting words in news media discourse: a chronological analysis[J]. Social science computer review, 2023, 41(1): 99-122.
[20] 孙瑞英, 李杰茹. 我国政府数据开放平台个人隐私保护政策评价研究[J]. 图书情报工作, 2022, 66(12): 3-16. (SUN R Y, LI J R. Research on the evaluation of personal privacy protection policies of government data open platforms in China[J]. Library and information service, 2022, 66(12): 3-16.)
[21] 韦景竹, 操慧子, 张乐乐. 基于在线评论的公共文化云活动用户需求研究[J]. 图书情报工作, 2022, 66(9): 66-73. (WEI J Z, CAO H Z, ZHANG L L. Research on users’ needs of public culture cloud activities based on online comments[J]. Library and information service, 2022, 66(9): 66-73.)
[22] 高靖超, 彭丽徽, 张艳丰, 等. 在线医疗社区健康焦虑用户画像模型构建及实证研究[J]. 图书情报工作, 2023, 67(16): 124-134. (GAO J C, PENG L H, ZHANG Y F, et al. Construction and empirical research of health anxiety user portrait model in online medical community[J]. Library and information service, 2023, 67(16): 124-134.)
[23] CHO K, VAN MERRIËNBOER B, BAHDANAU D, et al. On the properties of neural machine translation: encoder-decoder approaches[J]. arXiv preprint arXiv: 1409. 1259, 2014.
[24] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. arXiv preprint arXiv: 1412.3555, 2014.
[25] 缪亚林, 姬怡纯, 张顺, 等. CNN-BiGRU模型在中文短文本情感分析的应用[J]. 情报科学, 2021, 39(4): 85-91. (MIAO Y L, JI Y C, ZHANG S, et al. Application of CNN-BiGRU model in Chinese short text sentiment analysis[J]. Information science, 2021, 39(4): 85-91.)
[26] BECKER J A, GUILBEAULT D, SMITH E B. The crowd classification problem: social dynamics of binary-choice accuracy[J]. Management science, 2022, 68(5): 3949-3965.
[27] MOHAMMAD A L S, HAMMAD M M, SA’AD A, et al. Gated recurrent unit with multilingual universal sentence encoder for arabic aspect-based sentiment analysis[J]. Knowledge-based systems, 2023, 261: 107540.
[28] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[29] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE transactions on signal processing, 1997, 45(11): 2673-2681.
[30] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
Outlines

/