图书情报工作 ›› 2022, Vol. 66 ›› Issue (16): 36-47.DOI: 10.13266/j.issn.0252-3116.2022.16.004

• 理论研究 • 上一篇    下一篇

融合XGBoost与SHAP的政务新媒体公共价值共识可解释性模型——以“今日头条”十大市级政务号为例

易明1,2, 姚玉佳1, 胡敏1   

  1. 1. 华中师范大学信息管理学院 武汉 430079;
    2. 华中师范大学中国图书馆创新发展研究中心 武汉 430079
  • 收稿日期:2022-03-07 修回日期:2022-06-20 出版日期:2022-08-20 发布日期:2022-08-19
  • 通讯作者: 姚玉佳,硕士研究生,通信作者,E-mail:931455035@qq.com
  • 作者简介:易明,教授,博士生导师;胡敏,博士研究生。
  • 基金资助:
    本文系国家社会科学基金重点项目“在线健康社区知识共创机理及引导机制研究”(项目编号:21ATQ006)和华中师范大学2022年度基本科研业务费(自然科学类)优秀青年团队项目“信息交互行为与隐私保护研究”(项目编号:CCNU22QN017)研究成果之一。

An Interpretable Model for New Government Media Public Value Consensus Integrating XGBoost and SHAP:Taking the Top 10 Municipal Government Accounts of the Jinri Toutiao as an Example

Yi Ming1,2, Yao Yujia1, Hu Min1   

  1. 1. School of Information Management, Central China Normal University, Wuhan 430079;
    2. China Library Innovation and Development Research Center, Central China Normal University, Wuhan 430079
  • Received:2022-03-07 Revised:2022-06-20 Online:2022-08-20 Published:2022-08-19

摘要: [目的/意义]为准确识别影响公共价值共识的重要因素及其作用方式,提升政务新媒体广泛凝聚共识的能力和水平,提出一种融合XGBoost与SHAP的政务新媒体公共价值可解释性模型。[方法/过程]以"今日头条"下500篇政务头条号文章及32 185条评论为研究对象。首先,识别文章的公共价值共识,并从内容、形式、情感3个维度提取文章特征变量,将预处理后的数据作为模型的输入。其次,构建基于XGBoost的政务新媒体公共价值共识预测模型,并与LR、SVM、LGBM等其他主流机器学习算法进行实验对比,找到综合最优模型。最后,引入SHAP解释框架,对各特征变量的重要性进行量化和归因。[结果/结论]结果发现,XGBoost模型在准确率、召回率、F1-score、AUC 4项性能指标上均优于对比模型,性能优异。此外,文章主题类型、公共价值类型、文章长度、内容形式、文章情感属性、标题情绪符数量是影响政务头条号文章共识的重要特征,它们对公共价值共识的影响方式、影响方向和影响力度各有差异。

关键词: 公共价值共识, 政务头条号, XGBoost, SHAP, 可解释性

Abstract: [Purpose/Significance] In order to identify the important factors and modes of action affecting public value consensus accurately and improve the ability and level of new government media to gather consensus, this paper proposes an interpretability model of public value of new government media that integrates XGBoost and SHAP. [Method/Process] The research objects are 500 government headline articles and 32,185 comments under "Jinri Toutiao". First, identify the public value consensus of the article, and then extract the article feature variables from the three dimensions of contents, forms, and emotions, and used the preprocessed data as the input of the model. Secondly, based on XGBoost to build a consensus prediction model for public value of government new media, and compare with other mainstream machine learning algorithms such as LR, SVM, LGBM, etc., it was to find the synthesized optimal model. Finally, the SHAP interpretation framework was introduced to quantify and attribute the importance of each feature variable. [Result/Conclusion] The results show that the XGBoost model is superior to the comparison model in the four performance indicators of accuracy, recall, F1-score, and AUC, with excellent performance. In addition, the study finds that the article topic type, public value type, article length, content form, article sentiment attribute, and the number of title emojis are important characteristics that affect the consensus of government headlines. There are differences in the way, direction and strength of their influence on the consensus of public value.

Key words: public value consensus, government headlines, XGBoost, SHAP, interpretability

中图分类号: