图书情报工作 ›› 2021, Vol. 65 ›› Issue (13): 96-107.DOI: 10.13266/j.issn.0252-3116.2021.13.010

• 知识组织 • 上一篇    下一篇

文本增强与预训练语言模型在网络问政留言分类中的集成对比研究

施国良, 陈宇奇   

  1. 河海大学商学院 南京 211100
  • 收稿日期:2020-10-07 修回日期:2021-03-19 出版日期:2021-07-05 发布日期:2021-07-10
  • 作者简介:施国良(ORCID:0000-0001-7585-640X),副教授,博士,E-mail:shigl@hhu.edu.cn;陈宇奇(ORCID:0000-0001-5755-5208),硕士研究生。
  • 基金资助:
    本文系中央高校基本业务费项目"基于图数据库的水利知识图谱关键技术研究"(项目编号:B200207036)研究成果之一。

A Comparative Study on the Integration of Text Enhanced and Pre-trained Language Models in the Classification of Internet Political Messages

Shi Guoliang, Chen Yuqi   

  1. Business School, Hohai University, Nanjing 211100
  • Received:2020-10-07 Revised:2021-03-19 Online:2021-07-05 Published:2021-07-10

摘要: [目的/意义] 政府网络问政平台是政府部门知晓民意的重要途径之一,为提高问政留言分类的精度以及处理留言数据质量差、数量少等问题,对比多种基于BERT改进模型与文本增强技术结合的分类效果并探究其差异原因。[方法/过程] 设计网络问政留言分类集成对比模型,文本增强方面采用EDA技术与SimBERT文本增强技术进行对比实验,文本分类模型方面则采用多种基于BERT改进的预训练语言模型(如ALBERT、RoBERTa)进行对比实验。[结果/结论] 实验结果表明,基于RoBERTa与SimBERT文本增强的文本分类模型效果最佳,在测试集上的F1值高达92.05%,相比于未进行文本增强的BERT-base模型高出2.89%。同时,SimBERT文本增强后F1值相比未增强前平均提高0.61%。实验证明了基于RoBERTa与SimBERT文本增强模型能够有效提升多类别文本分类的效果,在解决同类问题时具有较强可借鉴性。

关键词: 问政平台, 文本分类, 文本增强, BERT模型

Abstract: [Purpose/significance] Government network platform for political inquiry is one of the important ways for rulers to know public opinions. In order to improve the accuracy of the classification of political inquiry messages and to deal with the problems such as poor quality and small quantity of message data, the classification effects of various BERT improved models combined with text enhancement technology and the reasons for their differences were explored.[Method/process] Design the network political inquiry message classification integrated comparison model,the EDA (Easier Data Augment) technology and SimBERT text Augment technology were used for comparison experiment in the aspect of text augmentation, and various pre-training language models (such as ALBERT and RoBERTa) based on BERT improvement were used for comparison experiment in the aspect of text classification model.[Result/conclusion] The experimental results showed that the text classification model based on RoBERTa and SimBERT text enhancement had the best effect, and the F1 value on the test set was as high as 92.05%, 2.89% higher than that of the Bert-Base model without text enhancement. At the same time, F1 value after SimBERT text enhancement was 0.61% higher than that before no enhancement. The experiment proved that text enhancement model based on RoBERTa and SimBERT can effectively improve the classification effect of multiple categories of text classification problems, and has strong referability in solving similar problems.

Key words: political platform, text classification, text enhancement, BERT model

中图分类号: