图书情报工作 ›› 2018, Vol. 62 ›› Issue (15): 92-101.DOI: 10.13266/j.issn.0252-3116.2018.15.011

• 知识组织 • 上一篇    下一篇

基于词向量的微博情感倾向分类研究

刘勘, 袁蕴英   

  1. 中南财经政法大学信息与安全工程学院 武汉 430074
  • 收稿日期:2017-12-24 修回日期:2018-04-22 出版日期:2018-08-05 发布日期:2018-08-05
  • 作者简介:刘勘(ORCID:0000-0002-9686-9768),教授,博士,E-mail:liukan@zuel.edu.cn;袁蕴英(ORCID:0000-0003-1713-1624),硕士研究生。
  • 基金资助:
    本文系国家社会科学基金项目"基于文本挖掘的网络谣言预判研究"(项目编号:14BXW033)研究成果之一。

Sentiment Classification for Micro-Blogs Based on Word Embedding

Liu Kan, Yuan Yunying   

  1. School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430074
  • Received:2017-12-24 Revised:2018-04-22 Online:2018-08-05 Published:2018-08-05

摘要: [目的/意义]微博已成为大众情感表达的重要平台,微博的情感分析在舆情分析、用户体验、商机挖掘等方面有着重要的作用。[方法/过程]提出的情感倾向分类算法WE_SDAE使用单词嵌入的方式将微博表示成一个低维稠密向量,然后通过添加正则项和加噪处理的方式将基本的自动编码器算法优化成深层噪音自动编码器,并在顶层添加分类器,实现情感倾向分类。考虑到微博用词灵活,还从单字和词语两个粒度训练模型。[结果/结论]实验结果表明,基于单字粒度的模型表现优于基于词语粒度的模型。此外,对比实验显示WE_SDAE算法优于传统的SVM、Naive-Bayes、XgBoost等相关算法;单词嵌入的方式优于传统的向量空间模型表示方法,能在微博情感分析中取得较好的效果。

关键词: 情感分析, 分类, 自动编码器, 微博

Abstract: [Purpose/significance] Weibo has become an important platform for public emotional expression. Weibo's sentiment analysis plays an important role in public opinion analysis, user experience, and business opportunities. [Method/process] The sentiment orientation model named WE_SDAE proposed by this paper uses word embedding to transform a weibo into a dense low-dimensional vector and optimizes the simple auto-encoder into a deep denoise auto-encoder by appending a regularization term in the equation and adding noise during data pre-processing. Besides, the top-level classifier does the final sentimental classification. Considering the flexible term usage in the weibo, the sentiment orientation model is trained on character level and word level respectively. [Result/conclusion] The experimental results show that character-level model beats word-level model. In addition, comparative experiments show that WE_SDAE is better than traditional classifier SVM, Naive-Bayes, XgBoost, etc., and word embedding data preprocessing is better than traditional vector space model representation.

Key words: sentiment analysis, classification, auto-encoder, Weibo

中图分类号: