Sentiment Classification for Micro-Blogs Based on Word Embedding

  • Liu Kan ,
  • Yuan Yunying
Expand
  • School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430074

Received date: 2017-12-24

  Revised date: 2018-04-22

  Online published: 2018-08-05

Abstract

[Purpose/significance] Weibo has become an important platform for public emotional expression. Weibo's sentiment analysis plays an important role in public opinion analysis, user experience, and business opportunities. [Method/process] The sentiment orientation model named WE_SDAE proposed by this paper uses word embedding to transform a weibo into a dense low-dimensional vector and optimizes the simple auto-encoder into a deep denoise auto-encoder by appending a regularization term in the equation and adding noise during data pre-processing. Besides, the top-level classifier does the final sentimental classification. Considering the flexible term usage in the weibo, the sentiment orientation model is trained on character level and word level respectively. [Result/conclusion] The experimental results show that character-level model beats word-level model. In addition, comparative experiments show that WE_SDAE is better than traditional classifier SVM, Naive-Bayes, XgBoost, etc., and word embedding data preprocessing is better than traditional vector space model representation.

Cite this article

Liu Kan , Yuan Yunying . Sentiment Classification for Micro-Blogs Based on Word Embedding[J]. Library and Information Service, 2018 , 62(15) : 92 -101 . DOI: 10.13266/j.issn.0252-3116.2018.15.011

References

[1] TUMEY P D. Thumbs up or thumbs down?:semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of annual meeting of the Association for Computational Linguistics. Stroudsburg PA:The Association for Computer Linguistics, 2002:417-424.
[2] 任远, 巢文涵, 周庆,等. 基于话题自适应的中文微博情感分析[J]. 计算机科学, 2013, 40(11):231-235.
[3] BARBOSA L, FENG J. Robust sentiment detection on Twitter from biased and noisy data[C]//Proceedings of 23rd international conference on computational linguistics. Cambridge:MIT Press, 2010:36-44.
[4] 庞磊,李寿山,周国栋. 基于情绪知识的中文微博情感分类方法[J]. 计算机工程, 2012, 38(13):156-158.
[5] 潘明慧, 牛耘. 基于多线索混合词典的微博情绪识别[J]. 计算机技术与发展, 2014(9):28-32.
[6] 刘全超, 黄河燕, 冯冲. 基于多特征微博话题情感倾向性判定算法研究[J]. 中文信息学报, 2014, 28(4):123-131.
[7] BAKLIWAL A, FOSTER J, VAN DER PUIL J, et al. Sentiment analysis of political Tweets:towards an accurate classifier[C]//Proceedings of NAACL Workshop on language analysis in social media. Stroudsburg PA:The Association for Computer Linguistics, 2013:49-58.
[8] JOHAN B, ALBERTO P, HUINA M. Modeling public mood and emotion:Twitter sentiment and socio-economic phenomena[C]//Proceedings of 5th AAAI international conference on Weblogs and social media. Menlo Park, California:The AAAI Press, 2011:450-453.
[9] TAN C, LEE L, TANG J, et al. User-level sentiment analysis incorporating social networks[C]//Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2011:1397-1405.
[10] 刘志明, 刘鲁. 基于机器学习的中文微博情感分类实证研究[J]. 计算机工程与应用, 2012, 48(1):1-4.
[11] 朱玺, 董喜双, 关毅,等. 基于半监督学习的微博情感倾向性分析[J]. 山东大学学报:理学版, 2014, 49(11):37-42.
[12] 孙建旺, 吕学强, 张雷瀚. 基于词典与机器学习的中文微博情感分析研究[J]. 计算机应用与软件, 2014, 31(7):177-181.
[13] LIU N, ZHANG B, YAN J, et al. Text representation:from vector to tensor[C]//Proceedings of IEEE international conference on data mining. New Jersey:IEEE Press, 2005:725-728.
[14] KIM K, LEE J. Sentiment visualization and classification via semi-supervised nonlinear dimensionality reduction[J]. Pattern recognition, 2014, 47(2):758-768.
[15] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//Proceedings of international conference on learning representations. New York:ACM, 2013:1301-1309.
[16] MILOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 27th annual conference on neural information processing systems. Cambridge:MIT Press, 2013:3111-3119.
[17] MILOLOV T, YIH W, ZWEIG G. Linguistic regularities in continuous space word representations[C]//Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics:human language technologies. Stroudsburg:The Association for Computer Linguistics, 2013:746-751.
[18] ZHENG X, CHEN H, XU T. Deep learning for Chinese word segmentation and POS tagging[C]//Proceedings of the 2013 conference on Empirical methods in natural language processing. Stroudsburg:The Association for Computer Linguistics, 2013:647-657.
[19] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th international conference on machine learning. New York:ACM, 2008:1096-1103.
[20] VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders:learning useful representations in a deep network with a local denoising criterion[J]. Journal of machine learning research, 2015, 11(6):3371-3408.
[21] 桂斌, 杨小平, 朱建林,等. 基于意群划分的中文微博情感倾向分析研究[J]. 中文信息学报, 2015, 29(3):100-105.
[22] HASSAN S, HE Y, HARITH A. Semantic sentiment analysis of twitter[C]//Proceedings of the 11th international conference on the semantic Web. Berlin:Springer, 2012:508-524.
[23] PAK A, PAROUBEK P. Twitter as a corpus for sentiment analysis and opinion mining[C]//Seventh conference on international language resources & evaluation. Paris:European Language Resources Association, 2010:1320-1326.
[24] SVETRIK V, LIAW A, TONG C, et al. Random forest:a classification and regression tool for compound classification and QSAR modeling[J]. Journal of chemical information & computer sciences, 2003, 43(6):1947-1958.
[25] BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks[C]//Proceedings of the 21th annual conference on neural information processing systems. Cambridge:MIT Press, 2007:153-160.
Outlines

/