情报研究

“一带一路”倡议下的Twitter文本主题挖掘和情感分析

  • 赵常煜 ,
  • 吴亚平 ,
  • 王继民
展开
  • 1. 北京大学信息管理系 北京 100871;
    2. 北京大学图书馆 北京 100871
赵常煜(ORCID:0000-0001-6780-1070),硕士研究生;吴亚平(ORCID:0000-0002-4242-2434),馆员。

收稿日期: 2018-12-11

  修回日期: 2019-03-21

  网络出版日期: 2019-10-05

基金资助

本文系国家社会科学基金项目"‘一带一路’沿线国家互联互通水平综合评价研究"(项目编号:16BTQ057)研究成果之一。

Twitter Text Topic Mining and Sentiment Analysis Under the Belt and Road Initiative

  • Zhao Changyu ,
  • Wu Yaping ,
  • Wang Jimin
Expand
  • 1. Department of Information Management, Peking University, Beijing 100871;
    2. Peking University Library, Beijing 100871

Received date: 2018-12-11

  Revised date: 2019-03-21

  Online published: 2019-10-05

Supported by

 

摘要

[目的/意义] "一带一路"倡议的提出引起了国内外广泛的关注,众多国家的用户在最具代表性的社交媒体Twitter中表达观点、发表评论、相互讨论。从推文中挖掘得出世界对"一带一路"的讨论主题和情感倾向,有助于为政府机构优化宣传策略,增加"一带一路"倡议的曝光度、关注度提供参考。[方法/过程] 采集2017年与"一带一路"相关的6万余条推文,分别按照中文和英文进行数据预处理、数据描述、主题挖掘、情感分析,并实现主题和情感的交叉分析,得出结论。[结果/结论] 2017年的推文主题主要围绕5月份的"一带一路"高峰论坛。其中,中文推文更关注高峰论坛的筹划和实施,以及安全问题、领导层的访问等方面的内容,情感值的波动较大,特别是安全问题上的消极情绪波动很大。英文推文则更关注举办高峰论坛的事实以及论坛所带来的经济效应,情感波动较小,经济方面的情感值是积极占比明显高于消极和中立的情感值。

本文引用格式

赵常煜 , 吴亚平 , 王继民 . “一带一路”倡议下的Twitter文本主题挖掘和情感分析[J]. 图书情报工作, 2019 , 63(19) : 119 -127 . DOI: 10.13266/j.issn.0252-3116.2019.19.012

Abstract

[Purpose/significance] The Belt and Road Initiative has attracted widespread attention around the world, and users in many countries have expressed their opinions, comments and discussed with each other on twitter, the most representative social media. The discussion topic and emotional tendency of "the Belt And Road" in the world extracted from the tweets will be helpful for the government to optimize their propaganda strategies and increase the exposure and attention of the Belt and Road Initiative.[Method/process] This paper collected more than 60 000 tweets related to the Belt and Road Initiative in 2017, and respectively carried out data preprocessing, data description, topic mining, and sentiment analysis in Chinese and English, and realized cross-analysis of topics and emotions to draw conclusions.[Result/conclusion] The tweet theme in 2017 is mainly around the "Belt and Road Forum for International Cooperation". Chinese tweets pay more attention to the planning and implementation of the forum, as well as security issues, visits by the leadership, etc. The emotional value fluctuates greatly, especially the negative emotions on security issues. English tweets are more concerned with the facts of holding the summit forum and the economic effects brought by the forum. The emotional fluctuations are small, and the emotional value of the economic aspect is that the positive proportion is significantly higher than the negative and neutral emotional values.

参考文献

[1] 黄炎秋. 建构主义国际关系视域下"一带一路"对非洲传播策略研究[D].武汉:华中师范大学, 2017.
[2] 朱桂生,黄建滨.美国主流媒体视野中的中国"一带一路"战略——基于《华盛顿邮报》相关报道的批评性话语分析[J].新闻界,2016(17):58-64.
[3] 龚言浩,甄峰,席广亮."一带一路"倡议关注与响应的空间格局——基于新浪微博数据的分析[J].地域研究与开发,2018,37(2):29-35.
[4] 贾爽. "一带一路":Twitter网络舆情分析与对策建议[D].南京:南京大学,2016.
[5] HOFMANN T. Probabilistic latent semantic indexing[J]. Sigir forum,2017, 51(2):211-218.
[6] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003, 3(1):993-1022.
[7] 陈晓美,高铖,关心惠. 网络舆情观点提取的LDA主题模型方法[J]. 图书情报工作,2015, 59(21):21-26.
[8] MICHELSON M, MACSKASSY S A. Discovering users' topics of interest on twitter:a first look[C]//Workshop on analytics for noisy unstructured text data. New York:ACM, 2010.
[9] YOOSUN H, HONGJIN S. Opinion leadership on twitter and twitter use-motivations and patterns of twitter use and case study of opinion leaders on twitter[J]. Korean journal of broadcasting and telecommunication studies, 2010, 24(6):365-404.
[10] HU Y, JOHN A, SELIGMANN D D. Event analytics via social media[C]//ACM workshop on social and behavioural networked media access. New York:ACM, 2011:39-44.
[11] MEI Q, ZHAI C X. A mixture model for contextual text mining[C]//Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2006:649-655.
[12] MOGHADDAM S, ESTER M. ILDA:Interdependent LDA model for learning latent aspects and their ratings from online product reviews[C]//Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. New York:ACM, 2011:665-674.
[13] BLEI D M, LAFFERTY J D. Dynamic topic models[C]//Proceedings of the 23rd international conference on machine learning. New York:ACM, 2006:113-120.
[14] ALSUMAIT L, BARBARÁ D, DOMENICONI C. On-line lda:adaptive topic models for mining text streams with applications to topic detection and tracking[C]//Eighth IEEE international conference on Data mining. Washington, DC:IEEE, 2008:3-12.
[15] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems. New York:Curran Associates, Inc., 2013:3111-3119.
[16] GO A, BHAYANI R, HUANG L. Twitter sentiment classification using distant supervision[R]. Cs224n project report. Palo Alto:Stanford University, 2009.
[17] JANSEN B J, ZHANG M, SOBEL K, et al. Twitter power:tweets as electronic word of mouth[J]. Journal of the American Society for Information Science & Technology, 2009, 60(11):2169-2188.
[18] TUMASJAN A, SPRENGER T O, SANDNER P G, et al. Predicting elections with twitter:what 140 characters reveal about political sentiment[C]//International conference on weblogs and social media, Icwsm 2010. Washington, DC:DBLP, 2010.
[19] BOLLEN J, PEPE A, MAO H. Modeling public mood and emotion:twitter sentiment and socio-economic phenomena[J]. Computer science, 2009, 44(12):2365-2370.
[20] SHERIDAN D P, DECKER H K, KLOUMANN I M, et al. Temporal patterns of happiness and information in a global social network:hedonometrics and Twitter[J]. PlOS ONE, 2011, 6(12):e26752.
[21] HU M, LIU B. Mining and summarizing customer reviews[C]//Tenth ACM SIGKDD international conference on knowledge discovery and data mining. Seattle:ACM, 2004:168-177.
[22] ZHANG L, GHOSH R, DEKHIL M, et al. Combining lexiconbased and learning-based methods for Twitter sentiment analysis[EB/OL].[2018-04-03]. https://www.hpl.hp.com/techreports/2011/HPL-2011-89.pdf.
[23] 知网发布情感分析用词语集[EB/OL].[2018-03-08]. http://www.keenage.com/html/c_bulletin_2007.htm.
[24] FELDMAN R. Techniques and applications for sentiment analysis[M]. New York:ACM, 2013.
[25] VOLKOVA S, WILSON T, YAROWSKY D. Exploring sentiment in social media:bootstrapping subjectivity clues from multilingual twitter streams[C]//Proceedings of ACL 2013-51st Annual Meeting of the Association for Computational Linguistics.Sofia:ACL,2013:505-510.
[26] VOLKOVA S, WILSON T, YAROWSKY D. Exploring demographic language variations to improve multilingual sentiment analysis in social media[C]//Proceedings of conference on empirical methods in natural language processing. Seattle:ACL,2013:1815-1827.
[27] PANG B, LEE L, VAITHYANATHAN S. Thumbs up?:sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 conference on empirical methods in natural language processing-Volume 10. Philadelphia:ACM,2002:79-86.
[28] 张志华. 基于深度学习的情感词向量及文本情感分析的研究[D].武汉:华东师范大学,2016.
[29] 范云满, 马建霞. 基于LDA与新兴主题特征分析的新兴主题探测研究[J]. 情报学报, 2014, 33(7):698-711.
[30] 贺亮, 李芳. 基于话题模型的科技文献话题发现和趋势分析[J]. 中文信息学报, 2012, 26(2):109-116.
[31] 杨希. 基于情感词典与规则结合的微博情感分析模型研究[D].合肥:安徽大学,2014.
[32] 法媒称中巴经济走廊重振巴基斯坦[EB/OL].[2018-04-03]. http://news.163.com/17/0805/00/CR1LUS9A00018AOQ_all.html.
[33] 2017年我对"一带一路"沿线国家投资合作情况[EB/OL].[2018-04-03]. http://fec.mofcom.gov.cn/article/fwydyl/tjsj/201801/20180102699450.shtml.
[34] "一带一路"国际合作高峰论坛"政策沟通"平行主题会议签署32个合作协议[EB/OL].[2018-04-03]. http://www.xinhuanet.com/2017-05/14/c_1120970716.htm.
文章导航

/