[目的/意义]针对目前UGC质量不均衡的问题,提出一种基于情绪分析的UGC质量评判模型,对低质量UGC进行及时地识别,有助于舆情监控、规范网络秩序。[方法/过程]首先选取实时热搜话题,抓取与话题有关的转发、评论等用户数据,按照不同时间段内话题讨论的重点对数据进行内容聚类;其次,利用ROSTCM6工具对聚类内容进行情绪分析,捕获UGC的情绪特征和质量特征,挖掘情绪值与UGC质量之间的关系,建立两者间的回归模型;最后在此基础上实现UGC质量评判。[结果/结论]实验证明,该模型能够辅助评估某一主题的UGC在其生命周期内各阶段的平均质量,及时发现低质量UGC所处的阶段和位置。
[Purpose/significance] This paper aims to solve the problem of the imbalanced UGC (User Generated Content) quality, and a UGC quality evaluation model based on the emotion analysis is put forward. This model can identify the low quality UGC in time, which is helpful to monitor public opinions and regulate the Internet.[Method/process] The first step was to select the hot topics, catch the relevant UGC including the forwarded content, comments and other user data, and to classify and cluster the data according to the hot topics of different periods. The second step was to use ROSTCM6 to analyze the classified and clustered results. During this step, it got the emotional characteristics and quality characteristics of the chosen UGC, and mined the relationship between the emotional value and the UGC quality to establish the regression model between these two elements. Then, the UGC quality will be evaluated based on the results of step one and two.[Result/conclusion] The average quality of the UGC can be detected in each phase of the life cycle of a topic, and part of low quality UGC can be identified timely by using this method.
[1] 肖琳,徐升华,王琪.社交媒体发展与研究述评[J].图书馆学研究,2016(14):13-16.
[2] 中国互联网信息中心第39次中国互联网发展状况统计报告[DB/OL].[2017-02-22]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201701/P020170123364672657408.pdf.
[3] 方鹏程.用户贡献内容质量评价研究[D].北京:北京邮电大学,2011.
[4] 何子琳,陈曼仪,齐佳音.产品设计角度的微博评论有用性分析[J].北京邮电大学学报(社会科学版),2014,16(5):1-8.
[5] 章成志,李蕾.社会化标签质量自动评估研究[J].现代图书情报技术,2015, 31(10):2-12.
[6] 陈曼仪,苏宇,谢菲.基于微博平台的在线评论有用性研究——产品设计角度[J].信息通信技术,2015(6):73-79.
[7] 金燕.基于用户行为情景描述的UGC质量实时预判方法研究[J].图书情报工作,2016, 60(11):128-134,112.
[8] RZESZOTARSKI J.Predicting content quality from user behavior[EB/OL].[2016-12-02]. http://jeffrz. com/wp-content/uploads/2013/11/Proposal_JeffRz. pdf.
[9] 金燕,闫婧.基于用户信誉评级的UGC质量预判模型[J].情报理论与实践,2016,39(3):10-14
[10] 王博远.基于用户交互关系的用户创作内容质量评估[D].北京:北京邮电大学,2014.
[11] EMIGH W,HERRING S C.Collaborative authoring on the Web:a genre analysis of online encyclopedias[C]//Hawaii international conference on system sciences.Piscataway:IEEE,2005:99.
[12] MAIK A, BENNO S, NEDIM L. Predicting quality flaws in user-generated content:The case of Wikipedia[C]//SIGIR' 12 Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval.Newyork:ACM, 2012:981-990.
[13] 聂卉.基于内容分析的用户评论质量的评价与预测[J].图书情报工作,2014,58(13):83-89.
[14] KIM S M, PANTEL P, CHKLOVSKI T, et al. Automatically assessing review helpfulness[C]//EMNLP 2007,Proceedings of the 2006 conference on empirical methods in natural language processing.Newyork:ACM, 2006:423-430.
[15] FELDMAN M,BERNSTEIN A.Behavior-based quality assurance in crowdsourcing markets[EB/OL].[2017-03-27].http://www.zora.uzh.ch/98779/1/Feldman.pdf.
[16] 林鑫.基于用户圈和内容联动关系的UGC内容质量评估[D].北京:北京邮电大学,2015.
[17] SUROWIECKI J. The Wisdom of crowds-why the many are smarter than the few[EB/OL].[2017-02-22].http://vedpuriswar.org/Book_Review/General/The%20Wisdom%20of%20Crowds.pdf.
[18] SCHERER K R. Emotion as a multicomponent process:a model and some cross-cultural data[J]. Review of Personality & Social Psychology, 1984(5):37-63.
[19] 周莉,郝敏.网络情绪引导:突发事件舆情管理的新路径[J].今传媒:学术版, 2017(6):51-53.
[20] 赵晓航.基于情感分析与主题分析的"后微博"时代突发事件政府信息公开研究——以新浪微博"天津爆炸"话题为例[J]. 图书情报工作,2016,60(20):104-111.
[21] 黄润鹏,左文明,毕凌燕.基于微博情绪信息的股票市场预测[J].管理工程学报,2015, 29(1):47-52.
[22] BAI H, YU G. A Weibo-based approach to disaster informatics:incidents monitor in post-disaster situation via Weibo text negative sentiment analysis[EB/OL].[2017-01-02].https://doi.org/10.1007/s11069-016-2370-5.
[23] KAHN B K, STRONG D M, WANG R Y. Information quality benchmarks:product and service performance[J]. Communications of the ACM, 2002, 45(4):184-192.
[24] ROST虚拟学习团队.ROSTCM6使用手册[OL].[2016-12-18]. https://wenku.baidu.com/view/e7a62df3f90f76c661371a76.html.
[25] 蔡溢,杨洋,殷红梅.基于ROST文本挖掘软件的贵阳市城市旅游品牌受众感知研究[J].重庆师范大学学报(自然科学版),2015(1):126-134.
[26] 马松岳,许鑫.基于评论情感分析的用户在线评价研究——以豆瓣网电影为例[J].图书情报工作,2016,60(10):95-102.
[27] Gephi[DB/OL].[2017-08-25]. https://baike.baidu.com/item/gephi/7509330?fr=aladdin.
[28] KEVIN C,VIDYASAGAR P,THARAM D. Content quality assessment related frameworks for social media[EB/OL].[2016-06-05].http://kevinchai.net/wp-content/uploads/2011/06/content-quality-assessment-related-frameworks-for-social-media.pdf.
[29] 闫婧.基于用户信誉评级的UGC质量预判方法[D].郑州:郑州大学,2017.
[30] 赵珍珍,唐辉一,魏荟荟,等.群体情绪凝聚及其产生机制[J].宁波大学学报(教育科学版),2015(5):12-18.