收稿日期: 2015-09-22
修回日期: 2015-10-18
网络出版日期: 2015-11-05
基金资助
本文系国家自然科学基金面上项目"大数据环境下多媒体网络舆情信息的语义识别与危机响应研究" (项目编号:71473101)研究成果之一。
Extraction Method of Network Public Opinion Based on LDA Topic Model
Received date: 2015-09-22
Revised date: 2015-10-18
Online published: 2015-11-05
陈晓美 , 高铖 , 关心惠 . 网络舆情观点提取的LDA主题模型方法[J]. 图书情报工作, 2015 , 59(21) : 21 -26 . DOI: 10.13266/j.issn.0252-3116.2015.21.003
[Purpose/significance]The pervasive network public opinion information deeply affects and even misleads the network audience. This paper explores ways to reveal the network public opinion points, to expand the users' depth and breadth to cognize, and improve the public's ability to distinguish.[Method/process]The differences between two methods are analyzed from the view of technology, and the cognitive process of the masses and the audience is interpreted from the perspective of cognition, and then the advantages and path of the LDA topic model are described.[Result/conclusion]Combined with the public opinion topic and emotional factors and extracting the network public opinion points based on LDA model, this paper determines the depth comments from mass comments and extracts the main opinions, and effectively expands the individual thought and cognition with the wisdom of crowds, to explore a new path to present audience ideas, and provide the practical basis for public opinion monitoring and counseling.
Key words: network public opinion; LDA; topic model; semantic; opinions
[1] Liu Bing. Sentiment analysis and opinion mining[M]. San Rafael:Morgan & Claypool, 2012:12-30.
[2] Liu Bing, Zhang Lei. A survey of opinion mining and sentiment analysis[M]//Mining Text Data. New York:Springer,2012:415-463.
[3] 姚天昉,程希文,徐飞玉,等.文本意见挖掘综述[J].中文信息学报,2008,22(3):71-80.
[4] 杨潇,马军,杨同峰, 等.主题模型LDA的多文档自动文摘[J]. 智能系统学报,2010(2):169-176.
[5] Hofmann T.Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,1999:50-57.
[6] Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J]. Journal of Machine Learning Research,2003(3):993-1022.
[7] 任昭春. 面向网络论坛的动态主题建模与文本摘要[D].济南:山东大学,2012.
[8] 范云满,马建霞. 利用LDA的领域新兴主题探测技术综述[J]. 现代图书情报技术,2012(12):58-65.
[9] Yan Xiaohui,Guo Jiafeng,Lan Yanyan, et al.A biterm topic model for short texts[C]//Proceedings of the 22nd International Conference on World Wide Web.Riode Janeiro:International World Wide Web Conferences Steering Committee,2013:1445-1456.
[10] 王来华,刘毅. 2004年舆情研究综述[J].天津大学学报(社会科学版),2005(4):309-313.
[11] 阿拉伯世界政府倒台给中国人的警醒[EB/OL]. [2015-03-09]. http://scholarsupdate.hi2net.com/news.asp?NewsID=13280.
[12] 房宁.中国民主的经验[EB/OL]. [2014-03-11]. http://www.qstheory.cn/hqwg/2014/201406/201403/t20140325_333590.htm.
[13] 何文译.群体智慧在社交媒体中的应用研究[D].大连:大连理工大学,2014.
[14] 彭兰.群氓的智慧还是群体性迷失——互联网群体互动效果的两面观察[J].当代传播,2014(2):4-7.
[15] 樊嘉禄,陈发俊. "盲人摸象"的认识论启示[J]. 安徽农业大学学报(社会科学版),2001(1):33-34.
[16] Brookes B C. The fundamental problem of informationscience[J].Journal of Information Sciecce,1981(3):3-12.
[17] 张小平,周雪忠,黄厚宽, 等.基于词相似性与CRP的主题模型[J]. 模式识别与人工智能,2010(1):72-76.
[18] 李金广. 数据挖掘中聚类算法研究综述[J]. 中国科技信息,2010(17):48-49.
[19] 张连文,袁世宏.隐结构模型与中医辨证研究(I)——隐结构法的基本思想以及隐结构分析工具[J].北京中医药大学学报,2006,29(6):365-369.
[20] 张小平.主题模型及其在中医临床诊疗中的应用研究[D].北京:北京交通大学,2011.
[21] 王大玲,于戈,鲍玉斌, 等.一种面向数据挖掘预处理过程的领域知识的分类及表示[J].小型微型计算机系统,2003(5):863-868.
[22] Sougou实验室数据[DS/OL].[2015-03-09].http://download.labs.sogou.com/dl/sogoulabdown/SogouC.mini.20061102.tar.gz.
[23] Lu Yue,Zhai Chengxiang. Opinion integration through semi-supervised topic modeling[C]//Proceedings of the 17th International Conference on World Wide Web. Beijing:ACM,2008:121-130.
[24] 张冬梅.文本情感分类及观点摘要关键问题研究[D]. 济南:山东大学,2012.
[25] 付玲,张晖. 结合 LDA 和谱聚类的多文档摘要[J]. 计算机工程与应用,2013,49(16):142-146.
[26] 秦兵,刘挺,陈尚林,等. 多文档文摘中句子优化选择方法研究[J]. 计算机研究与发展,2006,43(6):1129-1134.
/
〈 | 〉 |