收稿日期: 2013-02-04
修回日期: 2013-03-15
网络出版日期: 2013-05-05
基金资助
本文系山东理工大学2012年学生工作研究立项课题"新媒体时代大学生信息行为研究"研究成果之一。
Study on a Hot Topics Analysis System based on Time Sliced Topic Model
Received date: 2013-02-04
Revised date: 2013-03-15
Online published: 2013-05-05
廖君华 , 孙克迎 , 钟丽霞 . 一种基于时序主题模型的网络热点话题演化分析系统[J]. 图书情报工作, 2013 , 57(09) : 96 -102,118 . DOI: 10.7536/j.issn.0252-3116.2013.09.016
A Hot Topics Analysis System (HTAS) based on time sliced network data was proposed. HTAS realized the network hot topic data source automatically collected, acquisition and storage. HTAS integrated the google revenue segmentation system IKAnalyzer to batch processing of Chinese documents. HTAS used LDA model to extract and time label to find the evolution of the hot topics on the network. Experiments of Diaoyudao as the hot event show that, the system can effectively acquire, store and analyze this hot topic evolution trend.
Key words: topic model; topic evolution; hot topics; LDA
[1] Allan J, Carbonell J, Doddington G, et al. Topic detection and tracking pilot study: Final report[C]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.San Francisco:Morgan Kaufmann, 1998.
[2] 余传明,张小青,陈雷. 基于LDA模型的评论热点挖掘:原理与实现[J]. 情报理论与实践,2010(5):103-106.[LL]
[3] 刁宇峰,杨亮,林鸿飞. 基于LDA模型的博客垃圾评论发现[J]. 中文信息学报,2011(1):41-47.
[4] 楚克明,李芳. 基于LDA话题关联的话题演化[J]. 上海交通大学学报,2010(11):1496-1500.
[5] 胡艳丽,白亮,张维明. 网络舆情中一种基于OLDA的在线话题演化方法[J]. 国防科技大学学报,2012(1):150-154.
[6] 洪娜,钱庆,李亚子,等. 网络内容演化趋势影响因素分析——从词的生命周期和背景词簇环境中挖掘演化线索[J]. 情报理论与实践,2012(6):44-48.
[7] 李保利,杨星. 基于LDA模型和话题过滤的研究主题演化分析[J]. 小型微型计算机系统,2012(12):2738-2743.
[8] 赵旭剑. 中文新闻话题动态演化及其关键技术研究[D].合肥:中国科学技术大学,2012.
[9] Wang X,McCailum A.Topics over time:A non-Markov continuous time model of topical trends[C]//Proceedings of the ACM S1GKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2006:424-433.
[10] Blei D M,Lafferty J D.Dynamic topic models[C]//Proceedings of the Annual International Conference on Machine Learning.New York:ACM Press,2006:113-120.
[11] Wang C,Blei D M,Heckerman D.Continuous time dynamic topic models[C]//Proceedings of the Conference on Uncertainty in Artificial Intelligence.Arlington: AUAI Press, 2008:579-588.
[12] Ahmed A, Xing E P.Timeline:A dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream[C]//Proceedings of the Conference on Uncertainty in Artificial Intelligence.Arlington:AUAI Press,2010:20-29.
[13] Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science,1990,41(6):391-407.
[14] Hofmann T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International SIGIR Conference. New York: ACM Press, 1999:50-57.
[15] Blei D M, Ng A, Jordan M, Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3(5):993-1022.
[16] Blei D M, Probabilistic topic models[J].Communications of the ACM,2012,55(4):77-84.
[17] Blei D M.Topic modeling[EB/OL].[2012-02-02].http://www.cs.princeton.edu/~blei/topicmodeling.html.
[18] 哈工大信息检索研究中心论坛.下载中文停用词词表[EB/OL].[2012-02-02] http://ir.hit.edu.cn/bbs/viewthread.php?tid=20.
[19] Knime[EB/OL].[2012-02-02].http://www.knime.org/.
[20] The Stanford Natural Language Processing Group.Stanford topic modeling toolbox[EB/OL].[2012-02-02] http://nlp.stanford.edu/software/tmt/tmt-0.4/.
/
〈 | 〉 |