Library and Information Service >
Hot News Ranking of Network News Based on New Words Detection
Received date: 2015-02-08
Revised date: 2015-03-02
Online published: 2015-03-20
[Purpose/significance] With the widespread of network news, the hot words associated with the news will spread, and the new words with a large amount of public opinion become the basis of the analysis of the network public opinion.[Method/process] This article proposes an improved algorithm of association rules to mine new words based on the headlines of network news, input frequent string collections adjacently and orderly, proposes a method of computing the similarity of strings by using Mutual Information to form the collections of keywords of the hot news,and tested based on the actual network news corpus.[Result/conclusion] The experiment results show that this method can not only find unknown words and hot words from the network news, but also proposed a new method of support degree comparison which can distinguish combined new words from the collection of words. Then rank the news by computing hot degrees of the collections of hot words.
Key words: association rules; unknown words; mutual information; hot degree
Wang Xin , Wang Yu , Wang Liang . Hot News Ranking of Network News Based on New Words Detection[J]. Library and Information Service, 2015 , 59(6) : 68 -74 . DOI: 10.13266/j.issn.0252-3116.2015.06.011
[1] 中国互联网络信息中心(CNNIC).第35次中国互联网络发展状况统计报告[EB/OL].[2015-02-03]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201502/t20150203_51634.htm.
[2] 李铁锤.网络热词传播现象研究[D].武汉:华中科技大学,2012.
[3] 赵文清,侯小可.基于词共现图的中文微博新闻话题识别[J].智能系统学报,2012,7(5):444-449.
[4] 耿升华.新词识别和热词排名方法研究[D].重庆:重庆大学,2013.
[5] Wang Xiaodong, Wang Juan. A method of hot topic detection in blogs using N-gram model[J].Journal of Software,2013,8(1):184-191.
[6] 刘哲,黄永峰,罗芳,等.网络新词识别算法研究[J].计算机工程与科学,2013,35(9):141-145.
[7] Wu Andi,Jiang Zixin.Statistically-enhanced new word identification in a rule-based Chinese system[C]//The Second Chinese Language Processing Workshop.Hongkong:ACL,2000:26-51.
[8] 李钝,曹元大,万月亮.Internet中的新词识别[J].北京邮电大学学报,2008,31(1):26-28.
[9] Phyue S L, Thida A. Unknown word detection via syntax analyzer[J].IAES International Journal of Artificial Intelligence(IJ-AI),2013,3(2):107-116.
[10] Zhou Guodong.A chunking strategy towards un-known word detection in chinese word segment-ation[J].Lecture Notes in Computer Science, 2005,3651:530-541.
[11] Sun Xu, Wang Houfeng, Li Wenjie. Fast online training with frequency-adaptive learning rates for chinese word segmentation and new word detection[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics.Jeju:ACL,2012:253-262.
[12] Sun Xiao,Huang Degen,Song Haiyu,et al.Chinese new word identification:A latent discriminative modle with global features[J].Journal of Computer and Technology,2011,26(1):14-24.
[13] 游玉祥.新词语的特点分析及其认知解释——以2006-2009年汉语新词语为例[D].上海:上海外国语大学,2012.
[14] 浦墨,郑彦宁,赵筱媛,等.基于词共现关系强度和关键词词频的叙词选词方法探究[J].图书情报工作,2013,57(15):121-125,149.
/
〈 | 〉 |