情报研究

基于新词发现的网络新闻热点排名

  • 王馨 ,
  • 王煜 ,
  • 王亮
展开
  • 河北大学计算机科学与技术学院 保定 071000
王馨(ORCID:0000-0001-5010-1609),硕士研究生;王亮(ORCID:0000-0003-2839-0832),实验师,硕士。

收稿日期: 2015-02-08

  修回日期: 2015-03-02

  网络出版日期: 2015-03-20

基金资助

本文系国家自然科学基金项目"关系Top-N查询引擎和排序函数的研究"(项目编号:61170039)研究成果之一。

Hot News Ranking of Network News Based on New Words Detection

  • Wang Xin ,
  • Wang Yu ,
  • Wang Liang
Expand
  • College of Computer Science and Technology, Hebei University, Baoding 071000

Received date: 2015-02-08

  Revised date: 2015-03-02

  Online published: 2015-03-20

摘要

[目的/意义]随着网络新闻的广泛快速传播,通过辨析网络新词,及时掌握新闻热点关键词,对于了解新闻热点和社会舆情的预警控制具有十分重要的意义。[方法/过程]利用改进的关联规则算法对网络新闻标题进行挖掘,相邻、有序地输出频繁字符串集合。根据互信息计算字符串的相似度,形成热点新闻的关键词集合,以实际的网络新闻为语料进行实验。[结果/结论]实验结果表明,本文所提出的方法不仅能有效地发现词典中不存在的新词汇以及当前网络中流行的热词,而且有效地区别词汇集合中的复合式新词,继而可通过热词集合的热点度计算对网络新闻热点进行排名。

本文引用格式

王馨 , 王煜 , 王亮 . 基于新词发现的网络新闻热点排名[J]. 图书情报工作, 2015 , 59(6) : 68 -74 . DOI: 10.13266/j.issn.0252-3116.2015.06.011

Abstract

[Purpose/significance] With the widespread of network news, the hot words associated with the news will spread, and the new words with a large amount of public opinion become the basis of the analysis of the network public opinion.[Method/process] This article proposes an improved algorithm of association rules to mine new words based on the headlines of network news, input frequent string collections adjacently and orderly, proposes a method of computing the similarity of strings by using Mutual Information to form the collections of keywords of the hot news,and tested based on the actual network news corpus.[Result/conclusion] The experiment results show that this method can not only find unknown words and hot words from the network news, but also proposed a new method of support degree comparison which can distinguish combined new words from the collection of words. Then rank the news by computing hot degrees of the collections of hot words.

参考文献

[1] 中国互联网络信息中心(CNNIC).第35次中国互联网络发展状况统计报告[EB/OL].[2015-02-03]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201502/t20150203_51634.htm.
[2] 李铁锤.网络热词传播现象研究[D].武汉:华中科技大学,2012.
[3] 赵文清,侯小可.基于词共现图的中文微博新闻话题识别[J].智能系统学报,2012,7(5):444-449.
[4] 耿升华.新词识别和热词排名方法研究[D].重庆:重庆大学,2013.
[5] Wang Xiaodong, Wang Juan. A method of hot topic detection in blogs using N-gram model[J].Journal of Software,2013,8(1):184-191.
[6] 刘哲,黄永峰,罗芳,等.网络新词识别算法研究[J].计算机工程与科学,2013,35(9):141-145.
[7] Wu Andi,Jiang Zixin.Statistically-enhanced new word identification in a rule-based Chinese system[C]//The Second Chinese Language Processing Workshop.Hongkong:ACL,2000:26-51.
[8] 李钝,曹元大,万月亮.Internet中的新词识别[J].北京邮电大学学报,2008,31(1):26-28.
[9] Phyue S L, Thida A. Unknown word detection via syntax analyzer[J].IAES International Journal of Artificial Intelligence(IJ-AI),2013,3(2):107-116.
[10] Zhou Guodong.A chunking strategy towards un-known word detection in chinese word segment-ation[J].Lecture Notes in Computer Science, 2005,3651:530-541.
[11] Sun Xu, Wang Houfeng, Li Wenjie. Fast online training with frequency-adaptive learning rates for chinese word segmentation and new word detection[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics.Jeju:ACL,2012:253-262.
[12] Sun Xiao,Huang Degen,Song Haiyu,et al.Chinese new word identification:A latent discriminative modle with global features[J].Journal of Computer and Technology,2011,26(1):14-24.
[13] 游玉祥.新词语的特点分析及其认知解释——以2006-2009年汉语新词语为例[D].上海:上海外国语大学,2012.
[14] 浦墨,郑彦宁,赵筱媛,等.基于词共现关系强度和关键词词频的叙词选词方法探究[J].图书情报工作,2013,57(15):121-125,149.

文章导航

/