A Compound Word Based Algorithm for Hot Event Detection and Description on the Web

  • Li Xia ,
  • Wang Lianxi ,
  • Lu Meixiu ,
  • Liu Hanfeng ,
  • Liu Junyan
  • 1 Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510006;
    2 School of Informatics, Guangdong University of Foreign Studies, Guangzhou 510006;
    3 Guangdong University of Foreign Studies Library, Guangzhou 510006

Received date: 2016-05-13

  Revised date: 2016-11-15

  Online published: 2016-12-05


[Purpose/significance] Automatic detection of hot events on the Web (from news and microblogs) and extraction of descriptive words to describe them is important for detecting internet public opinion. [Method/process] Current methods to extract descriptive words mainly rely on association rules or combination of multiple n-grams, which often lead to noise words with imprecise meaning and potential meanig drift. In this paper, a compound word based feature extraction method is proposed and used to represent news texts. A vector space model is used to cluster and detect hot events on the Web. [Result/conclusion] The experimental result on Tencent Internet News shows that the method proposed in this paper has higer clustering precision and recall and can produce better descriptive words.

Cite this article

Li Xia , Wang Lianxi , Lu Meixiu , Liu Hanfeng , Liu Junyan . A Compound Word Based Algorithm for Hot Event Detection and Description on the Web[J]. Library and Information Service, 2016 , 60(23) : 128 -134 . DOI: 10.13266/j.issn.0252-3116.2016.23.016


