Library and Information Service >
Classification Algorithm of Chinese Short Texts Based on Wikipedia
Received date: 2013-04-03
Revised date: 2013-05-19
Online published: 2013-06-05
In order to resolve the problems of the lack key words of short texts and weak signal concepts, this paper proposes a method of feature extension based on Wikipedia to classify Chinese short texts. It extracts the set of related concepts and computes the concept relevancy with Wikipedia concepts and interlinkages, and avoids the polysemy problem by combining ambiguous page with the context extracted from short texts. Then it makes the feature extension based on the theory of semantic relevance relation between words, to supply semantic features information of texts. Finally, this paper put forwards Wikipedia-based classification algorithm of Chinese short texts and verifies it. The results show that the algorithm can get better classified effect of Chinese short texts.
Zhao Hui , Liu Huailiang . Classification Algorithm of Chinese Short Texts Based on Wikipedia[J]. Library and Information Service, 2013 , 57(11) : 120 -124 . DOI: 10.7536/j.issn.0252-3116.2013.11.022
[1] 闫瑞, 曹先彬, 李凯. 面向短文本的动态组合分类算法[J]. 电子学报, 2009, 37(5): 1019-1024.
[2] 庞观松, 蒋盛益. 文本自动分类技术研究综述[J]. 情报理论与实践, 2012, 35(2): 123-128.
[3] 王鹏, 樊兴华. 中文文本分类中利用依存关系的实验研究[J]. 计算机工程与应用, 2010, 46(3): 131-133.
[4] 范云杰, 刘怀亮. 基于维基百科的中文短文本分类研究[J]. 现代图书情报技术, 2012(3): 47-52.
[5] 王细薇, 沈云琴. 中文短文本分类方法研究[J]. 现代计算机, 2010(7): 28-31.
[6] 王细薇,张 凯. 一种改进的基于共现关系的短文本特征扩展算法研究[J]. 河南城建学院学报, 2012, 21(4): 48-50.
[7] 王细薇, 樊兴华, 赵军. 一种基于特征扩展的中文短文本分类方法[J]. 计算机应用, 2009, 29(3): 843-845.
[8] 曹叶盛. 基于关联扩展的短文本分类方法研究[D].北京:北京邮电大学,2012.
[9] Fan X H, Hu H G. Utilizing high-quality feature extension mode to classify Chinese short-text [J]. Journal of Networks, 2010, 5(12): 1417-1425.
[10] 宁亚辉, 樊兴华, 吴渝. 基于领域词语本体的短文本分类[J]. 计算机科学, 2009, 36(3): 142-145.
[11] 王盛, 樊兴华, 陈现麟. 利用上下位关系的中文短文本分类[J]. 计算机应用, 2010, 30(3): 603-611.
[12] 涂新辉, 张红春, 周琨峰,等. 中文维基百科的结构化信息抽取及词语相关度计算方法[J]. 中文信息学报, 2012, 26(3): 109-115.
[13] 王兰成, 刘晓亮. 维基百科知网的构建研究与应用进展[J]. 情报资料工作, 2012 (5): 56-60.
[14] Milne D,Witten I H. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links[C]//Proceedings of the 23th Association for the Advancement of Artificial Intelligence.Chicago:AAAI Press, 2008: 25-30.
[15] Salton G, McGill M J. Introduction to modern information retrieval[M].New York:McGraw Hill, 1983.
[16] Auen J.Natural language understanding[M].New York:The Benjamin Cummings Publishing Company, 1991.
[17] Sebastiani F. Machine learning in automated text categorization[J]. ACM Computeing Surveys,2002, 34(1): 1-47.
/
〈 | 〉 |