图书情报工作 ›› 2014, Vol. 58 ›› Issue (19): 124-128.DOI: 10.13266/j.issn.0252-3116.2014.19.019

• 知识组织 • 上一篇    下一篇

基于语义网络社团划分的中文文本分类研究

尹丽英1,2, 赵捧未1   

  1. 1. 西安电子科技大学经济与管理学院;
    2. 西安邮电大学经济与管理学院
  • 收稿日期:2014-06-09 修回日期:2014-08-11 出版日期:2014-10-05 发布日期:2014-10-05
  • 作者简介:尹丽英,西安电子科技大学经济与管理学院博士研究生,西安邮电大学经济与管理学院讲师,E-mail:yin-liying@163.com;赵捧未,西安电子科技大学经济与管理学院教授,博士生导师。
  • 基金资助:

    本文系国家自然科学基金项目“基于知识地图的对等网语义社区及其知识共享研究”(项目编号:71103138)和中央高校基础科研业务费资助项目“大数据背景下基于用户生成内容的商务智能模型研究”(项目编号:BDY231414)研究成果之一。

A Chinese Text Classification Algorithm Based on Partitioning Community in Semantic Network

Yin Liying1,2, Zhao Pengwei1   

  1. 1. School of Economics & Management, Xidian University, Xi'an 710071;
    2. College of Economics and Management, Xi'an University of Post & Telecommunications, Xi'an 710121
  • Received:2014-06-09 Revised:2014-08-11 Online:2014-10-05 Published:2014-10-05

摘要:

为减少一词多义现象及训练样本的类偏斜问题对分类性能的影响,提出一种基于语义网络社团划分的中文文本分类算法。通过维基百科知识库对文本特征词进行消歧,构建出训练语义复杂网络以表示文本间的语义关系,再次结合节点特性采用K-means算法对训练集进行社团划分以改善类偏斜问题,进而查找待分类文本的最相近社团并以此为基础进行文本分类。实验结果表明,本文所提出的中文文本分类算法是可行的,且具有较好的分类效果。

关键词: 语义网络, 词义消歧, 社团结构, 文本分类

Abstract:

In order to reduce the polysemy phenomenon and the influence of the category deflect problem of training samples, a Chinese text categorization method was proposed on community division of semantic network. Firstly, disambigurtion was in progress through Wikipedia knowledge base, the complex network of text is built in order to represent the semantic relations between training texts. Then, in order to improve the problem of category deflect, the training samples is partitioned by the method of K-means which combined with the synthetic characteristics of network nodes. Finally, the text classification based on the nearest community of testing text is found out according to the nearest community. Results of experiments show that the algorithm proposed by this paper is feasible and can improve the effect of its classification.

Key words: semantic network, word sense disambiguation, community structure, text classification

中图分类号: