图书情报工作 ›› 2011, Vol. 54 ›› Issue (02): 41-45.

• 情报研究 • 上一篇    下一篇

国内中文自动分词技术研究综述

奉国和1,郑伟2   

  1. 1. 华南师范大学经济管理学院
    2. 河北北方学院理学院
  • 收稿日期:2010-08-12 修回日期:2010-09-13 出版日期:2011-01-20 发布日期:2011-01-20
  • 通讯作者: 奉国和
  • 基金资助:
    国家社科基金项目:文本自动分类技术研究

Review of Chinese Automatic Word Segmentation

Feng Guohe 1,Zhen Wei 2   

  1. 1. School of Economics & Management, South China Normal University,
    2. College of Science, Hebei North University,
  • Received:2010-08-12 Revised:2010-09-13 Online:2011-01-20 Published:2011-01-20
  • Contact: Feng Guohe

摘要: 认为分词是文本自动分类、信息检索、信息过滤、文献自动标引、摘要自动生成等中文信息处理的基础与关键技术之一,中文本身复杂性及语言规则的不确定性,使中文分词技术成为分词技术中的难点。全面归纳中文分词算法、歧义消除、未登录词识别、自动分词系统等研究,总结出当前中文分词面临的难点与研究热点。

关键词: 中文分词, 分词算法, 歧义消除, 未登录词, 分词系统

Abstract: Word segmentation is one of the key technology for natural language processing such as text auto-classification, information retrieval,information filtration, document auto-index,summarization auto-generation etc.. Chinese word segmentation is difficult problem in word segmentation because of it’s complexity and uncertain language rules in nature.This paper sums up the research comprehensively of Chinese word segmentation algorithm, disambiguation method, unknown word recognition,auto-segmentaion systems etc.and summarizes Chinese word segmentation’s research difficult points and hot points today.

Key words: chinese word segmentation, word segmentation algorithm, disambiguation method, unknown word recognition, word segmentation system