[Purpose/significance] According to the characteristics of Chinese language expression, this paper proposes a feature extraction method of words with word segmentation tag of character granularity, which can effectively improve the F1 value of Chinese clinical named entity recognition, and the method can be used for other Chinese sequence labeling model. [Method/process] This paper chose three kinds of features of Chinese-words, including part-of-speech Tagging, keyword weight and dependency parsing, to construct the clinical cases training text in sequence labeling model of the Chinese-character granularity, and the corpus source is CCKS2017:Task2. Then, in different feature combination modes, this paper adopted CRF algorithm to verify Method 1 and Method 2,which are two kinds of words feature extraction methods for character granularity. [Result/conclusion] Compared with Method 1, for the four different combinations of word features, Method 2 has been improved in the task of CNER, and the F1 value has increased by an average of 0.23% in the 4-fold cross-validation test. The experiment shows that in the context of mature Chinese word segmentation technology, Method2 can obtain better word feature representations than Method 1, and it has a lifting effect on the processing performance of Chinese-Character Granularity in Sequence Labeling Model.
作者贡献说明:孙安:提出研究思路,制定实验方案,撰写论文初稿; 于英香:设计论文框架,提出修改建议; 罗永刚:为研究选题提供素材和指导; 王祺:提供技术指导。