图书情报工作 ›› 2019, Vol. 63 ›› Issue (7): 105-115.DOI: 10.13266/j.issn.0252-3116.2019.07.013

• 知识组织 • 上一篇    下一篇

基于知识元的中文文本层级分割

王忠义1, 沈雪莹1, 黄京2   

  1. 1. 华中师范大学信息管理学院 武汉 430079;
    2. 武汉职业技术学院 武汉 430074
  • 收稿日期:2018-06-22 修回日期:2018-10-11 出版日期:2019-04-05 发布日期:2019-04-05
  • 作者简介:王忠义(ORCID:0000-0001-8945-783X),副教授,博士,硕士生导师,E-mail:wzywzy13579@163.com;沈雪莹(ORCID:0000-0002-2944-4399),硕士研究生;黄京(ORCID:0000-0003-2938-8507),副教授。
  • 基金资助:
    本文系教育部人文社会科学研究青年基金"数字图书馆馆藏资源多粒度层级主题分割研究"(项目编号:16YJC870003)研究成果之一。

Chinese Text Hierarchical Segmentation Based on Knowledge Element

Wang Zhongyi1, Shen Xueying1, Huang Jing2   

  1. 1. School of Information Management, Central China Normal University, Wuhan 430079;
    2. Wuhan Polytechnic, Wuhan 430074
  • Received:2018-06-22 Revised:2018-10-11 Online:2019-04-05 Published:2019-04-05

摘要: [目的/意义]为帮助用户检索到完整的、粒度大小适当的知识单元,满足用户多粒度的知识需求。[方法/过程]提出一种基于知识元的文本层级分割方法。该方法首先对知识元的类型及其描述规则进行分析;然后依据知识元描述规则识别实体资源中的各类型知识元,并将所有的知识元和知识元之间的衔接句视为一个类;最后基于fisher分割算法对该类进行逐级二分,直到识别出所有的主题为止,确定分割边界,实现文本层级分割。[结果/结论]基于知识元的中文文本层级分割方法,一方面使得文本分割单元从句子扩展为知识元,提高分割时的效率,另一方面将知识服务的控制单元从文献深入到以知识元、知识元集合为单位的知识块,按需为用户提供相关知识服务,使数据检索、信息检索向知识检索迈进,提高知识获取效率,实现信息服务向知识服务的转型。

关键词: 知识元识别, 聚类, 层级分割

Abstract: [Purpose/significance] This paper aims to help users to retrieve complete and appropriate size of knowledge unit and to satisfy users' multi-granularity requirements. [Method/process] This paper proposes a hierarchical segmentation based on the knowledge element. Firstly, the method analyzes the types of knowledge elements and the description rules. Secondly, it identifies the knowledge elements in the entity resources according to the knowledge element description rules, and treats the knowledge elements and the joint sentences as a class. Finally, the fisher segmentation algorithm is used to divide the class bi-levelly until all topics are identified, and the segmentation boundaries are determined, to achieve the hierarchical segmentation. [Result/conclusion] This method is based on the recognition of the knowledge element to segment the text. On the one hand, segmentation granularity extends from sentence to knowledge element, which improves the efficiency of segmentation. On the other hand, the control unit of knowledge service is deepened from the literature into knowledge blocks with knowledge elements and knowledge elements sets as the unit, providing the necessary knowledge resources, realizing the progress from data retrieval, information retrieval to knowledge retrieval, improving the efficiency of knowledge acquisition and achieving the transformation of information services to knowledge services.

Key words: knowledge-element recognition, clustering, hierarchical segmentation

中图分类号: