图书情报工作 ›› 2019, Vol. 63 ›› Issue (18): 109-118.DOI: 10.13266/j.issn.0252-3116.2019.18.013

• 知识组织 • 上一篇    下一篇

基于依存句法分析的中文专利候选术语选取研究

俞琰1,2, 陈磊1, 姜金德3, 赵乃瑄1   

  1. 1. 南京工业大学信息服务部 南京 210009;
    2. 东南大学成贤学院计算机工程系 南京 211816;
    3. 南京晓庄学院商学院 南京 211171
  • 收稿日期:2019-01-22 修回日期:2019-04-14 出版日期:2019-09-20 发布日期:2019-09-20
  • 作者简介:俞琰(ORCID:0000-0002-9654-8614),副教授,博士,E-mail:yuyanyuyan2004@126.com;陈磊(ORCID:0000-0002-5504-7493),硕士研究生;姜金德(ORCID:0000-0002-5504-7493),教授,博士;赵乃瑄(ORCID:0000-0001-9072-7315),馆长,教授,博士。
  • 基金资助:
    本文系教育部人文社会科学规划项目 "大数据时代技能知识图谱构建研究"(项目编号:16YJAZH073)和国家社会科学基金一般规划项目"大数据时代支持创新设计的多维度多层次专利文本挖掘研究"(项目编号:17BTQ059)研究成果之一。

Research on the Selection of Chinese Patent Candidate Term Based on Dependency Syntax Parsing

Yu Yan1,2, Chen lei1, Jiang Jinde3, Zhao Naixuan1   

  1. 1. Information Service Department, Nanjing Tech University, Nanjing 210009;
    2. Computer Science Department, Chengxian College, Southeast University, Nanjing 211816;
    3. School of Business, Nanjing Xiaozhuang University, Nanjing 211171
  • Received:2019-01-22 Revised:2019-04-14 Online:2019-09-20 Published:2019-09-20

摘要: [目的/意义]针对中文专利候选术语选取方法存在需要对不同的数据集分别制定不同的模式匹配规则、专利术语抽取准确性不高等问题,本文提出基于依存句法分析的中文专利术语选取方法,以提高中文专利术语抽取准确性。[方法/过程]主要包括依存句法分析、剪枝、生成依存子树等三个主要步骤。首先对中文专利进行依存句法分析,得到依存树,对依存树进行剪枝,去除不符合要求的依存关系,生成依存子树,从中选取连续词串作为候选术语,以抽取中文专利术语。[结果/结论]实验结果表明,与已有的中文专利候选术语选取方法相比,本文提出的基于依存句法分析的中文候选术语选取方法能够有效地提高中文专利术语抽取的准确性。

关键词: 术语抽取, 依存句法分析, 中文候选术语选取

Abstract: [Purpose/significance] Aiming at the difficulties in making different pattern matching rules for different data sets and the low accuracy of Chinese patent term extraction, this paper proposes a selection method of Chinese patent candidate term based on dependency syntax parsing to improve the accuracy of Chinese patent term extraction.[Method/process] The method mainly includes three main steps:dependency syntax parsing, pruning and dependency subtree generation. Firstly, dependency syntax analysis was carried out on the Chinese patent text, from which dependency tree were obtained. Then, the dependency subtrees were generated by removing dependency relations which do not meet requirements. At last, the continuous word strings were selected as candidate terms to extract Chinese patent terms.[Result/conclusion] The experimental results show that compared with the existing related methods, the proposed method based on dependency syntax parsing can effectively improve the accuracy of Chinese patent term extraction.

Key words: term extraction, dependency syntax parsing, Chinese patent candidate term selection

中图分类号: