图书情报工作 ›› 2022, Vol. 66 ›› Issue (6): 108-117.DOI: 10.13266/j.issn.0252-3116.2022.06.012

• 知识组织 • 上一篇    下一篇

基于信息增益与相似度的专利关键词抽取算法评价模型

俞琰1,2, 鞠鹏1, 尚明杰1   

  1. 1. 南京工业大学信息管理与技术研究所 南京 210009;
    2. 东南大学成贤学院计算机工程系 南京 211816
  • 收稿日期:2021-07-08 修回日期:2021-10-30 出版日期:2022-03-30 发布日期:2022-03-30
  • 作者简介:俞琰,教授,博士,E-mail:yuyanyuyan2004@126.com;鞠鹏,硕士研究生;尚明杰,硕士研究生。
  • 基金资助:
    本文系国家社会科学基金项目"大数据时代支持创新设计的多维度多层次专利文本挖掘研究"(项目编号:17BTQ059)研究成果之一。

Research on the Evaluation Method of Patent Keyword Extraction Algorithm Based on Information Gain and Similarity

Yu Yan1,2, Ju Peng1, Shang Mingjie1   

  1. 1. Institute of the Information Management and Technology, Nanjing Technology University, Nanjing 210009;
    2. School of Electronics and Computer, Chengxian College, Southeast University, Nanjing 211816
  • Received:2021-07-08 Revised:2021-10-30 Online:2022-03-30 Published:2022-03-30

摘要: [目的/意义] 针对目前专利关键词抽取算法评价中主要采用抽取的关键词与专家人工标注关键词进行匹配存在的问题,提出一种基于信息增益与相似度的专利关键词抽取算法评价模型。[方法/过程] 提出的评价模型从内部和外部两个层面评估专利关键词抽取算法的准确性。其中,内部评价模型度量待评价算法抽取的每个关键词的信息增益,以评估被抽取的关键词的新颖性与创造性;外部评价模型使用待评价算法抽取的关键词集表示专利,计算相关专利的相似度,衡量算法抽取的关键词描述专利主题的有效性。[结果/结论] 通过评价模型有效性验证实验与评价模型应用实证研究,结果表明提出的基于信息增益与相似度的评价模型具有可行性与有效性。

关键词: 专利, 关键词抽取, 评价, 信息增益, 相似度

Abstract: [Purpose/significance] Aiming at the problems existing in the evaluation of patent keyword extraction algorithm, which mainly uses the extracted keywords to match the keywords manually labeled by experts, an evaluation model of patent keyword extraction algorithm based on information gain and similarity is proposed.[Method/process] The proposed evaluation model evaluated the accuracy of the patent keyword extraction algorithm from intrinsic and extrinsic levels. The intrinsic evaluation model measured the information gain of each keyword extracted by the evaluation algorithm to evaluate the novelty and creativity of the extracted keywords. The extrinsic evaluation model used the keyword set extracted by the evaluation algorithm to represent the patents, and measured the effectiveness of the keywords extracted by the algorithm to describe the patent topic by calculating the similarity of relevant patents.[Result/conclusion] Through the validation experiment of the evaluation model and the empirical research on the application of the evaluation model, the results show that the evaluation model based on information gain and similarity is feasible and effective.

Key words: patent, keyword extraction, evaluation, information gain, similarity

中图分类号: