图书情报工作 ›› 2021, Vol. 65 ›› Issue (8): 107-113.DOI: 10.13266/j.issn.0252-3116.2021.08.011

• 知识组织 • 上一篇    下一篇

多示例多标签学习在中文专利自动分类中的应用研究

包翔1, 刘桂锋1, 崔靖华2   

  1. 1 江苏大学科技信息研究所 镇江 212013;
    2 南京大学信息管理学院 南京 210093
  • 收稿日期:2020-10-28 修回日期:2021-01-13 出版日期:2021-04-20 发布日期:2021-06-02
  • 作者简介:包翔(ORCID:0000-0002-2233-5739),馆员,硕士,E-mail:bx425bob@163.com;刘桂锋(ORCID:0000-0002-7209-3862),研究馆员,博士;崔靖华(ORCID:0000-0001-9723-3414),博士研究生。
  • 基金资助:
    本文系江苏省高校哲学社会科学研究一般项目"主题模型在高校图书馆知识产权信息服务中的研究与实践"(项目编号:2019SJA1870)和江苏省高校自然科学研究面上项目"基于多示例多标签学习及深度神经网络的专利主题分类研究"(项目编号:19KJB520005)研究成果之一。

Application of Multi Instance Multi Label Learning in Chinese Patent Automatic Classification

Bao Xiang1, Liu Guifeng1, Cui Jinghua2   

  1. 1 Institute of Science and Technology Information, Jiangsu University, Zhenjiang 212013;
    2 School of Information Management, Nanjing University, Nanjing 210093
  • Received:2020-10-28 Revised:2021-01-13 Online:2021-04-20 Published:2021-06-02

摘要: [目的/意义] 旨在对大量的中文专利实现快速分类,满足专利审查以及情报分析等工作的要求。[方法/过程] 结合专利文本的固有格式以及存在多个IPC分类号的实际情况,将多示例多标签学习应用于专利自动分类中,在介绍几种经典的多示例多标签模型的基本原理之后,将这些模型运用于中文专利IPC分类号的确定。[结果/结论] 实验证明,多示例多标签模型适合运用在专利的自动分类中,并且从Average precision、Hamming Loss、Ranking Loss、One Error、Coverage、Training time等指标分析可以发现,MIMLRBF模型能快速、准确地运用在中文专利IPC分类号的确定中,为大规模专利的自动分类提供借鉴。

关键词: 专利, 分类, IPC分类号, 多示例多标签

Abstract: [Purpose/significance] In order to achieve rapid classification in a large number of Chinese patents to meet the requirements of patent examination and intelligence analysis.[Method/process] Combined with the inherent format of patent text and the fact that there are multiple classification numbers, this paper applied multi-instance multi-label learning to automatic patent classification. Firstly, several classical multi-instance multi-label learning methods were introduced, and then these methods were applied to determine IPC number of Chinese patent.[Result/conclusion] It is experimentally demonstrated that the multi-instance multi-label learning methods are suitable for patent automatic classification, according to average precision, hamming loss, ranking loss, one error, coverage, training time, it is found that MIMLRBF can be used to determine the IPC number of Chinese patents quickly and accurately, which provides a new perspective for classifying large-scale patents.

Key words: patent, classification, IPC, multi-instance multi-label

中图分类号: