知识组织

多示例多标签学习在中文专利自动分类中的应用研究

  • 包翔 ,
  • 刘桂锋 ,
  • 崔靖华
展开
  • 1 江苏大学科技信息研究所 镇江 212013;
    2 南京大学信息管理学院 南京 210093
包翔(ORCID:0000-0002-2233-5739),馆员,硕士,E-mail:bx425bob@163.com;刘桂锋(ORCID:0000-0002-7209-3862),研究馆员,博士;崔靖华(ORCID:0000-0001-9723-3414),博士研究生。

收稿日期: 2020-10-28

  修回日期: 2021-01-13

  网络出版日期: 2021-06-02

基金资助

本文系江苏省高校哲学社会科学研究一般项目"主题模型在高校图书馆知识产权信息服务中的研究与实践"(项目编号:2019SJA1870)和江苏省高校自然科学研究面上项目"基于多示例多标签学习及深度神经网络的专利主题分类研究"(项目编号:19KJB520005)研究成果之一。

Application of Multi Instance Multi Label Learning in Chinese Patent Automatic Classification

  • Bao Xiang ,
  • Liu Guifeng ,
  • Cui Jinghua
Expand
  • 1 Institute of Science and Technology Information, Jiangsu University, Zhenjiang 212013;
    2 School of Information Management, Nanjing University, Nanjing 210093

Received date: 2020-10-28

  Revised date: 2021-01-13

  Online published: 2021-06-02

摘要

[目的/意义] 旨在对大量的中文专利实现快速分类,满足专利审查以及情报分析等工作的要求。[方法/过程] 结合专利文本的固有格式以及存在多个IPC分类号的实际情况,将多示例多标签学习应用于专利自动分类中,在介绍几种经典的多示例多标签模型的基本原理之后,将这些模型运用于中文专利IPC分类号的确定。[结果/结论] 实验证明,多示例多标签模型适合运用在专利的自动分类中,并且从Average precision、Hamming Loss、Ranking Loss、One Error、Coverage、Training time等指标分析可以发现,MIMLRBF模型能快速、准确地运用在中文专利IPC分类号的确定中,为大规模专利的自动分类提供借鉴。

本文引用格式

包翔 , 刘桂锋 , 崔靖华 . 多示例多标签学习在中文专利自动分类中的应用研究[J]. 图书情报工作, 2021 , 65(8) : 107 -113 . DOI: 10.13266/j.issn.0252-3116.2021.08.011

Abstract

[Purpose/significance] In order to achieve rapid classification in a large number of Chinese patents to meet the requirements of patent examination and intelligence analysis.[Method/process] Combined with the inherent format of patent text and the fact that there are multiple classification numbers, this paper applied multi-instance multi-label learning to automatic patent classification. Firstly, several classical multi-instance multi-label learning methods were introduced, and then these methods were applied to determine IPC number of Chinese patent.[Result/conclusion] It is experimentally demonstrated that the multi-instance multi-label learning methods are suitable for patent automatic classification, according to average precision, hamming loss, ranking loss, one error, coverage, training time, it is found that MIMLRBF can be used to determine the IPC number of Chinese patents quickly and accurately, which provides a new perspective for classifying large-scale patents.

参考文献

[1] 高莉.科技创新市场化的专利制度回应[J].江苏大学学报(社会科学版),2017,19(1):63-69.
[2] 吕璐成,韩涛,周健,等.基于深度学习的中文专利自动分类方法研究[J].图书情报工作,2020,64(10):75-85.
[3] 胡杰,李少波,于丽娅,等.基于卷积神经网络与随机森林算法的专利文本分类模型[J].科学技术与工程,2018,18(6):268-272.
[4] 张群,王红军,王伦文.词向量与LDA相融合的短文本分类方法[J].现代图书情报技术,2016,32(12):27-35.
[5] 温超东,曾诚,任俊伟,等.结合ALBERT和双向门控循环单元的专利文本分类[J].计算机应用,2021,41(2):407-412.
[6] 余本功,张培行.基于双通道特征融合的WPOS-GRU专利分类方法[J].计算机应用研究,2020,37(3):655-658.
[7] GOMEZ J. Analysis of the effect of data properties in automated patent classification[J]. Scientometrics, 2019,121(3):1239-1268.
[8] 胡学钢,杨恒宇,林耀进,等.基于协同过滤的专利TRIZ分类方法[J].情报学报,2018,37(5):512-518.
[9] LI S, HU J, CUI Y, et al. DeepPatent:patent classification with convolutional neural networks and word embedding[J]. Scientometrics,2018,117(2):721-744.
[10] 周成,魏红芹.专利价值评估与分类研究——基于自组织映射支持向量机[J].数据分析与知识发现,2019,3(5):117-124.
[11] LU Y, XIONG X, ZHANG W, et al. Research on classification and similarity of patent citation based on deep learning[J]. Scientometrics, 2020,123(2):813-839.
[12] ZHANG M L, ZHOU Z H. M3MIML:A maximum margin method for multi-instance multi-label learning[C]//Eighth IEEE international conference on data mining. Los Alamitos:IEEE Computer Society, 2008:688-697.
[13] ZHOU Z H. A framework for machine learning with ambiguous objects[C]//5th international conference on active media technology. Berlin:Springer-Verlag, 2009:6.
[14] ZHOU Z H, ZHANG M L. Multi-instance multi-label learning with application to scene classification[C]//Advances in neural information processing systems. Cambridge:Neural information processing systems foundation, 2006:1609-1616.
[15] ZHANG M L, WANG Z J. MIMLRBF:RBF neural networks for multi-instance multi-label learning[J]. Neurocomputing, 2009, 72(16-18):3951-3956.
[16] CHEN Z, CHI Z, FU H, et al. Multi-instance multi-label image classification:a neural approach[J]. Neurocomputing, 2013, 99(1):298-306.
[17] HUANG S J, GAO W, ZHOU Z H. Fast multi-instance multi-label learning[J]. IEEE transactions on pattern analysis and machine intelligence, 2014, 41(11):1868-1874.
[18] 严考碧,李志欣,张灿龙.基于主题模型的多示例多标记学习方法[J].计算机应用,2015,35(8):2233-2237.
[19] SEBASTIANI F. Machine learning in automated text categorization[M]. New York:ACM, 2002.
[20] YANG Y, WU Y F, ZHAN D C, et al. Complex object classification:a multi-modal multi-instance multi-label deep network with optimal transport[C]//The 24th ACM SIGKDD international conference. New York:Assoc Computing Machinery, 2018:2594-2603.
[21] 包翔,刘桂锋,杨国立.基于多示例学习框架的专利文本分类方法研究[J].情报理论与实践,2018,41(11):144-148.
文章导航

/