[Purpose/significance] In order to achieve rapid classification in a large number of Chinese patents to meet the requirements of patent examination and intelligence analysis.[Method/process] Combined with the inherent format of patent text and the fact that there are multiple classification numbers, this paper applied multi-instance multi-label learning to automatic patent classification. Firstly, several classical multi-instance multi-label learning methods were introduced, and then these methods were applied to determine IPC number of Chinese patent.[Result/conclusion] It is experimentally demonstrated that the multi-instance multi-label learning methods are suitable for patent automatic classification, according to average precision, hamming loss, ranking loss, one error, coverage, training time, it is found that MIMLRBF can be used to determine the IPC number of Chinese patents quickly and accurately, which provides a new perspective for classifying large-scale patents.
Bao Xiang
,
Liu Guifeng
,
Cui Jinghua
. Application of Multi Instance Multi Label Learning in Chinese Patent Automatic Classification[J]. Library and Information Service, 2021
, 65(8)
: 107
-113
.
DOI: 10.13266/j.issn.0252-3116.2021.08.011
[1] 高莉.科技创新市场化的专利制度回应[J].江苏大学学报(社会科学版),2017,19(1):63-69.
[2] 吕璐成,韩涛,周健,等.基于深度学习的中文专利自动分类方法研究[J].图书情报工作,2020,64(10):75-85.
[3] 胡杰,李少波,于丽娅,等.基于卷积神经网络与随机森林算法的专利文本分类模型[J].科学技术与工程,2018,18(6):268-272.
[4] 张群,王红军,王伦文.词向量与LDA相融合的短文本分类方法[J].现代图书情报技术,2016,32(12):27-35.
[5] 温超东,曾诚,任俊伟,等.结合ALBERT和双向门控循环单元的专利文本分类[J].计算机应用,2021,41(2):407-412.
[6] 余本功,张培行.基于双通道特征融合的WPOS-GRU专利分类方法[J].计算机应用研究,2020,37(3):655-658.
[7] GOMEZ J. Analysis of the effect of data properties in automated patent classification[J]. Scientometrics, 2019,121(3):1239-1268.
[8] 胡学钢,杨恒宇,林耀进,等.基于协同过滤的专利TRIZ分类方法[J].情报学报,2018,37(5):512-518.
[9] LI S, HU J, CUI Y, et al. DeepPatent:patent classification with convolutional neural networks and word embedding[J]. Scientometrics,2018,117(2):721-744.
[10] 周成,魏红芹.专利价值评估与分类研究——基于自组织映射支持向量机[J].数据分析与知识发现,2019,3(5):117-124.
[11] LU Y, XIONG X, ZHANG W, et al. Research on classification and similarity of patent citation based on deep learning[J]. Scientometrics, 2020,123(2):813-839.
[12] ZHANG M L, ZHOU Z H. M3MIML:A maximum margin method for multi-instance multi-label learning[C]//Eighth IEEE international conference on data mining. Los Alamitos:IEEE Computer Society, 2008:688-697.
[13] ZHOU Z H. A framework for machine learning with ambiguous objects[C]//5th international conference on active media technology. Berlin:Springer-Verlag, 2009:6.
[14] ZHOU Z H, ZHANG M L. Multi-instance multi-label learning with application to scene classification[C]//Advances in neural information processing systems. Cambridge:Neural information processing systems foundation, 2006:1609-1616.
[15] ZHANG M L, WANG Z J. MIMLRBF:RBF neural networks for multi-instance multi-label learning[J]. Neurocomputing, 2009, 72(16-18):3951-3956.
[16] CHEN Z, CHI Z, FU H, et al. Multi-instance multi-label image classification:a neural approach[J]. Neurocomputing, 2013, 99(1):298-306.
[17] HUANG S J, GAO W, ZHOU Z H. Fast multi-instance multi-label learning[J]. IEEE transactions on pattern analysis and machine intelligence, 2014, 41(11):1868-1874.
[18] 严考碧,李志欣,张灿龙.基于主题模型的多示例多标记学习方法[J].计算机应用,2015,35(8):2233-2237.
[19] SEBASTIANI F. Machine learning in automated text categorization[M]. New York:ACM, 2002.
[20] YANG Y, WU Y F, ZHAN D C, et al. Complex object classification:a multi-modal multi-instance multi-label deep network with optimal transport[C]//The 24th ACM SIGKDD international conference. New York:Assoc Computing Machinery, 2018:2594-2603.
[21] 包翔,刘桂锋,杨国立.基于多示例学习框架的专利文本分类方法研究[J].情报理论与实践,2018,41(11):144-148.