知识组织

专利无效对比文件判定方法研究

  • 郭诗琪 ,
  • 贠强 ,
  • 陈亮 ,
  • 周杰
展开
  • 1. 中国医学科学院医学信息研究所 北京 100020;
    2. 中国科学技术信息研究所 北京 100038
郭诗琪(ORCID:0000-0002-9311-8088),硕士研究生;贠强(ORCID:0000-0002-9156-6063),副研究员,博士,硕士生导师,E-mail:yunq@istic.ac.cn;陈亮(ORCID:0000-0002-3235-9806),副研究员,博士,硕士生导师。

收稿日期: 2020-03-10

  修回日期: 2020-10-12

  网络出版日期: 2021-01-20

基金资助

本文系国家重点研发计划项目课题“知识产权信息智能采集及深加工技术研究与应用示范”(项目编号:2017YFB1401902)和中信所重点工作“重点科技领域前沿跟踪与深度研究”(项目编号:ZD2020-02)研究成果之一。

Research on the Method of Judging Reference Document in Patent Invalidation Using GBDT

  • Guo Shiqi ,
  • Yun Qiang ,
  • Chen Liang ,
  • Zhou Jie
Expand
  • 1. Institute of Medical Information/Medical Lirary CAMS&PUMC, Beijing 100020;
    2. Institute of Scientific and Technical Information of China, Beijing 100038

Received date: 2020-03-10

  Revised date: 2020-10-12

  Online published: 2021-01-20

Supported by

 

摘要

[目的/意义] 对比文件是用以判断专利能否授权或无效的重要文件,针对传统信息检索方法的不足且鲜有利用机器学习方法研究对比文件检索的问题,在引入对比文件信息的基础上,构建专利相关性判定模型。[方法/过程] 以专利无效判决书中的目标专利与对比文件为数据集进行实验,提取文本相似度、共现词汇和共词数量特征信息,利用GBDT模型将对比文件的检索问题转化为判断其是否相关的分类问题。[结果/结论] 研究结果表明,不同字段数据对分类效果的贡献不同,其中说明书字段的准确率、召回率和F1值分别为79%、48%和59%,并且多特征集成后的分类效果显著优于单一文本相似度的结果,最后对实验错分情况进行分析,指出本研究下一步的研究方向。

本文引用格式

郭诗琪 , 贠强 , 陈亮 , 周杰 . 专利无效对比文件判定方法研究[J]. 图书情报工作, 2021 , 65(2) : 117 -125 . DOI: 10.13266/j.issn.0252-3116.2021.02.012

Abstract

[Purpose/significance] Comparative documents are important for judging whether a patent can be granted or invalid. Aiming at the shortcomings of traditional information retrieval methods and rarely using machine learning methods to study the issue of comparative document retrieval, based on the introduction of comparative file information, this paper constructs a patent relevance determination model.[Method/process] Experiments were performed by using the target patents and comparative documents in the patent invalidation judgment as the data set to extract text similarity, co-occurrence vocabulary, and co-word quantity feature information. The GBDT model was used to convert the retrieval of comparative documents into classification issues that determined whether they were relevant.[Result/conclusion] The research results show that the contribution of different field data to the classification effect is different, in which the F1 of the description text reaches 59%, and the classification effect after multi-feature integration is significantly better than the result of single text similarity. Finally, this paper analyzes the experimental misclassifications and points out the next research directions.

参考文献

[1] 国家知识产权局.1985年专利统计年报[EB/OL].[2020-08-05].http://www.cnipa.gov.cn/tjxx/jianbao/1985-1999/85/1.1.htm.
[2] BLOSSER G H, ARSHADI N, AGRAWAL S. A critical assessment of the USPTO policies toward small entity patent applications[J].Technology and innovation,2011,13(3):249-259.
[3] 国家知识产权局.2018专利复审无效十大案件[EB/OL].[2020-08-05].http://www.sipo.gov.cn/mtsd/1138630.htm.
[4] 国家知识产权局.2017专利复审无效十大案件[EB/OL].[2020-08-05].http://www.sipo.gov.cn/mtsd/1123789.htm.
[5] 国家知识产权局.申长雨在国家知识产权局专利审查工作座谈会上强调努力提高专利审查质量和效率,推动知识产权事业高质量发展[EB/OL].[2020-08-05].http://www.sipo.gov.cn/zscqgz/1120594.htm.
[6] 国家知识产权局.2018年中国知识产权发展状况新闻发布会在京举行[EB/OL].[2020-02-05].http://www.sipo.gov.cn/zscqgz/1138755.htm.
[7] 中华人民共和国国家知识产权局.专利审查指南(2010)[M].北京:知识产权出版社,2009.
[8] 中国专利检索技能大赛[EB/OL].[2020-08-05].http://www.ipsearch.top/home/index.html.
[9] 国家知识产权局专利复审委员会.以案说法——专利复审、无效典型案例指引[M].北京:知识产权出版社,2018:1-446.
[10] HUNT D,NGUYEN L,RODGERS M.专利检索:工具与技巧[M].北京市知识产权局,编译.陈可南,译.北京:知识产权出版社,2013.
[11] CLARKE N S.The basics of patent searching[J].World patent information,2018,54:S4-S10.
[12] LUPU M,MAYER K,TAIT J,et al.Current challenges in patent information retrieval[M].Berlin:Springer,2011.
[13] 高继刚.浅析计算机关键词检索的选取在专利检索中的作用[J].通讯世界,2015(12):257-257.
[14] 卢士燕,朱佳,李娇,等.追踪检索在化工领域专利申请审查中的应用[J].广东化工,2019,46(3):131-132.
[15] 朱敬敬,杨喆.专利检索技巧之"顺藤摸瓜"[J].科教导刊-电子版(上旬),2017(10):218-220.
[16] 黄微.专利审查中非专利文献的检索与应用[J].中小企业管理与科技(下旬刊),2016(7):118-119.
[17] RAJSHEKHAR K,SHALABY W,ZADROZNY W.Analytics in post-grant patent review:possibilities and challenges (preliminary report)[J]. Social science electronic publishing,2017.
[18] 隆瑾.专利无效对比文件及其获取研究——以专利引文分析为视角[D].湘潭:湘潭大学,2012.
[19] 张杰,孙宁宁,张海超.基于SAO结构的中文相似专利识别算法及其应用[J]. 情报学报,2016,35(5):472-482.
[20] 刘玉琴,汪雪锋,吕琳.基于权利要求结构信息的中文专利无效检索模型[J].计算机应用研究,2008,25(7):2068-2070.
[21] 翟东升,马文姗.中文专利权利要求书分词算法研究[J].情报杂志,2011,30(11):152-155.
[22] 马双刚.基于深度学习理论与方法的中文专利文本自动分类研究[D].镇江:江苏大学,2016.
[23] 廖列法,勒孚刚,朱亚兰.LDA模型在专利文本分类中的应用[J].现代情报,2017,37(3):35-39.
[24] 胡杰,李少波,于丽娅,等.基于卷积神经网络与随机森林算法的专利文本分类模型[J]. 科学技术与工程,2018,18(6):268-272.
[25] GUO M,YUAN H,QIAN Y.A new method for rare feature extraction in patent documents[C]//201613th international conference on service systems and service management.Kunming:IEEE,2016:687-692.
[26] ZHU F,WANG X,ZHU D,et al. User demand-driven patent topic classification using machine learning techniques[C]//The 11th conference on international fuzzy logic and intelligent technologies in nuclear science.Joao Pessoa:World Scientific, 2014:657-663.
[27] CHEN X,DENG N.A semi-supervised machine learning method for chinesse patent effect annotation[C]//2015 international conference on cyber-enabled distributed computing and knowledge discovery. Xi'an:IEEE,2015:243-250.
[28] KREUCHAUFF F,KORZINOV V.A patent search strategy based on machine learning for the emerging field of service robotics[J].Scientometrics,2017,111(2):743-772.
[29] LEE J. Predicting bad patents[EB/OL].[2020-08-05].https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-57.pdf.
[30] WINER D.Predicting bad patents:employing machine learning to predict post-grant review outcomes for US patents[EB/OL].[2020-08-05].https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-60.pdf.
[31] HO W.Predicting bad patents[EB/OL].[2020-08-05].https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-63.pdf.
[32] YEW T.Predicting bad patents[EB/OL].[2020-08-05].https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-66.pdf.
[33] RYAN L,MARCOS T. Predicting patent outcomes with text and attributes[EB/OL].[2020-08-05]. http://cs230.stanford.edu/projects_spring_2019/reports/18681598.pdf.
[34] RAJSHEKHAR K,ZADROZNY W,GARAPATI S S.Analytics of patent case rulings:empirical evaluation of models for legal relevance[C]//Proceedings of the 16th international conference on artificial intelligence and law.London:Elsevier,2017:1-9.
[35] 邓洁,余翔,崔利刚.基于专利信息的我国发明专利无效行为实证研究[J].情报杂志,2014,33(8):52-58.
[36] 李航.统计学习方法[M].北京:清华大学出版社,2019.
[37] FRIEDMAN J H.Greedy function approximation:a gradient boosting machine[J].Annals of statistics,2001,29(5):1189-1232.
[38] ApacheCN.scikit-learn(sklearn)官方文档中文版[EB/OL].[2020-08-05].https://sklearn.apachecn.org/.
[39] GENSIM.Core cencepts[EB/OL].[2020-08-05]. https://radimrehurek.com/gensim/auto_examples/core/run_core_concepts.html#core-concepts-document.
[40] 万象云.万象云专利检索[EB/OL].[2020-08-05].https://www.wanxiangyun.net/search/Index.
[41] 中华人民共和国国家知识产权.专利审查指南(2010)[M].北京:知识产权出版社,2010.
文章导航

/