综述述评

国内外专利挖掘研究进展与前瞻

  • 陈亮 ,
  • 陈利利 ,
  • 许海云 ,
  • 魏超 ,
  • 苏娜 ,
  • 尚玮姣
展开
  • 1 中国科学技术信息研究所 北京 100038;
    2 山东理工大学管理学院 淄博 255000;
    3 中国科学院科技战略咨询研究院 北京 100190;
    4 中国林业科学研究院林业科技信息研究所 北京 100091
陈亮,副研究员,博士,硕士生导师;陈利利,硕士研究生;许海云,博士,博士生导师;魏超,博士后。

收稿日期: 2023-03-08

  修回日期: 2023-08-29

  网络出版日期: 2024-02-02

基金资助

本文系中央级公益性科研院所基本科研业务项目“中信所统一数据底座建设与运营管理”(项目编号:ZD2023-04)和国家自然科学基金面上项目“基于弱信号时效网络演化分析的变革性科技创新主题早期识别方法研究”(项目编号:72274113)研究成果之一。

A Global Literature Review in Recent Advancement of Patent Mining

  • Chen Liang ,
  • Chen Lili ,
  • Xu Haiyun ,
  • Wei Chao ,
  • Su Na ,
  • Shang Weijiao
Expand
  • 1 Institute of Scientific and Technical Information of China, Beijing 100038;
    2 Management School, Shandong University of Technology, Zibo 255000;
    3 Institute of Science and Development, Chinese Academy of Sciences, Beijing 100190;
    4 Research Institute of Forestry Policy and Information, Chinese Academy of Forestry, Beijing 100091

Received date: 2023-03-08

  Revised date: 2023-08-29

  Online published: 2024-02-02

摘要

[目的/意义] 专利挖掘是获取技术情报的重要途径,在近年来智能技术快速发展的驱动下,专利挖掘不仅在方法自动化、智能化和挖掘深度、精确度上取得了长足进步,而且展露出数据与算法紧密融合的发展新范式,亟需通过综述形成对其研究现状和未来发展趋势的全面认识。[方法/过程] 将文献调研活动的主要环节连成闭环“检索→筛选→梳理→查漏→拓展和再次检索”并持续更新、反复迭代,调研范围包括国内外专利挖掘的相关论文、专利、数据集、算法竞赛评测活动、专利信息服务平台乃至代码托管网站和模型托管网站,并在叙述内容中穿插专家访谈、竞赛选手交流会以及笔者学术成果评审意见中获得的相关信息,最终完成对专利挖掘的系统综述。[结果/结论] 专利基础资源的种类和数量较之前增长较快,专利挖掘方法的训练和性能评测逐步具有数据基准和统一测度标准;专利挖掘前沿方法紧跟智能技术发展步伐以实现技术升级和性能提升,而统计学习、人工规则、软件工具等传统方法也在学习成本、实践成本和方法效果的平衡中得到优化和发展;专利挖掘的研究范围实现了从数据处理、规范化到专利基础服务和技术情报分析的全面覆盖,并开启了专利智慧法律的探索。

本文引用格式

陈亮 , 陈利利 , 许海云 , 魏超 , 苏娜 , 尚玮姣 . 国内外专利挖掘研究进展与前瞻[J]. 图书情报工作, 2024 , 68(2) : 110 -133 . DOI: 10.13266/j.issn.0252-3116.2024.02.010

Abstract

[Purpose/Significance] Patent mining is a significant means to achieve technical intelligence from patent documents. Driven by recent advancement of AI technologies, not only have patent mining methods achieved competitive performance in terms of automation, intelligence, mining depth and accuracy, but also they have revealed a new paradigm of integrating datasets and algorithms together, which indicates a comprehensive survey of related research achievements and future development trends is urgent.[Method/Process] This paper linked the main steps of literature review into a closed loop “searching→screening→sorting→checking→expanding and re-searching” and kept updating and iterating. Review scope covered the related papers, patents, datasets, algorithm competition, patent information service platforms, and even code hosting and model hosting Websites. In a meanwhile, relevant information from expert interviews, competition player seminar and academic achievement reviews was also included in the narrative content.[Result/Conclusion] It finds that the types and quantity of patent resources are growing more rapidly than before, which paved the way for training and evaluating algorithms and models in an uniform standard. Forefront patent mining methods are closely following the pace of intelligent technology development to achieve technological upgrades and performance enhancements, while traditional methods such as statistical learning, manual rules, and software tools have been optimized and developed in the balance of learning costs, practical costs and method performances. The research scope of patent mining has achieved full coverage from data processing, normalization to basic patent service supporting and technological intelligence analysis, and the exploration of legal judgement prediction for patent lawsuit have been launched.

参考文献

[1] LEVIN R C. A new look at the patent system[J]. The American economic review, 1986, 76(2):199-202.
[2] ZHA X, CHEN M. Study on early warning of competitive technical intelligence based on the patent map[J]. Journal of computers, 2010, 5(2):274-281.
[3] PORTER A, CUNNINGHAM S. Tech mining:exploiting new technologies for competitive advantage[M]. Hoboken:John Wiley&Sons, 2004.
[4] 胡正银,方曙.专利文本技术挖掘研究进展综述[J].现代图书情报技术, 2014, 30(6):62-70.(HU Z Y, FANG S. Review on text-based patent technology mining[J]. New technology of library and information service, 2014, 30(6):62-70.)
[5] 屈鹏,张均胜,曾文,等.国内外专利挖掘研究(2005-2014)综述[J].图书情报工作, 2014, 58(20):131-137.(QU P, ZHANG J S, ZENG W, et al. A review of patent mining studies in China and abroad 2005-2014[J]. Library and information service, 2014, 58(20):131-137.)
[6] ZHANG L, LI L, LI T. Patent mining:a survey[J]. ACM sigkdd explorations newsletter, 2015, 16(2):1-19.
[7] 马天旗,赵强,苏丹,等.专利挖掘(第2版)[M].北京:知识产权出版社, 2020.(MA T, ZHAO Q, SU D, et al. Patent mining (2nd edition)[M]. Beijing:Intellectual Property Publishing Hourse, 2020.)
[8] HALL B H, JAFFE A B, TRAJTENBERG M. The NBER patent citation data file:lessons, insights and methodological tools[EB/OL].[2023-11-28]. https://www.nber.org/system/files/working_papers/w8498/w8498.pdf.
[9] RICHARD M. Technical documentation for the 2019 patent examination research dataset (PatEx) release[EB/OL].[2023-11-28]. https://www.uspto.gov/sites/default/files/documents/PatEx-2019-Technical-Doc.pdf.
[10] TRAPPEY A J C, TRAPPEY C V, WU J L, et al. Intelligent compilation of patent summaries using machine learning and natural language processing techniques[J]. Advanced engineering informatics, 2020, 43:101027.
[11] USPTO. USPTO-2M[EB/OL].[2023-11-28]. https://github.com/JasonHoou/USPTO-2M.
[12] 北京大学开放研究数据平台.发明专利数据[EB/OL].[2023-11-28]. https://opendata.pku.edu.cn/dataset. xhtml?persistentId=doi:10.18170/DVN/ASRTHL.(Peking University open research data. Invention patent data[EB/OL].[2023-11-28]. https://opendata.pku.edu.cn/dataset. xhtml?persistentId=doi:10.18170/DVN/ASRTHL.)
[13] NTCIR. NTCIR-7 PATMT (Patent translation test collection)[EB/OL].[2023-11-26]. http://research.nii.ac.jp/ntcir/permission/ntcir-7/perm-en-PATMT.html.
[14] NTCIR. NTCIR-8 PATMT (Patent translation) Research purpose use of test collection[EB/OL].[2023-11-26]. http://research.nii.ac.jp/ntcir/permission/ntcir-8/perm-en-PATMT.html.
[15] SHARMA E, LI C, WANG L. Bigpatent:a large-scale dataset for abstractive and coherent summarization[J]. arXiv preprint arXiv:1906.03741, 2019.
[16] Vienna University of Technology. MAREC[EB/OL].[2023-11-26]. https://www.ifs.tuwien.ac.at/imp/marec.shtml.
[17] USPTO. Patent trial and appeal board (PTAB) API[EB/OL].[2023-11-26]. https://uspto.data.commerce.gov/dataset/PatentTrial-and-Appeal-Board-PTAB-API/nfzn-tgjt/data.
[18] ZHU J, KAPLAN R, JOHNSON J, et al. HiDDeN:hiding data with deep networks[J]. arXiv preprint arXiv:1807.09937, 2018.
[19] 蔡莉,王淑婷,刘俊晖,等.数据标注研究综述[J].软件学报, 2020, 31(2):302-320.(CAI L, WANG S T, LIU J H, et al. Survey of data annotation[J]. Journal of software, 2020, 31(2):302-320.)
[20] NTCIR Project test collections-DATA[EB/OL].[2023-11-26]. http://research.nii.ac.jp/ntcir/permission/data-en.htm.
[21] CLEF-IP 2009 download area[EB/OL].[2023-11-26]. http://www.ifs.tuwien.ac.at/~clef-ip/download/2009/index.shtml#data.
[22] CLEF-IP 2010 download area[EB/OL].[2023-11-26]. http://www.ifs.tuwien.ac.at/~clef-ip/download/2010/index.shtml.
[23] CLEF-IP 2012 download area[EB/OL].[2023-11-26]. http://www.ifs.tuwien.ac.at/~clef-ip/download/2012/index.shtml.
[24] CLEF-IP 2011 download area[EB/OL].[2023-11-26]. http://www.ifs.tuwien.ac.at/~clef-ip/download/2011/index.shtml.
[25] GOBEILL J, TEODORO D, PASCHE E, et al. Report on the TREC 2009 experiments:chemical IR track[EB/OL].[2023-11-28]. http://bitem.hesge.ch/sites/default/files/biblio/Report_on_the_trec_2009_experiments_Chem.pdf.
[26] LUPU M, TAIT J, HUANG J, et al. Trec-chem 2010:notebook report[EB/OL].[2023-11-28]. https://trec.nist.gov/pubs/trec19/papers/CHEM.OVERVIEW.pdf.
[27] LUPU M, ZHAO J, HUANG J, et al. Overview of the TREC 2011 Chemical IR Track[EB/OL].[2023-11-28]. https://trec.nist.gov/pubs/trec20/papers/CHEM.OVERVIEW.pdf.
[28] NTCIR-8 PATMT (patent translation) research purpose use of test collection[EB/OL].[2023-11-26]. http://research.nii.ac.jp/ntcir/permission/ntcir-8/perm-en-PATMT.html.
[29] CHEN L, XU S, ZHU L, et al. A deep learning based method for extracting semantic information from patent documents[J]. Scientometrics, 2020, 125(1):289-312.
[30] Track 2-CHEMDNER-patents[EB/OL].[2023-11-26]. https://biocreative.bioinformatics.udel.edu/tasks/biocreative-v/track-2-chemdner/.
[31] AKHONDI S A, KLENNER A G, TYRCHAN C, et al. Annotated chemical patent corpus:a gold standard for text mining[J]. Plos one, 2014, 9(9):e107477.
[32] HE J, NGUYEN D Q, AKHONDI S A, et al. Overview of ChEMU 2020:named entity recognition and event extraction of chemical reactions from patents[C]//Experimental IR meets multilinguality, multimodality, and interaction:11th international conference of the CLEF Association. Heidelberg:Springer, 2020:237-254.
[33] LI Y, FANG B, HE J, et al. Extended overview of ChEMU 2021:reaction reference resolution and anaphora resolution in chemical patents[C]//Experimental IR meets multilinguality, multimodality, and interaction. Heidelberg:Springer, 2021:292-307.
[34] CLEF-IP 2013 download area[EB/OL].[2023-11-26]. http://www.ifs.tuwien.ac.at/~clef-ip/download/2013/index.shtml.
[35] ASLANYAN G, Wetherbee I. Patents phrase to phrase semantic matching dataset[J]. arXiv preprint arXiv:2208.01171, 2022.
[36] RISCH J, ALDER N, HEWEL C, et al. PatentMatch:a dataset for matching patent claims&prior art[J]. arXiv preprint arXiv:2012.13919, 2020.
[37] FRUMKIN J, MYERS A. Cancer moonshot patent data (August, 2016)[EB/OL].[2023-11-26]. https://bulkdata.uspto.gov/data/patent/cancer/moonshot/2016/cancer_patent_data_doc_v15. Docx.
[38] HUNT D, NGUYEN L, RODGERS M. Patent searching:tools&techniques[M]. Hoboken:John Wiley&Sons, 2012.
[39] NTCIR. NTCIR (NII Testbeds and community for information access research) project[EB/OL].[2023-11-26]. http://research.nii.ac.jp/ntcir/index-en.html.
[40] CLEF-Initiative. The CLEF Initiative conference and labs of the evaluation forum[EB/OL].[2023-11-26]. http://www.clefinitiative.eu/.
[41] LUPU M, HUANG J, ZHU J, et al. TREC-CHEM:large scale chemical information retrieval evaluation at TREC[C]//ACM SIGIR forum. New York:ACM, 2009, 43(2):63-70.
[42] ACM. SIGIR:special interest group on information retrieval[EB/OL].[2023-11-26]. https://www.acm.org/special-interest-groups/sigs/sigir.
[43] CLEF-Initiative. CLEF-IP image tasks guidelines[EB/OL].[2023-11-26]. http://www.ifs.tuwien.ac.at/~clef-ip/download/2011/docs/CLEF-IP2011-IMG_tasks_guidelines.pdf.
[44] CLEF-Initiative. CLEF-IP 2013 download area[EB/OL].[2023-11-26]. http://www.ifs.tuwien.ac.at/~clef-ip/download/2013/index.shtml.
[45] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[EB/OL].[2023-11-26]. https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf.
[46] RISCH J, KRESTEL R. Domain-specific word embeddings for patent classification[J]. Data technologies and applications, 2019, 53(1):108-122.
[47] Google. BERT for patents[EB/OL].[2023-11-26]. https://github.com/google/patents-public-data/blob/master/models/BERT% 20 for% 20Patents.md.
[48] 陈亮,张吉玉,刘一畅,等.[三等奖方案]小样本数据分类任务赛题[复兴15号]团队解题思路[EB/OL].[2023-11-26]. https://mp.weixin.qq.com/s/dPWnm4OkxQLhAc-2uqSSUQ (CHEN L, ZHANG J Y, LIU Y C, et al.[Third Prize] Small sample data classification task[Fuxing No.15] Team problem solving ideas[EB/OL].[2023-11-26]. https://mp.weixin.qq.com/s/dPWnm4OkxQLhAc-2uqSSUQ.)
[49] LEE J S. Evaluating generative patent language models[J]. World patent information, 2023, 72:102173.
[50] 陈亮.基于关联规则改进的技术演化分析方法研究[D].北京:中国科学院大学, 2013.(CHEN L. An improved method of technological evolution analysis based on improved association rules[D]. Beijing:University of Chinese Academy of Sciences, 2013)
[51] VIVALDI J, CABRERA-DIEGO L A, SIERRA G, et al. Using Wikipedia to validate the terminology found in a corpus of basic textbooks[EB/OL].[2023-11-26]. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=30b3efe82d97b6974c0a11d0750d994723826954.
[52] 张雪,孙宏宇,辛东兴,等.自动术语抽取研究综述[J].软件学报, 2020, 31(7):2062-2094.(ZHANG X, SUN H Y, XIN D X, et al. Survey on automatic term extraction research[J]. Journal of software, 2020, 31(7):2062-2094)
[53] DEWULF S. Directed variation of properties for new or improved function product DNA, a base for connect and develop[J]. Procedia engineering, 2011(9):646-652.
[54] YOON J, KIM K. Trend perceptor:a property-function based technology intelligence system for identifying technology trends from patents[J]. Expert system with application, 2012, 39(3):2927-2938.
[55] YOON J, KO N, KIM J. A function-based knowledge base for technology intelligence[J]. Industrial engineering&management systems. 2015, 14(1):73-87.
[56] EVANS D A, LEFFERTS R G. Clarit-TREC experiments[J]. Information processing and management, 1995, 31(3):385-395.
[57] FRANTZI K, ANANIADOU S, MIMA H. Automatic recognition of multi-word terms[J]. International journal of digital libraries, 2000, 3(2):117-132.
[58] 陈亮,张志强.一种基于专利文本的技术系统构成识别方法[J].图书情报工作, 2014, 58(10):134-137, 144.(CHEN L, ZHANG Z Q. Method of recognizing technological architecture component based on patent documents[J]. Library and information service, 2014, 58(10):134-137, 144.)
[59] 陈亮,张静,杨冠灿,等.基于专利文本的闭频繁项集在技术演化分析中的应用[J].图书情报工作, 2016, 60(6):70-76.(CHEN L, ZHANG J, YANG G C, et al. The application of closed frequent item sets on patent text for technological evolution analysis[J]. Library and information service, 2016, 60(6):70-76.)
[60] WU W, LIU T, HU H, et al. Extracting domain-relevant term using Wikipedia based on random walk model[C]//Proceeding of 2012 seventh China grid annual conference. Rosten:IEEE, 2012:68-75.
[61] JUDEA A, SCHÜTZE H, BRÜGMANN S. Unsupervised training set generation for automatic acquisition of technical terminology in patents[C]//Proceedings of the 25th international conference on computational linguistics. Stroudsburg:ACL, 2014:290-300.
[62] BOLSHAKOVA E, LOUKACHEVITCH N, NOKEL M. Topic models can improve domain term extraction[C]//European conference on information retrieval. Heidelberg:Springer, 2013:684-687.
[63] WANG R, LIU W, MCDONALD C. Featureless domain-specific term extraction with minimal labelled data[C]//Proceedings of the Australasian Language Technology Association workshop 2016. Stroudsburg:ACL, 2016:103-112.
[64] The Stanford natural language processing group. Stanford named entity recognizer (NER)[EB/OL].[2023-11-18]. http://nlp. stanford.edu/software/CRF-NER.shtml.
[65] GRANT I, THOMAS M, ANDREW F, et al. Taming text:how to find, organize and manipulate it[M]. Greenwich:Manning Publications, 2015.
[66] YANG S Y, SOO V W. Extract conceptual graphs from plain texts in patent claims[J]. Engineering applications of artificial intelligence, 2012, 25(4):874-887.
[67] CHOI S, KANG D, LIM J, et al. A fact-oriented ontological approach to SAO-based function modeling of patents for implementing function-based technology database[J]. Expert system with application, 2012, 39(10):9129-9140.
[68] 薛驰,邱清盈,冯培恩,等.机械产品专利作用结构知识提取方法研究[J].农业机械学报, 2013, 44(1):222-229.(XUE C, QIU Q, FENG P, et al. Acquisition method for principle solution of mechanical patent[J]. Transactions of the Chinese Society for Agricultural Machinery, 2013, 44(1):222-229)
[69] 沈萌红.创新的方法-TRIZ理论概述[M].北京:北京大学出版社, 2011.(SHEN M H. Innovative methods:an overview of TRIZ theory[M]. Beijing:Peking University Press, 2011.)
[70] BERGMANN I, BUTZKE D, WALTER L, et al. Evaluating the risk of patent infringement by means of semantic patent analysis:the case of DNA chips[J]. R&D management, 2008, 38(5):550-562.
[71] LI J, SUN A, HAN J, et al. A survey on deep learning for named entity recognition[J]. IEEE transactions on knowledge and data engineering, 2020, 34(1):1-20.
[72] PÉREZ-PÉREZ M, PÉREZ-RODRÍGUEZ G, VAZQUEZ M, et al. Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5:the CEMP and GPRO patents tracks[C]//Proceedings of the BioCreative V.5 challenge evaluation workshop. Barcelona:UPC, 2017:11-18.
[73] SAAD F. Named entity recognition for biomedical patent text using Bi-LSTM variants[C]//Proceedings of the 21st international conference on information integration and Webbased applications&services. New York:ACM, 2019:617-621.
[74] ZHAI Z, NGUYEN D Q, AKHONDI S A, et al. Improving chemical named entity recognition in patents with contextualized word embeddings[J]. arXiv preprint arXiv:1907.02679, 2019.
[75] LIU K. A survey on neural relation extraction[J]. Science China technological sciences, 2020(63):1971-1989.
[76] SREBROVIC R, YONAMIN J. Leveraging the BERT algorithm for patents with tensor flow and BigQuery.[EB/OL].[2023-11-28]. https://services.google.com/fh/files/blogs/bert_for_patents_white_paper.pdf.
[77] PARK H, YOON J, KIM K. Using function-based patent analysis to identify potential application areas of technology for technology transfer[J]. Expert systems with applications, 2013, 40(13):5260-5265.
[78] CHOI S, KIM H, YOON J, et al. An SAO-based textmining approach for technology road mapping using patent information[J]. R&D management, 2013, 43(1):52-73.
[79] WANG X, QIU P, ZHU D, et al. Identification of technology development trends based on subject-action-object analysis:the case of dye-sensitized solar cells[J]. Technological forecasting and social change, 2015(98):24-46.
[80] YOON J, KIM K. An analysis of property-function based patent networks for strategic R&D planning in fast-moving industries:the case of silicon-based thin film solar cells[J]. Expert systems with applications, 2012, 39(9):7709-7717.
[81] CHOI S, PARK H, KANG D, et al. An SAO-based text mining approach to building a technology tree for technology planning[J]. Expert system with application, 2012, 39(13):11443-11455.
[82] KIM H B, HYEOK Y J, KIM K S. Semantic SAO network of patents for reusability of inventive knowledge[C]//IEEE international conference on management of innovation and technology. Rosten:IEEE, 2012:510-515.
[83] WU H. Report of 2019 language&Intelligence technique evaluation. Baidu Corporation[EB/OL].[2023-11-18]. http://tcci.ccf.org.cn/summit/2019/dlinfo/1101-wh.pdf
[84] CHEN L, XU S, ZHU L, et al. A deep learning based method benefiting from characteristics of patents for semantic relation classification[J]. Journal of informetrics, 2022, 16(3):101312.
[85] FANTONI G, APREDA R, DELL'ORLETTA F, et al. Automatic extraction of function-behaviour-state information from patents[J]. Advanced engineering informatics, 2013, 27(3):317-334.
[86] KANAZASHI T, YONEDO K. Tornado generation method and apparatus:US6082387[P/OL]. 2000-07-04.[2023-11-28]. https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/6082387
[87] GOMI A, NOMURA Y, IKUMA K. Light scanning apparatus and method to prevent damage to an oscillation mirror, reducing its amplitude, in an abnormal control condition via a detection signal outputted to a controller even though the source still emits light:US7557976[P/OL]. 2009-07-07.[2023-11-28]. https://imageppubs.uspto.gov/dirsearch-public/print/downloadPdf/7557976
[88] BEHESHTI S M R, BENATALLAH B, VENUGOPAL S, et al. A systematic review and comparative analysis of cross-document coreference resolution methods and tools[J]. Computing, 2017, 99(4):313-349.
[89] CATTAN A, EIREW A, STANOVSKY G, et al. Cross-document coreference resolution over predicted mentions[J]. arXiv preprint arXiv:2106.01210, 2021.
[90] BARHOM S, SHWARTZ V, EIREW A, et al. Revisiting joint modeling of cross-document entity and event coreference resolution[J]. arXiv preprint arXiv:1906.01753, 2019.
[91] SHEN W, WANG J, HAN J. Entity linking with a knowledge base:issues, techniques, and solutions[J]. IEEE transactions on knowledge and data engineering, 2014, 27(2):443-460.
[92] LEE H, RECASENS M, CHANG A, et al. Joint entity and event coreference resolution across documents[C]//Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Stroudsburg:ACL, 2012:489-500.
[93] CATTAN A, EIREW A, STANOVSKY G, et al. Cross-document coreference resolution over predicted mentions[J]. arXiv preprint arXiv:2106.01210, 2021.
[94] CACIULARU A, COHAN A, BELTAGY I, et al. Cross-document language modeling[J]. arXiv preprint arXiv:2101.00406, 2021.
[95] WIPO. IPC 2021.01-Statistics[EB/OL].[2023-11-26]. https://www.wipo.int/classifications/ipc/en/ITsupport/Version20210101/transformations/stats.html.
[96] LARKEY L. Some issues in the automatic classification of US patents[C]//Working notes for the AAAI-98 workshop on learning for text categorization. Menlo Park:AAAI, 1998:87-90.
[97] FALL C J, TÖRCSVÁRI A, BENZINEB K, et al. Automated categorization in the international patent classification[C]//AcmSigir Forum. New York:ACM, 2003, 37(1):10-25.
[98] KOSTER C H A, SEUTTER M, BENEY J. Multi-classification of patent applications with winnow[C]//International Andrei Ershov memorial conference on perspectives of system Informatics. Heidelberg:Springer, 2003:546-555.
[99] KIM J H, CHOI K S. Patent document categorization based on semantic structural information[J]. Information processing&management, 2007, 43(5):1200-1215.
[100] CAI L, HOFMANN T. Hierarchical document categorization with support vector machines[C]//Proceedings of the thirteenth ACM international conference on information and knowledge management. New York:ACM, 2004:78-87.
[101] TIKK D, BIRÓ G, TÖRCSVÁRI A. A hierarchical online classifier for patent categorization[M]//Emerging technologies of text mining:techniques and applications. IGI Global, 2008:244-267.
[102] 吕璐成,韩涛,周健,等.基于深度学习的中文专利自动分类方法研究[J].图书情报工作, 2020, 64(10):75-85.(LÜ L C, HAN T, ZHOU J, et al. Research on the method of Chinese patent automatic classification based on deep learning[J]. Library and information service, 2020, 64(10):75-85.)
[103] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013.
[104] DEVLIN J, CHANG M W, LEE K, ET AL. Bert:pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[105] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[J]. arXiv preprint arXiv:1802.05365, 2018.
[106] RADFORD A, NARASIMHAN K, SALIMANS T, ET Al. Improving language understanding by generative pretraining[EB/OL].[2023-11-26]. https://www.mikecaptain.com/resources/pdf/GPT-1.pdf.
[107] HEPBURN J. Universal language model fine-tuning for patent classification[C]//Proceedings of the Australasian Language Technology Association workshop 2018. Stroudsburg:ACL, 2018:93-96.
[108] LEE J S, HSIANG J. Patent classification by fine-tuning BERT language model[J]. World patent information, 2020(61):101965.
[109] BEKAMIRI H, HAIN D S, JUROWETZKI R. Patentsberta:a deep nlp based hybrid model for patent distance and classification using augmented sbert[J]. arXiv preprint arXiv:2103.11933, 2021.
[110] 陈燕,黄迎燕,方建国.专利信息采集与分析[M].北京:清华大学出版社, 2006.(CHEN Y, HUANG Y, FANG J G. Patent information collection and analysis[M]. Beijing:Tsinghua Press, 2006.)
[111] SHALABY W, ZADROZNY W. Patent retrieval:a literature review[J]. Knowledge and information systems, 2019(61):631-660.
[112] MAGDY W, LEVELING J, JONES G J F. Exploring structured documents and query formulation techniques for patent retrieval[C]//Workshop of the cross-language evaluation forum for European languages. Berlin:Springer, 2009:410-417.
[113] RODA G, TAIT J, PIROI F, et al. CLEF-IP 2009:retrieval experiments in the intellectual property domain[C]//Workshop of the cross-language evaluation forum for European languages. Berlin:Springer, 2009:385-409.
[114] BASHIR S, RAUBER A. Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection[C]//Proceedings of the 18th ACM conference on information and knowledge management. New York:ACM, 2009:1863-1866
[115] MAHDABI P, CRESTANI F. Learning-based pseudo-relevance feedback for patent retrieval[C]//Information retrieval facility conference. Berlin:Springer, 2012:1-11.
[116] FUJI A. Enhancing patent retrieval by citation analysis[C]//Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. New York:ACM, 2007:793-794.
[117] MAGDY W, JONES G J F. Applying the KISS principle for the CLEF-IP 2010 prior art candidate patent search task[EB/OL].[2023-11-28]. https://doras.dcu.ie/15834/.
[118] KRISHNAN A, CARDENAS A F, SPRINGER D. Search for patents using treatment and causal relationships[C]//Proceedings of the 3rd international workshop on patent information retrieval. New York:ACM, 2010:1-10.
[119] NGUYEN K L, MYAENG S H. Query enhancement for patent prior-art-search based on keyterm dependency relations and semantic tags[C]//Information retrieval facility conference. Berlin:Springer, 2012:28-42.
[120] MAHDABI P, CRESTANI F. The effect of citation analysis on query expansion for patent retrieval[J]. Information retrieval, 2014, 17(5/6):412-429.
[121] MAHDABI P, CRESTANI F. Query-driven mining of citation networks for patent citation retrieval and recommendation[C]//Proceedings of the 23rd ACM international conference on conference on information and knowledge management. New York:ACM, 2014:1659-1668.
[122] LANDAUER T K, FOLTZ P W, LAHAMD. An introduction to latent semantic analysis[J]. Discourse processes, 1998, 25(2/3):259-284.
[123] ALGHAMDI R, ALFALQI K. A survey of topic modeling in text mining[J]. International journal of advanced computer science and applications, 2015, 6(1):147-153
[124] LIU T Y. Learning to rank for information retrieval[J]. Foundation and trends in information retrieval, 2011, 3(3):225-331.
[125] SUN Y, HAN J. Mining heterogeneous information networks:principles and methodologies[J]. Synthesis lectures on data mining and knowledge discovery, 2012, 3(2):1-159.
[126] FU T, LEI Z, LEE W C. Patent citation recommendation for examiners[C]//2015 IEEE international conference on data mining. Rosten:IEEE, 2015:751-756.
[127] 苟妍.利用元路径提升的专利无效对比文件判断方法研究[D].北京:中国科学技术信息研究所, 2020.(GOU Y. Research on promotion methods of judging relevant patents in patent invalidation cases based on meta-path feature[D]. Beijing:Institute of Scientific and Technical Information of China, 2020.)
[128] CHOI S, LEE H, PARK E L, et al. Deep patent landscaping model using transformer and graph embedding[J]. arXiv preprint arXiv:1903.05823, 2019.
[129] 师英昭.利用图嵌入特征强化的专利对比文件检索方法研究[D].北京:中国科学技术信息研究所, 2021.(SHI Y Z. Research on the retrieval method of patent comparative document using graph embedding feature enhancement[D]. Beijing:Institute of Scientific and Technical Information of China, 2021.)
[130] 黄鲁成,李欣,吴菲菲.技术未来分析理论方法与应用[M].北京:科学出版社, 2010.(HUANG L C, LI X, WU F F. Theoretical method and application of technology future analysis[M]. Beijing:Science Press, 2010.)
[131] GALVIN R. Science roadmaps[J]. Science, 1998, 280(8):803.
[132] WIPS Co. Ltd, Patent map (PM).[EB/OL].[2023-11-10]. http://www.wipo.int/edocs/mdocs/sme/en/wipo_ip_bis_ge_03/wipo_ip_bis_ge_03_16-annex1.pdf.
[133] MOGEE M E, KOLAR R G. Patent co-citation analysis of Eli Lilly&Co. patents[J]. Expert opinion on therapeutic patents, 1999, 9(3):291-305.
[134] CHENA S H, HUANG M H, CHENA D Z. Identifying and visualizing technology evolution:a case study of smart grid technology[J]. Technological forecasting and social change, 2012, 79(6):1099-1110.
[135] GARFIELD E. Research fronts[J]. Current contents, 1994, 41(10):3-7.
[136] HUMMON N P, DEREIAN P. Connectivity in a citation network:the development of DNA theory[J]. Social networks, 1989, 11(1):39-63.
[137] LIU J S, LU Y Y L, LU W M, et al. Data envelopment analysis 1978-2010:a citation-based literature survey[J]. Omega, 2013, 41(1):3-15.
[138] XIAO Y, LU L Y, LIU J S, et al. Knowledge diffusion path analysis of data quality literature:a main path analysis[J]. Journal of informetrics, 2014, 8(3):594-605.
[139] 陈亮,杨冠灿,张静,等.面向技术演化分析的多主路径方法研究[J].图书情报工作, 2015(10):115, 124-130.(CHEN L, YANG G C, ZHANG J, et al. Research on multiple main paths method oriented to analysis of technological evolution[J]. Library and information service, 2015(10):115, 124-130.)
[140] 肖国华,郭捷婷.专利分析方法研究[J].情报杂志, 2008(1):12-15.(XIAO G H, GUO J T. The study of patent information analysis[J]. Journal of intelligence, 2008(1):12-15.)
[141] YOON B, PARK Y. A text-mining-based patent network:analytical tool for high-technology trend[J]. The journal of high technology management research, 2004, 15(1):37-50.
[142] YOUNG G, JONG H, SANG C. Visualization of patent analysis for emerging technology[J]. Expert systems with applications, 2008, 34(3):1804-1812.
[143] 方曙,胡正银,庞弘燊,等.基于专利文献的技术演化分析方法研究[J].图书情报工作, 2011, 55(22):42-46.(FANG S, HU Z Y, PANG H S, et al. Study on the method of analyzing technology evolution based on patent documents[J]. Library and information service, 2011, 55(22):42-46.)
[144] CHEN L, XU S, ZHU L, et al. A semantic main path analysis method to identify multiple developmental trajectories[J]. Journal of informetrics, 2022, 16(2):101281.
[145] UCHIDA H, MANO A, YUKAWA T. Patent map generation using concept-based vector space model[EB/OL].[2023-08-26]. http://research.nii.ac.jp/ntcir/ntcir-ws4/NTCIR4-WN/PATENT/NTCIR4WN-PATENT-UchidaH.pdf.
[146] LEE S, YOON B, PARK Y. An approach to discovering new technology opportunities:keyword-based patent map approach[J]. Technovation, 2009, 29(6/7):481-497.
[147] 王亮,张绍武,丁堃,等.基于HDP的汽车专利主题演化研究[J].情报学报, 2015, 33(9):944-951.(WANG L, ZHANG S W, DING K, et al. HDP-based vehicle patent topic evolution[J]. Journal of the China Society for Scientific and Technical Information, 2015, 33(9):944-951.)
[148] CASOLA S, LAVELLI A. Summarization, simplification, and generation:the case of patents[J]. arXiv preprint arXiv:2104.14860, 2021.
[149] PETRUZZI J D, MASON R M. Machine for drafting a patent application and process for doing same:US6049811[P]. 2000-04-11.[2023-11-28]. https://image-ppubs.uspto.gov/dirsearchpublic/print/downloadPdf/6049811
[150] GLASGOW J. Automated system and method for patent drafting and technology assessment:US8041739[P]. 2011-10-18.[2023-11-28]. https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/6082387.
[151] KNIGHT K, SCHICK I C, PRIYADARSHI J. Machine learning model for computer-generated patent applications to provide support for individual claim features in a specification:US10713443[P]. 2020-07-14.[2023-11-28]. https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/10713443
[152] LEE S, HSIANG J. Patent claim generation by fine-tuning OpenAI GPT-2[J]. World patent information, 2020(62):101983.
[153] LEE S, HSIANG J. PatentTransformer-2:controlling patent text generation by structural metadata[J]. arXiv preprint arXiv:2001.03708, 2020.
[154] LEE S. Measuring and controlling text generation by semantic search[C]//Companion proceedings of the Web conference 2020. New York:ACM, 2020:269-273.
[155] 李金鹏,张闯,陈小军,等.自动文本摘要研究综述[J].计算机研究与发展, 2021, 58(1):1-21.(LI J P, ZHANG C, CHEN X J, et al. Survey on automatic text summarization[J]. Journal of computer research and development, 2021, 58(1):1-21.)
[156] MILLE S, WANNER L. Multilingual summarization in practice:the case of patent claims[C]//Proceedings of the 12th annual conference of the European Association for Machine Translation. Stroudsburg:ACL, 2008:120-129.
[157] FERRARO G, SUOMINEN H, NUALART J. Segmentation of patent claims for improving their readability[C]//Proceedings of the 3rd workshop on predicting and improving text readability for target reader populations (PITR). Stroudsburg:ACL, 2014:66-73.
[158] WANNER L, BRÜGMANN S, DIALLO B, et al. PATExpert:semantic processing of patent documentation[EB/OL].[2023-11-18]. http://ftp.informatik.rwth-aachen.de/Publications/CEURWS/Vol-233/p51.pdf.
[159] 费一楠,张钊.高级专利加工服务PATExpert简析[J].中国发明与专利, 2013(6):54-57.(FEI Y N, ZHANG Z. The analysis of PATExpert for advanced patent processing service[J]). China invention&patent, 2013(6):54-57.)
[160] OKAMOTO M, SHAN Z, ORIHARA R. Applying information extraction for patent structure analysis[C]//Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. New York:ACM, 2017:989-992.
[161] ANDERSSON L, LUPU M, HANBURY A. Domain adaptation of general natural language processing tools for a patent claim visualization system[C]//Information retrieval facility conference. Heidelberg:Springer, 2013:70-82.
[162] KANG J, SOUILI A, CAVALLUCCI D. Text simplification of patent documents[C]//International TRIZ future conference. Heidelberg:Springer, 2018:225-237.
[163] KRESTEL R, CHIKKAMATH R, HEWEL C, et al. A survey on deep learning for patent analysis[J]. World patent information, 2021, 65(6):102035.
[164] RAGHUPATHI V, ZHOU Y, RAGHUPATHI W. Legal decision support:exploring big data analytics approach to modeling pharma patent validity cases[J]. IEEE access, 2018, 6(7):41518-41528.
[165] JURANEK S, OTNEIM H. Using machine learning to predict patent lawsuits.[EB/OL].[2023-11-28]. https://hdl.handle.net/11250/2760583.
[166] CAMPBELL W, LI L, DAGLI C, et al. Predicting and analyzing factors in patent litigation.[EB/OL].[2023-11-28]. http://www.mlandthelaw.org/papers/campbell.pdf.
[167] 中华人民共和国知识产权局.专利审查指南(2010)[M].北京:知识产权出版社, 2009.(China National Intellectual Property Administration. Patent examination guideline[M]. Beijing:Intellectual Property Publishing House, 2009.)
[168] LIU Q, WU H, YE Y, et al. Patent litigation prediction:a convolutional tensor factorization approach[C]//Proceedings of the 27th international joint conference on artificial intelligence. Burlington:Morgan Kaufmann, 2018:5052-5059.
[169] RAJSHEKHAR K, ZADROZNY W, GARAPATI S S. Analytics of patent case rulings:empirical evaluation of models for legal relevance[C]//Proceedings of the 16th international conference on artificial intelligence and law. New York:ACM, 2017.
[170] RAJSHEKHAR K, SHALABY W, ZADROZNY W. Analytics in post-grant patent review:possibilities and challenges (preliminary report)[C]//Proceedings of the American Society for Engineering Management 2016 international annual conference. Red Hook:Curran Associates, Inc., 2016.
[171] U.S. patent phrase to phrase matching[EB/OL].[2023-11-28]. https://www.kaggle.com/c/us-patent-phrase-to-phrase-matching.
Options
文章导航

/