[Purpose/significance] Patent similarity detection assists the formulation of the national innovation strategy planning macroscopically, finds hotspots in China and all over the world, and deals with patent rogues in other countries. Microscopically, patent similarity detection provides support for patent inventors, patent examiners and patentees.[Method/process] A new method was proposed based on deep learning of Doc2Vec model, with patent corpus based on no data clearance of domain knowledge. Then typical patents were randomly selected to carry on similarity detection by this new method, and the results with traditional similarity detection models were compared.[Result/conclusion] According to experimental results, the new deep learning of Doc2Vec method and TFIDF model has similary results which both of the model's patent corpus all based on no data clearance of domain knowledge.The new method requires less professional skill in specific domain knowledge, and didn't require the process of data clearance. It can givesa new intelligent support tool for patent infringement and patent investigation, reduce the research threshold and workload, and improve service efficiency.
Cao Qi
,
Zhao Wei
,
Zhang Yingjie
,
Zhao Shujun
,
Chen Liang
. Comparative Study of Patent Documents Similarity Detection on Deep Learning of Doc2Vec Based Methods[J]. Library and Information Service, 2018
, 62(13)
: 74
-81
.
DOI: 10.13266/j.issn.0252-3116.2018.13.010
[1] MUKHERJEE S, ROMERO D M, JONES B, et al. The nearly universal link between the age of past knowledge and tomorrow's breakthroughs in science and technology:the hotspot[J]. Science advances, 2017, 3(4):e1601315.
[2] PADMANABHAN S, AMIN T, SAMPAT B, et al. Intellectual property, technology transfer and manufacture of low-cost HPV vaccines in India[J]. Nature biotechnology, 2010, 28(7):671-678.
[3] 王曰芬, 谢寿峰, 邱玉婷. 面向预警的专利文献相似度研究的意义及现状[J]. 情报理论与实践, 2014, 37(7):135-140.
[4] 王秀红,袁艳,赵志程,等. 专利文献的结构树模型及其在相似度计算中的应用[J]. 情报理论与实践, 2015, 38(3):107-111.
[5] BUBELA T, GOLD E R, GRAFF G D, et al. Patent landscaping for life sciences innovation:toward consistent and transparent practices[J]. Nature biotechnology, 2013, 31(3):202-206.
[6] SMITH J A, ARSHAD Z, THOMAS H, et al. Evidence of insufficient quality of reporting in patent landscapes in the life sciences[J]. Nature biotechnology, 2017, 35(3):210-214.
[7] 李莉, 刘知远, 孙茂松. 基于中英平行专利语料的短语复述自动抽取研究[J]. 中文信息学报, 2013, 27(6):151-158.
[8] 娄岩, 张赏, 黄鲁成. 基于专利分析的替代性技术选择研究[J]. 科技管理研究, 2015, 35(20):150-154.
[9] 陈云伟, 方曙. 专利权人关联网络的社会网络分析方法研究[J]. 图书情报知识, 2011(3):58-66.
[10] 王鑫, 赵蕴华, 高芳. 基于分类号和引文的专利相似度测量方法研究[J]. 数字图书馆论坛, 2015(01):57-62.
[11] 朱磊, 金海, 郑然, 等. 基于形状语义的外观设计专利检索[J]. 计算机辅助设计与图形学学报, 2013, 25(3):372-380.
[12] 王晋, 孙涌, 王璁玮. 基于领域本体的文本相似度算法[J]. 苏州大学学报:工科版, 2011, 31(3):13-17.
[13] 陈亮, 张海超, 杨冠灿, 等利用Knowledge Graph的专利表示方法及其应用[J]. 图书情报工作, 2017, 61(9):123-129.
[14] 姜利雪, 季铎, 蔡东风. 专利中基于语义角色的术语相似度计算方法[J]. 中文信息学报, 2016, 30(4):37-43.
[15] 许海云, 王振蒙, 胡正银, 等. 利用专利文本分析识别技术主题的关键技术研究综述[J]. 情报理论与实践, 2016,39(11):131-137.
[16] 饶齐, 王裴岩, 张桂平. 面向中文专利SAO结构抽取的文本特征比较研究[J]. 北京大学学报(自然科学版), 2015, 51(2):349-356.
[17] 杨宏章, 付静. 利用专利文本结构化特征构建专利信息智能语义检索系统的方法[J]. 情报理论与实践, 2015, 38(4):136-138.
[18] 陈亮, 杨冠灿, 张静, 等. 面向技术演化分析的多主路径方法研究[J]. 图书情报工作, 2015, 59(10):124-130,115.
[19] 廖列法, 勒孚刚, 朱亚兰. LDA模型在专利文本分类中的应用[J]. 现代情报, 2017, 37(3):35-39.
[20] 武玉英, 马羽翔, 翟东升. 基于SOM的中文专利侵权检测研究[J]. 情报杂志, 2014, 33(2):33-39.
[21] 许侃,林原,曲忱,等. 专利查询扩展的词向量方法研究[J]. 计算机科学与探索, 2017(9):1-9.
[22] 王琰炎, 王裴岩. 一种用于专利实体的实体消歧方法[J]. 沈阳航空航天大学学报, 2015, 32(1):77-83.
[23] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of the 31st international conference on machine learning (ICML-14).[EB/OL].[2014-01-27] http://proceedings.mlr.press/v32/le14.html.
[24] EHEK R, PETR S. Software framework for topic modelling with large corpora.[EB/OL].[2018-05-22].https://is.muni.cz/publication/884893/en%7D%7D%2C%20language=%7BEnglish?lang=en.
[25] UNITED STATES PATENT TREADEMARK OFFICE. Patent Grant Full Text Data/XML Version 4.5 ICE (JAN 2017-DEC 2017)[EB/OL].[2017-12-26].https://bulkdata.uspto.gov/data/patent/grant/redbook/fulltext/2017/.