A Similarity-based Model for Mapping Between Patent and Industrial Classifications——Mapping Between the International Patent Classification and the Industrial Classification for National Economic Activities

  • Tian Chuang ,
  • Zhao Yajuan
Expand
  • National Science Library, Chinese Academy of Sciences, Beijing 100190

Received date: 2016-08-10

  Revised date: 2016-10-05

  Online published: 2016-10-20

Abstract

[Purpose/significance] This paper aims to propose a model based on the similarity for mapping between patents and industries and provide some references for the further research. This model is accurate, scalable and efficient. [Method/process] After introducing the methods for mapping between patent and industrial classifications, the authors described and explored the model. They completed the mapping of the International Patent Classification and the Industrial Classification for National Economic Activities partly and processed cosine similarity results by the Z-score normalization method. Then, the authors evaluated this model according to the trial version results of SIPO. [Result/conclusion] This model takes the advantage of the official annotation of patent classification and the descriptive content of patents and gets the mapping results between patents and industrial classifications automatically by the natural language processing technology. Compared with the existing methods, the method saves a lot of labor costs while ensuring the accuracy. This model can easily adjust the fine-grained classification and be applied to most of mapping between patents and industrial classifications. Finally, the improvement of the model is described. Some future application areas are also briefly discussed in this paper.

Cite this article

Tian Chuang , Zhao Yajuan . A Similarity-based Model for Mapping Between Patent and Industrial Classifications——Mapping Between the International Patent Classification and the Industrial Classification for National Economic Activities[J]. Library and Information Service, 2016 , 60(20) : 123 -131 . DOI: 10.13266/j.issn.0252-3116.2016.20.015

References

[1] 国家知识产权局. 国际专利分类与国民经济行业分类参照关系表(试用版)编制说明[EB/OL].[2016-04-08]. http://www.sipo.gov.cn/tjxx/zltjjb/201512/P020151221492994057449.pdf.
[2] 田创,赵亚娟.专利与产业的映射研究进展[J].图书情报工作,2016,60(1):135-141.
[3] 国家知识产权局.2014年战略性新兴产业发明专利统计分析总报告[EB/OL].[2016-04-08]. http://www.sipo.gov.cn/tjxx/yjcg/201504/P020150422347216350682.pdf.
[4] 卢慧生,林小露.联合专利分类体系发展与应用现状[J].中国发明与专利,2015(4):47-53.
[5] 结巴中文分词[EB/OL].[2016-04-08].https://github.com/fxsjy/jieba.
[6] 顾益军,樊孝忠,王建华,等.中文停用词表的自动选取[J].北京理工大学学报,2005,25(4):337-340.
[7] Patent_To_Industry:stopwords[EB/OL].[2016-04-08]. https://github.com/littlewilliam/Patent_To_Industry/blob/master/StopWords.txt.
[8] SALTON G, WONG A, YANG CS. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11):613-620.
[9] SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information processing & management, 1988, 24(5):513-523.
[10] TAN P N, STEINBACH M, KUMAR V. Introduction to data mining[M]. Boston:Pearson Addison Wesley, 2006:74-75.
[11] RUNYON R P, COLEMAN K A, HABER A. Behavioral statistics:the core[M]. New York:McGrawHill, 1994.
[12] 国家知识产权局. 中国专利文献的国民经济行业分类标引工作取得阶段性成果[EB/OL].[2016-04-08]. http://www.sipo.gov.cn/ghfzs/zltjjb/201503/P020150325567300995160.pdf.
[13] RIJSBERGEN V. Information retrieval[M]. London:Butterworths, 1979:114-115.

Outlines

/