[目的/意义] 使用融合多属性的量化方法,快速且有效地挖掘出领域内多个技术创新主题,为技术创新方向的确定提供借鉴。[方法/过程] 将LDA (Latent Dirichlet Allocation)主题模型与专利价值评价指标相结合,提出一种挖掘技术创新主题的量化方法。首先,综合运用TF-IDF、困惑度和四分位数法构建领域专利的LDA主题模型。然后,利用LDA输出的概率分布矩阵,结合专利价值评价指标(权利要求和IPC),构建量化指标体系。接着,选取芯片专利进行验证实验,计算量化指标并运用热力图对其可视化,识别出技术创新主题。最后,基于专利、LDA的输出矩阵、创新主题和量化指标之间的映射关系,进行专利筛选和技术创新主题的合理标记。[结果/结论] 通过邀请微电子领域专家和参考最新国内外芯片技术两种方式对实验结果进行评估,结果表明:融合多属性的领域技术创新主题挖掘方法能够快速且有效地挖掘出多个技术创新主题,在实践层面可以更好地为相关领域企业和科技工作者发现技术创新主题提供思路。
[Purpose/significance] By combining multiple attributes, it can quickly and effectively dig out multiple technological innovation themes in the field, providing reference for the determination of technological innovation direction. [Method/process] This paper combined the LDA (Latent Dirichlet Allocation) topic model with the evaluation indicators of patent value, and proposed a quantitative method for mining patent innovation themes. First, TF-IDF, means of perplexity and quartile method were used to construct the LDA topic model of the domain patent to mine technological topics. Then, the probability distribution matrix output by LDA was combined with the evaluation indicators of patent value(claim and IPC) to construct a quantitative indicator system. Then, patents in the chip field were selected for verification experiments, quantitative indicators were calculated and visualized by heat map to identify the technological innovation themes in the field. Finally, based on the mapping relationship between patent, LDA output matrix, innovation theme and quantitative indicators, innovation patent screening and reasonable marking of technological innovation themes were carried out. [Result/conclusion] By inviting experts in the field of microelectronics and based on the latest chip technology at home and abroad to evaluate the experimental results. The scoring results show that the method of mining technology innovation topics with multiple attributes can mine multiple technology innovation topics quickly and effectively. At the practical level, it can better provide ideas for enterprises and scientists in related fields to technological innovation themes.
[1] 温军, 张森. 专利、技术创新与经济增长[J]. 华东经济管理, 2019,33(8):152-158.
[2] SCHMOOKLER J. Changes in industry and in the state of knowledge as determinants of industrial invention[C]//NELSON R R. The rate and direction of inventive activity. Princeton:Princeton University Press, 1962:195-232.
[3] GRILICHES Z. Patent statistics as economic indicators:a survey[J]. Journal of economic literature, 1990, 28(4):1661-1707.
[4] SCHMOCH U. Indicators and the relations between science and technology[J]. Scientometrics, 1997, 38(1):103-116.
[5] OECD. Patent statistics manual[M]. Paris:OECD Publishing, 2009.
[6] 赵阳, 文庭孝. 专利技术信息挖掘研究进展[J]. 图书馆, 2018(4):28-33.
[7] CHOI C, PARK Y. Monitoring the organic structure of technology based on the patent development paths[J]. Technological forecasting and social change, 2009, 76(6):754-768.
[8] KWON O, SEO J, NOH K, et al. Categorizing influential patents using bibliometric analysis of patent citations network[J]. Information-an international interdisciplinary journal, 2007, 10(3):313-326.
[9] 张欣, 马瑞敏. 基于改进PageRank算法的核心专利发现研究[J].图书情报工作, 2018, 62(10):106-115.
[10] WANG Y, BAI H J, STANTON M, et al. PLDA:parallel latent dirichlet allocation for large-scale applications[C]//International conference on algorithmic aspects in information and management. San Francisco:Springer-verlag, 2009:301-314.
[11] NEWMAN M E J, GIRVAN M. Finding and evaluating community structure in networks[J]. Physical review, 2004, 69(2):108-113.
[12] BLONDEL V D, GUILLAUME J L, LAMBIOTTE R, et al. Fast unfolding of communities in large networks[J]. Journal of statistical mechanics:theory and experiment, 2008, 30(2):155-168.
[13] HAYOUNG C, SEUNGHYUN O, SUNGCHUL C, et al. Innovation topic analysis of technology:the case of augmented reality patents[J]. IEEE access, 2018(6):16119-16137.
[14] 伊惠芳, 吴红, 马永新, 等. 基于LDA和战略坐标的专利技术主题分析——以石墨烯领域为例[J].情报杂志, 2018, 37(5):97-102.
[15] 范宇, 符红光, 文奕. 基于LDA模型的专利信息聚类技术[J]. 计算机应用,2013, 33(S1):87-89, 93.
[16] 吕晓蓉. 专利价值评估指标体系与专利技术质量评价实证研究[J]. 科技进步与对策, 2014, 31(20):113-115.
[17] LANJOUW J, SHANKERMAN M. Stylized facts of patent litigation:value, scopeand ownership[R/OL].[2019-10-19]. https://www.nber.org/papers/w6297.pdf.
[18] 孙伟, 刘文静, 葛丽阁, 等. 一种基于词加权LDA模型的专利文献分类方法[J]. 计算机技术与发展, 2019(3):23-29.
[19] 庞剑锋, 卜东波, 白硕. 基于向量空间模型的文本自动分类系统的研究与实现[J]. 计算机应用研究, 2001(9):23-26.
[20] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of machine learning research, 2003(3):993-1022.
[21] 张晗, 徐硕, 乔晓东. 融合科技文献内外部特征的主题模型发展综述[J]. 情报学报, 2014, 33(10):1108-1120.
[22] 廖列法, 勒孚刚, 朱亚兰. LDA模型在专利文本分类中的应用[J]. 现代情报, 2017, 37(3):35-39.
[23] 杨超, 朱东华, 汪雪锋. 专利技术主题分析——基于SAO结构的LDA主题模型方法[J]. 图书情报工作, 2017, 61(3):86-96.
[24] WON S L, EUN J H, SO Y S. Predicting the pattern of technology convergence using big-data on large-scale triadic patents[J]. Technological forecasting & social change, 2015, 100:317-329.
[25] 张文君, 顾行发, 陈良富, 等. 基于均值-标准差的K均值初始聚类中心选取算法[J]. 遥感学报, 2006, 10(5):715-721.
[26] PAOLA D R, SABRINA S, VINCENZO L. A semantic-grained perspective of latent knowledge modeling[J]. Information fusion, 2017, 36:52-67.
[27] 李清海, 刘洋, 吴泗宗, 等. 专利价值评价指标概述及层次分析[J]. 科学学研究, 2007, 25(2):281-286.
[28] 严明义. 函数性数据的统计分析:思想、方法和应用[J]. 统计研究, 2007(2):87-94.
[29] 温颖, 周昕, 赵文明. 高职软件专业学生职业素养量化评价[J].计算机工程与设计, 2017, 38(9):2586-2590.
[30] BIRD S, KLEIN E, LOPER E. Natural language processing with python[M]. New York:O'Reilly Media Press, 2009:41-134.
[31] PEDREGOSA F, VAROQUAUX G. Scikit-learn:machine learning in python[J]. Journal of machine learning research, 2011, 12:2825-2830.
[32] LDA Developers. LDA:topic modeling with latent dirichlet allocation[EB/OL].[2019-11-22].https://lda.readthedocs.io/en/latest/.
[33] 国家知识产权局-国际专利分类表(2008.01版)[S/OL].[2019-09-01]. http://www.sipo.gov.cn/wxfw/zlwxxxggfw/zsyd/bzyfl/gjzlfl/201406/t20140630_973352.html.
[34] FRANK H, JOCHEN G, MICHAEL H. Memetic search for overlapping topics based on a local evaluation of link communities[J]. Scientometrics, 2016, 11(2):1089-1118.
[35] 张百尚, 商惠敏. 国内外芯片产业技术现状与趋势分析[J]. 科技管理研究, 2019(17):131-134.
[36] 王立娜, 唐川, 房俊民, 等. 2018年全球半导体领域规划与发展态势分析[J]. 世界科技研究与发展, 2019, 41(2):120-126.