情报研究

面向专利技术主题分析的WI-LDA模型研究

  • 吴红 ,
  • 伊惠芳 ,
  • 马永新 ,
  • 李昌
展开
  • 山东理工大学科技信息研究所 淄博 255049
吴红(ORCID:0000-0002-1708-7638),研究馆员,硕士,E-mail:wuhong0256@163.com;伊惠芳(ORCID:0000-0003-0094-7993),硕士研究生;马永新(ORCID:0000-0002-5243-4164),硕士研究生;李昌(ORCID:0000-0002-2454-792X),硕士研究生。

收稿日期: 2018-02-08

  修回日期: 2018-05-27

  网络出版日期: 2018-09-05

基金资助

本文系国家社会科学基金项目"高校图书馆深度嵌入专利运营研究"(项目编号:16BTQ029)研究成果之一。

WI-LDA: Technical Topic Analysis in Patents

  • Wu Hong ,
  • Yi Huifang ,
  • Ma Yongxin ,
  • Li Chang
Expand
  • Science and Technology Information Research Institute, Shandong University of Technology, Zibo 255049

Received date: 2018-02-08

  Revised date: 2018-05-27

  Online published: 2018-09-05

摘要

[目的/意义] 改善现有LDA专利技术主题分析存在的辨识度低、可解释性弱和界限划分模糊问题,对于把握技术热点、追踪技术前沿具有重要意义。[方法/过程] 将国际分类号IPC引入LDA专利主题分析中,将其作为技术词的语境,以<词/词组,分类号>二元组的WI (Word IPC)结构进行训练,构建WI-LDA模型,实现对专利文献主题的识别和分析。[结果/结论] 通过中国石墨烯领域的实证研究及与传统LDA模型的对比研究证明,WI-LDA模型泛化能力较强,在专利技术主题分析上能有效降低主题的辨识难度,增加主题的可解释性,使文本主题划分更加清晰。

本文引用格式

吴红 , 伊惠芳 , 马永新 , 李昌 . 面向专利技术主题分析的WI-LDA模型研究[J]. 图书情报工作, 2018 , 62(17) : 68 -74 . DOI: 10.13266/j.issn.0252-3116.2018.17.009

Abstract

[Purpose/significance] It is of great significance to improve the existing problems of technical topic analysis in patents based on the LDA, which are low recognition, weak interpretability and fuzzy boundary division,to hold the technical hot spots and track the technological frontier. [Method/process] The international patent classification is introduced into the topic analysis in patents based on the LDA, and used as the language content of technical terms. The structure of WI (Word IPC) is trained to construct the WI-LDA model to achieve the identification and analysis of the subject of patent documents. [Result/conclusion] The case study of graphene field in Chinese patents and comparative study with traditional LDA models prove that the generalization ability of the WI-LDA model is strong, and the WI-LDA model can effectively reduce the difficulty of identification technical topic analysis in patents, increase the interpretability of topics and make the topic classification clearer.

参考文献

[1] 胡阿沛, 张静, 雷孝平, 等. 基于文本挖掘的专利技术主题分析研究综述[J]. 情报杂志, 2013, 32(12):88-92.
[2] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of machine learning research,2003, 3(4/5):993-1022.
[3] 廖列法, 勒孚刚. 基于LDA模型和分类号的专利技术演化研究[J]. 现代情报, 2017, 37(5):13-18.
[4] KIM G J, SANG S P, JANG D S. Technology forecasting using topic-based patent analysis[J]. Journal of scientific & industrial research, 2015, 74(5):265-270.
[5] WANG B, LIU S, DING K, et al. Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis:a case study in LTE technology[J]. Scientometrics, 2014, 101(1):685-704.
[6] 吴菲菲, 张亚茹, 黄鲁成, 等. 基于AToT模型的技术主题多维动态演化分析——以石墨烯技术为例[J]. 图书情报工作, 2017, 61(5):95-102.
[7] 陈亮, 张静, 张海超, 等. 层次主题模型在技术演化分析上的应用研究[J]. 图书情报工作, 2017, 61(5):103-108.
[8] BLEI D M, LAFFERTY J D. Dynamic topic models[C]//Proceedings of the 23rd international conference on machine learning. Pittsburgh:ACM, 2006:113-120.
[9] WANG X, MCCALLUM A. Topics over time:a non-markov continuous-time model of topical trends[C]//Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM, 2006:424-433.
[10] TANG J, WANG B, YANG Y, et al. PatentMiner:topic-driven patent analysis and mining[C]//Proceedings of the eighteenth ACM SIGKDD international conference on knowledge discovery and data mining. Beijing:ACM, 2012:1366-1374.
[11] WALLACH H M. Topic modeling:beyond bag-of-words[C]//Proceedings of the 23rd international conference on machine learning. New York:ACM, 2006:977-984.
[12] WANG X, MCCALLUM A, WEI X. Topical N-Grams:phrase and topic discovery, with an application to information retrieval[C]//Proceedings of the seventh IEEE international conference on data mining. Los Alamitos:IEEE Computer Society Press, 2007:697-702.
[13] 杨超, 朱东华, 汪雪锋, 等. 专利技术主题分析:基于SAO结构的LDA主题模型方法[J]. 图书情报工作, 2017, 61(3):86-96.
[14] MAO X L, MING Z Y, CHUA T S, et al. SSHLDA:a semi-supervised hierarchical topic model[C]//2012 Joint conference on empirical methods in natural language processing and computational natural language learning. Stroudsburg:Association for Computational Linguistics, 2012:800-809.
[15] 陈亮. 面向专利分析的Patent Classification LDA模型[J]. 情报学报, 2016, 35(8):864-874.
[16] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10):1795-1802.
[17] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5):58-63.
[18] SUGIMOTO C R, LI D, RUSSELLT G, et al. The shifting sands of disciplinary development:analyzing North American Library and information science dissertations using Latent Dirichlet allocation[J]. Journal of the Association for Information Science & Technology, 2011, 62(1):185-204.
[19] 赵振霞, 陈红. 我国石墨烯技术发展现状及趋势分析——基于专利数据[J]. 纺织导报, 2016(9):40-43.
[20] 王博, 刘盛博,丁堃, 等. 基于LDA主题模型的专利内容分析方法[J]. 科研管理, 2015, 36(3):111-117.
[21] 刘旭. 基于Python自然语言处理工具包在语料库研究中的运用[J]. 昆明冶金高等专科学校学报, 2015, 31(5):65-69.
[22] 李保利, 杨星. 基于LDA模型和话题过滤的研究主题演化分析[J]. 小型微型计算机系统, 2012, 33(12):2738-2743.
文章导航

/