图书情报工作 ›› 2018, Vol. 62 ›› Issue (17): 68-74.DOI: 10.13266/j.issn.0252-3116.2018.17.009

• 情报研究 • 上一篇    下一篇

面向专利技术主题分析的WI-LDA模型研究

吴红, 伊惠芳, 马永新, 李昌   

  1. 山东理工大学科技信息研究所 淄博 255049
  • 收稿日期:2018-02-08 修回日期:2018-05-27 出版日期:2018-09-05 发布日期:2018-09-05
  • 作者简介:吴红(ORCID:0000-0002-1708-7638),研究馆员,硕士,E-mail:wuhong0256@163.com;伊惠芳(ORCID:0000-0003-0094-7993),硕士研究生;马永新(ORCID:0000-0002-5243-4164),硕士研究生;李昌(ORCID:0000-0002-2454-792X),硕士研究生。
  • 基金资助:
    本文系国家社会科学基金项目"高校图书馆深度嵌入专利运营研究"(项目编号:16BTQ029)研究成果之一。

WI-LDA: Technical Topic Analysis in Patents

Wu Hong, Yi Huifang, Ma Yongxin, Li Chang   

  1. Science and Technology Information Research Institute, Shandong University of Technology, Zibo 255049
  • Received:2018-02-08 Revised:2018-05-27 Online:2018-09-05 Published:2018-09-05

摘要: [目的/意义] 改善现有LDA专利技术主题分析存在的辨识度低、可解释性弱和界限划分模糊问题,对于把握技术热点、追踪技术前沿具有重要意义。[方法/过程] 将国际分类号IPC引入LDA专利主题分析中,将其作为技术词的语境,以<词/词组,分类号>二元组的WI (Word IPC)结构进行训练,构建WI-LDA模型,实现对专利文献主题的识别和分析。[结果/结论] 通过中国石墨烯领域的实证研究及与传统LDA模型的对比研究证明,WI-LDA模型泛化能力较强,在专利技术主题分析上能有效降低主题的辨识难度,增加主题的可解释性,使文本主题划分更加清晰。

关键词: WI-LDA, 主题模型, 专利技术主题, 石墨烯

Abstract: [Purpose/significance] It is of great significance to improve the existing problems of technical topic analysis in patents based on the LDA, which are low recognition, weak interpretability and fuzzy boundary division,to hold the technical hot spots and track the technological frontier. [Method/process] The international patent classification is introduced into the topic analysis in patents based on the LDA, and used as the language content of technical terms. The structure of WI (Word IPC) is trained to construct the WI-LDA model to achieve the identification and analysis of the subject of patent documents. [Result/conclusion] The case study of graphene field in Chinese patents and comparative study with traditional LDA models prove that the generalization ability of the WI-LDA model is strong, and the WI-LDA model can effectively reduce the difficulty of identification technical topic analysis in patents, increase the interpretability of topics and make the topic classification clearer.

Key words: WI-LDA, topic model, technical topic in patents, graphene

中图分类号: