图书情报工作 ›› 2017, Vol. 61 ›› Issue (3): 86-96.DOI: 10.13266/j.issn.0252-3116.2017.03.012

• 情报研究 • 上一篇    下一篇

专利技术主题分析:基于SAO结构的LDA主题模型方法

杨超1, 朱东华1, 汪雪锋1, 朱福进1,2, 衡晓帆1   

  1. 1. 北京理工大学管理与经济学院 北京 100081;
    2. 悉尼科技大学量子计算与智能系统研究中心 新南威尔士 2007
  • 收稿日期:2016-10-07 修回日期:2016-12-12 出版日期:2017-02-05 发布日期:2017-02-05
  • 作者简介:杨超(ORCID:0000-0002-0607-9552),博士研究生,E-mail:yc_2009@hotmail.com;朱东华(ORCID:0000-0001-8514-9733),研究员,博士生导师;汪雪锋(ORCID:0000-0002-4857-6944),教授;朱福进(ORCID:0000-0001-8089-4769),博士研究生;衡晓帆(ORCID:0000-0001-5695-5234),博士研究生
  • 基金资助:

    本文系国家自然科学基金面上项目"基于语义TRIZ的新兴技术创新路径预测研究"(项目编号:71373019)和国家高技术研究发展计划"面向政府管理的大数据智能服务系统及应用示范"(项目编号:2014AA015105)研究成果之一。

Technical Topic Analysis in Patents: SAO-based LDA Modeling

Yang Chao1, Zhu Donghua1, Wang Xuefeng1, Zhu Fujin1,2, Heng Xiaofan1   

  1. 1. School of Management and Economics, Beijing Institute of Technology, Beijing 100081;
    2. Centre for Quantum Computation and Intelligent Systems, University of Technology Sydney, NSW 2007
  • Received:2016-10-07 Revised:2016-12-12 Online:2017-02-05 Published:2017-02-05

摘要:

[目的/意义]改善现有专利技术主题分析方法主题辨识度低、主题词二义性、无法识别技术信息中的"问题"与相应"解决方案"等问题。[方法/过程]本文通过抽取专利文本中的SAO结构,并从SAO结构中识别"问题和解决方案"(P&S)模式,基于"bag of P&S"假设,构建基于"主语-行为-宾语"(subject-action-object,SAO)结构的LDA主题模型,实现对专利文献主题结构的识别和分析。[结果/结论]案例研究表明,该方法能够有效识别主题分布,并在主题辨识度和语义消岐方面较传统LDA模型具有较大优势。

关键词: SAO结构, 技术主题分析, LDA模型, P&S模式, 石墨烯

Abstract:

[Purpose/significance] There are three problems we have to fix in performing technical topic analysis:difficult to classify topic; homonyms of words and terms; difficult to identify technical problem and solution.[Method/process] In this paper, we first extract SAO structures from patents, and then we explore and identify the problem & solution patterns embodied in SAO structures. At last, SAO-Based LDA model is built based on the "bag of P&S" assumption and it performs technical topic analysis at concept level.[Result/conclusion] The case study shows that the proposed method can effectively identify topics' distribution, and has great advantages in topic identification and word disambiguation compared with traditional LDA model.

Key words: SAO structures, technical topic analysis, LDA model, P&S pattern, graphene

中图分类号: