图书情报工作 ›› 2019, Vol. 63 ›› Issue (9): 73-84.DOI: 10.13266/j.issn.0252-3116.2019.09.008

• 情报研究 • 上一篇    下一篇

基于Chunk-LDAvis的核心技术主题识别方法研究

刘自强1,2, 许海云1,3, 岳丽欣4, 方曙1   

  1. 1. 中国科学院成都文献情报中心 成都 610041;
    2. 中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190;
    3. 中国科学技术信息研究所 北京 100038;
    4. 中国人民大学信息资源管理学院 北京 100872
  • 收稿日期:2018-07-10 修回日期:2018-11-04 出版日期:2019-05-05 发布日期:2019-05-05
  • 作者简介:刘自强(ORCID:0000-0003-1814-8655),博士研究生,E-mail:liuziqiang@mail.las.ac.cn;许海云(ORCID:0000-0002-7453-3331),副研究员,博士,硕士生导师;岳丽欣(ORCID:0000-0002-7268-7871),博士研究生;方曙(ORCID:0000-0002-4584-7574),研究员,博士生导师。
  • 基金资助:
    本文系国家自然科学基金项目"基于科学—技术主题关联分析的创新演化路径识别方法研究"(项目编号:71704170)和中国科学院成都文献情报中心青年人才创新项目(项目编号:Y7Z0581002)研究成果之一。

Research on Core Technology Topic Identification Based on Chunk-LDAvis

Liu Ziqiang1,2, Xu Haiyun1,3, Yue Lixin4, Fang Shu1   

  1. 1. Chengdu Library of Chinese Academy of Sciences, Chengdu 610041;
    2. Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190;
    3. Institute of Scientific and Technical Information of China(ISTIC), Beijing 100038;
    4. School of Information Resource Management, Renmin University of China, Beijing 100872
  • Received:2018-07-10 Revised:2018-11-04 Online:2019-05-05 Published:2019-05-05

摘要: [目的/意义]基于大量专利文献数据的核心技术主题识别有助于识别某技术领域的关键技术、分析关键技术的发展方向,是进行技术创新的基础情报工作,对于研究人员、企业乃至国家层面都具有一定的意义。[方法/过程]提出基于Chunk-LDAvis的核心技术主题识别方法,首先基于经典LDA模型进行主题识别,然后利用名词组块对初始LDA主题识别结果进行标注,构建Chunk-LDA主题识别结果,提高其可解读性;然后基于社会网络分析方法构建主题网络,识别核心技术主题;基于R语言的LDAvis工具包绘制可交互的Chunk-LDAvis核心技术主题关联分析图谱,发现核心技术主题的隐含联系,辅助进行核心技术主题识别。[结果/结论]通过对纳米农业领域进行实证研究,验证了本文提出方法的准确性和可行性。

关键词: Chunk-LDAvis, 专利分析, 主题识别, 核心技术主题, 交互可视化

Abstract: [Purpose/significance] Core technology topic identification based on a large number of patent documents is helpful to detect key technologies in a technical field and to analyze the direction of the development of key technologies. It is the basic information work for technological innovation and has certain significance for researchers, enterprises and even the national level.[Method/process] This paper proposes a core technology topic identification method based on Chunk-LDAvis. Firstly, it is based on the classic LDA model to identify the topics. Then, the noun chunk is used to mark the results of the initial LDA topic identification, and the result of the Chunk-LDA topic recognition is constructed to improve its interpretability. Then based on the social network analysis method, the topic network is constructed to identify the core technical topics; based on the LDAvis toolkit, the interactive Chunk-LDAvis core technology topic association analysis map is plotted, and the hidden links of the core technical topics are found, and the core technology topic detection is assisted.[Result/conclusion] Through the empirical study on the field of nanoscale agriculture, the accuracy and feasibility of the proposed method are verified.

Key words: Chunk-LDAvis, patent analysis, topic recognition, core technology topics, interactive visualization

中图分类号: