图书情报工作 ›› 2021, Vol. 65 ›› Issue (22): 114-125.DOI: 10.13266/j.issn.0252-3116.2021.22.012

• 情报研究 • 上一篇    下一篇

基于BERT-LDA的关键技术识别方法及其实证研究——以农业机器人为例

王秀红1,2, 高敏1   

  1. 1. 江苏大学科技信息研究所 镇江 212013;
    2. 江苏大学图书馆 镇江 212013
  • 收稿日期:2021-05-18 修回日期:2021-08-19 出版日期:2021-11-20 发布日期:2021-12-01
  • 作者简介:王秀红,研究馆员,博士,E-mail:xiuhongwang@ujs.edu.cn;高敏,硕士研究生。
  • 基金资助:
    本文系国家重点研发计划项目"农业装备制造产业集聚区域网络协同制造集成技术研究与应用示范"(项目编号:SQ2020YFB170242)研究成果之一。

The Key Technology Identification Method Based on BERT-LDA and Its Empirical Research: A Case Study of Agricultural Robots

Wang Xiuhong1,2, Gao Min1   

  1. 1 Institute of Science and Technology Information, Jiangsu University, Zhenjiang 212013;
    2 Jiangsu University Library, Zhenjiang 212013
  • Received:2021-05-18 Revised:2021-08-19 Online:2021-11-20 Published:2021-12-01

摘要: [目的/意义] 好的关键技术识别方法能够更好地为各层各级的关键技术识别、预测和研发提供支撑。[方法/过程] 提出基于BERT-LDA模型的关键技术识别方法,通过将BERT与LDA相结合,以弥补单一使用LDA主题模型缺乏上下文语义信息的缺陷,并以农业机器人为例进行实证研究。具体包括以下过程:①基于python构建BERT语义特征向量和LDA主题特征向量,将其在高维空间进行向量拼接,利用自编码器学习连接向量的低维潜在空间表示;②在潜在空间表示上使用K-means算法实现语义关联聚类,得到二维聚类效果图及关键技术主题词云图;③进行关键技术判定;④在农业机器人技术领域,与基于德温特TI专利软件的专利分析结果和《中国制造2025》重点领域技术路线图中农业装备关键共性技术清单对比,实证本方法的有效性。[结果/结论] 研究表明:BERT-LDA模型提高了主题聚类的连贯性及细粒度划分的精准度;具有很好的关键技术识别精准率和召回率;对识别的不同数据库和出版类型的文献数据集具有较好的包容性与兼容性,适应性强;可广泛应用于各类关键技术的识别。

关键词: 关键技术识别, 农业机器人, BERT-LDA模型, 德温特专利

Abstract: [Purpose/significance] A good key technology identification method can provide better support for key technology identification, prediction and research and development at all levels. [Method/process] In this paper, a key technology identification method based on Bert-LDA was proposed, which combined BERT and LDA to make up for the lack of contextual semantic information in a single LDA topic model. An empirical study was carried out with agricultural robots as an example. Specifically, it included the following processes: ① Constructing BERT semantic feature vector and LDA topic feature vector based on Python, combining them in a high-dimensional space, and learning the low-dimensional latent space representation of the concatenated vector by using an autoencoder; ② In the potential space representation, K-means algorithm was used to realize semantic association clustering, and the effect diagram of two-dimensional clustering and key technology subject word cloud maps were drawn; ③ Determining key technologies; ④ In the field of agricultural robots, the effectiveness of this method was demonstrated by comparing with the results of TI patent analysis and the list of key generic technologies for agricultural equipments in the "Made in China 2025" technology roadmap for key areas. [Result/conclusion] The results show that the Bert-LDA model improves the coherence of topic clustering and the accuracy of fine-grained classification. With a good key technology identification accuracy and recall rate, there are good inclusiveness, compatibility and adaptability to the identified literature data sets of different databases and publishing types. It can be widely used to identify all kinds of key technologies.

Key words: key technology identification, agricultural robots, BERT-LDA model, Derwent patents

中图分类号: