[Purpose/Significance] It is beneficial to grasp the layout and trend of patent technology by identifying technical entities and predicting technology with finer granularity for patent texts. [Method/Process] The deep learning method was used to automatically identify patent technology terms entities, and the advantages and disadvantages of several groups of deep learning algorithms were compared by empirical analysis. At the same time, new semi-supervised labeling and self-defined labeling schemes were proposed to improve the efficiency of manual labeling. Finally, the optimal model obtained by training was implemented, and the fine-grained technical prediction of synthetic biotechnology was made by combining the link prediction method. [Result/Conclusion] The empirical results show that RoBERTa-BiLSTM-CRF model is more suitable for the recognition of patent technical terms with complex semantics, and the F1 value reaches 86.8%. The technical recognition result is more detailed than the traditional IPC analysis method. The fine-grained technical prediction shows that the synthetic methods of synthetic biology are constantly improving and innovating, and the synthetic research is developing towards synthetic fuels.
Hu Yamin
,
Wu Xiaoyan
,
Liao Xingbin
,
Qian Yangge
,
Chen Fang
. Research on Fine-Grained Technology Prediction Based on Deep Learning and Link Prediction: Take Synthetic Biology as an Example[J]. Library and Information Service, 2022
, 66(24)
: 92
-103
.
DOI: 10.13266/j.issn.0252-3116.2022.24.009
[1] 刘倩楠. 基于专利引文网络的技术演进路径识别研究[D].大连:大连理工大学, 2010.
[2] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of machine learning research, 2003, 3 (4/5): 993-1022.
[3] CHANG J S. Domain specific word extraction from Hierarchical web documents: a first step toward building lexicon trees from Web corpora[C]//The 4th SIGHAN workshop on Chinese language learning. Stroudsburg: ACL, 2005: 64-71.
[4] 谷俊. 专利文献中新技术术语识别研究[J]. 数据分析与知识发现, 2012 (11): 53-59.
[5] 曹国忠, 杨雯丹, 刘新星. 基于主体-行为-客体(SAO)三元结构的专利分析方法研究综述[J]. 科技管理研究, 2021, 41 (4): 158-167.
[6] 许海云, 王振蒙, 胡正银, 等. 利用专利文本分析识别技术主题的关键技术研究综述[J]. 情报理论与实践, 2016, 39 (11): 131-137.
[7] LAN Y, XU H G, XU K, et al. Research on named entity recognition for science and technology terms in Chinese based on dependent entity word vector[C]//The 14th international conference on anti-counterfeiting, security, and identification. Piscataway:IEEE, 2020: 25-30.
[8] 张洋, 林宇航, 侯剑华. 基于融合数据和生命周期的技术预测方法:以病毒核酸检测技术为例[J]. 情报学报, 2021, 40 (5): 462-470.
[9] 宋欣娜, 郭颖, 席笑文. 基于专利文献的多指标新兴技术识别研究[J]. 情报杂志, 2020, 39 (6): 76-81,88.
[10] 潘东华, 徐珂珂. 基于专利文献分类码的技术知识图谱绘制方法研究[J]. 情报学报, 2015, 34 (8): 866-874.
[11] 刘忠宝, 康嘉琦, 张静. 基于主题突变检测的颠覆性技术识别——以无人机技术领域为例[J]. 科技导报, 2020, 38 (20): 97-105.
[12] 吴颖文, 纪杨建, 顾新建. 基于专利技术共现网络的共性技术识别——以家电行业为例[J]. 情报探索, 2020 (3): 1-10.
[13] GETOOR L, DIEHL C P. Link mining: a survey[J]. ACM SIGKDD explorations newsletter, 2005, 7 (2): 3-12.
[14] 胡雅敏, 吴晓燕, 陈方. 基于机器学习的技术术语识别研究综述[J]. 数据分析与知识发现, 2022, 6 (Z1): 7-17.
[15] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[C]// Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Stroudsburg: ACL, 2016: 260-270.
[16] 王学锋, 杨若鹏, 朱巍. 基于深度学习的军事命名实体识别方法[J]. 装甲兵工程学院学报, 2018, 32 (4): 94-98.
[17] 王昊, 邓三鸿, 苏新宁, 等. 基于深度学习的情报学理论及方法术语识别研究[J]. 情报学报, 2020, 39 (8): 817-828.
[18] 袁慧. 基于Bi-LSTM与CRF的命名实体识别研究——以生态治理技术相关实体为例[D].兰州:中国科学院兰州文献情报中心, 2017.
[19] 刘宇飞, 尹力, 张凯, 等. 基于深度迁移学习的技术术语识别——以数控系统领域为例[J]. 情报杂志, 2019, 38 (10): 168-175.
[20] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. Computation and language, 2018 (10): 1810-4805.
[21] 刘婷, 张社荣, 王超, 等. 基于BERT-BiLSTM混合模型的水利施工事故文本智能分析[J]. 水力发电学报, 2022,41(7): 1-12.
[22] LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2022-08-20]. https://arxiv.org/abs/1907.11692.
[23] 陈颖, 张晓林. 专利中技术词和功效词识别方法研究[J]. 现代图书情报技术, 2011 (12): 24-30.
[24] KASTRIN A, RINDFLESCH T C, HRISTOVSKI D. Link prediction on a network of co-occurring mesh terms: towards lLiterature-based discovery[J]. Methods of information in medicine, 2016, 55 (4): 340-346.
[25] LU L Y, ZHOU T. Link prediction[M]. Beijing:Higher Education Press, 2013.
[26] ZHOU T, LU L Y, ZHANG Y C. Predicting missing links via local information[J]. European physical journal B, 2009, 71 (4): 623-630.
[27] 刘思, 刘海, 陈启买, 等. 基于网络表示学习与随机游走的链路预测算法[J]. 计算机应用, 2017, 37 (8): 2234-2239.
[28] DOREN D V, KOENIGSTEIN S, REISS T. The development of synthetic biology: a patent analysis[J]. Systems & synthetic biology, 2013,7(4): 209-220.
[29] 吴晓燕, 胡雅敏, 陈方. 基于专利共类的技术融合分析框架研究——以合成生物学领域为例[J]. 情报理论与实践, 2021, 44 (10): 179-184.