图书情报工作 ›› 2021, Vol. 65 ›› Issue (5): 110-117.DOI: 10.13266/j.issn.0252-3116.2021.05.011

• 情报研究 • 上一篇    下一篇

利用迁移学习精准识别领域信息之探讨

陆泉1,2, 郝志同1, 陈静3, 陈仕1, 朱安琪1   

  1. 1. 武汉大学信息资源研究中心 武汉 430072;
    2. 国土资源部城市土地资源监测与仿真重点实验室 深圳 518034;
    3. 华中师范大学信息管理学院 武汉 430079
  • 收稿日期:2020-07-03 修回日期:2020-09-19 出版日期:2021-03-05 发布日期:2021-04-14
  • 通讯作者: 陈静(ORCID:0000-0002-6444-2962),教授,博士生导师,通讯作者,E-mail:dancinglulu@sina.com
  • 作者简介:陆泉(ORCID:0000-0002-8679-9866),教授,博士生导师;郝志同(ORCID:0000-0003-1803-2441),硕士研究生;陈仕(ORCID:0000-0003-4664-7208),硕士研究生;朱安琪(ORCID:0000-0002-7526-1761),硕士研究生。
  • 基金资助:
    本文系国家社会科学基金重点项目"心理账户理论视角下在线健康社区精准信息服务研究"(项目编号:20ATQ008)研究成果之一。

Discussion on Using Transfer Learning to Accurately Identify Domain Information

Lu Quan1,2, Hao Zhitong1, Chen Jing3, Chen Shi1, Zhu Anqi1   

  1. 1 Center for Studies of Information Resources, Wuhan University, Wuhan 430072;
    2 Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Land and Resources, Shenzhen 518034;
    3 School of Information Management, Central China Normal University, Wuhan 430079
  • Received:2020-07-03 Revised:2020-09-19 Online:2021-03-05 Published:2021-04-14

摘要: [目的/意义] 将从互联网大数据中无监督学习的结果迁移到目标领域,解决目标领域因学习样本有限而信息识别效果难以提升的问题。[方法/过程] 使用以中文维基百科等数据预训练的RoBERTa模型进行迁移学习,将学习结果映射到目标领域后使用DPCNN对其进行聚合凝练,然后结合部分标注数据微调模型完成领域信息的精准识别。[结果/结论] 在10个领域内与未进行迁移学习的模型及经典模型TextCNN对比,提出的模型均较大幅度优于对比模型,平均后的精确率绝对提高4.15%、3.43%,召回率绝对提高4.55%、3.44%,F1分数绝对提高4.52%、3.44%,表明利用网络大数据迁移学习可以显著提升目标领域的信息识别效果。

关键词: 迁移学习, 信息识别, RoBERTa

Abstract: [Purpose/significance] To solve the problem that the identification effect of the target domain information is difficult to improve because of not enough samples, we will transfer the results of unsupervised learning from big data to the feature space of the target domain. [Method/process] Used the RoBERTa model, which was pre-trained with Chinese Wikipedia and other data, for transfer learning. After mapping the learning results to the target domain, DPCNN was used to aggregate and condense it, and then fine-tuned the model with part of the labeled data to complete the accurate recognition of domain information. [Result/conclusion] Compared with the model without transfer learning and the classic model TextCNN in 10 fields, the model in this paper is much better than the comparison models. After average, the precision is increased by 4.15% and 3.43%, the recall is increased by 4.55% and 3.44%, and the F1 score is increased by 4.52% and 3.44%. It shows that knowledge transfer using big data can effectively improve the information recognition effect in the target field.

Key words: transfer learning, information recognition, RoBERTa

中图分类号: