图书情报工作 ›› 2021, Vol. 65 ›› Issue (16): 118-129.DOI: 10.13266/j.issn.0252-3116.2021.16.013

• 知识组织 • 上一篇    下一篇

融合语义联想和BERT的图情领域SAO短文本分类研究

张玉洁1, 白如江1, 刘明月1, 于纯良2   

  1. 1 山东理工大学信息管理研究院 淄博 255049;
    2 烟台大学图书馆 烟台 264005
  • 收稿日期:2021-01-27 修回日期:2021-05-12 出版日期:2021-08-20 发布日期:2021-08-20
  • 通讯作者: 白如江(ORCID:0000-0003-3822-8484),研究馆员,硕士生导师,通讯作者,E-mail:brj@sdut.edu.cn。
  • 作者简介:张玉洁(ORCID:0000-0002-6819-031X),硕士研究生;刘明月(ORCID:0000-0002-4335-9369),硕士研究生;于纯良(ORCID:0000-0002-3013-8022),副研究馆员。
  • 基金资助:
    本文系山东省高等学校青创科技支持计划"科技大数据驱动的智慧决策支持创新团队-面向新旧动能转换的新兴科学研究前沿识别研究"(项目编号:2019RWG033)和山东省社科规划处项目"数字环境下科学论文的内容标注模型研究"(项目编号:20CSDJ65)研究成果之一。

Research on SAO Short Text Classification in LIS Based on Semantic Association and BERT

Zhang Yujie1, Bai Rujiang1, Liu Mingyue1, Yu Chunliang2   

  1. 1 Institute of Information Management, Shandong University of Technology, Zibo 255049;
    2 Yantai University Library, Yantai 264005
  • Received:2021-01-27 Revised:2021-05-12 Online:2021-08-20 Published:2021-08-20

摘要: [目的/意义] 针对SAO结构短文本分类时面临的语义特征短缺和领域知识不足问题,提出一种融合语义联想和BERT的SAO分类方法,以期提高短文本分类效果。[方法/过程] 以图情领域SAO短文本为数据源,首先设计了一种包含"扩展-重构-降噪"三环节的语义联想方案,即通过语义扩展和SAO重构延展SAO语义信息,通过语义降噪解决扩展后的噪声干扰问题;然后利用BERT模型对语义联想后的SAO短文本进行训练;最后在分类部分实现自动分类。[结果/结论] 在分别对比了不同联想值、学习率和分类器后,实验结果表明当联想值为10、学习率为4e-5时SAO短文本分类效果达到最优,平均F1值为0.852 2,与SVM、LSTM和单纯的BERT相比,F1值分别提高了0.103 1、0.153 8和0.140 5。

关键词: SAO, 短文本分类, 语义联想, BERT

Abstract: [Purpose/significance] Aiming at the shortage of semantic features and insufficient domain knowledge in the classification of SAO structure short texts, this paper proposes a SAO classification method combining semantic association and BERT in order to improve the classification effect.[Method/process] Taking the SAO short text in the library and information science field as the data source, firstly, a semantic association scheme including the three links of "Expansion-Reconstruction-NoiseReduction" was designed. The semantic information of SAO was extended through semantic expansion and SAO reconstruction, and the extended noise interference problem was solved by semantic noise reduction; then used the BERT model to train the SAO short text after semantic association; finally realized automatic classification in the classification part.[Result/conclusion] After comparing different association values, learning rates and classifiers, the experimental results show that when the association value is 10 and the learning rate is 4e-5, the SAO short text classification effect is optimal, and the average F1 value is 0.852 2, which is comparable to SVM and LSTM compared with pure BERT, the F1 value is increased by 0.103 1, 0.153 8 and 0.140 5 respectively.

Key words: SAO, short text classification, semantic association, BERT

中图分类号: