图书情报工作 ›› 2023, Vol. 67 ›› Issue (2): 131-139.DOI: 10.13266/j.issn.0252-3116.2023.02.013

• 知识组织 • 上一篇    下一篇

基于UniLM模型的学术文摘观点自动生成研究

曾江峰, 刘园园, 程征, 段尧清   

  1. 华中师范大学信息管理学院 武汉 430079
  • 收稿日期:2022-07-25 修回日期:2022-10-06 出版日期:2023-02-09 发布日期:2023-02-09
  • 通讯作者: 刘园园,硕士研究生,通信作者,E-mail:liu5211314yuan2021@163.com。
  • 作者简介:曾江峰,讲师,博士,硕士生导师;程征,硕士研究生;段尧清,教授,博士,博士生导师。
  • 基金资助:
    本文系教育部人文社会科学青年项目“情境大数据驱动的社交媒体虚假信息识别模型与治理策略研究”(项目编号:21YJC870002)和中央高校基本科研业务费资助项目“信息交互行为与隐私保护研究”(项目编号:CCNU22QN017)研究成果之一。

An Automatic Generation Study of Academic Abstract Viewpoints Based on the UniLM Model

Zeng Jiangfeng, Liu Yuanyuan, Cheng Zheng, Duan Yaoqing   

  1. School of Information Management, Central China Normal University, Wuhan 430079
  • Received:2022-07-25 Revised:2022-10-06 Online:2023-02-09 Published:2023-02-09

摘要: [目的/意义] 将海量学术文本观点提取工作由人工转向机器,提高效率的同时又能够保证观点提取的准确性、客观性。[方法/过程] 使用UniLM统一语言预训练模型,训练过程中对模型进行精调,以人工标注数据集进行机器学习。将学术文摘作为长度为a的文本序列,经过机器学习,生成长度为b的句子序列(a≥b),并且作为学术论文观点句输出。[结果/结论] 研究结果表明: UniLM模型对于规范型文摘、半规范型文摘、非规范型文摘观点生成精准度分别为94.36%、77.27%、57.43%,规范型文摘生成效果最好。将机器学习模型应用于长文本观点生成,为学术论文观点生成提供一种新方法。不足之处在于本文模型依赖文摘的结构性,对非规范型文摘观点生成效果有所欠缺。

关键词: 学术文摘, 观点自动生成, UniLM模型, 机器学习

Abstract: [Purpose/Significance] The extraction of views from massive academic texts has shifted from manual to machine, which improves efficiency and ensures the accuracy and objectivity of point of view extraction.[Method/Process] Pre-train models using UniLM unified language, fine-tuning the model during training, and machine learning with manually labeled datasets. Using the academic abstract as a sequence of text of length a, after machine learning, it was possible to generate a sentence sequence of length b (a ≥ b) and output as an academic paper point of view sentence.[Result/Conclusion] The results show that the UniLM model has the best effect on the generation of normative abstracts with 94.36%, semi-canonical abstracts with 77.27%, and non-normative abstracts with 57.43%. Applying machine learning models to long text idea generation provides a new approach to academic paper idea generation. The disadvantage is that the model of this paper relies on the structure of the abstract, and the effect of generating non-normative abstract views is lacking.

Key words: academic abstracts, automatic generation of ideas, UniLM models, machine learning

中图分类号: