图书情报工作 ›› 2019, Vol. 63 ›› Issue (13): 95-104.DOI: 10.13266/j.issn.0252-3116.2019.13.010

• 知识组织 • 上一篇    下一篇

多层次融合的学术文本结构功能识别研究

王佳敏1,2, 陆伟1,2, 刘家伟1,2, 程齐凯1,2   

  1. 1. 武汉大学信息管理学院 武汉 430072;
    2. 武汉大学信息检索与知识挖掘研究所 武汉 430072
  • 收稿日期:2018-09-13 修回日期:2019-01-25 出版日期:2019-07-05 发布日期:2019-07-05
  • 通讯作者: 陆伟(ORCID:0000-0002-0929-7416),副院长,教授,博士生导师,通讯作者,E-mail:weilu@whu.edu.cn
  • 作者简介:王佳敏(ORCID:0000-0003-3954-0381),博士研究生;刘家伟(ORCID:0000-0002-2774-1509),硕士研究生;程齐凯(ORCID:0000-0003-3904-8901),讲师,博士。
  • 基金资助:
    本文系国家自然科学基金面上项目"面向词汇功能的学术文本语义识别与知识图谱构建研究"(项目编号:71473183)研究成果之一。

Research on Structure Function Recognition of Academic Text Based on Multi-level Fusion

Wang Jiamin1,2, Lu Wei1,2, Liu Jiawei1,2, Cheng Qikai1,2   

  1. 1. School of Information Management, Wuhan University, Wuhan 430072;
    2. Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan 430072
  • Received:2018-09-13 Revised:2019-01-25 Online:2019-07-05 Published:2019-07-05

摘要: [目的/意义]学术文本结构功能是对学术文献的结构和章节功能的概括,针对当前研究较少从学术文本多层次结构出发进行融合和传统方法依赖人工经验构建规则或特征的问题,本文在对学术文本层次结构进行解析的基础上,构建了多层次融合的学术文本结构功能识别模型。[方法/过程]以ScienceDirect数据集为例进行实验,该模型首先通过深度学习方法对不同层次学术文本进行结构功能识别,接着采用投票方法对不同层次和不同模型的识别结果进行融合。[结果/结论]研究结果表明各层次集成后的整体效果较单一模型均有不同程度提升,综合结果的整体准确率、召回率和F1值分别达到86%、84%和84%,并且深度学习算法在学术文本分类任务中的性能较传统机器学习算法SVM更优,最后对学术文本结构功能错分情况进行了分析,指出本研究潜在的应用领域和下一步的研究方向。

关键词: 深度学习, 结构功能, 多层次融合, 学术文本

Abstract: [Purpose/significance] The structure function of the academic text refers to the summarization of academic text structure and section function. While few of existed studies pay attention to the fusion of multi-level structure of academic text, and the traditional methods usually rely on artificial experience to build rules or features. After the analysis of the multi-level structure of academic text, we construct a structure function recognition model based on multi-level fusion.[Method/process] We use the academic text dataset from ScienceDirect for experiment. First, we apply deep learning algorithms to identify the structure function of academic text at different level. Then we employ the voting method to fuse the results from different levels and models.[Result/conclusion] The results show that the performance improved to varying degrees after fusion. The precision, recall and F1 value of the combined results reached 86%, 84% and 84%, respectively. Compared with the traditional machine learning algorithm SVM, the deep learning algorithm has better performance in the task of academic text classification. Finally, we analyze the misclassification of the structure function of academic text and point out the potential application fields and future research directions.

Key words: deep learning, structure function, multi-level fusion, academic text

中图分类号: