图书情报工作 ›› 2022, Vol. 66 ›› Issue (12): 125-138.DOI: 10.13266/j.issn.0252-3116.2022.12.012

• 综述述评 • 上一篇    下一篇

学术论文中问题与方法识别及其关系抽取研究综述

张颖怡1, 章成志1, Daqing, He2   

  1. 1. 南京理工大学经济管理学院信息管理系 南京 210094;
    2. 匹兹堡大学计算与信息学院 匹兹堡 15260
  • 收稿日期:2021-11-22 修回日期:2022-03-31 出版日期:2022-06-20 发布日期:2022-06-25
  • 通讯作者: 章成志,教授,博士,博士生导师,通信作者,E-mail: zhangcz@njust.edu.cn
  • 作者简介:张颖怡,博士研究生;Daqing He,教授,博士生导师。

A Review of Problem and Method Recognition and Relation Extraction in Academic Papers

Zhang Yingyi1, Zhang Chengzhi1, Daqing He   

  1. 1. Department of Information Management, School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094;
    2. School of Computing and Information, University of Pittsburgh, Pittsburgh 15260
  • Received:2021-11-22 Revised:2022-03-31 Online:2022-06-20 Published:2022-06-25

摘要: [目的/意义]问题和方法是学术论文的重要组成部分。将散落在学术论文中的问题与方法进行有效组织,例如问题与方法识别及其之间的关系抽取,可挖掘学术论文中的隐性知识,促进学科的方法体系和问题体系构建。对学术论文中问题与方法识别及其关系抽取的相关研究工作的梳理,有助于把握该研究的发展趋势、发现该研究中存在的不足,并为未来的工作提供借鉴和指导。[方法/过程]在学术论文的问题和方法的挖掘方面,现有研究围绕4个研究点展开,分别是问题与方法及其关系定义、问题与方法及其关系标注数据集构建、问题与方法识别及其关系抽取的模型设计以及问题与方法及其关系的应用。本文分别对这4个研究点进行梳理,归纳总结现有学术论文中问题和方法知识挖掘的现状。[结果/结论]分析发现,在问题与方法的相关定义中,较少结合科学哲学中的问题学等理论进行定义;在问题与方法数据集构建中,存在数据集重复标注的现象,另外,开源数据集集中在自然科学领域且一般为英语语料,中文开源语料稀缺;在问题与方法识别及其关系抽取中,现有抽取模型性能较低;有关问题和方法的研究不应止步于词语识别和关系抽取,需对挖掘出的知识进行深入分析和应用。

关键词: 问题识别, 方法识别, 关系抽取, 学术论文信息挖掘

Abstract: [Purpose/Significance] Problems and methods are important parts of academic papers. Effectively organizing the problems and methods scattered in the academic papers, such as problem and method recognition and their relationship extraction, can mine the tacit knowledge in the academic papers and promote the construction of the method system and problem system in a discipline. To sort out previous studies on problem and method recognition and relationship extraction in academic papers, we can grasp the development trend, discover the shortcomings in this research, and provide guidance for future work. [Method/Process] In terms of mining problems and methods in academic papers, recent research was carried out around four research points, i.e., the definition of problems, methods and their relationship, the construction of problems, methods and their relationship datasets, problem and method recognition and relationship extraction methods, and the application of problems, methods and their relationship. This paper sorted out these four research points separately and summarized the current situation of knowledge mining of problems and methods in academic papers. [Result/Conclusion] The analysis finds that in the definitions of problems and methods, they seldom take the theories such as problemology in the philosophy of science into account; In problem and method dataset construction, there is a phenomenon of repeated annotations. Furthermore, most open-source datasets are in the field of natural science and are generally English corpus, while Chinese open-source corpus are scarce; In the problem and method recognition and relationship extraction, the performance of the existing extraction model is still low; The mining of problems and methods should not stop at concept recognition and relationship extraction, and in-depth analysis and application of the extracted knowledge is required.

Key words: problem recognition, method recognition, relationship extraction, information mining of academic papers

中图分类号: