图书情报工作 ›› 2020, Vol. 64 ›› Issue (14): 94-103.DOI: 10.13266/j.issn.0252-3116.2020.14.010

• 情报研究 • 上一篇    下一篇

机器学习在术语抽取研究中的文献计量分析

邱科达, 马建玲   

  1. 中国科学院兰州文献情报中心 兰州 730000;中国科学院西北生态环境资源研究院 兰州 730000;中国科学院大学经济与管理学院图书情报与档案管理系 北京 100049
  • 收稿日期:2019-08-23 修回日期:2020-04-16 出版日期:2020-07-20 发布日期:2020-07-20
  • 通讯作者: 马建玲(0000-0003-4933-5904),信息系统部副主任,研究馆员,硕士生导师,通讯作者:E-mail:majl@lzb.ac.cn
  • 作者简介:邱科达(ORCID:0000-0002-2826-8899),硕士研究生。
  • 基金资助:
    本文系国家自然科学基金面上项目"气候变化科学成果集成研究范式及其实现平台研究"(项目编号:41671535)和中国科学院文献情报能力建设专项"开放学术资源体系"(项目编号:Y7ZG081001)研究成果之一。

A Statistical Analysis of Literature on Term Extraction Based on Machine Learning

Qiu Keda, Ma Jianling   

  1. Lanzhou Library Chinese Academy of Sciences, Lanzhou 730000 Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000 Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100049
  • Received:2019-08-23 Revised:2020-04-16 Online:2020-07-20 Published:2020-07-20

摘要: [目的/意义] 梳理和总结基于机器学习的自动术语抽取的相关研究,为领域相关人员提供参考。[方法/过程] 在CNKI和EndNote的分析工具基础上,应用文献计量对主题的年度趋势和核心机构进行宏观分析,然后从抽取技术方法、数据集和评价以及应用3个方面进行主题内容分析。[结果/结论] 近些年,术语抽取研究取得了很大的进步,是知识系统、自然语言处理、情报分析等领域的基础工作。随着自然语言处理领域的迅猛发展,抽取技术开始朝着深度学习方向发展,但术语抽取的基础理论体系还有待完善,如评价指标、语料选取和效果评价方法。

关键词: 术语抽取, 机器学习, 知识组织, 文献计量

Abstract: [Purpose/significance] The purpose of this paper is to sort out and summarize the relevant content of the automatic term extraction research based on machine learning, and to provide a reference for related personnel in the field. [Method/process] Firstly, this paper applied literature measurement to conduct a macro analysis of the subject's annual trends and core institutions based on the analysis tools of CNKI and EndNote, then it carried out the subject analysis from 3 aspects:extraction of technical methods, data sets and evaluation, and application. [Result/conclusion] In recent years, term extraction research has made great progress, and is the basic work in the fields of knowledge systems, natural language processing, and information analysis. With the rapid development of natural language processing, extraction technology has begun to develop in the direction of deep learning, but the basic theoretical system of term extraction still needs to be improved, such as evaluation indicators, corpus selection and effect evaluation methods.

Key words: term extraction, machine learning, knowledge organization, bibliometrics

中图分类号: