图书情报工作 ›› 2021, Vol. 65 ›› Issue (14): 128-137.DOI: 10.13266/j.issn.0252-3116.2021.14.015

• 知识组织 • 上一篇    下一篇

基于题录信息的领域学术文献细粒度分类方法研究

雷兵1,2, 刘小1,2, 钟镇1,2   

  1. 1 河南工业大学管理学院 郑州 450001;
    2 河南工业大学商务智能与知识工程实验室 郑州 450001
  • 收稿日期:2020-11-05 修回日期:2021-02-25 出版日期:2021-07-20 发布日期:2021-07-21
  • 通讯作者: 钟镇(ORCID:0000-0001-6248-2226),副教授,博士,通讯作者,E-mail:zhongzhen@haut.edu.cn。
  • 作者简介:雷兵(ORCID:0000-0002-1073-4724),教授,博士;刘小(ORCID:0000-0002-6770-7583),硕士研究生
  • 基金资助:
    本文系国家自然科学基金项目"作者、期刊与数据库错误引文的科学计量学研究:识别方法、产生机理与抑控对策"(项目编号:71603073)和河南省高校哲学社会科学创新团队资助项目"大数据与管理决策"(项目编号:2019-CXTD-04)研究成果之一。

Research on Fine-Grain Classification Method of Academic Literature Based on Bibliographies

Lei Bing1,2, Liu Xiao1,2, Zhong Zhen1,2   

  1. 1 School of Management, Henan University of Technology, Zhengzhou 450001;
    2 Business Intelligence and Knowledge Engineering Laboratory, Henan University of Technology, Zhengzhou 450001
  • Received:2020-11-05 Revised:2021-02-25 Online:2021-07-20 Published:2021-07-21

摘要: [目的/意义] 针对领域学术文献,基于题录信息构建按照"研究内容"与"研究方法"的双标签分类模型,为学术文献的细粒度分类提供方法借鉴。[方法/过程] 以深度学习中卷积神经网络为基础模型,将题名、摘要、关键词、刊名、作者、机构等题录信息分为显性特征和隐性特征,通过显性特征提取、隐性特征映射等步骤,形成特征词数组,在此基础上生成词向量矩阵,经过卷积层、池化层与Softmax层处理,完成分类任务。[结果/结论] 以电子商务领域文献为例进行实验验证,结果显示,该模型按"研究内容"与"研究方法"双标签分类的宏F1值分别为0.74、0.81,不仅明显优于传统机器学习方法,也比仅使用显性特征的深度学习分类方法高。

关键词: 学术文献, 主题分类, 题录信息, 深度学习, 卷积神经网络

Abstract: [Purpose/significance] Targeting the academic literature in a specific field, a dual classification model in "research content" and "research method" is constructed based on bibliographies, aiming to provide method reference for fine-grain classification of academic literature. [Method/process] Using the convolutional neural network in deep learning as the basic model, the title, abstract, keyword, source, author, organ and other bibliographies were divided into dominant feature and invisible feature. Through dominant feature extraction, invisible feature mapping and other steps, a feature word array was formed. On this basis, the word vector matrix was constructed, which processed by the convolutional layer, pooling layer and Softmax layer to complete the classification task. [Result/conclusion] Take the literature in the e-commerce field as an example for experimental verification. The results show that the macro F1 values of this model are 0.74 and 0.81 respectively according to the two categories of "research content" and "research method". The classification results are not only significantly better than traditional machine learning methods, but also higher than deep learning classification methods that only use dominant feature.

Key words: academic literature, subject classification, bibliographies, deep learning, convolutional neural network

中图分类号: