图书情报工作 ›› 2016, Vol. 60 ›› Issue (2): 122-128.DOI: 10.13266/j.issn.0252-3116.2016.02.019

• 知识组织 • 上一篇    下一篇

基于条件随机场的学术期刊中理论的自动识别方法

陈锋, 翟羽佳, 王芳   

  1. 南开大学商学院信息资源管理系 天津 300071;
    南开大学商学院网络社会治理研究中心 天津 300071
  • 收稿日期:2015-10-25 修回日期:2015-12-14 出版日期:2016-01-20 发布日期:2016-01-20
  • 通讯作者: 王芳(ORCID:0000-0002-2655-9975),网络社会治理研究中心主任,教授,博士生导师,通讯作者,E-mail:wangfangnk@nankai.edu.cn
  • 作者简介:陈锋(ORCID:0000-0002-0214-8100),博士研究生;翟羽佳(ORCID:0000-0002-3231-4077),博士研究生。
  • 基金资助:

    本文系国家社会科学基金重大项目"我国网络社会治理研究"(项目编号:14ZDA063)研究成果之一。

Automatic Theory Recognition in Academic Journals Based on CRF

Chen Feng, Zhai Yujia, Wang Fang   

  1. Department of Information Resources Management, Business School of Nankai University, Tianjin 300071
  • Received:2015-10-25 Revised:2015-12-14 Online:2016-01-20 Published:2016-01-20

摘要:

[目的/意义]从学术期刊中抽取其中的理论是对文献进行内容分析的前提,实现理论名称识别的自动化可以提高内容分析的效率。[方法/过程]将理论识别视为一类命名实体识别问题,总结现有的命名实体识别的常用方法,提出一个基于语义泛化思想的命名实体识别方法,选取词性、知网义原等外部知识,采用CRF模型对《情报学报》1822篇论文的标题和摘要进行实验。[结果/结论]实验表明,识别准确率最高达到95.38%,但召回率较低;训练语料规模对性能影响较大,不同程度的语义泛化方法对准确率和召回率有复杂影响。如何选择语义特征、语义标注和语义消歧是需要解决的新问题。

关键词: 理论识别, 命名实体识别, 引文分析, 语义泛化

Abstract:

[Purpose/significance] Theory recognition in the academic journals is a precondition for content analysis, so the automation of theory recognition can improve the efficiency of content analysis. [Method/process] This paper regards theory recognition as named entity recognition, reviews the existing named entity recognition methods, and proposes a theory recognition model based on semantic generalization. Selecting the part of speech, HowNet semantic and other external knowledge, a series of experiments with CRF model on 1822 academic journal papers are conducted. [Result/conclusion] The accuracy rate of recognition is 95.38% high, but the recall rate is low;the size of the training texts has a large influence on the performance. Semantic resources can improve the performance, but the recall rate is decreased. How to select the semantic features, semantic annotation and semantic disambiguation has to be solved.

Key words: theory recognition, named entity recognition(NER), citation content analysis, semantic generalization

中图分类号: