Automatic Theory Recognition in Academic Journals Based on CRF

  • Chen Feng ,
  • Zhai Yujia ,
  • Wang Fang
Expand
  • Department of Information Resources Management, Business School of Nankai University, Tianjin 300071

Received date: 2015-10-25

  Revised date: 2015-12-14

  Online published: 2016-01-20

Abstract

[Purpose/significance] Theory recognition in the academic journals is a precondition for content analysis, so the automation of theory recognition can improve the efficiency of content analysis. [Method/process] This paper regards theory recognition as named entity recognition, reviews the existing named entity recognition methods, and proposes a theory recognition model based on semantic generalization. Selecting the part of speech, HowNet semantic and other external knowledge, a series of experiments with CRF model on 1822 academic journal papers are conducted. [Result/conclusion] The accuracy rate of recognition is 95.38% high, but the recall rate is low;the size of the training texts has a large influence on the performance. Semantic resources can improve the performance, but the recall rate is decreased. How to select the semantic features, semantic annotation and semantic disambiguation has to be solved.

Cite this article

Chen Feng , Zhai Yujia , Wang Fang . Automatic Theory Recognition in Academic Journals Based on CRF[J]. Library and Information Service, 2016 , 60(2) : 122 -128 . DOI: 10.13266/j.issn.0252-3116.2016.02.019

References

[1] 陆伟,孟睿,刘兴帮. 面向引用关系的引文内容标注框架研究[J]. 中国图书馆学报, 2014, 35(6):93-104.
[2] CHINCHOR N. MUC-7 named entity task definition[C]//Proceedings of the 7th message understanding conference. Virginia:Association for Computational Linguistics, 1998:1-21.
[3] PETTIGREW K E, MCKECHNIE L. Use of theory in information science research[J]. Journal of the American Society for Information Science and Technology, 2001, 52(1):62-73.
[4] 王芳,史海燕,纪雪梅. 我国情报学研究中理论的应用:基于《情报学报》的内容分析[J]. 情报学报, 2015, 34(6):581-591.
[5] 孙镇,王惠临. 命名实体识别研究进展综述[J]. 现代图书情报技术, 2010(6):42-47.
[6] RATINOV L, DAN R. Design challenges and misconceptions in named entity recognition[C]//Proceedings of the thirteenth conference on computational natural language learning,2009:147-155.
[7] NGUYEN N,TSURUOKA Y. Extracting bacteria biotopes with semi-supervised named entity recognition and coreference resolution[C]//Proceedings of BioNLP shared task 2011 workshop. Oregon:Association for Computational Linguistics, 2011:94-101.
[8] LIAO W, VEERAMACHANENI S. A simple semi-supervised algorithm for named entity recognition[C]//Proceedings of the NAACL HLT 2009 Workshop on semi-supervised learning for natural language processing.Boulder:Association for Computational Linguistics, 2009:58-65.
[9] PENNACCHIOTTI M, PANTEL P. Entity extraction via ensemble semantics[C]//Proceedings of the 2009 conference on empirical methods in natural language processing:Volume 1.Singapore:Association for Computational Linguistics, 2009:238-247.
[10] 孙茂松,黄昌宁,高海燕,等. 中文姓名的自动辨识[J]. 中文信息学报, 1995(2):16-27.
[11] 俞鸿魁,张华平,刘群,等. 基于层叠隐马尔可夫模型的中文命名实体识别[J]. 通信学报, 2006, 27(2):88-94.
[12] 郑逢强,林磊,刘秉权,等. 《知网》在命名实体识别中的应用研究[J]. 中文信息学报, 2008(5):97-101.
[13] YAN E, ZHU Y. Identifying entities from scientific publications:a comparison of vocabulary and model-based methods[J]. Journal of informetrics, 2015, 9(3):455-465.
[14] TALUKDAR P P,REISINGER J,PASCA M, et al. Weakly-supervised acquisition of labeled class instances using graph random walks[C]//Proceedings of the 2008 conference on empirical methods in natural language processing. Honolulu:Association for Computational Linguistics, 2008:582-590.
[15] ARGAMON S,BLOOM K,GARG N. Extracting appraisal expressions[C]//Proceedings of NAACL HLT 2007. Rochester:Association for Computational Linguistics, 2007:308-315.
[16] 索绪尔. 普通语言学教程[M]. 北京:商务印书馆, 1999:170-177.
[17] MIRKIN S,DAGAN I,GEFFET M. Integrating pattern-based and distributional similarity methods for lexical entailment acquisition[C]//Proceedings of COLING-ACL 2006 main conference poster sessions. Sydney:International Committee on Computational Linguistics, 2006:1-8.
[18] LAFFERTY J,MCCALLUM A,PEREIRA F. Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//Proceeding of international conference on machine learning. Williamstown:International Machine Learning Society, 2001:282-289.
[19] LI L, ZHOU R, HUANG D. Two-phase biomedical named entity recognition using CRFs[J]. Computational biology and chemistry, 2009, 33(4):334-338.
[20] 何炎祥,罗楚威,胡彬尧. 基于CRF和规则相结合的地理命名实体识别方法[J]. 计算机应用与软件, 2015, 32(1):179-185.
[21] 付瑞吉. 开放域命名实体识别及其层次化类别获取[D]:哈尔滨:哈尔滨工业大学, 2014.
[22] 靖继鹏,马费成,张向先. 情报科学理论[M]. 北京:科学出版社, 2013:32-90.

Outlines

/