Research on Automatic Recognition of Academic Citation Context

  • Lei Shengwei ,
  • Chen Haihua ,
  • Huang Yong ,
  • Lu Wei
Expand
  • 1. School of Information Management, Wuhan University, Wuhan 430072;
    2. Institute for Information Retrieval and Knowledge Mining, Wuhan University, Wuhan 430072

Received date: 2016-06-16

  Revised date: 2016-08-15

  Online published: 2016-09-05

Abstract

[Purpose/significance] Citation content analysis can help to reveal the deep semantic influence of literature citation relations, and citation context identification as a basis for content analysis is particularly important. [Method/process] This paper reviews the latest development of researches of citation context and summarizes the deficiencies in citation context identification. Based on which five categories of citation context identification features are proposed. Besides, this paper also conducts an automatic identification experiment by utilizing text classification and sequence labeling. [Result/conclusion] A significant improvement over baseline method shows the effectiveness of our features. Besides, the text classification based SVM method performs better than the sequence labeling based CRF method.

Cite this article

Lei Shengwei , Chen Haihua , Huang Yong , Lu Wei . Research on Automatic Recognition of Academic Citation Context[J]. Library and Information Service, 2016 , 60(17) : 78 -87 . DOI: 10.13266/j.issn.0252-3116.2016.17.012

References

[1] 刘洋,崔雷.引文上下文在文献内容分析中的信息价值研究[J]. 图书情报工作, 2014, 58(6): 101-104.
[2] ABU-JBARA A, EZRA J, RADEV D R. Purpose and polarity of citation: towards NLP-based bibliometrics[C]//Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Atlanta: Association for Computational Linguistics, 2013: 596-606.
[3] 陆伟, 孟睿, 刘兴帮. 面向引用关系的引文内容标注框架研究[J]. 中国图书馆学报, 2014(6):93-104.
[4] COLLINS H M. The TEA set: tacit knowledge and scientific networks[J]. Social studies of science, 1974, 4(2): 165-185.
[5] CANO V. Citation behavior: classification, utility, and location[J]. Journal of the American Society for Information Science, 1989, 40(4): 284-290.
[6] CHUBIN D E, MOITRA S D. Content analysis of references: adjunct or alternative to citation counting?[J]. Social studies of science, 1975, 5(4):423-441.
[7] NANBA H, OKUMURA M. Towards Multi-paper summarization using reference information[C]// Proceedings of The 1999 International Joint Conference on Artificial Intelligence. Stockholm: AAAI, 1999: 926-931.
[8] ABU-JBARA A, RADEV D. Coherent citation-based summarization of scientific papers[C]//Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies-volume 1. Portland: Association for Computational Linguistics, 2011: 500-509.
[9] ATHAR A. Sentiment analysis of citations using sentence structure-based features[C]//Proceedings of the ACL 2011 student session. Portland: Association for Computational Linguistics, 2011: 81-87.
[10] ANGROSH M A, CRANEFIELD S, STANGER N. Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries[C]//Proceedings of the 10th annual joint conference on digital libraries. Gold Coast: ACM, 2010: 293-302.
[11] QAZVINIAN V, RADEV D R. Identifying non-explicit citing sentences for citation-based summarization[C]//Proceedings of the 48th annual meeting of the association for computational linguistics.Uppsala: Association for Computational Linguistics, 2010: 555-564.
[12] KAN M Y. Identifying citing sentences in research papers using supervised learning[C]//2010 International conference on information retrieval & knowledge management (CAMP). Toronto: IEEE, 2010: 67-72.
[13] ABU-JBARA A, RADEV D. Reference scope identification in citing sentences[C]//Proceedings of the 2012 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Montréal: Association for Computational Linguistics, 2012: 80-90.
[14] ANGROSH M A, CRANEFIELD S, STANGER N. Conditional random field based sentence context identification: enhancing citation services for the research community[C]//Proceedings of the first Australasian Web Conference-Volume 144. Adelaide: Australian Computer Society, 2013: 59-68.
[15] SONDHI P, ZHAI C X. A constrained hidden Markov Model Approach for Non-Explicit Citation Context extraction[C]// Proceedings of the 2014 Society for Industrial and Applied Mathematics International conference on data mining. Pennsylvania: Society for Industrial and Applied Mathematics, 2014: 361-369.
[16] ATHAR A. Sentiment analysis of scientific citations[EB/OL].[2016-05-10]. http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-856.pdf.
[17] 刘盛博, 丁堃. 基于引用内容的引文评价分析[C]//第九届中国科技政策与管理学术年会论文集. 济南:山东省科技发展战略研究所, 2013.
[18] 许德山. 科技论文引用中的观点倾向分析[D]. 北京: 中国科学院文献情报中心, 2012.
[19] 孙枫军. 引文上下文中的概念抽取[D]. 北京: 中国科学信息技术研究所, 2012.
[20] 张金松. 基于引文上下文分析的文献检索技术研究[D]. 大连:大连海事大学, 2013.
[21] SCHAFER U, SPURK C. TAKE scientist's workbench: semantic search and citation-based visual navigation in scholar papers[C]// IEEE International conference on semantic computing. Pittsburgh: IEEE,2010:317-324.
[22] TANG X, WAN X, ZHANG X. Cross-language context-aware citation recommendation in scientific articles[C]// Proceedings of the 37th International ACM SIGIR conference on research & development in information retrieval. Gold Coast:ACM, 2014:817-826.
[23] LIVNE A, GOKULADAS V, TEEVAN J, et al. CiteSight: supporting contextual citation recommendation using differential search[C]// Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. Gold Coast:ACM, 2014:807-816.
[24] COHAN A, GOHARIAN N. Scientific article summarization using citation-context and article's discourse structure[C]// Conference on empirical methods in natural language processing. Lisbon:Association for Computational Linguistics,2015.
[25] 杨杰明. 文本分类中文本表示模型和特征选择算法研究[D].长春:吉林大学, 2013.
[26] 鉴萍, 宗成庆. 基于序列标注模型的分层式依存句法分析方法[J]. 中文信息学报, 2010, 24(6): 14-22.
[27] ATHAR A, TEUFEL S. Detection of implicit citations for sentiment detection[C]//Proceedings of the workshop on detecting structure in scholarly discourse. Jeju Island:Association for Computational Linguistics, 2012: 18-26.
[28] RADEV D R, MUTHUKRISHNAN P, QAZVINIAN V. The ACL anthology network corpus[C]//Proceedings of the 2009 workshop on text and citation analysis for scholarly digital libraries. Stroudsburg:Association for Computational Linguistics, 2009: 54-61.
[29] SCHFER U, WEITZ B. Combining OCR outputs for logical document structure markup: technical background to the ACL 2012 contributed task[C]//Proceedings of the ACL-2012 special workshop on rediscovering 50 years of discoveries. Jeju Island:Association for Computational Linguistics, 2012: 104-109.
[30] [EB/OL]. [2016-05-10]. http://opennlp.apache.org/ to download OpenNLP.
[31] [EB/OL]. [2016-05-10]. http://nlp.stanford.edu/software/lex-parser.shtml.
[32] [EB/OL]. [2016-05-10]. http://www.cs.waikato.ac.nz/ml/weka/.
[33] [EB/OL]. [2016-05-10]. https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
[34] [EB/OL]. [2016-05-10]. http://wing.comp.nus.edu.sg/~forecite/services/parscit-100401/crfpp/CRF++-0.51/doc/.

Outlines

/