情报研究

学术文献引文上下文自动识别研究

  • 雷声伟 ,
  • 陈海华 ,
  • 黄永 ,
  • 陆伟
展开
  • 1. 武汉大学信息管理学院 武汉 430072;
    2. 武汉大学信息检索与知识挖掘研究所 武汉 430072
雷声伟(ORCID:0000-0002-7152-7817),硕士研究生;陈海华(ORCID:0000-0003-2806-3938),硕士研究生;黄永(ORCID:0000-0003-4808-6491),博士研究生。

收稿日期: 2016-06-16

  修回日期: 2016-08-15

  网络出版日期: 2016-09-05

基金资助

本文系国家自然科学基金面上项目"面向词汇功能的学术文本语义识别与知识图谱构建"(项目编号:71473183)研究成果之一。

Research on Automatic Recognition of Academic Citation Context

  • Lei Shengwei ,
  • Chen Haihua ,
  • Huang Yong ,
  • Lu Wei
Expand
  • 1. School of Information Management, Wuhan University, Wuhan 430072;
    2. Institute for Information Retrieval and Knowledge Mining, Wuhan University, Wuhan 430072

Received date: 2016-06-16

  Revised date: 2016-08-15

  Online published: 2016-09-05

摘要

[目的/意义] 引文内容分析能够帮助揭示文献引用关系的深层语义内涵,而引文上下文识别作为引文内容分析的基础显得尤为重要。[方法/过程] 梳理已有引文上下文研究的现状,总结当前引文上下文识别的不足,在此基础上归纳引文上下文识别的5类特征,并采用文本分类和序列标注两种方法开展引文上下文自动识别实验。[结果/结论] 实验结果表明,本文提出的特征能够很好地提升引文上下文识别效果,且基于文本分类的SVM分类效果要优于基于序列标注的CRF。

本文引用格式

雷声伟 , 陈海华 , 黄永 , 陆伟 . 学术文献引文上下文自动识别研究[J]. 图书情报工作, 2016 , 60(17) : 78 -87 . DOI: 10.13266/j.issn.0252-3116.2016.17.012

Abstract

[Purpose/significance] Citation content analysis can help to reveal the deep semantic influence of literature citation relations, and citation context identification as a basis for content analysis is particularly important. [Method/process] This paper reviews the latest development of researches of citation context and summarizes the deficiencies in citation context identification. Based on which five categories of citation context identification features are proposed. Besides, this paper also conducts an automatic identification experiment by utilizing text classification and sequence labeling. [Result/conclusion] A significant improvement over baseline method shows the effectiveness of our features. Besides, the text classification based SVM method performs better than the sequence labeling based CRF method.

参考文献

[1] 刘洋,崔雷.引文上下文在文献内容分析中的信息价值研究[J]. 图书情报工作, 2014, 58(6): 101-104.
[2] ABU-JBARA A, EZRA J, RADEV D R. Purpose and polarity of citation: towards NLP-based bibliometrics[C]//Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Atlanta: Association for Computational Linguistics, 2013: 596-606.
[3] 陆伟, 孟睿, 刘兴帮. 面向引用关系的引文内容标注框架研究[J]. 中国图书馆学报, 2014(6):93-104.
[4] COLLINS H M. The TEA set: tacit knowledge and scientific networks[J]. Social studies of science, 1974, 4(2): 165-185.
[5] CANO V. Citation behavior: classification, utility, and location[J]. Journal of the American Society for Information Science, 1989, 40(4): 284-290.
[6] CHUBIN D E, MOITRA S D. Content analysis of references: adjunct or alternative to citation counting?[J]. Social studies of science, 1975, 5(4):423-441.
[7] NANBA H, OKUMURA M. Towards Multi-paper summarization using reference information[C]// Proceedings of The 1999 International Joint Conference on Artificial Intelligence. Stockholm: AAAI, 1999: 926-931.
[8] ABU-JBARA A, RADEV D. Coherent citation-based summarization of scientific papers[C]//Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies-volume 1. Portland: Association for Computational Linguistics, 2011: 500-509.
[9] ATHAR A. Sentiment analysis of citations using sentence structure-based features[C]//Proceedings of the ACL 2011 student session. Portland: Association for Computational Linguistics, 2011: 81-87.
[10] ANGROSH M A, CRANEFIELD S, STANGER N. Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries[C]//Proceedings of the 10th annual joint conference on digital libraries. Gold Coast: ACM, 2010: 293-302.
[11] QAZVINIAN V, RADEV D R. Identifying non-explicit citing sentences for citation-based summarization[C]//Proceedings of the 48th annual meeting of the association for computational linguistics.Uppsala: Association for Computational Linguistics, 2010: 555-564.
[12] KAN M Y. Identifying citing sentences in research papers using supervised learning[C]//2010 International conference on information retrieval & knowledge management (CAMP). Toronto: IEEE, 2010: 67-72.
[13] ABU-JBARA A, RADEV D. Reference scope identification in citing sentences[C]//Proceedings of the 2012 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Montréal: Association for Computational Linguistics, 2012: 80-90.
[14] ANGROSH M A, CRANEFIELD S, STANGER N. Conditional random field based sentence context identification: enhancing citation services for the research community[C]//Proceedings of the first Australasian Web Conference-Volume 144. Adelaide: Australian Computer Society, 2013: 59-68.
[15] SONDHI P, ZHAI C X. A constrained hidden Markov Model Approach for Non-Explicit Citation Context extraction[C]// Proceedings of the 2014 Society for Industrial and Applied Mathematics International conference on data mining. Pennsylvania: Society for Industrial and Applied Mathematics, 2014: 361-369.
[16] ATHAR A. Sentiment analysis of scientific citations[EB/OL].[2016-05-10]. http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-856.pdf.
[17] 刘盛博, 丁堃. 基于引用内容的引文评价分析[C]//第九届中国科技政策与管理学术年会论文集. 济南:山东省科技发展战略研究所, 2013.
[18] 许德山. 科技论文引用中的观点倾向分析[D]. 北京: 中国科学院文献情报中心, 2012.
[19] 孙枫军. 引文上下文中的概念抽取[D]. 北京: 中国科学信息技术研究所, 2012.
[20] 张金松. 基于引文上下文分析的文献检索技术研究[D]. 大连:大连海事大学, 2013.
[21] SCHAFER U, SPURK C. TAKE scientist's workbench: semantic search and citation-based visual navigation in scholar papers[C]// IEEE International conference on semantic computing. Pittsburgh: IEEE,2010:317-324.
[22] TANG X, WAN X, ZHANG X. Cross-language context-aware citation recommendation in scientific articles[C]// Proceedings of the 37th International ACM SIGIR conference on research & development in information retrieval. Gold Coast:ACM, 2014:817-826.
[23] LIVNE A, GOKULADAS V, TEEVAN J, et al. CiteSight: supporting contextual citation recommendation using differential search[C]// Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. Gold Coast:ACM, 2014:807-816.
[24] COHAN A, GOHARIAN N. Scientific article summarization using citation-context and article's discourse structure[C]// Conference on empirical methods in natural language processing. Lisbon:Association for Computational Linguistics,2015.
[25] 杨杰明. 文本分类中文本表示模型和特征选择算法研究[D].长春:吉林大学, 2013.
[26] 鉴萍, 宗成庆. 基于序列标注模型的分层式依存句法分析方法[J]. 中文信息学报, 2010, 24(6): 14-22.
[27] ATHAR A, TEUFEL S. Detection of implicit citations for sentiment detection[C]//Proceedings of the workshop on detecting structure in scholarly discourse. Jeju Island:Association for Computational Linguistics, 2012: 18-26.
[28] RADEV D R, MUTHUKRISHNAN P, QAZVINIAN V. The ACL anthology network corpus[C]//Proceedings of the 2009 workshop on text and citation analysis for scholarly digital libraries. Stroudsburg:Association for Computational Linguistics, 2009: 54-61.
[29] SCHFER U, WEITZ B. Combining OCR outputs for logical document structure markup: technical background to the ACL 2012 contributed task[C]//Proceedings of the ACL-2012 special workshop on rediscovering 50 years of discoveries. Jeju Island:Association for Computational Linguistics, 2012: 104-109.
[30] [EB/OL]. [2016-05-10]. http://opennlp.apache.org/ to download OpenNLP.
[31] [EB/OL]. [2016-05-10]. http://nlp.stanford.edu/software/lex-parser.shtml.
[32] [EB/OL]. [2016-05-10]. http://www.cs.waikato.ac.nz/ml/weka/.
[33] [EB/OL]. [2016-05-10]. https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
[34] [EB/OL]. [2016-05-10]. http://wing.comp.nus.edu.sg/~forecite/services/parscit-100401/crfpp/CRF++-0.51/doc/.

文章导航

/