收稿日期: 2016-06-16
修回日期: 2016-08-15
网络出版日期: 2016-09-05
基金资助
本文系国家自然科学基金面上项目"面向词汇功能的学术文本语义识别与知识图谱构建"(项目编号:71473183)研究成果之一。
Research on Automatic Recognition of Academic Citation Context
Received date: 2016-06-16
Revised date: 2016-08-15
Online published: 2016-09-05
雷声伟 , 陈海华 , 黄永 , 陆伟 . 学术文献引文上下文自动识别研究[J]. 图书情报工作, 2016 , 60(17) : 78 -87 . DOI: 10.13266/j.issn.0252-3116.2016.17.012
[Purpose/significance] Citation content analysis can help to reveal the deep semantic influence of literature citation relations, and citation context identification as a basis for content analysis is particularly important. [Method/process] This paper reviews the latest development of researches of citation context and summarizes the deficiencies in citation context identification. Based on which five categories of citation context identification features are proposed. Besides, this paper also conducts an automatic identification experiment by utilizing text classification and sequence labeling. [Result/conclusion] A significant improvement over baseline method shows the effectiveness of our features. Besides, the text classification based SVM method performs better than the sequence labeling based CRF method.
[1] 刘洋,崔雷.引文上下文在文献内容分析中的信息价值研究[J]. 图书情报工作, 2014, 58(6): 101-104.
[2] ABU-JBARA A, EZRA J, RADEV D R. Purpose and polarity of citation: towards NLP-based bibliometrics[C]//Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Atlanta: Association for Computational Linguistics, 2013: 596-606.
[3] 陆伟, 孟睿, 刘兴帮. 面向引用关系的引文内容标注框架研究[J]. 中国图书馆学报, 2014(6):93-104.
[4] COLLINS H M. The TEA set: tacit knowledge and scientific networks[J]. Social studies of science, 1974, 4(2): 165-185.
[5] CANO V. Citation behavior: classification, utility, and location[J]. Journal of the American Society for Information Science, 1989, 40(4): 284-290.
[6] CHUBIN D E, MOITRA S D. Content analysis of references: adjunct or alternative to citation counting?[J]. Social studies of science, 1975, 5(4):423-441.
[7] NANBA H, OKUMURA M. Towards Multi-paper summarization using reference information[C]// Proceedings of The 1999 International Joint Conference on Artificial Intelligence. Stockholm: AAAI, 1999: 926-931.
[8] ABU-JBARA A, RADEV D. Coherent citation-based summarization of scientific papers[C]//Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies-volume 1. Portland: Association for Computational Linguistics, 2011: 500-509.
[9] ATHAR A. Sentiment analysis of citations using sentence structure-based features[C]//Proceedings of the ACL 2011 student session. Portland: Association for Computational Linguistics, 2011: 81-87.
[10] ANGROSH M A, CRANEFIELD S, STANGER N. Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries[C]//Proceedings of the 10th annual joint conference on digital libraries. Gold Coast: ACM, 2010: 293-302.
[11] QAZVINIAN V, RADEV D R. Identifying non-explicit citing sentences for citation-based summarization[C]//Proceedings of the 48th annual meeting of the association for computational linguistics.Uppsala: Association for Computational Linguistics, 2010: 555-564.
[12] KAN M Y. Identifying citing sentences in research papers using supervised learning[C]//2010 International conference on information retrieval & knowledge management (CAMP). Toronto: IEEE, 2010: 67-72.
[13] ABU-JBARA A, RADEV D. Reference scope identification in citing sentences[C]//Proceedings of the 2012 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Montréal: Association for Computational Linguistics, 2012: 80-90.
[14] ANGROSH M A, CRANEFIELD S, STANGER N. Conditional random field based sentence context identification: enhancing citation services for the research community[C]//Proceedings of the first Australasian Web Conference-Volume 144. Adelaide: Australian Computer Society, 2013: 59-68.
[15] SONDHI P, ZHAI C X. A constrained hidden Markov Model Approach for Non-Explicit Citation Context extraction[C]// Proceedings of the 2014 Society for Industrial and Applied Mathematics International conference on data mining. Pennsylvania: Society for Industrial and Applied Mathematics, 2014: 361-369.
[16] ATHAR A. Sentiment analysis of scientific citations[EB/OL].[2016-05-10]. http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-856.pdf.
[17] 刘盛博, 丁堃. 基于引用内容的引文评价分析[C]//第九届中国科技政策与管理学术年会论文集. 济南:山东省科技发展战略研究所, 2013.
[18] 许德山. 科技论文引用中的观点倾向分析[D]. 北京: 中国科学院文献情报中心, 2012.
[19] 孙枫军. 引文上下文中的概念抽取[D]. 北京: 中国科学信息技术研究所, 2012.
[20] 张金松. 基于引文上下文分析的文献检索技术研究[D]. 大连:大连海事大学, 2013.
[21] SCHAFER U, SPURK C. TAKE scientist's workbench: semantic search and citation-based visual navigation in scholar papers[C]// IEEE International conference on semantic computing. Pittsburgh: IEEE,2010:317-324.
[22] TANG X, WAN X, ZHANG X. Cross-language context-aware citation recommendation in scientific articles[C]// Proceedings of the 37th International ACM SIGIR conference on research & development in information retrieval. Gold Coast:ACM, 2014:817-826.
[23] LIVNE A, GOKULADAS V, TEEVAN J, et al. CiteSight: supporting contextual citation recommendation using differential search[C]// Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. Gold Coast:ACM, 2014:807-816.
[24] COHAN A, GOHARIAN N. Scientific article summarization using citation-context and article's discourse structure[C]// Conference on empirical methods in natural language processing. Lisbon:Association for Computational Linguistics,2015.
[25] 杨杰明. 文本分类中文本表示模型和特征选择算法研究[D].长春:吉林大学, 2013.
[26] 鉴萍, 宗成庆. 基于序列标注模型的分层式依存句法分析方法[J]. 中文信息学报, 2010, 24(6): 14-22.
[27] ATHAR A, TEUFEL S. Detection of implicit citations for sentiment detection[C]//Proceedings of the workshop on detecting structure in scholarly discourse. Jeju Island:Association for Computational Linguistics, 2012: 18-26.
[28] RADEV D R, MUTHUKRISHNAN P, QAZVINIAN V. The ACL anthology network corpus[C]//Proceedings of the 2009 workshop on text and citation analysis for scholarly digital libraries. Stroudsburg:Association for Computational Linguistics, 2009: 54-61.
[29] SCHFER U, WEITZ B. Combining OCR outputs for logical document structure markup: technical background to the ACL 2012 contributed task[C]//Proceedings of the ACL-2012 special workshop on rediscovering 50 years of discoveries. Jeju Island:Association for Computational Linguistics, 2012: 104-109.
[30] [EB/OL]. [2016-05-10]. http://opennlp.apache.org/ to download OpenNLP.
[31] [EB/OL]. [2016-05-10]. http://nlp.stanford.edu/software/lex-parser.shtml.
[32] [EB/OL]. [2016-05-10]. http://www.cs.waikato.ac.nz/ml/weka/.
[33] [EB/OL]. [2016-05-10]. https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
[34] [EB/OL]. [2016-05-10]. http://wing.comp.nus.edu.sg/~forecite/services/parscit-100401/crfpp/CRF++-0.51/doc/.
/
〈 | 〉 |