[目的/意义]基于被引频次的传统引文分析法将所有引用同等看待,未能有效区分不同引用之间的差异。采用机器学习和自然语言处理技术对引用文本从不同角度进行自动分类,能够深入揭示文献之间深层次的引用关系。[方法/过程]首先对引用文本自动分类方法进行探索,采用传统机器学习和深度学习技术从引用功能和引用情感两个角度分别构建自动分类器。在此基础上,对计算机领域的1 738篇科学论文和一篇高被引论文的4 132篇施引文献两个语料集进行引用内容分析。[结果/结论]引用功能和引用情感间存在一定的相关性,并在论文中存在明显的位置分布特征;不同学科的施引文献对同一篇论文的引用在功能和情感上均存在显著差异。
[Purpose/Significance] Traditional citation analysis method based on citation frequency treats all citations equally and thus cannot effectively distinguish the differences among various citations. Using machine learning and natural language processing technologies to automatically classify citation texts from different perspectives can in depth reveal the underlying citation relations among scientific papers. [Method/Process] This paper firstly explored the automatic classification methods of citation texts, and used traditional machine learning and emerging deep learning technologies to build automatic classifiers from the two perspectives of citation function and citation sentiment respectively. On this basis, citation content analysis was carried out on two corpora of the 1738 scientific papers in the field of computer science as well as the 4132 citing papers of a highly cited paper. [Result/Conclusion] The analysis results show that there is a certain correlation between citation function and citation sentiment, their positions in the citing papers have an obvious distribution characteristic, and furthermore there is a significant difference in citation function and citation sentiment among the citing papers from different disciplines to the same cited paper.
[1] 祝清松,冷伏海.引文类型识别研究进展[J].图书情报知识, 2013,30(6):70-76.
[2] MORAVCSIK M J, MURUGESAN P. Some results on the function and quality of citations[J]. Social studies of science, 1975, 5(1): 86-92.
[3] VOOS H, DAGAEV K S. Are all citations equal? or, did we op. cit. your idem? [J]. Journal of academic librarianship, 1976, 1(6): 19-21.
[4] OPPENHEIM C, RENN S P. Highly cited old papers and the reasons why they continue to be cited[J]. Journal of the American Society for Information Science, 1978, 29(5): 225-231.
[5] DING Y, ZHANG G, CHAMBERS T, et al. Content-based citation analysis: the next generation of citation analysis[J]. Journal of the Association for Information Science and Technology, 2014, 65(9): 1820-1833.
[6] DONG C, SCHÄFER U. Ensemble-style self-training on citation classification[C]//Proceedings of the 5th international joint conference on natural language processing. Chiang Mai: Asian Federation of Natural Language Processing, 2011: 623-631.
[7] ABU-JBARA A, EZRA J, RADEV D. Purpose and polarity of citation: towards NLP-based bibliometrics[C]//Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Stroudsburg, PA: Association for Computational Linguistics, 2013: 596-606.
[8] JHAR R, JBARA A, QAZVINIAN V, et al. NLP-driven citation analysis for scientometrics[J]. Natural language engineering, 2017, 23(1): 93-130.
[9] SMALL H. Citation context analysis[J]. Progress in communication science, 1982(3):287-310.
[10] 金贤日,欧石燕. 无监督引用文本自动识别与分析[J]. 数据分析与知识发现, 2021, 5(1): 66-77.
[11] 王文娟,马建霞,陈春,等.引文文本分类与实现方法研究综述[J].图书情报工作, 2016,60(6):118-127.
[12] GARFIELD E. Can citation indexing be automated[C]//Symposium proceedings on statistical association methods for mechanized documentation. Washington, DC: U.S. Government Printing Office,1965: 189-192.
[13] SPIEGEL-ROSING I. Science studies: bibliometric and content analysis[J]. Social studies of science, 1977, 7(1): 97-113.
[14] GARZONE M, MERCER R E. Towards an automated citation classifier[C]//Proceedings of the 13th biennial conference of the Canadian Society on Computational Studies of Intelligence: advances in artificial intelligence. New York: Springer Nature, 2000: 337-346.
[15] TEUFEL S, SIDDARTHAN A, TIDHAR D. Automatic classification of citation function[C]//Proceedings of the 2006 conference on empirical methods in natural language processing. Stroudsburg, PA: Association for Computational Linguistics, 2006: 103-110.
[16] PHAM S B, HOFFMANN A. A new approach for scientific citation classification using cue phrases[C]//Proceedings of the 16th Australian conference on artificial intelligence. New York: Springer Nature, 2003:759-771.
[17] MNEG R, LU W, CHI Y, et al. Automatic classification of citation function by new linguistic features[C]//Proceedings of iConference 2017. New York: Springer Nature, 2017:826-830.
[18] BAKHTI K, NIU Z, YOUSIF A, et al. Citation function classification based on ontologies and convolutional neural networks[C]//Proceedings of the international workshop on learning technology for education in cloud. New York: Springer Nature, 2018:105-115.
[19] YOUSIF A, NIU Z, CHAMBUA J, et al. Multi-task learning model based on recurrent convolutional neural networks for citation sentiment and purpose classification[J]. Neurocomputing, 2019, 335(3):195-205.
[20] YOUSIF A, NIU Z, TARUS J K, et al. A survey on sentiment analysis of scientific citations[J]. Artificial intelligence review, 2017,52(3):1805-1838.
[21] 廖君华,刘自强,白如江, 等. 基于引文内容分析的引用情感识别研究[J]. 图书情报工作, 2018, 62(15):112-121.
[22] KUMAR S. Structure and dynamics of signed citation networks[C]//Proceedings of the 25th international conference companion on World Wide Web. Montreal: International World Wide Web Conferences Steering Committee,2016: 63-64.
[23] YAN E, CHEN Z, LI K. Authors' status and the perceived quality of their work: measuring citation sentiment change in Nobel articles[J]. Journal of the Association for Information Science and Technology, 2020,71(3):314-324.
[24] ATHAR A. Sentiment analysis of citations using sentence structure-based features[C]// Proceedings of the ACL 2011 student session. Stroudsburg, PA: Association for Computational Linguistics, 2011:81-87.
[25] TASKIN Z, Al U. A content-based citation analysis study based on text categorization[J]. Scientometrics, 2018, 114(1): 335-357.
[26] LAUSCHER A, GLAVAŠ G,PONZETTO S P, et al. Investigating convolutional networks and domain-specific embeddings for semantic classification of citations[C]// Proceedings of the 6th international workshop on mining scientific publications. New York: Association for Computing Machinery, 2017:24-28.
[27] RAVI K, SETLUR S, RAVI V, et al. Article citation sentiment analysis using deep learning[C]//Proceedings of the IEEE 17th international conference on cognitive informatics & cognitive computing. New York: IEEE, 2018: 78-85.
[28] JOCHIM C, SCHVTZE H. Towards a generic and flexible citation classifier based on a faceted classification scheme[C]//Proceedings of COLING 2012. Mumbai: The COLING 2012 Organizing Committee, 2012: 1343-1358.
[29] MCCAIN K W, SALVUCCI L J. How influential is Brooks' law? a longitudinal citation context analysis of Frederick Brooks' the mythical man-month[J]. Journal of information science, 2006, 32(3): 277-295.
[30] 章成志,李卓,赵梦圆,等.基于引文内容的中文图书被引行为研究[J].中国图书馆学报, 2019,45(3):96-109.
[31] 刘盛博,王博,唐德龙,等.基于引用内容的论文影响力研究——以诺贝尔奖获得者论文为例[J].图书情报工作,2015,59(24):109-114.
[32] 耿树青,杨建林.基于引用情感的论文学术影响力评价方法研究[J].情报理论与实践,2018,41(12):93-98.
[33] 李铮,邓三鸿,孔嘉,等.学者学术影响力识别研究——基于引文全数据的视角.图书情报工作,2020,64(12):87-94.
[34] JURGENS D, KUMAR S, HOOVER R, et al. Measuring the evolution of a scientific field through citation frames[J]. Transactions of the Association of Computational Linguistics, 2018,6: 391-406.
[35] WILSON T, WIEBE J, HOFFMANN P. Recognizing contextual polarity in phrase-level sentiment analysis[C]//Proceedings of human language technology conference and conference on empirical methods in natural language processing. Stroudsburg, PA: Association for Computational Linguistics, 2005:347-354.