[Purpose/significance] A wide variety of topic models has been developed with improved algorithm. This paper aims to study the research advances, generation process and algorithm of citation based topic models. Additionally, we discuss the application in the text of academic articles and research areas in the future.[Method/process] Based on the data of Web of Science and CNKI database, we collected articles of citation based topic models. In these articles, we selected several representative articles after manual interpretation to analyze the generative process, parameter estimation and inference methods in these citation based topic models.[Result/conclusion] Currently, there are mainly three types of citation based topic models. This includes the topic models which focus on the topic-citation distribution, while other topic models mainly study the relationship between the citing documents and the cited documents. Besides, citation context based topic models are also available. Additionally, more complete topic content can be detected after introducing citation information into the topic models. Moreover, most of the models are the variants of LDA and PLSA. In future, incorporating citation context information into topic models, improving the inference methods and applying the models are some of the future directions.
[1] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003, 3(Jan):993-1022.
[2] 张金松. 基于引文上下文分析的文献检索技术研究[D].大连:大连海事大学, 2013.
[3] HOFMANN T. Probabilistic latent semantic analysis[C]//Association for Uncertainty in Artificial Intelligence. Fifteenth conference on uncertainty in artificial intelligence. Stockholm:Morgan Kaufmann, 1999:289-296.
[4] 范云满, 马建霞. 利用LDA的领域新兴主题探测技术综述[J]. 现代图书情报技术, 2012, 28(12):58-65.
[5] KAWAMAE N. Trend analysis model:trend consists of temporal words, topics, and timestamps[C]//International conference on web search and data mining. Heng Kong:Association for Computing Machinery, 2011:317-326.
[6] ROSEN-ZVI M, GRIFFITHS T, STEYVERS M, et al. The author-topic model for authors and documents[C]//Association for Uncertainty in Artificial Intelligence. Proceedings of the 20th conference on uncertainty in artificial intelligence. Banff:Association for Uncertainty in Artificial Intelligence Press, 2012:487-494.
[7] COHN D, CHANG H. Learning to probabilistically identify authoritative documents[C]//Association for Computing Machinery. Proceedings of the seventeenth international conference on machine learning. San Francisco:Morgan Kaufmann Publishers, 2000:167-174.
[8] COHN D, HOFMANN T. The missing link:a probabilistic model of document content and hypertext connectivity[C]//Neural Information Processing Systems Foundation. Advances in neural information processing systems 13. Cambridge:NIPS, 2000:430-436.
[9] EROSHEVA E, FIENBERG S, LAFFERTY J. Mixed-membership models of scientific publications[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(1):5220-5227.
[10] NGUYEN T, DO P. CitationLDA plus:an extension of LDA for discovering topics in document network[C]//Association for Computing Machinery. International symposium on information and communication technology. Danang City:Association for Computing Machinery, 2018:31-37.
[11] LI Y, HE J, LIU H. Topic analysis and influential paper discovery on scientific publications[C]//14th web information systems and applications conference. Liuzhou:IEEE, 2017:68-73.
[12] TU Y, JOHRI N, ROTH D, et al. Citation author topic model in expert search[C]//Association for Computational Linguistics. International conference on computational linguistics:posters. Beijing:Association for Computational Linguistics, 2010:1265-1273.
[13] LU Z, MAMOULIS N, CHEUNG D. A collective topic model for milestone paper discovery[C]//Association for Computing Machinery. Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. Queensland:Association for Computing Machinery, 2014:1019-1022.
[14] GUO Z, ZHU S, CHI Y, et al. A latent topic model for linked documents[C]//Association for Computing Machinery. Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. Boston:Association for Computing Machinery, 2009:720-721.
[15] HUANG X, CHEN C, PENG C, et al. Topic-sensitive influential paper discovery in citation network[C]//PacificAsia conference on knowledge discovery & data mining. Melbourne:Springer, 2018:16-28.
[16] ZHOU H, HUIMIN Y, ROLAND H. Topic discovery and evolution in scientific literature based on content and citations[J]. Frontiers of information technology & electronic engineering, 2017, 18(10):1511-1532.
[17] LIM K W, BUNTINE W. Bibliographic analysis on research publications using authors, categorical labels and the citation network[J]. Machine learning, 2016, 103(2):185-213.
[18] LIM K W, BUNTINE W. Bibliographic analysis with the citation network topic model[C]//Asian conference on machine learning. JMLR Workshop and conference proceedings. Nha Trang City:Springer, 2014, 39:142-158.
[19] ZHU Y, YAN X, GETOOR L, et al. Scalable text and link analysis with mixed-topic link models[C]//Association for Computing Machinery. Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. Chicago:Association for Computing Machinery, 2013, 47:473-481.
[20] YAN L, NICULESCU-MIZIL A, GRYC W. Topic-link LDA:joint models of topic and author community[C]//Association for Computing Machinery. Proceedings of the 26th annual international conference on machine learning. Montreal:Association for Computing Machinery, 2009:665-672.
[21] BAI H, CHEN Z, LYU M. Neural relational topic models for scientific article analysis[C]//Association for Computing Machinery. Proceedings of the 27th ACM international conference on information and knowledge management. Torino:Association for Computing Machinery, 2018:27-36.
[22] DIETZ L, BICKEL S, SCHEFFER T. Unsupervised prediction of citation influences[C]//Association for Computing Machinery. Proceedings of the 24th international conference on Machine learning. Corvalis:Association for Computing Machinery, 2007:233-240.
[23] KIM M, BAEK I, SONG M. Topic diffusion analysis of a weighted citation network in biomedical literature[J]. Journal of the Association for Information Science and Technology, 2018, 69(2):329-342.
[24] GUO Z, ZHANG Z M, ZHU S, et al. A two-level topic model towards knowledge discovery from citation networks[J]. IEEE transactions on knowledge & data engineering, 2014, 26(4):780-794.
[25] MASADA T, TAKASU A. Extraction of topic evolutions from references in scientific articles and its GPU acceleration[C]//Association for Computing Machinery. International conference on information and knowledge management. Maui:Association for Computing Machinery, 2012:1522-1526.
[26] NALLAPATI R M, AHMED A, XING E P, et al. Joint latent topic models for text and citations[C]//Association for Computing Machinery. Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. Las Vegas:Association for Computing Machinery, 2008:542-550.
[27] CHANG J, BLEI D M. Hierarchical relational models for document networks[J]. Annals of applied statistics, 2010, 4(1):124-150.
[28] TAN L S L, HUI C A, TIAN Z. Topic-adjusted visibility metric for scientific articles[J]. The annals of applied statistics, 2016, 10(1):1-31.
[29] HE Q, CHEN B, PEI J, et al. Detecting topic evolution in scientific literature:how can citations help?[C]//Association for Computing Machinery. Proceedings of the 18th ACM conference on information and knowledge management. Hong Kong:Association for Computing Machinery, 2009:957-966.
[30] SHEN J, SONG Z, LI S, et al. Modeling topic-level academic influence in scientific literatures[C]//Association for the Advancement of Artificial Intelligence. The workshops of the thirtieth AAAI conference on artificial Intelligence. Phoenix:Association for the Advancement of Artificial Intelligence, 2016:1-7.
[31] HUANG L, LIU H, HE J, et al. Finding latest influential research papers through modeling two views of citation links[C]//Asia-pacific web conference, Web technologies and applications. Suzhou:Springer, 2016:555-566.
[32] KIM J, KIM D, OH A. Joint modeling of topics, citations, and topical authority in academic corpora[J]. Transactions of the association for computational linguistics, 2017, 5(1):191-204.
[33] DAI T, ZHU L, CAI X, et al. Explore semantic topics and author communities for citation recommendation in bipartite bibliographic network[J]. Journal of ambient intelligence and humanized computing, 2018, 9(5):957-975.
[34] SMALL H. Citation context analysis[J]. Progress in communication sciences, 1982, 3(9):287-310.
[35] ALJABER B, STOKES N, BAILEY J, et al. Document clustering of scientific texts using citation contexts[J]. Information retrieval, 2010, 13(2):101-131.
[36] BORNMANN L, HAUNSCHILD R, HUG S E. Visualizing the context of citations referencing papers published by Eugene Garfield:a new type of keyword co-occurrence analysis[J]. Scientometrics, 2018, 114(2):427-437.
[37] DOSLU M, BIGNOL H O. Context sensitive article ranking with citation context analysis[J]. Scientometrics, 2016, 108(2):653-671.
[38] LIU S, CHEN C. The differences between latent topics in abstracts and citation contexts of citing papers[J]. Journal of the American Society for Information Science and Technology, 2013, 64(3):627-639.
[39] 杨春艳, 潘有能, 赵莉. 基于语义和引用加权的文献主题提取研究[J]. 图书情报工作, 2016, 60(9):131-138.
[40] LIU X, ZHANG J, GUO C. Full-text citation analysis:a new method to enhance scholarly networks[J]. Journal of the American Society for Information Science and Technology banner, 2013, 64(9):1852-1863.
[41] KATARIA S, MITRA P, BHATIA S. Utilizing context in generative bayesian models for linked corpus[C]//Association for Computing Machinery. Twenty-fourth AAAI conference on artificial intelligence. Atlanta:Association for Computing Machinery, 2010:1340-1345.