Topic Extraction and Evolution for Scientific Literature Based on Hierarchical Probabilistic Topic Model

  • Wang Ping
Expand
  • School of Information Management, Wuhan University, Wuhan 430072

Received date: 2014-09-01

  Revised date: 2014-11-05

  Online published: 2014-11-20

Abstract

Automatic mining scientific literature's topic and observing topic change for researchers will play great role in understanding and accessing the latest research frontiers on certain field. This paper analyzed topic extraction and evolution approaches of scientific papers by examining the characteristics of the diversity and dynamics of scientific papers, and based on hierarchical probabilistic topic model, using Gibbs sampling to estimate the model parameters and choosing the high-quality topic words by means of mutual information. This paper finally used Pro/Post-discretized analysis to study the topic evolution. The experimental results show that topic extraction and evolution method proposed in this paper are feasible and effective.

Cite this article

Wang Ping . Topic Extraction and Evolution for Scientific Literature Based on Hierarchical Probabilistic Topic Model[J]. Library and Information Service, 2014 , 58(22) : 70 -77 . DOI: 10.13266/j.issn.0252-3116.2014.22.012

References

[1] Aizawa A. An information-theoretic perspective of tf-idf measures[J]. Information Processing and Management , 2003, 39(1):45-65.

[2] Salton G, Wong A, Yang C S. A vector space model for automatic indexing [EB/OL]. [2014-11-04]. http://mall.psy.ohio-state.edu/LexicalSemantics/SaltonWongYang75.pdf.

[3] Allan J, Carbonell J G, Doddington G,et al. Topic detection and tracking pilot study final report[C]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.Virginia:DARPA, 1998.

[4] Gruhl D, Guha R, Liben-Nowell D,et al. Information diffusion through blogspace [C]//Proceedings of the 13th International World Wide Web Conference (WWW'04).New York:ACM, 2004:491-501.

[5] Yang Yiming, Carbonell J G, Brown R D, et al. Learning approaches for detecting and tracking news events[J]. IEEE Intelligent Systems, 1999, 14(4): 32-43.

[6] Zhou Ding, Ji Xiang, Zha Hongyuan,et al. Topic evolution and social interactions: How authors effect research[C]//Proceedings of the 15th ACM International Conference on Lnformation and Knowledge Management.Virginia:ACM, 2006:248-257.

[7] Mei Qiaozhu, Zhai Chengxiang. Discovering evolutionary theme patterns from text: An exploration of temporal text mining[C]//Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining.Chicago:ACM, 2005:198-207.

[8] Mei Qiaozhu, Zhai Chengxiang. A mixture model for contextual text mining[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data mining.Philadelphia:ACM, 2006:649-655.

[9] Zhu Mingliang, Hu Weiming, Wu Ou. Topic detection and tracking for threaded discussion communities[C]//Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.Washington:IEEE, 2008: 77-83.

[10] Cheng V, Li C. Topic detection via participation using markov logic network[C]//Proceedings of the 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System-Volume.Shanghai:IEEE, 2007: 85-91.

[11] Sugimoto C R, Li D, Russell T G, et al. The shifting sands of disciplinary development: Analyzing north american library and information science dissertations using iatent dirichlet allocation [J]. Journal of the American Society for Information Science and Technology, 2011, 62(1):185-204.

[12] 王萍. 基于概率主题模型的文献知识挖掘[J]. 情报学报, 2011, 30(6):583-590.

[13] 王金龙, 徐从富, 耿雪玉.基于概率图模型的科研文献主题演化研究[J]. 情报学报,2009,28(3):347-355.

[14] 叶春蕾, 冷伏海. 基于引文——主题概率模型的科技文献主题识别方法研究[J]. 情报理论与实践, 2013, 36(9):100-103.

[15] 贺亮, 李芳. 科技文献话题演化研究[J]. 现代图书情报技术, 2012(4):61-67.

[16] Blei D M, Ng A Y, Jordan M L, et al. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3(2):993-1022.

[17] Bei D M, Griffiths T L, Jordan M L, et al. Hierarchical topic models and the nested chinese restaurant process[C]//Advances in Neural Information Processing Systems.British Columbia:NIPS, 2004, 16: 106-114.

[18] Wang Chong, Blei D M. Variational inference for the nested chinese restaurant process[C]//Advances in Neural Information Processing Systems.British Columbia:NIPS,2009: 1990-1998.

[19] Mimno D. Wallach H M, McCallum A. Gibbs Sampling for logistic normal topic models with graph-based priors [EB/OL]. [2014-11-04]. https://people.cs.umass.edu/~wallach/publications/mimno08gibbs.pdf.

[20] Andrieu C, De Freitas N, Doucet A,et al. Introduction to MCML for machine learning[J]. Machine Learning, 2003, 50:5-43.

[21] Battiti R. Using mutual information for selecting features in supervised neural net learning[J]//IEEE Trans on Neural Networks, 1994, 5(4):537-550.

[22] 单斌, 李芳. 基于LDA主题演化研究方法综述[J]. 中文信息学报,2010, 24(6):43-49.

[23] Wang Xuerui, McCallum A. Topic over time: A non-markov continuous time model of topical trends [C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia:ACM, 2006: 424-433.

[24] Griffiths T L, Steyvers M. Finding scientific topics[C]//Proceeding of the National Academy of Science of United States of America.New York:PNAS, 2004, 101: 5228-5235.

[25] Hall D, Jurafsky D, Manning C D. Studying the history of ideas using topic models [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Hawaii:ACM, 2008:363-371.

[26] Blei D M, Lafferty J D. Dynamic topic models[C]//Proceedings of the 23rd International Conference on Machine Learning. New York:ACM, 2006: 113-120.

[27] 中国社会科学研究评价中心. 中文社会科学引文索引[EB/OL].[2014-08-10].http://cssci.nju.edu.cn/.

Outlines

/