Research Progress of Scientific and Technical Literature Topic Detection and Evolution Based on Topic Model in China

  • Wang Yanpeng
Expand
  • 1. National Science Library, Chinese Academy of Sciences, Beijing 100190;
    2. University of Chinese Academy of Sciences, Beijing 100049

Received date: 2015-01-04

  Revised date: 2016-01-23

  Online published: 2016-02-05

Abstract

[Purpose/significance] This paper studiesresearch progress of scientific and technical literature topic detection and evolution based on topic model in China, to provide reference and idea for related researchers. [Method/process] It selects CNKI and WANGFANG DATA as data sources, retrievesand screensrelated articles, extractsthe analysis process of literature topic detection and evolution based on topic model by manual interpretation, and concludes the strategies and method that Chinese researchers use in the process with literature analysis method. [Result/conclusion] This related research is comparatively mature, and the analysis process is comparatively complete, and the strategies,methods and tools involved in every steps of analysis process are of diversity. On the other hand, there are some shortages, such as the application of topic model in literature topic detection and evolution is not that mature, the number of topics is constant, and it islack of evaluation methods and standards for application effect of topic model.

Cite this article

Wang Yanpeng . Research Progress of Scientific and Technical Literature Topic Detection and Evolution Based on Topic Model in China[J]. Library and Information Service, 2016 , 60(3) : 130 -137 . DOI: 10.13266/j.issn.0252-3116.2016.03.019

References

[1] 马秀敏. 中国典型管理期刊文献主题发现与演化分析[D]. 大连:大连理工大学, 2011.
[2] LANDAUER T K, MCNAMARA D S, DENNIS S, et al. Handbook of latent semantic analysis[M]. Mahwah:Lawrence Erlbaum Associates, 2007.
[3] DEERWESTER S C, DUMAIS S T, LANDAUER T K, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6):391-407.
[4] HOFMANN T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval.New York:ACM Press, 1999:50-57.
[5] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. The journal of machine learning research, 2003, 3(3):993-1022.
[6] GRIFFITHS D, TENENBAUM M. Hierarchical topic models and the nested Chinese restaurant process[C]//Advances in neural information processing systems 16:proceedings of the 2003 conference. Cambridge:MIT Press, 2004:17-24.
[7] ROSEN-ZVI M, GRIFFITHS T, STEMVERS M, et al. The author-topic model for authors and documents[C]//Proceedings of the 20th conference on uncertainty in artificial intelligence.Arlington:AUAI Press, 2004:487-494.
[8] STEYVERS M, SMYTH P, ROSEN-ZVI M, et al. Probabilistic author-topic models for information discovery[C]//Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. New York:ACM Press, 2004:306-315.
[9] ROSEN-ZVI M, CHEMUDUGUNTA C, GRIFFITHS T, et al. Learning author-topic models from text corpora[J]. ACM transactions on information systems (TOIS), 2010, 28(1):312-324.
[10] BLEI D M, LAFFERTY J D. Dynamic topic models[C]//Proceedings of the 23rd international conference on machine learning. New York:ACM Press, 2006:113-120.
[11] 赵迎光, 洪娜, 安新颖. 主题模型在主题演化方法中的应用研究进展[J]. 现代图书情报技术, 2014, 30(10):63-69.
[12] 范云满, 马建霞. 利用LDA的领域新兴主题探测技术综述[J]. 现代图书情报技术, 2012, 28(12):58-65.
[13] 王金龙, 徐从富, 耿雪玉. 基于概率图模型的科研文献主题演化研究[J]. 情报学报, 2009,28(3):347-355.
[14] 徐戈, 王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8):1423-1436.
[15] 史庆伟, 李艳妮, 郭朋亮. 科技文献中作者研究兴趣动态发现[J]. 计算机应用, 2013, 33(11):3080-3083.
[16] 王萍. 基于概率主题模型的文献知识挖掘[J]. 情报学报, 2011, 30(6):583-590.
[17] 张金松, 陈燕, 刘晓钟. 基于主题模型的文献引用贡献分析[J]. 图书情报工作, 2013, 57(4):120-124,137.
[18] 王博, 刘盛博, 丁堃, 等. 基于LDA主题模型的专利内容分析方法[J]. 科研管理, 2015, 36(36):111-117.
[19] 方延风, 陈健. 基于主题模型的科技项目主题分布研究[J]. 中国科技信息, 2015, 26(7):37-40.
[20] 冯书晓, 徐新. 国内中文分词技术研究新进展[J]. 情报杂志, 2002, 21(11):29-30.
[21] 奉国和, 郑伟. 国内中文自动分词技术研究综述[J]. 图书情报工作, 2011, 54(2):41-45.
[22] 蒋卓人, 陈燕, 高良才, 等. 一种结合有监督学习的动态主题模型[J]. 北京大学学报(自然科学版), 2015, 51(2):367-376.
[23] 姚旭, 王晓丹, 张玉玺, 等. 特征选择方法综述[J]. 控制与决策, 2012, 27(2):161-166.
[24] 王平. 基于层次概率主题模型的科技文献主题发现及演化[J]. 图书情报工作, 2014, 58(22):70-77.
[25] 刘卫江. 基于主题模型的科技监测研究与实现[D]. 南京:南京理工大学, 2014.
[26] 张才东. 基于LDA和HMM的文本主题演化模型及其应用研究[D]. 厦门:厦门大学, 2013.
[27] 秦晓慧, 乐小虬. 基于LDA主题关联过滤的领域主题演化研究[J]. 现代图书情报技术, 2015, 31(3):18-25.
Outlines

/