[Purpose/significance] Topic evolution analysis plays an important role in detection the technology frontier detection and innovation strategy deployment. [Method/process] In this paper, the topic evolution analysis process was divided into several steps: topic representation, similarity correlation and intensity evolution calculation. The LDA model was used to represent the topic; content, co-occurrence, and trend similarity were proposed for topic correlation calculations, and the prophet-based pre-train fine-tuning model was used to predict the topic trends. An empirical analysis was conducted using the stem cell field as an example. [Result/conclusion] Experiments show that the Logistic growth model has a R2Score of more than 0.90 for each topic. It shows that the Logistic growth model in Prophet is consistent with the growth trend of topics, and can fit the evolution trend of the topic intensity. The topic evolution model proposed in this paper has certain reference to topic distribution and evolution analysis in specific fields.
Zhang Xin
,
Wen Yi
,
Xu Haiyun
,
Liu Zhongyu
. Prophet Prediction-Correction Topic Evolution Model——A Case Study in Stem Cell Field[J]. Library and Information Service, 2020
, 64(8)
: 78
-92
.
DOI: 10.13266/j.issn.0252-3116.2020.08.010
[1] 罗文馨,王园园.技术主题演化研究方法综述[J].知识管理论坛,2018,3(5):255-265.
[2] HUMMON N P, DEREIAN P. Connectivity in a citation network:the development of DNA theory[J]. Social networks, 1989, 11(1):39-63.
[3] MARTINELLI A. An emerging paradigm or just another trajectory? Understanding the nature of technological changes using engineering heuristics in the telecommunications switching industry[J]. Research policy, 2012, 41(2):414-429.
[4] LU L Y Y, LIU J S. A survey of intellectual property rights literature from 1971 to 2012:the main path analysis[C]//Proceedings of PICMET'14 conference:portland international center for management of engineering and technology; infrastructure and service integration. Piscataway:IEEE, 2014:1274-1280.
[5] PILKINGTON A, MEREDITH J. The evolution of the intellectual structure of operations management-1980-2006:a citation/co-citation analysis[J]. Journal of operations management, 2009, 27(3):185-202.
[6] LAI R J, LI M F. Technology evolution of lower extremity exoskeleton from the patent perspective[J].Key engineering materials. 2014, 625:536-541.
[7] WANG Z Y, LI G, LI C Y, et al. Research on the semantic-based co-word analysis[J]. Scientometrics,2012,90(3):855-875.
[8] 胡正银,刘春江,隗玲,等.面向TRIZ的领域专利技术挖掘系统设计与实践[J].图书情报工作,2017,61(1):117-124.
[9] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003, 3(1):993-1022.
[10] 范少萍,安新颖,单连慧,等.基于医学文献的主题演化类型与演化路径识别方法研究[J].情报理论与实践,2019,42(3):114-119.
[11] BLEI D M, LAFFERTY J D. Dynamic topic models[C]//Proceedings of the 23rd international conference on Machine learning. New York:ACM, 2006:113-120.
[12] WANG X, MCCALLUM A. Topics over time:a non-Markov continuous-time model of topical trends[C]//Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. New York:ACM, 2006:424-433.
[13] PORTEOUS I, NEWMAN D, IHLER A, et al. Fast collapsed gibbs sampling for latent dirichlet allocation[C]//Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. New York:ACM, 2008:569-577.
[14] HOFFMAN M, BACH F R, BLEI D M. Online learning for latent dirichlet allocation[C]//Advances in neural information processing systems 23.Vancouver:Curran Associates Inc., 2010:856-864.
[15] GRIFFITHS T L, JORDAN M I, TENENBAUM J B, et al. Hierarchical topic models and the nested Chinese restaurant process[C]//Advances in neural information processing systems. Vancouver:ACM,2004:17-24.
[16] MIMNO D, WALLACH H M, TALLEY E, et al. Optimizing semantic coherence in topic models[C]//Proceedings of the conference on empirical methods in natural language processing. Edinburgh:Association for Computational Linguistics, 2011:262-272.
[17] 王婷婷,韩满,王宇.LDA模型的优化及其主题数量选择研究——以科技文献为例[J].数据分析与知识发现,2018,2(1):29-40.
[18] WANG X, MCCALLUMA A, WEI X. Topical n-grams:Phrase and topic discovery, with an application to information retrieval[C]//IEEE International Conference on Data Mining. Piscataway:IEEE,2007:697-702.
[19] LI B, WANG B, ZHOU R, et al. CITPM:A cluster-based iterative topical phrase mining framework[C]//International conference on database systems for advanced applications. Dallas:Springer International Publishing,2016:197-213.
[20] 张琴,张智雄.基于PhraseLDA模型的主题短语挖掘方法研究[J].图书情报工作,2017,61(8):120-125.
[21] 刘自强,许海云,岳丽欣,等.基于Chunk-LDAvis的核心技术主题识别方法研究[J].图书情报工作,2019,63(9):73-84.
[22] 孙孟孟. 基于名词短语提取与词条权重分析的话题提取算法研究[D].杭州:浙江大学,2014.
[23] GRAVES A. Supervised sequence labelling[M]//Supervised Sequence Labelling with Recurrent Neural Networks. Berlin:Springer, 2012:5-13.
[24] TAYLOR S J, LETHAM B. Forecasting at scale[J]. The American statistician, 2018, 72(1):37-45.