KNOWLEDGE ORGANIZATION

Analysis of Domain Topic Evolution Path Based on Multi-Source Data

  • Zhang Jing ,
  • Zhu Xiangli
Expand
  • 1 National Science Library, Chinese Academy of Sciences, Beijing 100190;
    2 Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190

Received date: 2023-02-03

  Revised date: 2023-04-11

  Online published: 2023-07-28

Abstract

[Purpose/Significance] In order to comprehensively, objectively, efficiently and intuitively grasp the development law and evolution trend of science and technology domain topics, this paper proposes a framework for identifying and analyzing the evolution path of domain topics based on multi-source data. [Method/Process] This paper acquired scientific and technological literature data from different sources, used multi-dimensional ordered sample clustering method to assist temporal slicing, enhanced the LDA model topic identification effect based on an improved word packet construction method, utilized Louvain community detection algorithm for fusion of multi-source data at the topic level, and analyzed domain topic evolution path. [Result/Conclusion] The empirical study in terahertz research field in the United States is conducted using the data from three sources about fund project, the paper and the patent. The results show three types of data sources are clearly divided into four unique time window, and the improved word-bag construction method can represent more accurate domain information, and simplified topic communities can help extract topic evolution paths from multi-data complex evolution networks.

Cite this article

Zhang Jing , Zhu Xiangli . Analysis of Domain Topic Evolution Path Based on Multi-Source Data[J]. Library and Information Service, 2023 , 67(14) : 94 -108 . DOI: 10.13266/j.issn.0252-3116.2023.14.010

References

[1] 许海云, 董坤, 隗玲, 等. 科学计量中多源数据融合方法研究述评[J]. 情报学报, 2018, 37(3):318-328.
[2] 李广建, 杨林. 大数据视角下的情报研究与情报研究技术[J]. 图书与情报, 2012(6):1-8.
[3] XU H Y, YUE Z H, WANG C, et al. Multi-source data fusion study in scientometrics[J]. Scientometrics, 2017, 111(2):773-792.
[4] 谭晓, 李辉. 基于多源数据知识融合方法的研究前沿识别[J]. 现代情报, 2019, 39(8):29-36.
[5] 冯佳, 穆晓敏, 王伟. 面向研究前沿识别的载体-特征-关系融合模型研究[J]. 图书馆杂志, 2020, 39(9):56-63.
[6] WANG X. Research on the discourse power evaluation of academic journals from the perspective of multiple fusion:taking medicine, general and internal journals as an example[J]. Journal of information science, 2022, 7:01655515221107334.
[7] 陈启明, 王效岳, 白如江, 等. 多源数据融合下突发公共事件社会关注与政策趋向研究——以新冠肺炎疫情为例[J]. 情报探索, 2022(6):15-25.
[8] 胡吉霞. 面向多源数据的学科知识网络构建方法与应用研究[D]. 西安:西安电子科技大学, 2021.
[9] 王春秀, 冉美丽. 学科主题演化定量分析的理论基础探析[J]. 现代情报, 2008(6):48-50.
[10] 梁爽, 刘小平. 基于文本挖掘的科技文献主题演化研究进展[J]. 图书情报工作, 2022, 66(13):138-149.
[11] 陈悦, 刘则渊, 陈劲, 等. 科学知识图谱的发展历程[J]. 科学学研究, 2008(3):449-460.
[12] MORRIS S A, YEN G, WU Z, et al. Time line visualization of research fronts[J]. Journal of the American Society for Information Science and Technology, 2003, 54(5):413-422.
[13] PALLA G, BARABASI A L, VICSEK T. Quantifying social group evolution[J]. Nature, 2007, 446(7136):664-667.
[14] 周源, 张超, 唐杰, 等. 基于主题变迁的领域发展路径智能化识别——以人工智能为例[J]. 图书情报工作, 2018, 62(14):62-71.
[15] 陈悦, 王康, 宋超, 等. 一种用于技术融合与演化路径探测的新方法:技术群相似度时序分析法[J]. 情报学报, 2021, 40(6):565-574.
[16] 刘怀兰, 刘盛, 周源, 等. 基于多源文本挖掘的技术演化路径识别[J]. 情报理论与实践, 2022, 45(11):178-187.
[17] MEYER M. Tracing knowledge flows in innovation systems[J].Scientometrics, 2002, 54(2):193-212.
[18] 刘自强, 许海云, 岳丽欣, 等. 面向研究前沿预测的主题扩散演化滞后效应研究[J]. 情报学报, 2018, 37(10):979-988.
[19] 李慧, 孟玮. 专利视角下的美国空军核心技术演化分析[J]. 情报理论与实践, 2021, 44(2):41-49.
[20] 李慧, 胡吉霞, 佟志颖. 面向多源数据的学科主题挖掘与演化分析[J]. 数据分析与知识发现, 2022, 6(7):44-55.
[21] FISHER W D. On grouping for maximum homogeneity[J]. Journal of the American Statistical Association, 1958, 53(284):789-798.
[22] 李俊, 毕华兴, 李笑吟, 等. 有序聚类法在土壤水分垂直分层中的应用[J]. 北京林业大学学报, 2007(1):98-101.
[23] 大布穷, 叶彦辉, 赵垦田. 西藏色季拉山急尖长苞冷杉生长规律研究[J]. 安徽农业科学, 2010, 38(17):9317-9320, 9344.
[24] 张多, 韩逢庆. 基于支持向量机和有序聚类的岩层识别[J]. 智能系统学报, 2014, 9(1):98-103.
[25] 祖坤琳, 赵铭伟, 林鸿飞. 基于有序聚类的专利知识演化研究[J]. 计算机工程与科学, 2016, 38(4):785-791.
[26] 严广松, 路允芳. 多维有序样本的聚类方法研究[J]. 统计与决策, 2008(4):29-30.
[27] DU Y J, YI Y T, LI X Y, et al. Extracting and tracking hot topics of micro-blogs based on improved Latent Dirichlet Allocation[J]. Engineering applications of artificial intelligence, 2020, 87:13.
[28] 谭春辉, 熊梦媛. 基于LDA模型的国内外数据挖掘研究热点主题演化对比分析[J]. 情报科学, 2021, 39(4):174-185.
[29] 张学成, 周斌, 孔瑞远, 等. 大型仪器利用情况调查数据异常值检测的数学方法比较[J]. 数学的实践与认识, 2012, 42(11):50-57.
[30] 刘路. 基于Louvain算法的社区发现与核心节点挖掘研究[D]. 西安:西安电子科技大学, 2021.
[31] NEWMAN M E J. Modularity and community structure in networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2006, 103(23):8577-8582.
[32] BLONDEL V D, GUILLAUME J L, LAMBIOTTE R, et al. Fast unfolding of communities in large networks[J]. Journal of statistical mechanics-theory and experiment, 2008(10):P10008.
[33] 隗玲, 许海云, 胡正银, 等. 学科主题演化路径的多模式识别与预测——一个情报学学科主题演化案例[J]. 图书情报工作, 2016, 60(13):71-81.
[34] 唐果媛. 基于共词分析法的学科主题演化研究方法的构建[J]. 图书情报工作, 2017, 61(23):100-107.
[35] 周毅. 模因视角下知识网络的主题演化研究[D]. 兰州:兰州交通大学, 2021.
[36] 姜鑫, 王德庄, 马海群. 社会网络分析方法在图书情报学科的应用研究[M]. 北京:知识产权出版社, 2019.
[37] LEE B, JEONG Y I. Mapping Korea's national R&D domain of robot technology by using the co-word analysis[J]. Scientometrics, 2008, 77(1):3-19.
[38] SATOPAA V, ALBRECHT J, IRWIN D, et al. Finding a kneedle in a haystack:detecting knee points in system behavior[C]//International conference on distributed computing systems workshops. Piscataway:IEEE Computer Society, 2011.
Outlines

/