情报研究

动态语义网的高价值热点主题识别与演化路径分析

  • 滕婕 ,
  • 刘莉 ,
  • 李硕 ,
  • 胡广伟
展开
  • 1 南京大学信息管理学院南京 210023;
    2 南京大学政务数据资源研究所南京 210023
滕婕,博士研究生;刘莉,硕士研究生;李硕,硕士研究生。

收稿日期: 2022-08-31

  修回日期: 2022-10-28

  网络出版日期: 2023-04-15

基金资助

本文系国家社会科学基金重大项目“大数据驱动的城乡社区服务体系精准化构建研究”(项目编号:20&ZD154)和国家社会科学基金重大项目“营销服务渠道效能及渠道协同效能评价体系研究”(项目编号:SGJSYF00YHJS2000144)研究成果之一。

High-Value Hot Topic Identification and Evolutionary Path Analysis Based on Dynamic Semantic Network

  • Teng Jie ,
  • Liu Li ,
  • Li Shuo ,
  • Hu Guangwei
Expand
  • 1 School of Information Management, Nanjing University, Jiangsu 210023;
    2 Government Data Resources Institution, Nanjing University, Jiangsu 210023

Received date: 2022-08-31

  Revised date: 2022-10-28

  Online published: 2023-04-15

摘要

[目的/意义] 高效准确地识别社会诉求主题、把握社会诉求转变节点、追踪主题演化趋势,进而为政务服务和社会治理的和谐有序发展提供支撑。[方法/过程] 提出一套基于语义网的高价值主题识别和演化路径分析方法。首先,基于本地上下文语义解析思想,利用词汇共现构建动态语义关系网;其次,利用社区发现算法识别子社区,采用 RFM 模型对关键词进行价值划分,依据高价值层次关键词识别主题标签;接着,通过计算相邻时间区间的主题相似度来反映主题演化关系;最后,利用上海市的社会诉求数据进行模型验证,与 K-means 方法进行主题识别效果的比较,并利用精确率、召回率和 F1 值进行方法效果评测。[结果/结论] 研究结果发现,该方法应用效果的提升差额均大于 0.3,具有明显的优化效果。研究能够为政府网站领导信箱模块反映的公众关切事项构建全景视图,也能为探索其他社交文本挖掘方法以及支撑国家治理大数据分析实践提供新的思路。

本文引用格式

滕婕 , 刘莉 , 李硕 , 胡广伟 . 动态语义网的高价值热点主题识别与演化路径分析[J]. 图书情报工作, 2023 , 67(7) : 92 -106 . DOI: 10.13266/j.issn.0252-3116.2023.07.009

Abstract

[Purpose/Significance] Effectively and accurately identify the theme of social demands, grasp the transformation nodes of social demands, and track the evolution trend of the theme, so as to provide support for the harmonious and orderly development of government services and social governance.[Method/Process] A set of high-value topic identification and evolution path analysis methods based on semantic web was proposed. Firstly, based on the idea of local contextual semantic parsing, this study used lexical co-occurrence networks to construct a dynamic semantic relational network; Secondly, this study used community discovery algorithm to identify sub communities, used RFM model was applied to divide the value of keywords and topic tags was identified according to high value keywords, and then the topic evolution relationship was reflected by measuring the topic similarity in adjacent time intervals; Finally, the model was verified with the social appeal data of Shanghai, and the theme recognition effect was compared with the K-means method. The accuracy rate, recall rate and F1 value were used to evaluate the effect of the method.[Result/Conclusion] All the three indicators are above the 0.3 cut-off values, proving significant optimization effects have been achieved. This study can build a panoramic view of public concerns reflected in the mailbox module of government website leaders, and also provide new ideas for exploring other social text mining methods and supporting the big data analysis practice of national governance.

参考文献

[1] BALLA S J, XIE Z. Online consultation and the institutionalization of transparency and participation in Chinese policymaking[J]. The China quarterly, 2020, 246:1-24.
[2] 施国良, 陈宇奇. 文本增强与预训练语言模型在网络问政留言分类中的集成对比研究[J]. 图书情报工作, 2021, 65(13):96-107.
[3] 王灿, 梁霄. 面向武汉市网络问政的文本挖掘研究[J]. 科技视界, 2021, (12):16-18.
[4] MOODYSSON J, ZUKAUSKAITE E. Institutional conditions and innovation systems:on the impact of regional policy on firms in different sectors. Regional studies 2014, 48(1):127-138.
[5] 姚兰, 王晓, 段尧清. 多层级政府回应信息协同网络结构分析[J]. 情报理论与实践, 2021, 44(9):114-121.
[6] 熊小刚, 卢佳佳. 地方人民政府官方微博的聚类分析与评估——以江西省设区市为例[J]. 现代情报, 2016, 36(12):50- 56.
[7] QUILLIAN M R. Semantic memory[J]. Semantic information processing, 1968, 22:227-270.
[8] SINGHAL A. Introducing the knowledge graph:things, not strings[EB/OL].[2022-10-08]. https://search.googleblog.com/2012/05/introducing-knowledge-graph-things-not.html.
[9] SIMMONS R F. Semantic networks:their computation and use for understanding English sentences[M]. Department of computer sciences and computer-assisted instruction laboratory. San Francisco:WH Freeman, 1973.
[10] SZUMLANSKI S, GOMEZ F. Automatically acquiring a semantic network of related concepts[C]//Proceedings of the 19th ACM international conference on information and knowledge management. Toronto:CIKM 2010, 2010:19-28.
[11] SOWA J F. Principles of semantic networks:explorations in the representation of knowledge[M]. Burlington:Morgan Kaufmann, 2014.
[12] IYER H, BUNGO L. An examination of semantic relationships between professionally assigned metadata and user-generated tags for popular literature in complementary and alternative medicine[EB/OL].[2022-10-08]. https://informationr.net/ir/16-3/paper482.html.
[13] 吕鹏辉, 邵建芳, 杨善林. 基于机标关键词的学科语义知识网络构建研究[J]. 图书情报知识, 2017(2):120-128.
[14] SZUMLANSKI S, GOMEZ F. Automatically acquiring a semantic network of related concepts[C]//Proceedings of the 19th ACM international conference on information and knowledge management. New York:ACM, 2010:19-28.
[15] 张军亮, 方雪梅, 张帆, 等. 基于复杂网络的医学语义关联研究[J]. 数据分析与知识发现, 2022, 6(9):125-137.
[16] 陈翔, 黄璐, 倪兴兴, 等. 基于动态语义网络分析的主题演化路径识别研究[J]. 情报学报, 2021, 40(5):500-512.
[17] 谭荧, 张进, 夏立新. 语义网络发展历程与现状研究[J]. 图书情报知识, 2019, (6):102-110.
[18] 荣国阳, 李长玲, 范晴晴, 等. 主题热度加速度指数——学科研究热点识别新方法[J]. 图书情报工作, 2021, 65(20):59- 67.
[19] WATTS R J, PORTER A L. Innovation forecasting[J]. Technological forecasting & focial change, 1997, 56(1):25-47.
[20] DONG K, XU H, LUO R, et al. An integrated method for interdisciplinary topic identification and prediction:a case study on information science and library science[J]. Scientometrics, 2018, 115(2):849-868.
[21] SCHIEBEL E, HORLESBERGER M, ROCHE I, et al. An advanced diffusion model to identify emergent research issues:the case of optoelectronic devices[J]. Scientometrics, 2010, 83(3):765-781.
[22] 杨学磊, 李卫宁, 尚航标. 基于文献计量的家族企业传承研究现状和主题识别分析[J]. 管理学报, 2021, 18(2):306-316.
[23] 吴健, 李子运, 王洪梅. 基于关键词共现聚类的深阅读研究热点分析[J]. 图书馆建设, 2016, (12):53-59.
[24] 谌志群, 徐宁, 王荣波. 基于主题演化图的网络论坛热点跟踪[J]. 情报科学, 2013, 31(3):147-150.
[25] 奉国和, 孔泳欣. 基于时间加权关键词词频分析的学科热点研究[J]. 情报学报, 2020, 39(1):100-110.
[26] 刘自强, 王效岳, 白如江. 基于时间序列模型的研究热点分析预测方法研究[J]. 情报理论与实践, 2016, 39(5):27-33.
[27] 李静, 徐路路, 赵素君. 基于时间序列分析和SVM模型的基金项目新兴主题趋势预测与可视化研究[J]. 情报理论与实践, 2019, 42(1):118-123, 152.
[28] HAN X. Evolution of research topics in LIS between 1996 and 2019:an analysis based on latent dirichlet allocation topic model[J]. Scientometrics, 2020, 125(3):2561-2595.
[29] 余传明, 张小青, 陈雷. 基于LDA模型的评论热点挖掘:原理与实现[J]. 情报理论与实践, 2010, 33(5):103-106.
[30] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5):58-63.
[31] 许海云, 董坤, 刘春江, 等. 文本主题识别关键技术研究综述[J]. 情报科学, 2017, 35(1):153-160.
[32] REN H, LIAO X, LI Z, et al. Anomaly detection using piecewise aggregate approximation in the amplitude domain[J]. Applied intelligence, 2018, 48(5):1097-1110.
[33] HU K, LUO Q, QI K, et al. Understanding the topic evolution of scientific literatures like an evolving city:using google Word2Vec model and spatial autocorrelation analysis[J]. Information processing & management, 2019, 56(4):1185-1203.
[34] JAIN P, LAPATA M. Memory-based semantic parsing[J]. Transactions of the association for computational linguistics, 2021(9):1197-1212.
[35] 滕婕, 胡广伟, 王婷. 基于动态语义依赖关系网的社会诉求主题识别与演化路径分析[J]. 情报资料工作, 2022, 43(3):20- 33.
[36] LIU D, CHEN X, PENG D. The intuitionistic fuzzy linguistic cosine similarity measure and its application in pattern recognition[J/OL]. Complexity, 2018[2022-10-27]. https://doi.org/10.1155/2018/9073597.
[37] MITTAL R, BHATIA M. Classification and comparative evaluation of community detection algorithms[J]. Archives of computational methods in engineering, 2020, 28(3):1417-1428.
[38] 孙佳佳, 李雅静. 基于关键词价值细分的高价值热点主题识别方法研究[J]. 情报学报, 2022, 41(2):118-129.
[39] HUGHES A M. Strategic database marketing:the masterplan for starting and managing a profitable, customer-based marketing program[M]. New York:McGraw-Hill, 2000.
[40] LEE B, JEONG Y I. Mapping korea's national R&D domain of robot technology by using the co-word analysis[J]. Scientometrics, 2008, 77(1):3-19.
[41] 张嶷, 汪雪锋, 朱东华, 等."主题词簇"方法研究——英文科技文献主题词清洗、合并与聚类[J]. 科学学研究, 2013, 31(11):1615-1622.
[42] DAOUADI K E, REBA R Z, AMOUS I. Optimizing semantic deep forest for tweet topic classification[J]. Information systems, 2021, 101(2):101801-101811.
[43] 万校基, 李海林, 龚燕燕, 等. 基于天际线算法的主题排序方法研究[J]. 情报学报, 2022, 41(4):388-400.
[44] LIU F, ZOU S C, LI Q. Deriving priorities from pairwise comparison matrices with a novel consistency index[J]. Applied mathematics and computation, 2020, 374:125059-125075.
[45] 袁逸铭, 刘宏志, 李海生. 基于密度峰值的改进K-Means文本聚类算法及其并行化[J]. 武汉大学学报(理学版), 2019, 65(5):457-464.
文章导航

/