专题:先秦典籍的语义组织与挖掘研究

春秋时期社会发展的主题挖掘与演变分析——以《左传》为例

  • 何琳 ,
  • 乔粤 ,
  • 刘雪琪
展开
  • 南京农业大学信息管理系 南京 210095
何琳(ORCID:0000-0002-4207-3588),教授,博士,博士生导师,E-mail:helin@njau.edu.cn;乔粤(ORCID:0000-0002-1968-9608),硕士研究生;刘雪琪(ORCID:0000-0002-5346-7291),硕士研究生。

收稿日期: 2019-07-10

  修回日期: 2019-10-26

  网络出版日期: 2020-04-05

基金资助

本文系国家社会科学基金项目"基于典籍的中华传统文化知识表达体系自动构建方法"(项目编号:18BTQ063)研究成果之一。

Topic Mining and Evolution Analysis of Social Development in Spring and Autumn Period——A Case of Studying Zuo Zhuan

  • He Lin ,
  • Qiao Yue ,
  • Liu Xueqi
Expand
  • College of Information Science & Technology, Nanjing Agricultural University, Nanjing 210095

Received date: 2019-07-10

  Revised date: 2019-10-26

  Online published: 2020-04-05

摘要

[目的/意义] 在人文计算迅速发展的背景下,利用文本挖掘技术对《左传》进行聚类计算,为春秋时期社会发展状况的主题挖掘等定量分析提供参考,同时对典籍文本多维度重组和分析也具有一定的借鉴意义。[方法/过程] 采用文本聚类方法对《左传》进行多维度的定量分析,打破《左传》线性的编年体记载顺序,先运用词匹配算法从《左传》特征词语料中得到各个诸侯国语料,再将LDA主题模型先后用于处理《左传》特征词语料和选取的诸侯国语料,最后结合时间信息进行主题强度计算。[结果/结论] 实验结果表明,根据主题-词分布可以挖掘出春秋时期社会和诸侯国各方面的发展内容,通过主题强度变化曲线可以总结出春秋时期社会和各诸侯国的各方面发展态势。通过LDA主题聚类方法最终展现出了春秋时期整个社会以及不同诸侯国在战争、政治及外交等的发展变迁。

本文引用格式

何琳 , 乔粤 , 刘雪琪 . 春秋时期社会发展的主题挖掘与演变分析——以《左传》为例[J]. 图书情报工作, 2020 , 64(7) : 30 -38 . DOI: 10.13266/j.issn.0252-3116.2020.07.004

Abstract

[Purpose/significance] In the context of the rapid development of humanistic computing, this paper uses text mining technology to cluster Zuo Zhuan, which provides a reference for quantitative analysis such as topic mining in Spring and Autumn Period, and has a certain reference significance for multi-dimensional reorganization and analysis of classical texts. [Method/process] This paper uses text clustering method to analyze Zuo Zhuan quantitatively in many dimensions, breaking the linear and chronological record order of Zuo Zhuan. Firstly, using the word matching algorithm, the corpus of each vassal state is obtained from the characteristic words of Zuo Zhuan. Then the LDA topic model is used to process the characteristic words of Zuo Zhuan and the corpuses of selected vassal states. Finally, the topic strength calculation is performed in combination with the time information. [Result/conclusion] The experimental results show that the development of the Spring and Autumn Society and the vassal states can be explored according to the theme-word distribution. The development trend of the Spring and Autumn Society and various vassal states can be summarized through the theme intensity curve. Through the LDA topic clustering method, the development of war, politics and diplomacy in the whole society and different vassal states in the Spring and Autumn Period is finally revealed.

参考文献

[1] 胡悦融,马青,刘佳派,等.数字人文背景下"远距离可视化阅读"探析[J].图书馆论坛,2017,37(2):1-9.
[2] 梁晨,董浩,李中清.量化数据库与历史研究[J].历史研究,2015(2):113-128,191-192.
[3] PALEY W B. TextArc:showing word frequency and distribution in text[C]//IEEE symposium on information visualization. Poster Compendium:IEEE CS Press,2002:148-165.
[4] HORTON T, TAYLOR K, YU B, et al. "Quite right, dear and interesting":seeking the sentimental in nineteenth century American fiction[EB/OL].[2019-06-20]. http://www.csdl.tamu.edu/~furuta/courses/06c_689dh/dh06readings/DH06-081-082.pdf.
[5] MORETTI F. Distant reading[M]. London:Verso Books,2013:211-221.
[6] MICHEL J B, SHEN Y K, AIDEN A P, et al. Quantitative analysis of culture using millions of digitized books[J]. Science, 2011, 331(6014):176-182.
[7] CHEN J W,BOROVSKY Z,KAWANO Y,et al. The Shi Shuo Xin Yu as data visualization[J].Early medieval China,2014(S0):23-59.
[8] CHEN J W. East Asian studies macroscope[EB/OL].[2019-06-20].http://macroscope.cdh.ucla.edu.
[9] 欧阳剑.大规模古籍文本在中国史定量研究中的应用探索[J].大学图书馆学报,2016,34(3):5-15.
[10] 欧阳剑.面向数字人文研究的大规模古籍文本可视化分析与挖掘[J].中国图书馆学报,2016,42(2):66-80.
[11] ALLEN C, LUO H L, MURDOCK J, et al. Topic modeling the hàn di?n ancient classics[J/OL].[2019-06-20].https://arxiv.xilesou.top/ftp/arxiv/papers/1702/1702.00860.pdf.
[12] NICHOLS R, SLINGERLAND E, NIELBO K, et al. Modeling the contested relationship between Analects, Mencius, and Xunzi:preliminary evidence from a machine-learning approach[J]. The journal of Asian studies, 2018, 77(1):19-57.
[13] 姜明波.近十年国内《左传》研究综述[J].华夏文化,2013(2):58-61.
[14] 邓勇.王霸:正义与秩序[D].武汉:武汉大学,2007.
[15] 刘巍.《左传》叙战语篇研究[D].长春:吉林大学,2013.
[16] 张君蕊.《左传》礼制与"三礼"比较研究[J].中国典籍与文化,2017(3):94-109.
[17] 王竹波.论《左传》"以礼解经"[J].现代哲学,2012(4):105-111.
[18] 李佳艺.从《左传》中探究鲁隐公人物形象[J].名作欣赏,2017(17):16-17.
[19] 刘妍彤.《左传》郑庄公人物形象之解析[J].文化学刊,2018(3):221-222.
[20] 吕丽,张倩倩,张敬.浅议《左传》外交辞令的特色[J].名作欣赏,2013(17):108-109.
[21] 王立.婉约有致、辞强不激的语体风格——《左传》外交辞令之探究[J].汉字文化,2011(3):55-58.
[22] 许超,陈小荷.《左传》中的春秋社会网络分析[J].南京师范大学文学院学报,2014(1):179-184.
[23] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003,3(1):993-1022.
[24] SCHMIDT B M. Words alone:dismantling topic models in the humanities[J]. Journal of digital humanities, 2012, 2(1):49-65.
[25] UNDERWOOD T. What kinds of "topics" does topic modeling actually produce[EB/OL].[2019-06-05]. http://tedunderwood.com/2012/04/01/what-kinds-oftopics-does-topic-modeling-actually-produce/.
[26] GRIFFITHS T L, STEYVERS M. Finding scientific topics[J]. Proceedings of the National Academy of Sciences, 2004, 101(S1):5228-5235.
[27] 崔凯.基于LDA的主题演化研究与实现[D].长沙:国防科学技术大学,2010.
[28] 曲靖野,陈震,胡轶楠.共词分析与LDA模型分析在文本主题挖掘中的比较研究[J].情报科学,2018,36(2):18-23.
[29] 杨伯峻,徐提.春秋左传词典[M].北京:中华书局,1985.
文章导航

/