[目的/意义] 针对《左传》中的战争事件展开研究,对先秦历史乃至中华民族文化的研究具有重要参考价值。[方法/过程] 基于框架理论构建《左传》战争事件基本框架体系,利用模式匹配法进行战争句识别,选择条件随机场模型、结合特征模板对战争时间、交战双方等7个命名实体进行识别和抽取,最后基于得到的结构化数据对战争事件进行分析和可视化展示。[结果/结论] 研究结果表明,条件随机场模型能够较好地应用于《左传》战争事件的抽取;特征选取会影响实体识别的结果;具体内容方面,春秋时期晋国、楚国、齐国、郑国等国参战频率较高,晋国为主要进攻方,郑国为主要防守方。
[Purpose/significance] This paper conducts research about the war incidents in Zuo Zhuan, it has important reference value for the study of pre-Qin history and Chinese culture. [Method/process] It constructs the basic framework system of the war incident in Zuo Zhuan based on the framework theory, uses the pattern matching method to identify the war sentence, selects the conditional random field model, and combines the feature template to identify and extract seven named entities, such as war time and warring parties. Finally, based on the obtained structured data, the war events are analyzed and visualized. [Result/conclusion] The research results show that the CRF model can be applied to the extraction of war events in Zuo Zhuan; the feature selection affects the results of entity recognition; about specific content, Jin, Chu, Qi, Zheng and other countries participated in the war more frequently. Jin was the main attacker. Zheng was the main defender during the Spring and Autumn Period.
[1] 黄水清,王东波.古文信息处理研究的现状与趋势[J].图书情报工作,2017,61(12):43-49.
[2] 施晨露.是什么捆住了古籍数字化的手脚[EB/OL].[2019-05-15]. https://www.jfdaily.com/news/detail?id=53981#top.
[3] 黄水清,王东波,何琳.基于先秦语料库的古汉语地名自动识别模型构建研究[J].图书情报工作,2015,59(12):135-140.
[4] LIU C L, HUANG C K, WANG H S, et al. Mining local gazetteers of literary Chinese with CRF and pattern based methods for biographical information in Chinese history[C]//Proceedings of the IEEE international conference on big data. Santa Clara:IEEE, 2015:1629-1638.
[5] 钱智勇,周建忠,童国平,等.基于HMM的楚辞自动分词标注研究[J].图书情报工作,2014,58(4):105-110.
[6] 朱晓红.先秦军事法思想研究[D].西安:西北大学,2010.
[7] 刘敏.基于专业领域文献的信息抽取与新知识发现系统研究与应用[D].济南:山东大学,2018.
[8] 赵妍妍,秦兵,车万翔.中文事件抽取技术研究[J].中文信息学报,2008,22(1):3-8.
[9] HAI L C, NG H T. A maximum entropy approach to information extraction from semi-structured and free text[C]//Eighteenth national conference on artificial intelligence. San Jose:American Association for Artificial Intelligence, 2002.
[10] AHN D. The stages of event extraction[C]//Workshop on annotating & reasoning about time & events. Sydney:Association for Computational Linguistics,2006.
[11] 于江德,肖新峰,樊孝忠.基于隐马尔可夫模型的中文文本事件信息抽取[C]//全国开放式分布与并行计算机学术会议论文集(下册). 南宁,2007.
[12] 吴平博,陈群秀,马亮. 基于时空分析的线索性事件的抽取与集成系统研究.中文信息学报,2006,20(1):21-28.
[13] 姜吉发.一种跨语句汉语事件信息抽取方法[J].计算机工程,2005,31(2):27-29.
[14] 郑家恒,王兴义,李飞.信息抽取模式自动生成方法的研究[J].中文信息学报,2004(1):48-54.
[15] 杨尔弘. 突发事件信息提取研究[D].北京:北京语言大学,2005.
[16] 高娟,刘家真.中国大陆地区古籍数字化问题及对策[J].中国图书馆学报,2013(4):110-119.
[17] 王嘉灵.以《汉书》为例的中古汉语自动分词[D].南京:南京师范大学, 2014.
[18] 梁社会,陈小荷.先秦文献《孟子》自动分词方法研究[J].南京师范大学文学院学报,2013(3):175-182.
[19] 王铮.基于CRF的古籍地名自动识别研究[D].南宁:广西民族大学, 2008.
[20] 张秋霞.《左传》征战类动词研究[D].长春:吉林大学,2009.
[21] 邓勇.王霸:正义与秩序[D].武汉:武汉大学,2007.