[Purpose/significance] Automatic event recognition and extraction is an important topic in current research on topic mining of ancient classics. Among them, the recognition of event trigger words is a basic work, which determined the quality of event extraction. This article aims to explore the general methods of automatic recognition and classification of event trigger words in ancient classics. [Method/process] Firstly, we explored the method of trigger verb classification construction by LDA topic clustering, which was carried out on the ancient classics combined with qualitative analysis. After the classification schema was confirmed, we building a preliminary seeds set of trigger words based on the clustering results. Then we expanded the trigger verb seeds set by the semantic similarity calculation on the ancient classics text resources. In the experiment, we took Zuo Zhuan as the experiment data sources, which is an important ancient classics in the Period of Chunqiu. The experiment tested the results of trigger verb classification construction and the expanding efficiency of trigger verb from the seeds set. [Result/conclusion] The results show that the method proposed in this paper is feasible and effective, and the event trigger word set constructed based on this has a high degree of credibility. The sample size and scope of the experiment can be further expanded in the future.
He Lin
,
Ma Xiaowen
,
Yu Xuehan
,
Ai Yuxi
,
Li Zhangchao
,
Gao Dan
. Research on Recognition of Verbs Triggered by Events in Ancient Classics:Textual Experiments Based on Zuo Zhuan[J]. Library and Information Service, 2022
, 66(5)
: 133
-141
.
DOI: 10.13266/j.issn.0252-3116.2022.05.014
[1] 黄水清,王东波. 古文信息处理研究的现状及趋势[J].图书情报工作,2017,61(12):43-49.
[2] SAMPO PYYSALO,TOMOKO OHTA,MAKOTO MIWA,et al. Event extraction across multiple levels of biological organization[J]. Bioinformatics, 2012, 28(18):i575-i581.
[3] 黄佳艳. 面向金融新闻文本的事件识别与抽取[D].南京:东南大学,2019.
[4] 丁效,宋凡,秦兵,刘挺.音乐领域典型事件抽取方法研究[J].中文信息学报,2011,25(2):15-20.
[5] 张海涛,李佳玮,刘伟利,等.重大突发事件事理图谱构建研究[J].图书情报工作,2021,65(18):133-140.
[6] BUYKO E, FACSSLCR E, WCRMTCRJ, et al. Event extraction from trimmed dependency graphs[C]//Proceedings of the workshop on current trends in biomedical natural language processing:shared task. Oregon:Association for Computational Linguistics, 2009:19-27.
[7] VLACHOS A, BUTTERY P, SCAGHDHA D O, et al. Biomedical event extraction without training data[C]//Proceedings of the workshop on current trends in biomedical natural language processing:shared task. Oregon:Association for Computational Linguistics, 2009:7-10.
[8] 付剑锋. 面向事件的知识处理研究[D].上海:上海大学, 2010.
[9] MINH Q L, TRUONG S N, BAO Q H. A pattern approach for biomedical event annotation[C]//Proceedings of the BioNLP shared task 2011 workshop. Oregon:Association for Computational Linguistics, 2011:199-150.
[10] 张建海. 基于深度学习的生物医学事件抽取研究[D].大连:大连理工大学,2016.
[11] COHCN K B, VCRSPOOR K, JOHNSON H L, et al. High-precision biological event extraction with a concept recognizer[C]//Proceedings of the workshop on current trends in biomedical natural language processing:shared task. Oregon:Association for Computational Linguistics, 2009:50-58.
[12] BJORNE J, HEIMONEN J, UINTCR F, et al. Extracting complex biological events with rich graph-based feature sets[C]//Proceedings of the workshop on current trends in biomedical natural language processing:shared task. Oregon:Association for Computational Linguistics,2009:10-18.
[13] 陈箫箫, 刘波. 微博中的开放域事件抽取[J].计算机应用与软件,2016,33(8):18-22,109.
[14] 景悦诚, 黄征. 基于语言特征的舆情事件抽取[J].信息安全与通信保密,2015, 256(4):96-100.
[15] VLACHOS A, CRAVEN M. Biomedical event extraction from abstracts and full of papers using search based structured prediction[J]. BMC bio-informatics. 2012, 13 (Suppl 11):S5.
[16] 邓三鸿,胡昊天,王昊,等.古文自动处理研究现状与新时代发展趋势展望[J].科技情报研究,2021,3(1):1-20.
[17] 邱冰,皇甫娟.基于中文信息处理的古代汉语分词研究[J].微计算机信息,2008,24 (24):100-102.
[18] 徐润华, 陈小荷.一种利用注疏的《左传》分词新方法[J].中文信息学报,2012,26(2):13-17,45.
[19] 王嘉灵. 以《汉书》为例的中古汉语自动分词[D].南京师范大学,2014.
[20] CHEN T, ZHU W, LV X, et al. A kalman filter based human-computer interactive word segmentation system for ancient Chinese texts[M]. Chinese computational linguistics and natural language processing based on naturally annotated big data. Berlin:Springer, 2013:25-35.
[21] 黄建年. 农业古籍的计算机断句标点与分词标引研究[D].南京:南京农业大学,2009.
[22] 陈小荷, 冯敏萱, 徐润华, 等. 先秦文献信息处理[M].北京:世界图书出版公司北京公司, 2013.
[23] 董志翘. 为中古汉语研究夯实基础——"中古汉语研究型语料库"建设琐议[J]. 燕山大学学报(哲学社会科学版), 2011, 12(01):1-6.
[24] 王东波,高瑞卿,沈思,等.面向先秦典籍的历史事件基本实体构件自动识别研究[J].国家图书馆学刊,2018,27(1):65-77.
[25] 刘忠宝,党建飞, 张志剑.《史记》历史事件自动抽取与事理图谱构建研究[J].图书情报工作,2020,64(11):116-124.
[26] Linguistic Data Consortium. ACE (Automatic Content Extraction) Chinese Annotation Guidelines for Events[EB/OL].[2021-10-16].https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/chinese-events-guidelines-v5.5.1.pdf.
[27] 纪国泰.先秦汉语词汇研究的力作——评毛远明的《左传词汇研究》[J].成都师专学报,2000(1):74-77.
[28] 孙丽丽. 春秋时期词汇研究[D].济南:山东大学,2012.
[29] SCHMIDT B M. Words alone:dismantling topic models in the humanities[J]. Journal of digital humanities, 2012, 2(1):49-65.
[30] UNDERWOOD T. What kinds of "topics" does topic modeling actually produce[EB/OL] [2021-03-05]. http://tedunderwood.com/2012/04/01/what-kinds-oftopics-does-topic-modeling-actually-produce/.
[31] 马晓雯,何琳,刘建斌,等.基于Bi-LSTM的古籍事件句触发词分类方法研究[J].农业图书情报学报,2021,33(9):27-36.
[32] 喻雪寒,何琳,徐健.基于RoBERTa-CRF的古文历史事件抽取方法研究[J].数据分析与知识发现,2021,5(7):26-35.