[目的/意义] 异文是古籍中的常见现象,也是重要研究对象。传统的古籍校勘是从大量古籍文献中人工查找校勘资料包括异文等,不仅耗时、费力、工作量大,而且找到的数据未必精准全面。通过计算机实现异文的自动发掘,可以从更大规模的语料中获取有效信息。并且,结合异文自动发掘的校勘方式可以实现穷尽式检索,对于古籍他校法具有重要意义,为新时期古籍校勘研究提供了新思路和新方法。[方法/过程] 本研究以《春秋》及"春秋三传"作为实验语料,引入常用于文本翻译领域的平行语料库思想,结合深度学习算法,对LSTM、BERT模型与较为经典的SVM模型进行比较实验,并对两部古籍中用不同表述描述同一事件的同事异文相关内容展开进一步探索和讨论。[结果/结论] 实验得到适用于"春秋三传"的同事异文自动发掘深度学习模型,证明深度学习等新兴技术融合到古籍知识库构建等研究中的可行性,同时,深度学习技术和平行语料库思想的结合在异文研究中能够发挥较大作用,对数字人文在汉语言文学研究中的应用提供实践支撑。
[Purpose/significance] Variations are a common phenomenon and also an important research object in ancient books. The traditional collation of ancient books is to manually search for materials, including variations from a large number of ancient books. This work is not only time-consuming, laborious, and heavy, but the data may not be accurate and comprehensive. Automatic mining of variant sentences through computers can obtain effective information from larger-scale corpus. In addition, the collation method combined with automatic mining of variant sentences can realize exhaustive retrieval, which is of great significance to the collation of ancient books. It provides new ideas and methods for the collation research of ancient books in the new period.[Method/process] This research automatically mined the variant sentences in Three Biographies of the Spring and Autumn Period, combining deep learning and introducing parallel corpus commonly used in the field of machine translation. Subsequently, this study compared LSTM and BERT models'results with the classic SVM model and further explored and analyzed the related content of the variants expressing the same event with different descriptions in two ancient books.[Result/conclusion] The experiment obtained a deep learning model for automatic mining of variants expressing the same event suitable for Three Biographies of the Spring and Autumn Period. It proves the feasibility of integrating new technologies such as deep learning into the construction of ancient books' knowledge base. Meanwhile, the combination of deep learning and parallel corpus can play a more significant role in studying variant sentences and provide practical support for applying digital humanities in the Chinese language and literature.
[1] 黄沛荣. 古籍异文论析[J]. 汉学研究, 1991, 9(2):395.
[2] 李娟. 《史记》《汉书》异文中的同源词研究[J]. 湖北师范学院学报(哲学社会科学版),2011,31(4):60-63.
[3] 李娟. 《史记》《汉书》异文的训诂价值研究[D]. 黄石:湖北师范学院,2012.
[4] 罗积勇. 异文与释义[J]. 古籍整理研究学刊, 1986(2):58-60.
[5] 王彦坤. 试论古书异文产生的原因[J]. 暨南学报:哲学社会科学版, 1989(4):78-85.
[6] 石云孙. 话语中的异文[J]. 安庆师范学院学报(社会科学版), 1996(2):2-8.
[7] 邓亚文. 论唐诗异文[J]. 湖北科技学院学报, 2002, 22(5):68-70.
[8] 王学军. 宋词异文探微[J]. 文教资料, 2010(18):32-36.
[9] 曾良, 江可心. 佛经异文与词语考索[J]. 古汉语研究, 2013(2):43-48.
[10] 江林昌. 《楚辞》异文考例[J]. 文献, 1991(3):3-14.
[11] 周福云. 《离骚》异文例释[J]. 淮阴师范学院学报(哲学社会科学版), 1993(2):20-24.
[12] 陈伟玲. 《怀沙》异文考辨[J]. 职大学报, 2007(1):46-47.
[13] 易敏. 《隋人书出师颂》及《文选》异文[J]. 井冈山师范学院学报, 2005(1):5-8.
[14] 牛尚鹏. 《太上洞渊神咒经》异文考辨[J]. 长江师范学院学报, 2016,32(1):73-78.
[15] 刘禾. 异文与训校[J]. 东北师大学报(哲学), 1986(2):62-69.
[16] 边星灿. 论异文在训诂中的作用[J]. 浙江大学学报(人文社会科学版), 1998(3):135-140.
[17] 王彦坤. 略论古书异文的应用[J]. 暨南学报:哲学社会科学版, 1987(1):75-81.
[18] 吴辛丑. 简帛异文的类型及其价值[J]. 华南师范大学学报(社会科学版), 2000(4):37-42.
[19] 于亭. 异文用于训诂实践的历史透视[J]. 长江学术, 2009(3):131-138.
[20] 狄碧云, 孙兆杰, 范登脉. 浅谈《灵枢经》的异文研究[J]. 中医文献杂志, 2013, 31(3):12-14.
[21] 薄迎迎. 《楚辞疏·九章》异文研究[J]. 语文学刊, 2016(24):59,107.
[22] 冯青. 异文词汇与词汇史研究[J]. 哈尔滨师范大学社会科学学报, 2010,1(1):52-55.
[23] 陈立华. 《生经》异文研究[D]. 长沙:湖南师范大学, 2011.
[24] 陈仁仁. 比卦异文解读[J]. 中国哲学史, 2010(3):54-62.
[25] 任璐. 《说无垢称经》异文研究[D].贵阳:贵州师范大学, 2015.
[26] 章琦. 《观棋》作者、异文考[J]. 北京社会科学, 2016(2):83-89.
[27] 王骧. 李白《蜀道难》诗的一处异文[J]. 高校教育管理, 1990(2):16-17.
[28] 周福云. 王维诗异文探索[J]. 台州学院学报, 1998(1):75-77.
[29] 郭殿忱, 郭志媛. 李白诗异文考——以《河岳英灵集》为中心[J]. 绵阳师范学院学报, 2014, 33(1):12-18.
[30] 崔达送, 詹绪左, 储泰松, 等. 异文比较与古汉语教学[J]. 滁州学院学报, 2008, 10(1):22-25.
[31] 过雨辰. 《宿新市徐公店》异文考[J]. 剑南文学(下半月), 2016(5):33-34.
[32] 鞠明库.古籍数字化与传统文献学[J].清华大学学报(哲学社会科学版),2011,26(5):154-158,161.
[33] 姜慧敏, 白振田, 周金水,等. 同一地域不同时代方志版本内容自动合并的研究与实现[J]. 广西地方志, 2008(5):29-32.
[34] 肖磊, 陈小荷. 古籍版本异文的自动发现[J]. 中文信息学报, 2010, 24(5):50-55.
[35] 李越. 《左传》《史记》同事异文自动发现及分析[D]. 南京:南京师范大学, 2014.
[36] 赵红. 吐鲁番文献与汉语语料库建设的若干思考[J]. 南京师范大学文学院学报, 2014(3):155-158.
[37] 谢靖. 基于句子匹配的《黄帝内经》异文自动发现研究[J]. 科技视界, 2015(35):53-54.
[38] HEARST M A, DUMAIS S T, OSUNA E, et al. Support vector machines[J]. IEEE intelligent systems & their applications, 1998, 13(4):18-28.
[39] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural networks, 2005, 18(5/6):602-610.
[40] MUELLER J,THYAGARAJAN A.Siamese recurrent architectures for learning sentence similarity[C]//Proceedings of the thirtieth AAAI conference on artificial intelligence.Phoenix,Arizona:AAAI,2016:2786-2792.
[41] DEVLIN J, CHANG M W, LEE K, et al. Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 annual conference of the North American chapter of the Association for Computational Linguistics. Minneapolis:ACL,2019:4171-4186.