知识组织

数字人文视角下古诗意象知识抽取及其文化图式构建研究

  • 张卫 ,
  • 王昊 ,
  • 李晓敏 ,
  • Song Min
展开
  • 1. 南京大学信息管理学院 南京 210023;
    2. 延世大学文学院 首尔 03722;
    3. 江苏省数据工程与知识服务重点实验室 南京 210023
张卫,博士研究生;李晓敏,博士研究生;Song Min,教授,博士,博士生导师。

收稿日期: 2022-07-01

  修回日期: 2022-10-12

  网络出版日期: 2022-12-27

基金资助

本文系国家自然科学基金面上项目"关联数据驱动下我国非遗文本的语义解析与人文计算研究"(项目编号:72074108)和2021年江苏省研究生科研创新计划"面向心理健康的医学文本语义解析与知识图谱构建研究"(项目编号:KYCX21_0026)研究成果之一。

Knowledge Extraction and Cultural Schema Construction of Classical Poetry Imagery from the Digital Humanities

  • Zhang Wei ,
  • Wang Hao ,
  • Li Xiaomin ,
  • Song Min
Expand
  • 1. School of Information Management, Nanjing University, Nanjing 210023;
    2. College of Liberal Arts, Yonsei University, Seoul 03722;
    3. Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023

Received date: 2022-07-01

  Revised date: 2022-10-12

  Online published: 2022-12-27

摘要

[目的/意义]古典诗歌意象善于运用物象(源域)隐喻人物或氛围情感(目标域),然而物象与情感知识目前广泛分布在多源异构的非结构化古诗文本内,尚未组织成具有知识解释体系的文化图式。[方法/过程]针对古诗意象提出一套基于知识本体的文化图式构建模式与技术实现方法。首先,定义基于序列标注的物象术语抽取和基于关系分类的物象与情感关系抽取任务。其次,在无学习语料下,搭建中文领域物象术语知识体系,用于文本内物象术语的自动标注;设计基于结构层面的规则模板与内容层面的概念共现约束,用于文本内意象关系的自动生成,进而通过深度学习实现物象术语与意象知识抽取。[结果/结论]基于古诗鉴赏文本开展实验,利用由5个一级类、12个二级类构成的物象知识体系标注29 765个物象术语,通过触发词与共现频率约束可获得8 977条结构和内容层面的意象关系。基于BERT-BiLSTM-CNN-CRF的物象术语抽取F1值多在95%以上,基于BERT-SE-FC的物象与情感关系抽取准确率均在94%以上,并泛化出大量新物象术语与新意象关系。将意象知识存储形成知识图谱并展开知识关联可知:"喜爱"类专有意象包括<春光,依恋><杨柳,送别>等,构建古诗中将喜爱之情诉诸春日物象的文化图式;"长安""女子""明月"等通用物象则能构建多种文化图式来隐喻古诗中的不同情感。

本文引用格式

张卫 , 王昊 , 李晓敏 , Song Min . 数字人文视角下古诗意象知识抽取及其文化图式构建研究[J]. 图书情报工作, 2022 , 66(24) : 104 -117 . DOI: 10.13266/j.issn.0252-3116.2022.24.010

Abstract

[Purpose/Significance] Classical Chinese poetry imagery is good at using things (source) to metaphorically describe characters and atmosphere emotions (target), but the objects and emotions are currently widely distributed within the heterogeneous unstructured classical Chinese poetry texts, and has not been organized into a cultural schema with a knowledge interpretation system. [Method/Process] A set of cultural schema construction model and technical implementation method based on knowledge ontology was proposed for classical poetry imagery. First, the tasks of thing term extraction based on sequence labeling and thing-emotion relation extraction based on relation classification were defined. Second, without learning corpus, a Chinese domain thing term knowledge system was built for the automatic annotation of thing terms within the text; a structure-based rule template and a content-based concept co-occurrence constraint were designed for the automatic generation of imagery relations within the text, and then deep learning was used to realize the thing term and imagery knowledge extraction. [Result/Conclusion] The experiment is carried out based on classical poetry appreciation text, and 29 765 thing terms are labeled by the thing knowledge system with 5 first-level classes and 12 second-level classes; 8 977 structural and content-level imagery relations can be obtained with trigger words and co-occurrence frequency constraints. The F1 values of thing terms extraction based on BERT-BiLSTM-CNN-CRF are mostly above 95%, and the accuracy of thing-emotion relationship extraction based on BERT-SE-FC are above 94%, and a large number of new thing terms and new imagery relations are generalized. The knowledge of imagery is stored to form a knowledge graph for knowledge association: the exclusive imagery of "favorite" includes , etc., constructing a cultural schema of classical poetry expressing favorite feelings with spring things. Generic images such as "Chang’an" "woman" and "bright moon" can be used to construct many kinds of cultural schemas as the metaphors for different types of emotions.

参考文献

[1] 韩伟. 20世纪中国美学"意象"理论的发展谱系及理论构建[J]. 文艺理论研究, 2014, 34(1): 204-214.
[2] 陈煜斓. 近代学堂乐歌的文化与诗学阐释[J]. 中国社会科学, 2006(3): 160-170.
[3] 庄众显. 浅析唐诗宋词中楼台"愁"意象的人文情怀[J]. 汉字文化, 2018(9): 16-17.
[4] 姚华. 市声:范成大诗歌声音描写的新开拓[J]. 浙江学刊, 2015(1): 82-89.
[5] GAO F. Negotiation of native linguistic ideology and cultural identities in English learning: a cultural schema perspective[J]. Journal of multilingual and multicultural development, 2021, 42(6): 551-564.
[6] LIEBESKIND C, LIEBESKIND S. Deep learning for period classification of historical hebrew texts[J]. Journal of data mining & digital humanities, 2020,10: 1-21.
[7] DOU J, QIN J, JIN Z, et al. Knowledge graph based on domain ontology and natural language processing technology for Chinese intangible cultural heritage[J]. Journal of visual languages & computing, 2018, 48: 19-28.
[8] FAN T, WANG H. Research of Chinese intangible cultural heritage knowledge graph construction and attribute value extraction with graph attention network[J]. Information processing & management, 2022, 59(1): 102753.
[9] 佟秋华.论古典诗歌意象的语用功能[J].学术交流,2020(6):183-190.
[10] 胡韧奋, 诸雨辰. 唐诗题材自动分类研究[J]. 北京大学学报(自然科学版), 2015, 51(2): 262-268.
[11] AL-SHAIBANIM S, ALYAFEAI Z, AHMAD I. Meter classification of Arabic poems using deep bidirectional recurrent neural networks[J]. Pattern recognition letters, 2020, 136: 1-7.
[12] AHMAD S, ASGHAR M Z, ALOTAIBI F M, et al. Classification of poetry text into the emotional states using deep learning technique[J]. IEEE access, 2020, 8: 73865-73878.
[13] 崔竞烽, 郑德俊, 王东波, 等. 基于深度学习模型的菊花古典诗词命名实体识别[J]. 情报理论与实践, 2020, 43(11): 150-155.
[14] 张卫, 王昊, 邓三鸿, 等. 面向数字人文的古诗文本情感术语抽取与应用研究[J]. 中国图书馆学报, 2021, 47(4): 113-131.
[15] 孙蓉蓉. 论古代文论中情感论的流变[J]. 文艺理论研究, 1992(1): 35-43.
[16] 和秀梅, 张夏妮, 张积家, 等. 文化图式影响亲属词语义加工中的空间隐喻——来自汉族人和摩梭人的证据[J]. 心理学报, 2015, 47(5): 584-599.
[17] 陈清泚. 论证研究的认知社会学路径[J]. 自然辩证法研究, 2020, 36(2): 103-108.
[18] 刘娟华. 图式理论视角下古诗英译意象传递研究——以杜牧《泊秦淮》两个英译本为例[J]. 山东理工大学学报(社会科学版), 2011, 27(5): 65-68.
[19] 王向前,张宝隆,李慧宗.本体研究综述[J].情报杂志,2016,35(6):163-170.
[20] 庄传志, 靳小龙, 朱伟建, 等. 基于深度学习的关系抽取研究综述[J]. 中文信息学报, 2019, 33(12): 1-18.
[21] LIU W, YU B, ZHANG C, et al. Chinese named entity recognition based on rules and conditional random field[C]// Proceedings of the 2018 2nd international conference on computer science and artificial intelligence. Shenzhen: ACM, 2018: 268-272.
[22] AKBIK A, BLYTHE D, VOLLGRAF R. Contextual string embeddings for sequence labeling[C]// Proceedings of the 27th international conference on computational linguistics. Santa Fe: Association for Computational Linguistics, 2018: 1638-1649.
[23] ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th annual meeting of the Association for Computational Linguistics (volume 2: Short papers). Berlin: Association for Computational Linguistics, 2016: 207-212.
[24] CHEN Z, GUO C. A pattern-first pipeline approach for entity and relation extraction[J]. Neurocomputing, 2022, 494: 182-191.
[25] MIWA M, SASAKI Y. Modeling joint entity and relation extraction with table representation[C]//Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, 2014: 1858-1869.
[26] DEVLIN J, CHANG M-W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[Preprint]. arXiv:1810.04805 [cs], 2019.
[27] 王星予, 吕学强, 游新冬. KBLCC:融合实体关键字特征的医疗领域实体分类方法[J]. 小型微型计算机系统, 2022, 43(1): 27-34.
[28] WU S, HE Y. Enriching pre-trained language model with entity information for relation classification[Preprint]. arXiv:1905.08284 [cs], 2019.
[29] MINTZ M, BILLS S, SNOW R, et al. Distant supervision for relation extraction without labeled data[C]//Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP. Suntec, Singapore: Association for Computational Linguistics, 2009: 1003.
[30] WEI J, HE J, CHEN K, et al. Collaborative filtering and deep learning based recommendation system for cold start items[J]. Expert systems with applications, 2017, 69: 29-39.
[31] HONG L, HOU W, WU Z, et al. A cooperative crowdsourcing framework for knowledge extraction in digital humanities-cases on Tang poetry[J]. Aslib journal of information management, 2020, 72(2):243-261.
[32] 柳建钰, 周晓文. 计算机辅助古籍版本校勘资源库建设浅议[J]. 图书馆理论与实践, 2017(3): 54-58.
[33] 郗亚辉. 产品评论中领域情感词典的构建[J]. 中文信息学报, 2016, 30(5): 136-144.
[34] ZHANG W, WANG H, SONG M, et al. A method of constructing a fine-grained sentiment lexicon for the humanities computing of classical Chinese poetry[EB/OL]. [2022-09-20]. https://doi.org/10.1007/s00521-022-07690-8.
[35] 刘昱彤, 吴斌, 白婷. 古诗词图谱的构建及分析研究[J]. 计算机研究与发展, 2020, 57(6): 1252-1268.
[36] 朱惠, 王昊, 苏新宁, 等. 汉语领域术语非分类关系抽取方法研究[J]. 情报学报, 2018, 37(12): 1193-1203.
[37] 李仕春. 中国语文辞书的分期问题[J]. 湖北大学学报(哲学社会科学版), 2017, 44(1): 109-115.
[38] CHE W, LI Z, LIU T. LTP: a Chinese language technology platform[C]//Coling 2010: demonstrations. Beijing: Coling 2010 Organizing Committee, 2010: 13-16.
[39] FRANCIS N, GREENA, GUAGLIARDO P, et al. Cypher: an evolving query language for property graphs[C]//Proceedings of the 2018 international conference on management of data. Houston: Association for Computing Machinery, 2018: 1433-1445.
文章导航

/