图书情报工作 ›› 2022, Vol. 66 ›› Issue (24): 104-117.DOI: 10.13266/j.issn.0252-3116.2022.24.010

• 知识组织 • 上一篇    下一篇

数字人文视角下古诗意象知识抽取及其文化图式构建研究

张卫1,2,3, 王昊1,3, 李晓敏1,3, Song Min2   

  1. 1. 南京大学信息管理学院 南京 210023;
    2. 延世大学文学院 首尔 03722;
    3. 江苏省数据工程与知识服务重点实验室 南京 210023
  • 收稿日期:2022-07-01 修回日期:2022-10-12 出版日期:2022-12-20 发布日期:2022-12-27
  • 通讯作者: 王昊,教授,博士,博士生导师,通信作者,E-mail:ywhaowang@nju.edu.cn
  • 作者简介:张卫,博士研究生;李晓敏,博士研究生;Song Min,教授,博士,博士生导师。
  • 基金资助:
    本文系国家自然科学基金面上项目"关联数据驱动下我国非遗文本的语义解析与人文计算研究"(项目编号:72074108)和2021年江苏省研究生科研创新计划"面向心理健康的医学文本语义解析与知识图谱构建研究"(项目编号:KYCX21_0026)研究成果之一。

Knowledge Extraction and Cultural Schema Construction of Classical Poetry Imagery from the Digital Humanities

Zhang Wei1,2,3, Wang Hao1,3, Li Xiaomin1,3, Song Min2   

  1. 1. School of Information Management, Nanjing University, Nanjing 210023;
    2. College of Liberal Arts, Yonsei University, Seoul 03722;
    3. Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023
  • Received:2022-07-01 Revised:2022-10-12 Online:2022-12-20 Published:2022-12-27

摘要: [目的/意义]古典诗歌意象善于运用物象(源域)隐喻人物或氛围情感(目标域),然而物象与情感知识目前广泛分布在多源异构的非结构化古诗文本内,尚未组织成具有知识解释体系的文化图式。[方法/过程]针对古诗意象提出一套基于知识本体的文化图式构建模式与技术实现方法。首先,定义基于序列标注的物象术语抽取和基于关系分类的物象与情感关系抽取任务。其次,在无学习语料下,搭建中文领域物象术语知识体系,用于文本内物象术语的自动标注;设计基于结构层面的规则模板与内容层面的概念共现约束,用于文本内意象关系的自动生成,进而通过深度学习实现物象术语与意象知识抽取。[结果/结论]基于古诗鉴赏文本开展实验,利用由5个一级类、12个二级类构成的物象知识体系标注29 765个物象术语,通过触发词与共现频率约束可获得8 977条结构和内容层面的意象关系。基于BERT-BiLSTM-CNN-CRF的物象术语抽取F1值多在95%以上,基于BERT-SE-FC的物象与情感关系抽取准确率均在94%以上,并泛化出大量新物象术语与新意象关系。将意象知识存储形成知识图谱并展开知识关联可知:"喜爱"类专有意象包括<春光,依恋><杨柳,送别>等,构建古诗中将喜爱之情诉诸春日物象的文化图式;"长安""女子""明月"等通用物象则能构建多种文化图式来隐喻古诗中的不同情感。

关键词: 古诗意象, 知识抽取, 知识本体, 文化图式, 深度学习

Abstract: [Purpose/Significance] Classical Chinese poetry imagery is good at using things (source) to metaphorically describe characters and atmosphere emotions (target), but the objects and emotions are currently widely distributed within the heterogeneous unstructured classical Chinese poetry texts, and has not been organized into a cultural schema with a knowledge interpretation system. [Method/Process] A set of cultural schema construction model and technical implementation method based on knowledge ontology was proposed for classical poetry imagery. First, the tasks of thing term extraction based on sequence labeling and thing-emotion relation extraction based on relation classification were defined. Second, without learning corpus, a Chinese domain thing term knowledge system was built for the automatic annotation of thing terms within the text; a structure-based rule template and a content-based concept co-occurrence constraint were designed for the automatic generation of imagery relations within the text, and then deep learning was used to realize the thing term and imagery knowledge extraction. [Result/Conclusion] The experiment is carried out based on classical poetry appreciation text, and 29 765 thing terms are labeled by the thing knowledge system with 5 first-level classes and 12 second-level classes; 8 977 structural and content-level imagery relations can be obtained with trigger words and co-occurrence frequency constraints. The F1 values of thing terms extraction based on BERT-BiLSTM-CNN-CRF are mostly above 95%, and the accuracy of thing-emotion relationship extraction based on BERT-SE-FC are above 94%, and a large number of new thing terms and new imagery relations are generalized. The knowledge of imagery is stored to form a knowledge graph for knowledge association: the exclusive imagery of "favorite" includes , etc., constructing a cultural schema of classical poetry expressing favorite feelings with spring things. Generic images such as "Chang’an" "woman" and "bright moon" can be used to construct many kinds of cultural schemas as the metaphors for different types of emotions.

Key words: classical poetry imagery, knowledge extraction, knowledge ontology, cultural schema, deep learning

中图分类号: