图书情报工作 ›› 2021, Vol. 65 ›› Issue (8): 97-106.DOI: 10.13266/j.issn.0252-3116.2021.08.010

• 知识组织 • 上一篇    下一篇

敦煌壁画叙词表关联数据实体语义相似度计算方法与实验

高劲松1, 付家炜1, 李珂2   

  1. 1 华中师范大学信息管理学院 武汉 430079;
    2 青岛海信日立空调营销股份有限公司 青岛 266510
  • 收稿日期:2020-09-28 修回日期:2021-01-08 出版日期:2021-04-20 发布日期:2021-06-02
  • 作者简介:高劲松(ORCID:0000-0003-0022-5923),教授,博士生导师,E-mail:jsgao@mail.ccnu.edu.cn;付家炜(ORCID:0000-0002-2996-3762),博士研究生;李珂(ORCID:0000-0002-5212-9733),硕士。
  • 基金资助:
    本文系国家社会科学基金重大项目"新时代我国文献信息资源保障体系重构研究"(项目编号:19ZDA345)与中央高校基本科研业务费自由探索项目"面向用户的文物信息资源知识服务研究"(项目编号:CCNU20A06025)研究成果之一。

A Method of Entity Semantic Similarity Calculation for Dunhuang Mural Thesaurus Linked Data with Experiment

Gao Jinsong1, Fu Jiawei1, Li Ke2   

  1. 1 School of Information Management, Central China Normal University, Wuhan 430079;
    2 Qingdao Hisense Hitachi Air Conditioning Marketing Co., Ltd, Qingdao 266510
  • Received:2020-09-28 Revised:2021-01-08 Online:2021-04-20 Published:2021-06-02

摘要: [目的/意义] 随着文化遗产数字化和人文计算研究范式的兴起,人文领域学者在参与数字人文研究过程中对于文化遗产数据资源的利用需求日益突显。多源、异构文化遗产信息资源的语义融合与互操作成为当前数字人文数据基础设施建设中的关键问题,而行之有效的实体语义相似度计算方法则成为实现这一目标的重要手段。[方法/过程] 以敦煌壁画叙词表关联数据为例,在分析该数据集本体模型与数据框架的基础上,针对其内容分布与结构特征提出一种多粒度匹配与加权运算相结合的实体语义相似度计算方法,并选取敦煌壁画叙词表关联数据中"飞天"相关实体为实验对象,引入属性特征、编辑距离等多种现有实体语义相似度计算方法进行对比实验。[结果/结论] 实验结果表明,本文提出的基于多粒度匹配的实体语义相似度计算方法,能够更好地适应敦煌壁画叙词表关联数据的内容与结构特征,在计算结果准确性方面比同类方法具有更好的表现,是推动数字人文背景下异构人文信息资源的数据互联与知识共享的又一可行思路。

关键词: 敦煌壁画, 关联数据, 多粒度, 语义相似度, 实体相似度

Abstract: [Purpose/significance] With the developing of cultural heritage digitization and humanities computing paradigm, the demand of cultural heritage data resources from scholars in the field of humanities have increasingly highlighted when participating in digital humanities research. The semantic integration and interoperability of multi-source and heterogeneous cultural heritage information resources has become a key issue in the construction of digital humanities data infrastructure nowadays, and the effective method of entity semantic similarity calculation has became an important means to achieve this goal.[Method/process] Based on the analysis of the ontology model and data framework of Dunhuang Mural Thesaurus Linked Data, this paper proposed an entity semantic similarity calculation method based on the integration of multi granularity matching and weighted calculate, and selected "Feitian" related entities in the dataset as the experimental object to compare the effects of the method proposed in this paper with current methods base on attribute characteristic or edit distance in semantic similarity calculation.[Result/conclusion] The experimental results show that, compareing with the other methods, the entity semantic similarity calculation method based on multi-granularity matching can better adapt to the content and structural characteristics of Dunhuang Mural Thesaurus Linked Data, and has better performance in the accuracy of calculation. Thus this paper has introduced another feasible idea for promoting the data interconnection and knowledge sharing of heterogeneous human information resources under the background of digital humanities.

Key words: Dunhuang murals, linked data, multi-granularity, semantic similarity, entity similarity

中图分类号: