[目的/意义] 随着文化遗产数字化和人文计算研究范式的兴起,人文领域学者在参与数字人文研究过程中对于文化遗产数据资源的利用需求日益突显。多源、异构文化遗产信息资源的语义融合与互操作成为当前数字人文数据基础设施建设中的关键问题,而行之有效的实体语义相似度计算方法则成为实现这一目标的重要手段。[方法/过程] 以敦煌壁画叙词表关联数据为例,在分析该数据集本体模型与数据框架的基础上,针对其内容分布与结构特征提出一种多粒度匹配与加权运算相结合的实体语义相似度计算方法,并选取敦煌壁画叙词表关联数据中"飞天"相关实体为实验对象,引入属性特征、编辑距离等多种现有实体语义相似度计算方法进行对比实验。[结果/结论] 实验结果表明,本文提出的基于多粒度匹配的实体语义相似度计算方法,能够更好地适应敦煌壁画叙词表关联数据的内容与结构特征,在计算结果准确性方面比同类方法具有更好的表现,是推动数字人文背景下异构人文信息资源的数据互联与知识共享的又一可行思路。
[Purpose/significance] With the developing of cultural heritage digitization and humanities computing paradigm, the demand of cultural heritage data resources from scholars in the field of humanities have increasingly highlighted when participating in digital humanities research. The semantic integration and interoperability of multi-source and heterogeneous cultural heritage information resources has become a key issue in the construction of digital humanities data infrastructure nowadays, and the effective method of entity semantic similarity calculation has became an important means to achieve this goal.[Method/process] Based on the analysis of the ontology model and data framework of Dunhuang Mural Thesaurus Linked Data, this paper proposed an entity semantic similarity calculation method based on the integration of multi granularity matching and weighted calculate, and selected "Feitian" related entities in the dataset as the experimental object to compare the effects of the method proposed in this paper with current methods base on attribute characteristic or edit distance in semantic similarity calculation.[Result/conclusion] The experimental results show that, compareing with the other methods, the entity semantic similarity calculation method based on multi-granularity matching can better adapt to the content and structural characteristics of Dunhuang Mural Thesaurus Linked Data, and has better performance in the accuracy of calculation. Thus this paper has introduced another feasible idea for promoting the data interconnection and knowledge sharing of heterogeneous human information resources under the background of digital humanities.
[1] 黄水清.人文计算与数字人文:概念、问题、范式及关键环节[J].图书馆建设,2019(5):68-78.
[2] 敦煌壁画叙词表关联数据服务平台[EB/OL].[2021-01-06].http://dh.whu.edu.cn/dhvocab/home.
[3] 左丹,欧石燕.人文信息资源语义描述、语义组织研究与实践述评[J].图书馆论坛,2019,39(8):21-31.
[4] 侯西龙,谈国新,庄文杰,等.基于关联数据的非物质文化遗产知识管理研究[J].中国图书馆学报,2019,45(2):88-108.
[5] 陈涛,刘炜,单蓉蓉,等.知识图谱在数字人文中的应用研究[J].中国图书馆学报,2019,45(6):34-49.
[6] 夏翠娟,张磊.关联数据在家谱数字人文服务中的应用[J].图书馆杂志,2016,35(10):26-34.
[7] 翟姗姗,许鑫,夏立新,等.语义出版技术在非遗数字资源共享中的应用研究[J].图书情报工作,2017,61(2):23-31.
[8] 曾子明,周知,蒋琳.基于关联数据的数字人文视觉资源知识组织研究[J].情报资料工作,2018(6):6-12.
[9] 龚振,范冰冰.数据集的语义关联发现方法研究[J].计算机应用与软件,2018,35(8):83-86,185.
[10] 张哲. 基于语义相似度分析的关联数据模型研究[D].北京:北京邮电大学,2018.
[11] 王忠义,周杰,黄京.数字图书馆多粒度关联数据的创建与发布[J].情报学报,2016,35(8):885-896.
[12] PASSANT A. Measuring semantic distance on linking data and ising it for resources recommendations[C]//Aaai spring symposium:linked data meets artificial intelligence.2010:93-98.
[13] HICKSON M, KARGAKIS Y, TZITZIKAS Y.Similarity-based browsing over linked open data[EB/OL].[2021-04-03].https://arxiv.org/pdf/1106.4176v1.pdf.
[14] TVERSKY A. Features of similarity[J]. Readings in cognitive science,1977,84(4):290-302.
[15] 邓兰兰,李春旺.关联数据资源集相似度计算方法研究[J].情报理论与实践,2012,35(5):112-116.
[16] 孙海霞,钱庆,成颖.基于本体的语义相似度计算方法研究综述[J].现代图书情报技术,2010(1):51-56.
[17] 贾丽梅,郑志蕴,李钝,等.基于动态权值的关联数据语义相似度算法研究[J].计算机科学,2014,41(8):263-266,273.
[18] MEYMANDPOUR R, DAVIS J. A semantic similarity measure for linked data:an information content-based approach[J].Knowledge-based systems,2016,109(19):276-293.
[19] 刘晓娟,刘群.基于关联数据的探索式检索系统研究与实现[J].图书情报工作,2017,61(5):117-124.
[20] 张立波,孙一涵,罗铁坚.一种基于大规模知识库的语义相似性计算方法[J].计算机研究与发展,2017,54(11):2576-2585.
[21] 王晓光,侯西龙,程航航,等.敦煌壁画叙词表构建与关联数据发布[J].中国图书馆学报,2020,46(4):69-84.
[22] 敦煌壁画叙词表项目介绍[EB/OL].[2021-01-06]. http://dh.whu.edu.cn/dhvocab/dhresource/html/intro.html.
[23] 本体模型[EB/OL].[2021-01-06].http://dh.whu.edu.cn/dhvocab/ontology.
[24] RADA R,MILI H. Development and application of a metric on semantic nets[J]. Ieee transaction on system man & cybernetics, 1989, 19(1):17-30.
[25] 贺元香,史宝明,张永.基于本体的语义相似度算法研究[J].计算机应用与软件,2013,30(11):312-315.
[26] 邓兰兰,李春旺.Web数据关联创建策略研究[J].现代图书情报技术,2011(5):1-6.
[27] 敦煌壁画叙词表关联数据查询[EB/OL].[2021-01-06].http://dh.whu.edu.cn/dhvocab/sparql.