[Purpose/significance] With the developing of cultural heritage digitization and humanities computing paradigm, the demand of cultural heritage data resources from scholars in the field of humanities have increasingly highlighted when participating in digital humanities research. The semantic integration and interoperability of multi-source and heterogeneous cultural heritage information resources has become a key issue in the construction of digital humanities data infrastructure nowadays, and the effective method of entity semantic similarity calculation has became an important means to achieve this goal.[Method/process] Based on the analysis of the ontology model and data framework of Dunhuang Mural Thesaurus Linked Data, this paper proposed an entity semantic similarity calculation method based on the integration of multi granularity matching and weighted calculate, and selected "Feitian" related entities in the dataset as the experimental object to compare the effects of the method proposed in this paper with current methods base on attribute characteristic or edit distance in semantic similarity calculation.[Result/conclusion] The experimental results show that, compareing with the other methods, the entity semantic similarity calculation method based on multi-granularity matching can better adapt to the content and structural characteristics of Dunhuang Mural Thesaurus Linked Data, and has better performance in the accuracy of calculation. Thus this paper has introduced another feasible idea for promoting the data interconnection and knowledge sharing of heterogeneous human information resources under the background of digital humanities.
Gao Jinsong
,
Fu Jiawei
,
Li Ke
. A Method of Entity Semantic Similarity Calculation for Dunhuang Mural Thesaurus Linked Data with Experiment[J]. Library and Information Service, 2021
, 65(8)
: 97
-106
.
DOI: 10.13266/j.issn.0252-3116.2021.08.010
[1] 黄水清.人文计算与数字人文:概念、问题、范式及关键环节[J].图书馆建设,2019(5):68-78.
[2] 敦煌壁画叙词表关联数据服务平台[EB/OL].[2021-01-06].http://dh.whu.edu.cn/dhvocab/home.
[3] 左丹,欧石燕.人文信息资源语义描述、语义组织研究与实践述评[J].图书馆论坛,2019,39(8):21-31.
[4] 侯西龙,谈国新,庄文杰,等.基于关联数据的非物质文化遗产知识管理研究[J].中国图书馆学报,2019,45(2):88-108.
[5] 陈涛,刘炜,单蓉蓉,等.知识图谱在数字人文中的应用研究[J].中国图书馆学报,2019,45(6):34-49.
[6] 夏翠娟,张磊.关联数据在家谱数字人文服务中的应用[J].图书馆杂志,2016,35(10):26-34.
[7] 翟姗姗,许鑫,夏立新,等.语义出版技术在非遗数字资源共享中的应用研究[J].图书情报工作,2017,61(2):23-31.
[8] 曾子明,周知,蒋琳.基于关联数据的数字人文视觉资源知识组织研究[J].情报资料工作,2018(6):6-12.
[9] 龚振,范冰冰.数据集的语义关联发现方法研究[J].计算机应用与软件,2018,35(8):83-86,185.
[10] 张哲. 基于语义相似度分析的关联数据模型研究[D].北京:北京邮电大学,2018.
[11] 王忠义,周杰,黄京.数字图书馆多粒度关联数据的创建与发布[J].情报学报,2016,35(8):885-896.
[12] PASSANT A. Measuring semantic distance on linking data and ising it for resources recommendations[C]//Aaai spring symposium:linked data meets artificial intelligence.2010:93-98.
[13] HICKSON M, KARGAKIS Y, TZITZIKAS Y.Similarity-based browsing over linked open data[EB/OL].[2021-04-03].https://arxiv.org/pdf/1106.4176v1.pdf.
[14] TVERSKY A. Features of similarity[J]. Readings in cognitive science,1977,84(4):290-302.
[15] 邓兰兰,李春旺.关联数据资源集相似度计算方法研究[J].情报理论与实践,2012,35(5):112-116.
[16] 孙海霞,钱庆,成颖.基于本体的语义相似度计算方法研究综述[J].现代图书情报技术,2010(1):51-56.
[17] 贾丽梅,郑志蕴,李钝,等.基于动态权值的关联数据语义相似度算法研究[J].计算机科学,2014,41(8):263-266,273.
[18] MEYMANDPOUR R, DAVIS J. A semantic similarity measure for linked data:an information content-based approach[J].Knowledge-based systems,2016,109(19):276-293.
[19] 刘晓娟,刘群.基于关联数据的探索式检索系统研究与实现[J].图书情报工作,2017,61(5):117-124.
[20] 张立波,孙一涵,罗铁坚.一种基于大规模知识库的语义相似性计算方法[J].计算机研究与发展,2017,54(11):2576-2585.
[21] 王晓光,侯西龙,程航航,等.敦煌壁画叙词表构建与关联数据发布[J].中国图书馆学报,2020,46(4):69-84.
[22] 敦煌壁画叙词表项目介绍[EB/OL].[2021-01-06]. http://dh.whu.edu.cn/dhvocab/dhresource/html/intro.html.
[23] 本体模型[EB/OL].[2021-01-06].http://dh.whu.edu.cn/dhvocab/ontology.
[24] RADA R,MILI H. Development and application of a metric on semantic nets[J]. Ieee transaction on system man & cybernetics, 1989, 19(1):17-30.
[25] 贺元香,史宝明,张永.基于本体的语义相似度算法研究[J].计算机应用与软件,2013,30(11):312-315.
[26] 邓兰兰,李春旺.Web数据关联创建策略研究[J].现代图书情报技术,2011(5):1-6.
[27] 敦煌壁画叙词表关联数据查询[EB/OL].[2021-01-06].http://dh.whu.edu.cn/dhvocab/sparql.