图书情报工作 ›› 2014, Vol. 58 ›› Issue (12): 130-135.DOI: 10.13266/j.issn.0252-3116.2014.12.020

• 知识组织 • 上一篇    下一篇

基于语义角色标注的文献相似度检测研究

王晓笛, 祝娜, 白如江, 王效岳   

  1. 山东理工大学科技信息研究所
  • 收稿日期:2014-04-30 修回日期:2014-06-03 出版日期:2014-06-20 发布日期:2014-06-20
  • 通讯作者: 祝娜,山东理工大学科技信息研究所硕士研究生,E-mail:742725186@qq.com
  • 作者简介:王晓笛,山东理工大学科技信息研究所硕士研究生;白如江,山东理工大学科技信息研究所讲师;王效岳,山东理工大学科技信息研究所教授。
  • 基金资助:

    本文系国家社会科学基金项目“学术文献'意抄’检测研究”(项目编号:12CTQ032)和山东理工大学人文社会科学发展基金项目“Web信息检索与智能挖掘”研究成果之一。

Research on Literature Similarity Detection Based on Semantic Role Labeling

Wang Xiaodi, Zhu Na, Bai Rujiang, Wang Xiaoyue   

  1. Institute of Scientific & Technical Information, Shandong University of Technology, Zibo 255049
  • Received:2014-04-30 Revised:2014-06-03 Online:2014-06-20 Published:2014-06-20

摘要:

利用语义角色标注技术对文献进行标注,以句子为最小单位进行文献的语义相似度检测。提取文献中所有词语的上位词,为每篇文献形成句子-词-语义角色-上位词四部图。语义相似的句子对比参照四部图确定,最终计算出两篇文献相似句子的Jaccard系数作为两篇文献的语义相似度。实验结果表明,所识别出的语义相似度较字粒度Jaccard系数法、词粒度Jaccard系数法、Winnowing Jaccard系数法等高出13%,然而受语料库限制,本方法还有很大的提升空间。

关键词: 语义角色标注, 科技文献, 相似度检测

Abstract:

In recent years, several academic misconducts have caught the attention of both the academic community and departments concerned which makes similarity detection a hot research point. To cope with semantic plagiarism, researchers begin to study the semantic information. This paper proposes a literature semantic similarity detection method based on semantic role labeling. First a paper is labeled using a SRL tool. Sentence granularity is used. Hypernyms were extracted using a semantic dictionary. Every paper is represented by a sentence-term-semantic role-hypernym 4-partite graph. Sentence comparison refers to the 4-partite graph. Jaccard coefficient is computed to represent the similarity between two papers. Due to the confinement of SRL tools, the result of semantic similarity detection is not agreeable. Even so it is still 13% higher than other methods.

Key words: semantic role labeling, scientific literature, similarity detection

中图分类号: