

  • 白如江 ,
  • 王晓笛 ,
  • 王效岳
  • 山东理工大学科技信息研究所

收稿日期: 2013-05-28

  修回日期: 2013-07-12

  网络出版日期: 2013-08-05


本文系国家社会科学基金项目“学术文献'意抄'检测研究”(项目编号:12CTQ032)和山东省自然科学基金项目“大规模学术文献并行处理与语义分类研究” (项目编号:ZR2011GL025)研究成果之一。

Literature Similarity Detection Based on Digital Fingerprint

  • Bai Rujiang ,
  • Wang Xiaodi ,
  • Wang Xiaoyue
  • Institute of Scientific & Technical Information, Shandong University of Technology, Zibo 255049

Received date: 2013-05-28

  Revised date: 2013-07-12

  Online published: 2013-08-05




白如江 , 王晓笛 , 王效岳 . 基于数字指纹的文献相似度检测研究[J]. 图书情报工作, 2013 , 57(15) : 88 -95 . DOI: 10.7536/j.issn.0252-3116.2013.15.014


As a copyright protection technique, digital fingerprint has been a hot research area. This paper proposed a digital fingerprinting algorithm for text based on Chinese words frequency. A frequency list is built through statistics on word frequency and character frequency in a document repository. With this frequency list a digital fingerprint for text of any length can be generated based on the principle for maximum entropy. To get an estimation of the similarity for two texts a Hamming distance can be calculated for the two corresponding digital fingerprint. We build a hash table based on zhwiki-20121129-all-titles corpus and with this table experiment on four core journals. The result shows that normal ways of plagiarism can be detected by this robust fingerprinting algorithm.


