Library and Information Service >
Research on Literature Similarity Detection Based on Semantic Role Labeling
Received date: 2014-04-30
Revised date: 2014-06-03
Online published: 2014-06-20
In recent years, several academic misconducts have caught the attention of both the academic community and departments concerned which makes similarity detection a hot research point. To cope with semantic plagiarism, researchers begin to study the semantic information. This paper proposes a literature semantic similarity detection method based on semantic role labeling. First a paper is labeled using a SRL tool. Sentence granularity is used. Hypernyms were extracted using a semantic dictionary. Every paper is represented by a sentence-term-semantic role-hypernym 4-partite graph. Sentence comparison refers to the 4-partite graph. Jaccard coefficient is computed to represent the similarity between two papers. Due to the confinement of SRL tools, the result of semantic similarity detection is not agreeable. Even so it is still 13% higher than other methods.
Wang Xiaodi , Zhu Na , Bai Rujiang , Wang Xiaoyue . Research on Literature Similarity Detection Based on Semantic Role Labeling[J]. Library and Information Service, 2014 , 58(12) : 130 -135 . DOI: 10.13266/j.issn.0252-3116.2014.12.020
[1] McCabe D L. Cheating among college and university students: A North American perspective[J].International Journal for Educational Integrity, 2005, 1(1):1-11.
[2] Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by latent semantic analysis[J].JASIS, 1990, 41(6): 391-407.
[3] García-Molina H, Gravano L, Shivakumar N. dSCAM: Finding document copies across multiple databases[C]//Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems. IEEE, 1996: 68-79.
[4] Manber U. Finding similar files in a large file system[C]//Proceedings of the Winter USENIX Technical Conference. San Francisco:USENIX Association,1994: 1-10.
[5] Zobel J, Moffat A. Exploring the similarity space[J].ACM SIGIR Forum., 1998, 32(1): 18-34.
[6] Schleimer S, Wilkerson D S, Aiken A. Winnowing:Local algorithms for document fingerprinting[C]//Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. New York:ACM, 2003: 76-85.
[7] Chowdhury A, Frieder O, Grossman D, et al. Collection statistics for fast duplicate document detection[J].ACM Transactions on Information Systems, 2002, 20(2): 171-191.
[8] Hoad T C, Zobel J. Methods for identifying versioned and plagiarized documents[J].Journal of the American society for information science and technology, 2003, 54(3): 203-215.
[9] Miller G A. WordNet:A lexical database for English[J].Communications of the ACM, 1995, 38(11): 39-41.
[10] Alzahrani S, Salim N. Fuzzy semantic-based string similarity for extrinsic plagiarism detection[C]//Proceedings of CLEF.Padua:2010:22-28.
[11] Kent C K, Salim N. Web based cross language plagiarism detection[C]//Proceedings of the 2010 Second International Conference on Computational Intelligence, Modelling and Simulation. Skudai:IEEE, 2010: 199-204.
[12] Baker C F, Fillmore C J, Lowe J B. The Berkeley framenet project[C]//Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1. Stroudsburg:Association for Computational Linguistics, 1998: 86-90.
[13] Kingsbury P, Palmer M. From TreeBank to PropBank[C]// Proceedings of the Third International Conference on Language Resources and Evaluation. Las Palmas: 2002.
[14] Schuler K K. VerbNet: A broad-coverage, comprehensive verb lexicon[D]. Philadelphia: University of Pennsylvania,2005.
[15] Fillmore C J. Toward a modern theory of case[M]. Ohio State University Press, 1966.
[16] Gildea D, Jurafsky D. Automatic labeling of semantic roles[J].Computational Linguistics, 2002, 28(3): 245-288.
[17] Chang Chih-Chung, Lin Chih-Jen. LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 1-27.
[18] Osman A H, Salim N, Binwahlan M S, et al. An improved plagiarism detection scheme based on semantic role labeling[J].Applied Soft Computing, 2012, 12(5): 1493-1502.
[19] Furlan B, Batanovi V, Nikolic B. Semantic similarity of short texts in languages with a deficient natural language processing support[J]. Decision Support Systems, 2013, 55(3): 710-719.
[20] Potthast M, Gollub T, Hagen M, et al. Overview of the 4th International Competition on Plagiarism Detection[C]// Notebook Papers of CLEF 2012 LABs and Workshops. Rome:2012:1-28.
[21] 糖尿病手术十大疑问:手术如何降血糖?访上海第二军医大学长海医院内分泌科主任邹大进教授哈尔滨医科大学附属第二医院普外科孙世波教授[J].糖尿病文摘,2013(11):16-18.
[22] Manning C D, Raghavan P, Schütze H. Introduction to information retrieval[M].Cambridge: Cambridge University Press, 2008.
/
〈 | 〉 |