loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Conference on Data Mining (ICDM'06)
Plagiarism Detection in arXiv
Hong Kong
December 18-December 22
ISBN: 0-7695-2701-9
Daria Sorokina, Cornell University, USA
Johannes Gehrke, Cornell University, USA
Simeon Warner, Cornell University, USA
Paul Ginsparg, Cornell University, USA
We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology effi- ciently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to imple- ment as a real-time submission screen for a collection many times larger.
Citation:
Daria Sorokina, Johannes Gehrke, Simeon Warner, Paul Ginsparg, "Plagiarism Detection in arXiv," icdm, pp.1070-1075, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.