This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Making a Clean Sweep of Cultural Heritage
March/April 2009 (vol. 24 no. 2)
pp. 54-63
Antal van den Bosch, Tilburg University
Marieke van Erp, Tilburg University
Caroline Sporleder, Saarland University
Digitization brings about new ways of analyzing data from cultural heritage areas. Automatic error detection, as input to semiautomatic error correction, is one type of analysis that can be found high on the priority list of cultural heritage data managers and researchers. We describe a general approach to cleaning cultural heritage databases. We present four case studies on databases from different cultural heritage institutions, and describe an information system in which we embed our error detector in a larger framework, enabling researchers to access, check, and correct their data more easily than before.

1. A. Chapman, Principles and Methods of Data Cleaning: Primary Species and Species Occurrence Data, ver. 1.0, tech. report, Global Biodiversity Information Facility, 2005.
2. J. Kubica and A. Moore, "Probabilistic Noise Identification and Data Cleaning," 3rd IEEE Int'l Conf. Data Mining (ICDM 03), IEEE CS Press, 2003, pp. 131–138.
3. X. Zhu, X. Wu, and Y. Yang, "Error Detection and Impact-Sensitive Instance Ranking in Noisy Datasets," Proc. 19th Nat'l Conf. Artificial Intelligence (AAAI 04), AAAI Press, 2004, pp. 378–383.
4. J. Van Hulse, T. Khoshgoftaar, and H. Huang, "The Pairwise Attribute Noise Detection Algorithm," Knowledge and Information Systems, vol. 11, no. 2, 2007, pp. 171–190.
5. J. Maletic and A. Marcus, "Data Cleansing: Beyond Integrity Analysis," Proc. Int'l Conf. Information Quality (ICIQ 00), MIT Press, 2000, pp. 200–209.
6. T.M. Cover and P.E. Hart, "Nearest Neighbor Pattern Classification," IEEE Trans. Information Theory, vol. 13, no. 1, 1967, pp. 21–27.
7. C. Sporleder et al., "Spotting the 'Odd-One-Out': Data-Driven Error Detection and Correction in Textual Databases," Proc. EACL 2006 Workshop Adaptive Text Extraction and Mining (ATEM 06), Assoc. Computational Linguistics, 2006, pp. 40–47.
8. W. Daelemans and A. van den Bosch, Memory-Based Language Processing, Cambridge Univ. Press, 2005.
9. M. Reynaert, Text-Induced Spelling Correction, PhD dissertation, Computational Linguistics and Artificial Intelligence Research Unit, Tilburg Univ., 2005.

Index Terms:
database cleaning, cultural heritage, automatic error detection
Citation:
Antal van den Bosch, Marieke van Erp, Caroline Sporleder, "Making a Clean Sweep of Cultural Heritage," IEEE Intelligent Systems, vol. 24, no. 2, pp. 54-63, March-April 2009, doi:10.1109/MIS.2009.33
Usage of this product signifies your acceptance of the Terms of Use.