The Community for Technology Leaders
RSS Icon
Issue No.02 - March/April (2009 vol.24)
pp: 54-63
Marieke van Erp , Tilburg University
Antal van den Bosch , Tilburg University
Digitization brings about new ways of analyzing data from cultural heritage areas. Automatic error detection, as input to semiautomatic error correction, is one type of analysis that can be found high on the priority list of cultural heritage data managers and researchers. We describe a general approach to cleaning cultural heritage databases. We present four case studies on databases from different cultural heritage institutions, and describe an information system in which we embed our error detector in a larger framework, enabling researchers to access, check, and correct their data more easily than before.
database cleaning, cultural heritage, automatic error detection
Marieke van Erp, Antal van den Bosch, "Making a Clean Sweep of Cultural Heritage", IEEE Intelligent Systems, vol.24, no. 2, pp. 54-63, March/April 2009, doi:10.1109/MIS.2009.33
1. A. Chapman, Principles and Methods of Data Cleaning: Primary Species and Species Occurrence Data, ver. 1.0, tech. report, Global Biodiversity Information Facility, 2005.
2. J. Kubica and A. Moore, "Probabilistic Noise Identification and Data Cleaning," 3rd IEEE Int'l Conf. Data Mining (ICDM 03), IEEE CS Press, 2003, pp. 131–138.
3. X. Zhu, X. Wu, and Y. Yang, "Error Detection and Impact-Sensitive Instance Ranking in Noisy Datasets," Proc. 19th Nat'l Conf. Artificial Intelligence (AAAI 04), AAAI Press, 2004, pp. 378–383.
4. J. Van Hulse, T. Khoshgoftaar, and H. Huang, "The Pairwise Attribute Noise Detection Algorithm," Knowledge and Information Systems, vol. 11, no. 2, 2007, pp. 171–190.
5. J. Maletic and A. Marcus, "Data Cleansing: Beyond Integrity Analysis," Proc. Int'l Conf. Information Quality (ICIQ 00), MIT Press, 2000, pp. 200–209.
6. T.M. Cover and P.E. Hart, "Nearest Neighbor Pattern Classification," IEEE Trans. Information Theory, vol. 13, no. 1, 1967, pp. 21–27.
7. C. Sporleder et al., "Spotting the 'Odd-One-Out': Data-Driven Error Detection and Correction in Textual Databases," Proc. EACL 2006 Workshop Adaptive Text Extraction and Mining (ATEM 06), Assoc. Computational Linguistics, 2006, pp. 40–47.
8. W. Daelemans and A. van den Bosch, Memory-Based Language Processing, Cambridge Univ. Press, 2005.
9. M. Reynaert, Text-Induced Spelling Correction, PhD dissertation, Computational Linguistics and Artificial Intelligence Research Unit, Tilburg Univ., 2005.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool