|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies
Optimization of Algorithm to Identification of Duplicate Tuples through Similarity Phonetic Based on Multithreading
Gwangju, Korea
October 20-October 22
ISBN: 978-0-7695-4564-6
| ASCII Text | x | ||
| Tiago Luís Andrade, Rogéria Cristiane Gratão de Souza, Maurizio Babini, Carlos Roberto Valêncio, "Optimization of Algorithm to Identification of Duplicate Tuples through Similarity Phonetic Based on Multithreading," Parallel and Distributed Computing Applications and Technologies, International Conference on, pp. 299-304, 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/PDCAT.2011.58, author = {Tiago Luís Andrade and Rogéria Cristiane Gratão de Souza and Maurizio Babini and Carlos Roberto Valêncio}, title = {Optimization of Algorithm to Identification of Duplicate Tuples through Similarity Phonetic Based on Multithreading}, journal ={Parallel and Distributed Computing Applications and Technologies, International Conference on}, volume = {0}, year = {2011}, isbn = {978-0-7695-4564-6}, pages = {299-304}, doi = {http://doi.ieeecomputersociety.org/10.1109/PDCAT.2011.58}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Parallel and Distributed Computing Applications and Technologies, International Conference on TI - Optimization of Algorithm to Identification of Duplicate Tuples through Similarity Phonetic Based on Multithreading SN - 978-0-7695-4564-6 SP299 EP304 A1 - Tiago Luís Andrade, A1 - Rogéria Cristiane Gratão de Souza, A1 - Maurizio Babini, A1 - Carlos Roberto Valêncio, PY - 2011 KW - Data cleansing KW - duplicated tuples KW - algorithm VL - 0 JA - Parallel and Distributed Computing Applications and Technologies, International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PDCAT.2011.58
Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this.
Index Terms:
Data cleansing, duplicated tuples, algorithm
Citation:
Tiago Luís Andrade, Rogéria Cristiane Gratão de Souza, Maurizio Babini, Carlos Roberto Valêncio, "Optimization of Algorithm to Identification of Duplicate Tuples through Similarity Phonetic Based on Multithreading," pdcat, pp.299-304, 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, 2011
Usage of this product signifies your acceptance of the Terms of Use.
