
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Ana Granados, Manuel Cebrián, David Camacho, Francisco de Borja Rodríguez, "Reducing the Loss of Information through Annealing Text Distortion," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7, pp. 10901102, July, 2011.  
BibTex  x  
@article{ 10.1109/TKDE.2010.173, author = {Ana Granados and Manuel Cebrián and David Camacho and Francisco de Borja Rodríguez}, title = {Reducing the Loss of Information through Annealing Text Distortion}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {23}, number = {7}, issn = {10414347}, year = {2011}, pages = {10901102}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.173}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Reducing the Loss of Information through Annealing Text Distortion IS  7 SN  10414347 SP1090 EP1102 EPD  10901102 A1  Ana Granados, A1  Manuel Cebrián, A1  David Camacho, A1  Francisco de Borja Rodríguez, PY  2011 KW  Information distortion KW  data compression KW  normalized compression distance KW  clustering by compression KW  Kolmogorov complexity. VL  23 JA  IEEE Transactions on Knowledge and Data Engineering ER   
[1] R.L. Cilibrasi and P.M. Vitanyi, "The Google Similarity Distance," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 3, pp. 370383, Mar. 2007.
[2] X. Zhang, Y. Hao, X. Zhu, and M. Li, "Information Distance from a Question to an Answer," KDD '07: Proc. the 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 874883, 2007.
[3] D. Ravichandran and E. Hovy, "Learning Surface Text Patterns for a Question Answering System," Proc. 40th Ann. Meeting on Assoc. for Computational Linguistics, pp. 4147, 2001.
[4] X. Chen, B. Francia, M. Li, B. McKinnon, and A. Seker, "Shared Information and Program Plagiarism Detection," IEEE Trans. Information Theory, vol. 50, no. 7, pp. 15451551, July 2004.
[5] C. Ané and M. Sanderson, "Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories," Systematic Biology, vol. 54, no. 1, pp. 146157, 2005.
[6] H. Otu and K. Sayood, "A New Sequence Distance Measure for Phylogenetic Tree Construction," Bioinformatics, vol. 19, no. 16, pp. 21222130, 2003.
[7] A. Kocsor, A. KerteszFarkas, L. Kajan, and S. Pongor, "Application of CompressionBased Distance Measures to Protein Sequence Classification: A Methodological Study," Bioinformatics, vol. 22, no. 4, pp. 407412, 2006.
[8] N. Krasnogor and D. Pelta, "Measuring the Similarity of Protein Structures by Means of the Universal Similarity Metric," Bioinformatics, vol. 20, no. 7, pp. 10151021, 2004.
[9] H. Pao and J. Case, "Computing Entropy for Ortholog Detection," ICCI 2004: Proc. Int'l Conf. Computational Intelligence, 2004.
[10] D. Benedetto, E. Caglioti, and V. Loreto, "Language Trees and Zipping," Physical Rev. Letters, vol. 88, no. 48702, 2002.
[11] M. Cuturi and J. Vert, "The ContextTree Kernel for Strings," Neural Networks, vol. 18, no. 8, pp. 11111123, 2005.
[12] K. Emanuel, S. Ravela, E. Vivant, and C. Risi, "A Combined StatisticalDeterministic Approach of Hurricane Risk Assessment," Bull. of the Am. Meteorological Soc., vol. 87, no. 3, pp. 299314, 2006.
[13] T. Arbuckle, A. Balaban, D. Peters, and M. Lawford, "Software Documents: Comparison and Measurement," SEKE '07: Proc. 18th Int'l. Conf. Software Eng. and Knowledge Eng., 2007.
[14] E.B. Allen, T.M. Khoshgoftaar, and Y. Chen, "Measuring Coupling and Cohesion of Software Modules: An InformationTheory Approach," Proc. Seventh Int'l Software Metrics Symp., 2001.
[15] W.T. Scott, "A New Approach to Data Mining for Software Design," CSITeA '04: Proc. Int'l Conf. Computer Science, Software Eng., Information Technology, EBusiness, and Applications, 2004.
[16] R. Cilibrasi, P. Vitanyi, and R. de Wolf, "Algorithmic Clustering of Music," Proc. Fourth Int'l Conf. Web Delivering of Music (WEDELMUSIC '04), pp. 110117, 2004.
[17] A. Kraskov, H. Stoegbauer, R. Andrzejak, and P. Grassberger, "Hierarchical Clustering Using Mutual Information," Europhysics Letters, vol. 70, no. 2, pp. 278284, 2005.
[18] C. Santos, J. Bernardes, P. Vitanyi, and L. Antunes, "Clustering Fetal Heart Rate Tracings by Compression," CBMS '06: Proc. 19th IEEE Symp. ComputerBased Medical Systems, pp. 685690, 2006.
[19] D. Parry, "Use of Kolmogorov Distance Identification of Web Page Authorship, Topic and Domain," Proc. Workshop Open Source Web Information Retrieval, 2005.
[20] T. Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," technical report, Dortmund Univ., 1997.
[21] E. Leopold and J. Kindermann, "Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?," Machine Learning, vol. 46, nos. 13, pp. 423444, 2002.
[22] C. Faloutsos and V. Megalooikonomou, "On Data Mining, Compression, and Kolmogorov Complexity," Data Mining and Knowledge Discovery, vol. 15, no. 1, pp. 320, 2007.
[23] R. Martínez, M. Cebrián, F. de Borja Rodríguez, and D. Camacho, "Contextual Information Retrieval Based on Algorithmic Information Theory and Statistical Outlier Detection," Proc. IEEE Information Theory Workshop, 2007.
[24] D. Salomon, Data Compression: The Complete Reference. Springer, 2004.
[25] M. Li, X. Chen, X. Li, B. Ma, and P. Vitanyi, "The Similarity Metric," IEEE Trans. Information Theory, vol. 50, no. 12, pp. 32503264, Dec. 2004.
[26] R. Cilibrasi and P. Vitanyi, "Clustering by Compression," IEEE Trans. Information Theory, vol. 51, no. 4, pp. 15231545, Apr. 2005.
[27] J. Seward, BZIP2, http:/bzip.org/, 2011.
[28] I. Pavlov, LZMAX, http://www.7zip.orgsdk.html, 2011.
[29] C. Bloom, PPMZ, http:/www.cbloom.com, 2011.
[30] M. Cebrián, M. Alfonseca, and A. Ortega, "Common Pitfalls Using the Normalized Compression Distance: What to Watch Out for in a Compressor," Comm. Information and Systems, vol. 5, no. 4, pp. 367384, 2005.
[31] M. Cebrian, M. Alfonseca, and A. Ortega, "The Normalized Compression Distance is Resistant to Noise," IEEE Trans. Information Theory, vol. 53, no. 5, pp. 18951900, May 2007.
[32] S. Verdú and T. Weissman, "The Information Lost in Erasures," IEEE Trans. Information Theory, vol. 54, no. 11, pp. 50305058, Nov. 2008.
[33] S. Fong, D. Roussinov, and D. Skillicorn, "Detecting Word Substitutions in Text," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 8, pp. 10671076, Aug. 2008.
[34] A. Turing, "On Computable Numbers, with an Application to the Entscheidungsproblem," Proc. London Math. Soc., vol. 2, no. 42, pp. 230265, 1936.
[35] A. Kolmogorov, "Three Approaches to the Quantitative Definition of Information," Problems Information Transmission, vol. 1, no. 1, pp. 17, 1965.
[36] M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, second ed. SpringerVerlag, 1997.
[37] M. Sipser, Introduction to the Theory of Computation, second ed. PWS Publishing, 2006.
[38] R. Cilibrasi, A.L. Cruz, S. de Rooij, and M. Keijzer, CompLearn Toolkit, http:/www.complearn.org/, 2011.
[39] UCI Knowledge Discovery in Databases Archive, Information and Computer Science, Univ. of California, Irvine. http:/kdd.ics. uci.edu/, 2011.
[40] MedlinePlus Health Information, MedlinePlus Website, US Nat'l Library of Medicine and Nat'l Inst. of Health, http:/medlineplus.gov/, 2011.
[41] IMDB, Internet Movie Database, http:/www.imdb.com/, 2011.
[42] Y. Yang, "Noise Reduction in a Statistical Approach to Text Categorization," SIGIR: Proc. 18th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 256263, 1995.
[43] C. Van Rijsbergen, Information Retrieval. ButterworthHeinemann Newton, 1979.
[44] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. AddisonWesley Longman Publishing Co., 1989.
[45] W.J. Wilbur and K. Sirotkin, "The Automatic Identification of Stop Words," J. Information Science, vol. 18, no. 1, pp. 4555, 1992.
[46] British National Corpus (BNC), http:/www.natcorp.ox.ac.uk/, University of Oxford, 2010.
[47] M. Burrows and D.J. Wheeler, "A BlockSorting Lossless Data Compression Algorithm," Digital Systems Research Center Research Report, vol. 124, p. 24, 1994.
[48] D.A. Huffman, "A Method for the Construction of MinimumRedundancy Codes," Proc. Inst. of Radio Engineers, vol. 40, no. 9, pp. 10981101, 1952.
[49] A. Granados, M. Cebrián, D. Camacho, and F.B. Rodríguez, "Evaluating the Impact of Information Distortion on Normalized Compression Distance," Proc. Second Int'l Castle Meeting on Coding Theory and Applications (ICMCTA), A. Barbero, ed., pp. 6979, 2008.
[50] S. Consoli, K. DarbyDowman, G. Geleijnse, J. Korst, and S. Pauws, "Heuristic Approaches for the Quartet Method of Hierarchical Clustering," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 10, pp. 14281443, Oct. 2010.
[51] N. Tishby, F. Pereira, and W. Bialek, "The Information Bottleneck Method," Proc. 37th Ann. Allerton Conf. Comm., Control, and Computing, pp. 368377, 1999.
[52] N. Slonim and N. Tishby, "Document Clustering Using Word Clusters via the Information Bottleneck Method," Proc. 23rd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 208215, 2000.
[53] N. Slonim, N. Friedman, and N. Tishby, "Unsupervised Document Classification Using Sequential Information Maximization," SIGIR '02: Proc. 25th Ann. Int'l ACM SIGIR Conf. Research and development in Information Retrieval, pp. 129136, 2002.
[54] E. Keogh, S. Lonardi, and C.A. Ratanamahatana, "Towards ParameterFree Data Mining," KDD '04: Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 206215, 2004.
[55] R. BaezaYates and B. RibeiroNeto, Modern Information Retrieval. AddisonWesley, 1999.
[56] S. Kullback and R. Leibler, "On Information and Sufficiency," Annals of Math. Statistics, vol. 22, pp. 7986, 1951.
[57] S. Kullback, "The KullbackLeibler Distance," The Am. Statistician, vol. 41, pp. 340341, 1987.
[58] J.A. and M. Wong, "A KMeans Clustering Algorithm," Applied Statistics, vol. 28, no. 1, pp. 100108, 1979.
[59] K.R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Schölkopf, "An Introduction to KernelBased Learning Algorithms," IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 181201, Mar. 2001.