
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Nizar Bouguila, "A ModelBased Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 12, pp. 16491664, December, 2009.  
BibTex  x  
@article{ 10.1109/TKDE.2009.42, author = {Nizar Bouguila}, title = {A ModelBased Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {21}, number = {12}, issn = {10414347}, year = {2009}, pages = {16491664}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.42}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  A ModelBased Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity IS  12 SN  10414347 SP1649 EP1664 EPD  16491664 A1  Nizar Bouguila, PY  2009 KW  Discrete data KW  finite mixture models KW  multinomial KW  Dirichlet prior KW  feature weighting/selection KW  MAP KW  stochastic complexity KW  Fisher kernel KW  image databases KW  text clustering. VL  21 JA  IEEE Transactions on Knowledge and Data Engineering ER   
[1] G.J. McLachlan and D. Peel, Finite Mixture Models. Wiley, 2000.
[2] N. Littlestone, “Learning Quickly when Irrelevant Attributes Abound: A New LinearThreshold Algorithm,” Machine Learning, vol. 2, pp. 285318, 1988.
[3] D. Angluin and P. Laird, “Learning from Noisy Examples,” Machine Learning, vol. 2, pp. 343370, 1988.
[4] H. Almuallim and T.G. Dietterich, “Learning with Many Irrelevant Features,” Proc. Ninth Nat'l Conf. Artificial Intelligence (AAAI '91), pp. 547552, 1991.
[5] S.J. Raudys and A.K. Jain, “Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 3, pp.252264, Mar. 1991.
[6] C. Schaffer, “Selecting a Classification Method by CrossValidation,” Machine Learning, vol. 13, no. 1, pp. 135143, 1993.
[7] R. Kohavi and D. Sommerfield, “Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology,” Proc. First Int'l Conf. Knowledge Discovery and Data Mining (KDD '95), pp. 192197, 1995.
[8] A.L. Blum and P. Langley, “Selection of Relevant Features and Examples in Machine Learning,” Artificial Intelligence, vol. 97, pp.245271, 1997.
[9] S. Cost and S. Salzberg, “A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features,” Machine Learning, vol. 10, no. 1, pp. 5778, 1993.
[10] D. Wettschereck, D.W. Aha, and T. Mohri, “A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms,” Artificial Intelligence Rev., vol. 11, nos.15, pp. 273314, 1997.
[11] D.S. Modha and W.S. Spangler, “Feature Weighting in KMeans Clustering,” Machine Learning, vol. 52, pp. 217237, 2003.
[12] C.H. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,” Proc. 11th Int'l Conf. Machine Learning (ICML '94), pp. 121129, 1994.
[13] M. Dash and H. Liu, “Feature Selection for Classification,” Intelligent Data Analysis, vol. 1, no. 3, pp. 131156, 1997.
[14] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, pp. 273324, 1997.
[15] H. Liu and L. Yu, “Toward Integrating Feature Selection Algorithms for Classification and Selection,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 4, pp. 491502, Apr. 2005.
[16] A.K. Jain and D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153158, Feb. 1997.
[17] H. Liu, F. Hussain, C.L. Tan, and M. Dash, “Discretization: An Enabling Technique,” Data Mining and Knowledge Discovery, vol. 6, no. 4, pp. 393423, 2002.
[18] S. Katz, “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 35, no. 3, pp.400401, Mar. 1987.
[19] N. Friedman and Y. Singer, “Efficient Bayesian Parameter Estimation in Large Discrete Domains,” Proc. Conf. Neural Information Processing Systems (NIPS '99), pp. 417423, 1999.
[20] I.S. Dhillon and D.S. Modha, “Concept Decompositions for Large Sparse Text Data Using Clustering,” Machine Learning, vol. 42, nos.1/2, pp. 143175, 2001.
[21] M. Dash and H. Liu, “Feature Selection for Clustering,” Proc. Fourth PacificAsia Conf. Knowledge Discovery and Data Mining, Current Issues and New Applications (PAKDD '00), pp. 110121, 2000.
[22] J.R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, no. 1, pp. 81106, 1986.
[23] W.W. Church and P. Hanks, “Word Association Norms: Mutual Information and Lexicography,” Computational Linguistics, vol. 16, no. 1, pp. 2229, 1990.
[24] Y. Li, C. Luo, and S.M. Chung, “Text Clustering with Feature Selection by Using Statistical Data,” IEEE Trans. Knowledge and Data Eng., vol. 20, no. 5, pp. 641652, May 2008.
[25] I. Kononenko, “On Biases in Estimating MultiValued Attributes,” Proc. 14th Int'l Joint Conf. Artificial Intelligence (IJCAI '95), pp. 10341040, 1995.
[26] X. Wang and A. Kabán, “ModelBased Estimation of Word Saliency in Text,” Proc. Ninth Int'l Conf. Discovery Science, N.Lavrac, L. Todorovski, and K.P. Jantke, eds., pp. 279290, 2006.
[27] D. Koller and M. Sahami, “Hierachically Classifying Documents Using Very Few Words,” Proc. 14th Int'l Conf. Machine Learning (ICML '97), pp. 170178, 1997.
[28] Y. Yang and J.O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” Proc. Int'l Conf. Machine Learning (ICML '97), pp. 412420, 1997.
[29] D. Mladenić and M. Grobelnik, “Feature Selection for Unbalanced Class Distribution and Naive Bayes,” Proc. 16th Int'l Conf. Machine Learning (ICML '99), pp. 258267, 1999.
[30] M.A. Hall and G. Holmes, “Benchmarking Attribute Selection Techniques for Discrete Class Data Mining,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 6, pp. 14371447, Nov./Dec. 2003.
[31] A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M.W. Mahoney, “Feature Selection Methods for Text Classification,” Proc. 13th Int'l Conf. Knowledge Discovery and Data Mining (KDD '07), pp. 230239, 2007.
[32] S. Yu, K. Yu, V. Tresp, and H.P. Kriegel, “A Probabilistic ClusteringProjection Model for Discrete Data,” Proc. 11th European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '05), pp. 417428, 2005.
[33] M.H.C. Law, M.A.T. Figueiredo, and A.K. Jain, “Simultaneous Feature Selection and Clustering Using Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp.11541166, Sept. 2004.
[34] M.W. Graham and D.J. Miller, “Unsupervised Learning of Parsimonious Mixtures on Large Spaces with Integrated Feature and Component Selection,” IEEE Trans. Signal Processing, vol. 54, no. 4, pp. 12891303, Apr. 2006.
[35] S. Vaithyanathan and B. Dom, “Generalized Model Selection for Unsupervised Learning in High Dimensions,” Proc. Conf. Neural Information Processing Systems (NIPS '99), pp. 970976, 1999.
[36] J. Rissanen, Stochastic Complexity in Statistical Inquiry. World Scientific, 1989.
[37] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and Unsupervised Discretization of Continuous Features,” Proc. Int'l Conf. Machine Learning (ICML '95), pp. 194202, 1995.
[38] N. Friedman and M. Goldszmidt, “Discretizing Continuous Attributes while Learning Bayesian Networks,” Proc. Int'l Conf. Machine Learning (ICML '96), pp. 157165, 1996.
[39] P. Domingos and M. Pazzani, “On the Optimality of the Simple Bayesian Classifier under the ZeroOne Loss,” Machine Learning, vol. 29, nos. 2/3, pp. 103130, 1997.
[40] J. Novovicová, P. Pudil, and J. Kittler, “Divergence Based Feature Selection for Multimodal Class Densities,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 2, pp. 218223, Feb. 1996.
[41] P. Pudil, J. Novovicová, N. Choakjarernwanit, and J. Kittler, “Feature Selection Based on the Approximation of Class Densities by Finite Mixtures of Special Type,” Pattern Recognition, vol. 28, no. 9, pp. 13891398, 1995.
[42] I.H. Witten and T.C. Bell, “The ZeroFrequency Problem: Estimating the Probabilities of Novel Events in Adaptive Text Compression,” IEEE Trans. Information Theory, vol. 37, no. 4, pp.10851094, July 1991.
[43] N. Bouguila and D. Ziou, “Unsupervised Learning of a Finite Discrete Mixture: Applications to Texture Modeling and Image Databases Summarization,” J. Visual Comm. and Image Representation, vol. 18, no. 4, pp. 295309, 2007.
[44] T.L. Griffiths and J.B. Tenenbaum, “Using Vocabulary Knowledge in Bayesian Multinomial Estimation,” Proc. Conf. Neural Information Processing Systems (NIPS '01), pp. 13851392, 2001.
[45] R.E. Krichevsky and V.K. Trofimov, “The Performance of Universal Encoding,” IEEE Trans. Information Theory, vol. IT27, no. 2, pp. 199207, Mar. 1981.
[46] B.S. Clarke and A.R. Barron, “Jeffrey's Prior is Asymptotically Lease Favorable under Entropic Risk,” J. Statistical Planning and Inference, vol. 41, no. 1, pp. 3760, 1994.
[47] Y. Freund, “Predicting a Binary Sequence Almost as Well as the Optimal Biased Coin,” Proc. Ninth Ann. Conf. Computational Learning Theory (COLT '96), pp. 8998, 1996.
[48] Y. Freund, “Predicting a Binary Sequence Almost as Well as the Optimal Biased Coin,” Information and Computation, vol. 182, no. 1, pp. 7394, 2003.
[49] N. Bouguila and D. Ziou, “Unsupervised Selection of a Finite Dirichlet Mixture Model: An MMLBased Approach,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 8, pp. 9931009, Aug. 2006.
[50] D.M. Chickering and D. Heckerman, “Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables,” Machine Learning, vol. 29, pp. 181212, 1997.
[51] P. Kontkanen, P. Myllymaki, T. Silander, H. Tirri, and P. Grunwald, “On Predictive Distributions and Bayesian Networks,” Statistics and Computing, vol. 10, pp. 3954, 2000.
[52] J.J. Rissanen, “Fisher Information and Stochastic Complexity,” IEEE Trans. Information Theory, vol. 42, no. 1, pp. 4047, Jan. 1996.
[53] G. Schwarz, “Estimating the Dimension of a Model,” Annals of Statistics, vol. 16, pp. 461464, 1978.
[54] J.J. Rissanen, “Modeling by Shortest Data Description,” Automatica, vol. 14, pp. 445471, 1978.
[55] P. Cheeseman and J. Stutz, “Bayesian Classification (AutoClass): Theory and Results,” Advances in Knowledge Discovery and Data Mining, chapter 6, pp. 153180, AAAI Press, 1995.
[56] D.J.C. Mackay, “Choice of Basis for Laplace Approximation,” Machine Learning, vol. 33, no. 1, pp. 7786, 1998.
[57] P.M. Lee, Bayesian Statistics: An Introduction, third ed. Ar nold, 2004.
[58] J.O. Berger, Statistical Decision Theory and Bayesian Analysis. Springer, 1985.
[59] L. Rigouste, O. Cappé, and F. Yvon, “Inference and Evaluation of the Multinomial Mixture Model for Text Clustering,” Information Processing and Management, vol. 43, no. 5, pp.12601280, 2007.
[60] S. Boutemedjet, D. Ziou, and N. Bouguila, “Unsupervised Feature Selection for Accurate Recommendation of HighDimensional Image Data,” Proc. Conf. Neural Information Processing Systems (NIPS), pp. 177184, 2007.
[61] S. Boutemedjet, D. Ziou, and N. Bouguila, “A Graphical Model for Content Based Image Suggestion and Feature Selection,” Proc. 11th European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '07), pp. 3041, 2007.
[62] M. Szummer and R.W. Picard, “IndoorOutdoor Image Classification,” Proc. IEEE Int'l Workshop ContentBased Access of Image and Video Databases, in Conjunction with Int'l Conf. Computer Vision (ICCV '98), pp. 4251, 1998.
[63] A. Vailaya, A.K. Jain, and H.J. Zhang, “On Image Classification: City Images vs. Landscapes,” Pattern Recognition, vol. 31, no. 12, pp. 19211935, 1998.
[64] O. Chapelle, P. Haffner, and V.N. Vapnik, “Support Vector Machines for HistogramBased Image Classification,” IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 10551064, Sept. 1999.
[65] N. Bouguila and D. Ziou, “A Hybrid SEM Algorithm for HighDimensional Unsupervised Learning Using a Finite Generalized Dirichlet Mixture,” IEEE Trans. Image Processing, vol. 15, no. 9, pp.26572668, Sept. 2006.
[66] V.N. Vapnik, The Nature of Statistical Learning Theory, second ed. SpringerVerlag, 1999.
[67] T.S. Jaakkola and D. Haussler, “Exploiting Generative Models in Discriminative Classifiers,” Proc. Conf. Neural Information Processing Systems (NIPS '99), pp. 487493, 1999.
[68] T. Hofmann, “Learning the Similarity of Documents: An InformationGeometric Approach to Document Retrieval and Categorization,” Proc. Conf. Neural Information Processing Systems (NIPS '00), pp. 914920, 2000.
[69] C. Elkan, “Deriving TF IDF as a Fisher Kernel,” Proc. 12th Int'l Conf. String Processing and Information Retrieval (SPIRE '05), pp.295300, 2005.
[70] G. Csurka, C.R. Dance, L. Fan, J. Willamowski, and C. Bray, “Visual Categorization with Bags of Keypoints,” Proc. Workshop Statistical Learning in Computer Vision, Eighth European Conf. Computer Vision (ECCV '04), 2004.
[71] L. FeiFei and P. Perona, “A Bayesian Hierarchical Model for Learning Natural Scene Categories,” Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition (CVPR '05), pp. 524531, 2005.
[72] A. Oliva and A. Torralba, “Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope,” Int'l J. Computer Vision, vol. 42, no. 3, pp. 145175, 2001.
[73] D.G. Lowe, “Distinctive Image Features from Scale Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91110, 2004.
[74] C. Elkan, “Using the Triangle Inequality to Accelerate KMeans,” Proc. 20th Int'l Conf. Machine Learning (ICML '03), pp. 147153, 2003.
[75] C.D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
[76] T. Liu, S. Liu, Z. Chen, and W.Y. Ma, “An Evaluation on Feature Selection for Text Clustering,” Proc. Int'l Conf. Machine Learning (ICML '03), pp. 488495, 2003.
[77] R.E. Madsen, D. Kauchak, and C. Elkan, “Modeling Word Burstiness Using the Dirichlet Distribution,” Proc. Int'l Conf. Machine Learning (ICML '05), pp. 545552, 2005.
[78] C. Elkan, “Clustering Documents with an ExponentialFamily Approximation of the Dirichlet Compound Multinomial Distribution,” Proc. Int'l Conf. Machine Learning (ICML '06), pp. 289296, 2006.
[79] N. Bouguila, “Clustering of Count Data Using Generalized Dirichlet Multinomial Distributions,” IEEE Trans. Knowledge and Data Eng., vol. 20, no. 4, pp. 462474, Apr. 2008.