The Community for Technology Leaders
2013 IEEE 29th International Conference on Data Engineering (ICDE) (2002)
San Jose, California
Feb. 26, 2002 to Mar. 1, 2002
ISBN: 0-7695-1531-2
pp: 0583
Raymond T. Ng , The University of British Columbia
Carson Kai-Sang Leung , The University of British Columbia
Heikki Mannila , University of Helsinki
ABSTRACT
Computing the frequency of a pattern is one of the key operations in data mining algorithms. We describe a simple yet powerful way of speeding up any form of frequency counting satisfying the monotonicity condition. Our method, the optimized segment support map (OSSM), is a light-weight structure which partitions the collection of transactions into m segments, so as to reduce the number of candidate patterns that require frequency counting. We study the following problems: (1) What is the optimal number of segments to be used; and (2) Given a user-determined m, what is the best segmentation/composition of the m segments? For Problem 1, we provide a thorough analysis and a theorem establishing the minimum value of m for which there is no accuracy lost in using the OSSM. For Problem 2, we develop various algorithms and heuristics, which efficiently generate OSSMs that are compact and effective, to help facilitate segmentation.
INDEX TERMS
Data mining, frequent patterns, support counting, data structure, performance analysis
CITATION
Raymond T. Ng, Carson Kai-Sang Leung, Heikki Mannila, "OSSM: A Segmentation Approach to Optimize Frequency Counting", 2013 IEEE 29th International Conference on Data Engineering (ICDE), vol. 00, no. , pp. 0583, 2002, doi:10.1109/ICDE.2002.994776
78 ms
(Ver 3.3 (11022016))