Issue No. 05 - October (1996 vol. 11)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/64.539013
<p>THE RAPIDLY EMERGING FIELD OF knowledge discovery in databases (KDD) has grown significantly in the past few years. This growth is driven by a mix of daunting practical needs and strong research interest. The technology for computing and storage has enabled people to collect and store information from a wide range of sources at rates that were, only a few years ago, considered unimaginable. Although modern database technology enables economical storage of these large streams of data, we do not yet have the technology to help us analyze, understand, or even visualize this stored data.</p> <p>Examples of this phenomenon abound in a wide spectrum of fields: finance, banking, retail sales, manufacturing, monitoring and diagnosis (be it of humans or machines), health care, marketing, and science data acquisition, among others. In science, modern instruments can easily measure and collect terabytes (1012 bytes) of data. For example, NASA's Earth Observing System is expected to return data at rates of several gigabytes per hour by the end of the century. Quite appropriately, the problem of how to put the torrent of data to use in analysis is often called "drinking from the fire hose." What we mean by analysis is not well-defined because it is highly context- and goal-dependent. However, as I argue, it typically transcends by far anything achievable via simple queries, simple string matching, or mechanisms for displaying the data.</p> <p>Prolific sources of data are not restricted to esoteric endeavors involving spacecraft or sophisticated scientific instruments. Imagine a database receiving transactions from common daily activities such as supermarket or department store checkout-register sales, or credit card charges. Or think of the information reaching your home television set as a stream of signals that, to be properly managed, need to be cataloged and indexed, and perhaps searched for interesting content at a higher level--channels, programs, genre, or mood, for example. The explosion in the number of resources available on the global computer network--the World Wide Web--is another challenge for indexing and searching through a continually changing and growing "database." </p>
U. M. Fayyad, "Data Mining and Knowledge Discovery: Making Sense Out of Data," in IEEE Intelligent Systems, vol. 11, no. , pp. 20-25, 1996.