Information systems are using an increasing amount of unstructured information in the form of text. This situation has spawned a need to improve the text-mining technologies needed for information retrieval, filtering, and classification. This article compares some of the options available and how they can provide textual data-mining functionalities to software applications. In particular, the authors focus on Pimiento, a new object-oriented application framework for text mining. This framework allows developers to easily create distributed applications that use machine learning and statistical techniques to automatically process documents.
Index Terms:
text mining, computational linguistics, catgeorization, clustering, information extraction, software frameworks
Citation:
Juan Jos? Garc?a Adeva, Rafael A. Calvo, "Mining Text with Pimiento," IEEE Internet Computing, vol. 10, no. 4, pp. 27-35, July/Aug. 2006, doi:10.1109/MIC.2006.85