2009 IEEE 25th International Conference on Data Engineering (2009)
Mar. 29, 2009 to Apr. 2, 2009
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2009.81
Entity uncertainty is an unavoidable problem in modern enterprise databases, resulting from integration of data over multiple sources. In traditional warehousing, the administrator, during an ETL process, manually and laboriously resolves inconsistent data records to discover "true'' entities(customers, products, etc.) and identify their "correct'' attribute values. At any time point, however, the current entity resolution is merely a best guess, and OLAP query results based on this resolution are inherently imprecise. We propose a new approach that maintains the data in an unresolved state, and dynamically deals with entity uncertainty at query time. We enhance the traditional OLAP model to return not a single query answer, but rather upper and lower bounds on each OLAP aggregate. This approach avoids expensive entity-resolution processing, and serves to identify potential risks when making business decisions based on the results of OLAP queries. By focusing on bounds, rather than probability distributions, we can easily and efficiently process roll-up and group-by aggregation queries over all of the core aggregation functions. Moreover, our approach can be readily implemented in an existing RDBMS using SQL queries, and does not require the user to specify explicit probabilities for alternative entity resolutions. Experiments show that the overhead of our new OLAP functionality is small over a wide range of scenarios.
Y. Sismanis, P. J. Haas, B. Reinwald, L. Wang and A. Fuxman, "Resolution-Aware Query Answering for Business Intelligence," 2009 IEEE 25th International Conference on Data Engineering(ICDE), vol. 00, no. , pp. 976-987, 2009.