Combining Structured and Unstructured Information Sources for a Study of Data Quality: A Case Study of Zillow.Com
Jan. 4, 2011 to Jan. 7, 2011
Zillow is a web-based, leading real-estate information service in the US. We studied user-contributed facts in a sample of Zillow records. User-contributed information seems to improve the completeness and the level of detail of the information on Zillow.com. However, the accuracy of user-contributed facts may not be high. An investigation of the sources of error revealed several weaknesses, including conceptual challenges, information integration failures, and design deficiencies. A lack of shared, user-friendly, conceptual foundation has been found to be a significant drawback. In part, errors are a product of Zillow's wide geographic coverage and highly networked operation. In addition, important peculiarities of a property are often unknown to the public. Information about such peculiarities is typically shared by a small group of people, whose levels of expertise and stakes in that property, and in real estate in general, may differ. This environment poses a challenge for harnessing the collective intelligence. The results demonstrate the success of our unique evaluation strategy, which utilizes a systematic review of a rich set of online sources. A similar strategy may also be useful for large-scale error detection and correction, if an efficient automated equivalent is developed to implement it.
Irit Askira Gelman, Ningning Wu, "Combining Structured and Unstructured Information Sources for a Study of Data Quality: A Case Study of Zillow.Com", HICSS, 2011, 2011 44th Hawaii International Conference on System Sciences (HICSS 2011), 2011 44th Hawaii International Conference on System Sciences (HICSS 2011) 2011, pp. 1-12, doi:10.1109/HICSS.2011.115