This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Framework for Analysis of Data Quality Research
August 1995 (vol. 7 no. 4)
pp. 623-640

Abstract—Organizational databases are pervaded with data of poor quality. However, there has not been an analysis of the data quality literature that provides an overall understanding of the state-of-art research in this area. Using an analogy between product manufacturing and data manufacturing, this paper develops a framework for analyzing data quality research, and uses it as the basis for organizing the data quality literature. This framework consists of seven elements: management responsibilities, operation and assurance costs, research and development, production, distribution, personnel management, and legal function. The analysis reveals that most research efforts focus on operation and assurance costs, research and development, and production of data products. Unexplored research topics and unresolved issues are identified and directions for future research provided.

[1] N. Agmon and N. Ahituv,“Assessing data reliability in an information system,” J. Management Information Systems, vol. 4, no. 2, pp. 34-44, 1987.
[2] N. Ahituv,“A systematic approach toward assessing the value of an information system,” MIS Quarterly, vol. 4, no. 4, pp. 61-75, 1980.
[3] T. Amer,A.D. Bailey,, and P. De,“A review of the computer information systems research related to accounting andauditing,” J. Information Systems, vol. 2, no. 1, pp. 3-28, 1987.
[4] S.E. Arnold,“Information manufacturing: The road to database quality,” Database, vol. 15, no. 5, pp. 32, 1992.
[5] J.E. Bailey and S.W. Pearson,“Development of a tool for measuring and analyzing computer usersatisfaction,” Management Science, vol. 29, no. 5, pp. 530-545, 1983.
[6] R. Bailey,Human Error Computer Systems.Englewood Cliffs, N.J.: Prentice Hall, 1983.
[7] D.P. Ballou and H.L. Pazer,“The impact of inspector fallibility on the inspection policy serial productionsystem,” Management Science, vol. 28, no. 4, pp. 387-399, 1982.
[8] D.P. Ballou and H.L. Pazer,“Modeling data and process quality multi-input, multi-output informationsystems,” Management Science, vol. 31, no. 2, pp. 150-162, 1985.
[9] D.P. Ballou and H.L. Pazer,“Cost/quality tradeoffs for control procedures information systems,” OMEGA: Int’l J. Management Science, vol. 15, no. 6, pp. 509-521, 1987.
[10] D.P. Ballou and H.L. Pazer,“A framework for the analysis of error conjunctive, multi-criteria, satisficing decisionprocesses,” J. Decision Sciences Inst., vol. 21, no. 4, pp. 752-770, 1990.
[11] D.P. Ballou and H.L. Pazer,“Designing information systems to optimize the accuracy-timeliness tradeoff,” Information Systems Research (forthcoming), 1995.
[12] D.P. Ballou and K.G. Tayi,“Methodology for allocating resources for data quality enhancement,” Comm. ACM, vol. 32, no. 3, pp. 320-329, 1989.
[13] D.P. Ballou,R.Y. Wang,H. Pazer,, and K.G. Tayi,Modeling Data Manufacturing Systems To Determine Data Product Quality, (No. TDQM-93-09). Cambridge, Mass.: Total Data Quality Management Research Program, MIT Sloan School of Management, 1993.
[14] C. Batini, M. Lenzerini, and S.B. Navathe, “A Comparative Analysis of Methodologies for Database Schema Integration,” ACM Computing Surveys, vol. 18, no. 2, pp. 323-364, Dec. 1986.
[15] P.A. Bernstein and N. Goodman, "Concurrency Control in Distributed Database Systems," ACM Computing Surveys, Vol. 13, No. 2, June 1981, pp. 185-221.
[16] B. Blaylock and L. Rees,“Cognitive style and the usefulness of information,” Decision Sciences, vol. 15, no. 1, pp. 74-91, 1984.
[17] G. Bodnar,“Reliability modeling of internal control systems,” Accounting Rev., vol. 50, no. 4, pp. 747-757, 1975.
[18] P. Bowen,“Managing data quality accounting information systems: A stochastic clearing systemapproach,” unpublished PhD dissertation, Univ. of Tennessee, 1993.
[19] M.L. Brodie,“Data quality information systems, information, and management,” vol. 3, pp. 245-258, 1980.
[20] W. Bulkeley,“Databases are plagued by reign of error,” Wall Street J., p. B6, May26, 1992.
[21] M. Bunge,Ontology I: The Furniture of the World, Treaties on Basic Philosophy, vol. 3. Boston, Mass.: D. Reidel Publishing, 1977.
[22] M. Bunge,Ontology II: A World of Systems. Treaties on Basic Philosophy, vol. 4. Boston, Mass.: D. Reidel Publishing, 1979.
[23] D. Burns and J. Loebbecke,“Internal control evaluation: How the computer can help,” J. Accountancy, vol. 140, no. 2, pp. 60-70, 1975.
[24] S. Chalton,“The draft directive on data protection: an overview and progress todate,” Int’l Computer Law Adviser, vol. 6, no. 1, pp. 6-12, 1991.
[25] P.P. Chen, “The Entity‐Relationship Model: Toward a Unified View of Data,” ACM Trans. Database Systems, Vol. 1, No. 1, Jan. 1976, pp. 9–36.
[26] P.S. Chen,The Entity-Relationship Approach, Information Technology Action:Trends and Perspectives, R.Y. Wang, ed. Englewood Cliffs, N.J.: Prentice Hall, 1993.
[27] E.F. Codd,“A relational model of data for large shared data banks,” Comm. ACM, vol. 13, no. 6, June 1970.
[28] E.F. Codd,“Relational database: A practical foundation for productivity,” 1981 ACM Turing Award Lecture, Comm. ACM, vol. 25, no. 2, pp. 109-117, 1982.
[29] R.B. Cooper,“Decision production-A step toward a theory of managerial informationrequirements,” Proc. Fourth Int’l Conf. on Information Systems, pp. 215-268,Houston, Tex., , 1983.
[30] P. Cronin,“Close the data quality gap through total data quality management,” MIT Management, June 1993.
[31] P.B. Crosby,Quality is Free.New York: McGraw-Hill, 1979.
[32] P.B. Crosby,Quality Without Tears.New York: McGraw-Hill, 1984.
[33] B.E. Cushing,“A mathematical approach to the analysis and design of internal controlsystems,” Accounting Rev., vol. 49, no. 1, pp. 24-41, 1974.
[34] C.J. Date, An Introduction to Database Systems, Volume 1.Reading, Mass.: Addison-Wesley, 1990.
[35] G.P.A. Delen and B.B. Rijsenbrij,“The specification, engineering, and measurement of information systemsquality,” J. Systems Software, vol. 17, no. 3, pp. 205-217, 1992.
[36] W.H. Delone and E.R. McLean,“Information systems success: The quest for the dependent variable,” Information Systems Research, vol. 3, no. 1, pp. 60-95, 1992.
[37] E.W. Deming,Out of the Crisis.Cambridge, Mass.: MIT Center for Advanced Eng. Study, 1986.
[38] D.E. Denning and P.J. Denning,“Data Security,” ACM Computing Surveys, vol. 11, no. 3, pp. 227-250, 1979.
[39] J.C. Emery,Organizational planning and control systems: Theory and technology.New York: Macmillan, 1969.
[40] A.V. Feigenbaum,Total Quality Control, Third edition. New York: McGraw-Hill, 1991.
[41] I.P. Fellegi and D. Holt,“A systematic approach to automatic edit and imputation,” J. Am. Statistical Assoc., vol. 71, no. 353, pp. 17-35, 1976.
[42] G. Feltham,“The value of information,” Accounting Rev., vol. 43, no. 4, pp. 684-696, 1968.
[43] K.T. Fields,H. Sami,, and G.E. Sumners,“Quantification of the auditor’s evaluation of internal control data basesystems,” J. Information Systems, vol. 1, no. 1, pp. 24-77, 1986.
[44] R.S. Garfinkel,A.S. Kunnathur,, and G.E. Liepens,“Optimal imputation of erroneous data: Categorical data, generaledits,” Operations Research, vol. 34, no. 5, pp. 744-751, 1986.
[45] Gartner,“Data pollution can choke business process reengineering,” Gartner Group Inside Industry Services, pp. 1, 1993.
[46] D.A. Garvin,“Quality on the line,” Harvard Business Rev., vol. 61, no. 5, pp. 65-75, 1983.
[47] S.M. Groomer and U.S. Murthy,“Continuous auditing of database applications: An embedded audit moduleapproach,” J. Information Systems, vol. 3, no. 2, pp. 53-69, 1989.
[48] D. Halloran et al., “Systems development quality control,” MIS Quarterly, vol. 2, no. 4, pp. 1-12, 1978.
[49] S. Hamilton and N. Chervany,“Evaluating information system effectiveness-Part I: Comparing evaluationapproaches,” MIS Quarterly, vol. 5, no. 3, pp. 55-69, 1981.
[50] S.S. Hamlen,“A chance constrained mixed integer programming model for internal controldesign,” Accounting Rev., vol. 55, no. 4, pp. 578-593, 1980.
[51] J.V. Hansen,“Audit considerations distributed processing systems,” Comm. ACM, vol. 26, no. 5, pp. 562-569, 1983.
[52] Y.U. Huh,F.R. Keller,T.C. Redman,, and A.R. Watkins,“Data Quality,” Information and Software Technology, vol. 32, no. 8, pp. 559-565, 1990.
[53] J. Iivari and E. Koskela,“The PIOCO model for information systems design,” MIS Quarterly, vol. 11, no. 3, pp. 401-419, 1987.
[54] ISO,ISO9000 Int’l Standards for Quality Management.Geneva: Int’l Organization for Standards, 1992.
[55] B. Ives and M. Olson,“User involvement and MIS success: A review of research,” Management Science, vol. 30, no. 5, pp. 586-603, 1984.
[56] B. Ives, M. Olson, and J. Baroudi, “The Measurement of User Information Satisfaction,” Comm. ACM, vol. 26, no. 10, pp. 785-793, 1983.
[57] Y. Jang,A.T. Ishii,, and R.Y. Wang,“A qualitative approach to automatic data quality judgment,” J. Organizational Computing (forthcoming), 1995.
[58] M. Janson,“Data quality: The Achilles heel of end-user computing,” Omega J. Management Science, vol. 16, no. 5, pp. 491-502, 1988.
[59] M.A. Jaro,“Current record linkage research,” Proc. Am. Statistical Assoc., pp. 140-143, 1985.
[60] J.R. Johnson,R.A. Leitch,, and J. Neter,“Characteristics of errors accounts receivable and inventory audits,” Accounting Rev., vol. 56, no. 2, pp. 270-293, 1981.
[61] R.A. Johnson and D.W. Wichern,Applied multivariate statistical analysis, Prentice Hall, 1988.
[62] J.W. Jones and R. McLeod Jr.,“The structure of executive information systems: An exploratoryanalysis,” Decision Sciences, vol. 17, pp. 220-249, 1986.
[63] J.M. Juran,Juran on Quality by Design: The New Steps for Planning Qualityinto Goods and Services.New York: Free Press,, 1992.
[64] K.K. Kim,“User satisfaction: A synthesis of three different perspectives,” J. Information Systems, vol. 4, no. 1, pp. 1-12, 1989.
[65] W. King and B.J. Epstein,“Assessing information system value: An experiment study,” Decision Sciences, vol. 14, no. 1, pp. 34-45, 1983.
[66] C.H. Kriebel,“Evaluating the quality of information systems,” Design, and Implementation of Computer Based Information Systems, N. Szysperski and E. Grochla, eds. Germantown: Sijthtoff and Noordhoff, 1979.
[67] A. Kumar and A. Segev, “Cost and Availability Tradeoffs in Replicated Data Concurrency Control,” ACM Trans. Database Systems, vol. 18, no. 1, pp. 102-131, Mar. 1993.
[68] D.F. Larcker and V.P. Lessig,“Perceived usefulness of information: A psychological examination,” Decision Sciences, vol. 11, no. 1, pp. 121-134, 1980.
[69] K.C. Laudon, “Data Quality and Due Process in Large Interorganizational Record Systems,” Comm. ACM, vol. 29, no. 1, pp. 4-11, 1986.
[70] G.E. Liepens,R.S. Garfinkel,, and A.S. Kunnathur,“Error localization for erroneous data: A survey,” TIMS/Studies the Management Science, vol. 19 pp. 205-219, 1982.
[71] G.E. Liepins,“Sound data are a sound investment,” Quality Progress, vol. 22, no. 9, pp. 61-64, 1989.
[72] G.E. Liepins and V.R.R. Uppuluri, eds., , Data Quality Control: Theory and Pragmatics. D.B. Owen, vol. 112. New York: Marcel Dekker, 1990.
[73] R.J.A. Little and P.J. Smith,“Editing and imputation for quantitative survey data,” J. Am.. Statistical Assoc., vol. 82, no. 397, pp. 56-68, 1987.
[74] G.E. Liepins and V.R.R. Uppuluri,Accuracy and Relevance and the Quality of Data, A.S. Loebl, ed., vol. 112. New York: Marcel Dekker, 1990.
[75] B.S. Maxwell,“Beyond’data validity’: Improving the quality of HRIS data,” Personnel, vol. 66, no. 4, pp. 48-58, 1989.
[76] J.A. McCall,P.K. Richards,, and G.F. Walters,Factors Software Quality, (No. F030602-76-C-0417). Electronic Systems Division and Rome Air Development Center, 1977.
[77] J.L. McCarthy,“Metadata management for large statistical databases,” Proc. Eighth Int’l Conf. on Very Large Databases, pp. 234-243,Mexico City, 1982.
[78] A.M. McGee,Total Data Quality Management, Zero Defect Data Capture, (No. TDQM-92-07). Cambridge, Mass.: Total Data Quality Management Research Program, MIT Sloan School of Management, 1992.
[79] P.G. McKeown,“Editing of continuous survey data,” SIAM J. Scientific and Statistical Computing, pp. 784-797, 1984.
[80] N. Melone,“A theoretical assessment of the user-satisfaction construct information systemsresearch,” Management Science, vol. 36, no. 1, pp. 598-613, 1990.
[81] H. Mendelson and A.N. Saharia, "Incomplete Information Costs and Database Design," ACM Trans. Database Systems, vol. 11, no. 2, pp. 159-185, June 1986.
[82] J. Miller and B.A. Doyle,“Measuring the effectiveness of computer-based information systems the financial servicessector,” MIS Quarterly, vol. 11, no. 1, pp. 107-124, 1987.
[83] K.I.J. Mollema,“Quality of information and EDP audit,” Informatie, vol. 33, nos. 7-8, pp. 482-485, 1991.
[84] R.C. Morey,“Estimating and improving the quality of information the MIS,” Comm. ACM, vol. 25, no. 5, pp. 337-342, 1982.
[85] D.R. Nichols,“A Model of auditor’s preliminary evaluations of internalcontrol from audit data,” The Accounting Rev., vol. 62, no. 1, pp. 183-190, 1987.
[86] R.C. Oman and T.B. Ayers,“Improving data quality,” J. Systems Management, vol. 39, no. 5, pp. 31-35, 1988.
[87] W. Page and P. Kaomea,“Using quality attributes to produce optimal tactical information,” Proc. Fourth Ann. Workshop on Information Technologies and Systems, pp. 145-154,Vancouver, B.C., Canada, 1994.
[88] D.B. Paradice and W.L. Fuerst,“An MIS data quality methodology based on optimal error detection,” J. Information Systems, vol. 5, no. 1, pp. 48-66, 1991.
[89] R.W. Pautke and T.C. Redman,“Techniques to control and improve quality of data large databases,” Proc. of Statistics Canada Symp. 90, pp. 319-333,Canada, 1990.
[90] M. Porter and V.E. Millar,“How information gives you competitive advantages,” Harvard Business Rev., vol. 63, no. 4, pp. 149-160, 1985.
[91] M.E. Porter,Competitive Advantage.New York: Free Press, 1985.
[92] T.C. Redman,Data Quality: Management and Technology.New York: Bantam Books, 1992.
[93] T.C. Redman,“Improve data quality for competitive advantage,” Sloan Management Rev., vol. 36, no. 2, pp. 99-109, 1995.
[94] B. Ronen and I. Spiegler,“Information as inventory: A new conceptual view,” Information and Management, vol. 21, pp. 239-247, 1991.
[95] J. Saraph,G. Benson,, and R. Schroeder,“An instrument for measuring the critical factors for qualitymanagement,” Decision Sciences, vol. 20, no. 4, pp. 810-829, 1989.
[96] N. Schneidewind,Standard for a Software Quality Metrics Methodology. Software Eng. Standards Subcommittee of the IEEE, 1990.
[97] J.C. Sparhawk Jr.,“How does the Fed data garden grow? By deeply sowing the seeds ofTQM.,” Government Computer News, Jan.18, 1993.
[98] J. Spirig,“Compensation: The up-front issues of payroll and HRIS interface,” Personnel J., vol. 66, no. 10, pp. 124-129, 1987.
[99] W.O. Stratton,“Accounting systems: The reliability approach to internalcontrol evaluation,” Decision Sciences, vol. 12, no. 1, pp. 51-67, 1981.
[100] D.M. Strong,“Design and evaluation of information handling processes,” PhD dissertation, Carnegie Mellon Univ., 1988.
[101] D.M. Strong,., Modeling Exception Handling and Quality Control Information Processes, No. WP92-36. Boston, Mass.: School of Management, Boston Univ., 1993.
[102] D.M. Strong,Y.W. Lee,, and R.Y. Wang,Beyond Accuracy: How Organizations are Redefining Data Quality. (No. TDQM-94-07). Cambridge, Mass.: Total Data Quality Management (TDQM) Research Program, MIT Sloan School Of Management, 1994.
[103] D.M. Strong and S.M. Miller, “Exceptions and Exception Handling in Computerized Information Processes,” ACM Trans. Office Information Systems, vol. 13, no. 2, pp. 206-233, 1995.
[104] M.I. Svanks,“Integrity analysis: Methods for automating data quality assurance,” EDP Auditors Foundation, vol. 30, no. 10, pp. 595-605, 1984.
[105] G. Taguchi,Introduction to Off-line Quality Control.Magaya, Japan: Central Japan Quality Control Assoc., 1979.
[106] A. Tansel et al. Temporal Databases: Theory, Design, and Implementation. Database Systems and Applications Series, Benjamin/Cummings, 1993.
[107] D. Te’eni,“Behavioral aspects of data production and their impact on dataquality,” J. Database Management, vol. 4, no. 2, pp. 30-38, 1993.
[108] T.J. Teorey, D. Yang, and J. Fry, "A Logical Design Methodology for Relational Databases Using the Extended Entity-Relationship Mode," ACM Computing Surveys, vol. 18, no. 2, 1986, pp. 197-222.
[109] J.W. Tukey,Exploratory Data Analysis.Reading, Mass.: Addison-Wesley, 1977.
[110] Y. Wand,“A proposal for a formal model of objects,” Object-Oriented Concepts, Databases, and Applications, W. Kim and F. Lochovsky, eds. New York: ACM Press, 1989.
[111] Y. Wand and R.Y. Wang, (1994), “Anchoring data quality dimensions ontological foundations,” Comm. ACM, forthcoming.
[112] Y. Wand and R. Weber,“A model of control and audit procedure change evoloving data processingsystems,” The Accounting Rev., vol. 64, no. 1, pp. 87-107, 1989.
[113] Y. Wand and R. Weber,“An ontological model of an information system,” IEEE Trans. Software Engineering, vol. 16, no. 11, pp. 1,282-1,292, 1990.
[114] R.Y. Wang and H.B. Kon,“Towards total data quality management,” Information Technology Action: Trends and Perspectives, R.Y. Wang, ed. Englewood Cliffs, N.J.: Prentice Hall, 1993.
[115] R.Y. Wang,H.B. Kon,, and S.E. Madnick,“Data quality requirements analysis and modeling,” Proc. Ninth Int’l Conf. on Data Engineering, pp. 670-677,Vienna, 1993.
[116] R.Y. Wang,M.P. Reddy,, and A. Gupta,“An object-oriented implementation of quality data products,” Proc. Third Ann. Workshop Information Technologies and Systems, pp. 48-56,Orlando, Fla., 1993.
[117] R.Y. Wang,M.P. Reddy,, and H.B. Kon,“Toward quality data: An attribute-based approach,” J. Decision Support Systems, (forthcoming), 1992.
[118] R.Y. Wang,D.M. Strong,, and L.M. Guarascio,Beyond Accuracy: What Data Quality Means to Data Consumers, (No. TDQM-94-10). Cambridge, Mass.: Total Data Quality Management Research Program, MIT Sloan School of Management, 1994.
[119] R.Y. Wang and S.E. Madnick, “A Polygon Model for Heterogeneous Database Systems: The Source Tagging Perspective,” Proc. 16th Int'l Conf. Very Large Databases, pp. 519-538, Brisbane, Australia, 1990.
[120] R. Weber,EDP Auditing: Conceptual Foundations and Practices, second edition, McGraw-Hill Series MIS, G.B. Davis, ed. New York: McGraw-Hill, 1988.
[121] B. Wright,“Authenticating EDI: The case for internal record keeping,” EDI Forum, pp. 82-84, 1992.
[122] S. Yu and J. Neter,“A stochastic model of the internal control system,” J. Accounting Research, vol. 1, no. 3, pp. 273-295, 1973.
[123] R. Zmud,“Concepts, theories, and techniques: An empirical investigation ofthe dimensionality of the concept of information,” Decision Sciences, vol. 9, no. 2, pp. 187-195, 1978.

Index Terms:
Data quality, data manufacturing, data product, Total Quality Management (TQM), ISO9000, information quality, data quality analysis, data quality practices.
Citation:
Richard Y. Wang, Veda C. Storey, Christopher P. Firth, "A Framework for Analysis of Data Quality Research," IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 4, pp. 623-640, Aug. 1995, doi:10.1109/69.404034
Usage of this product signifies your acceptance of the Terms of Use.