This Article 
 Bibliographic References 
 Add to: 
Generating Compact Redundancy-Free XML Documents from Conceptual-Model Hypergraphs
August 2006 (vol. 18 no. 8)
pp. 1082-1096
As XML data becomes more and more prevalent and as larger quantities of data find their way into XML documents, the need for quality XML data organization will only increase. One standard way of structuring data well is to reduce and, if possible, eliminate redundancy, while at the same time making the storage structures as compact as possible. In this paper, we present a methodology to generate XML storage structures where conforming XML documents are redundancy-free, and for most practical cases, are also fully compact. Our methodology assumes the input is a conceptual-model hypergraph. For the special case that every edge in the hypergraph is binary, we present a simple algorithm, guaranteed to always generate redundancy-free storage structures. We show, however, that generating a minimum number of redundancy-free storage structures is NP-hard. We therefore provide heuristics to guide the process and observe that these heuristics result in satisfactory solutions, which are often optimal. We then present a general algorithm for n{\hbox{-}}\rm ary edges and show that it generates redundancy-free storage structures. The general algorithm must overcome several problems that do not arise in the special case.

[1] M. Arenas and L. Libkin, “A Normal Form for XML Documents,” ACM Trans. Database Systems, vol. 29, no. 1, pp. 195-232, Mar. 2004.
[2] C. Batinni, S. Ceri, and S.B. Navathe, Conceptual Database Design: An Entity-Relationship Approach. Redwood City, Calif.: The Benjamin/Cummings Publishing Company, 1992.
[3] C. Beeri, R. Fagin, D. Maier, and M. Yannakakis, “On the Desirability of Acyclic Database Schemes,” J. ACM, vol. 30, no. 3, pp. 479-513, 1983.
[4] L. Bird, A. Goodchild, and T. Halpin, “Object Role Modelling and XML-Schema,” Proc. 19th Int'l Conf. Conceptual Modeling (ER '00), pp. 309-322, Oct. 2000.
[5] P.P. Chen, “The Entity-Relationship Model— Toward a Unified View of Data,” ACM Trans. Database Systems, vol. 1, no. 1, Mar. 1976.
[6] Y. Chen, S. Davidson, and Y. Zheng, “RRXS: Redundancy Reducing XML Storage in Relations,” Proc. 29th Int'l Conf. Very Large Databases (VLDB '03), pp. 189-200, Sept. 2003.
[7] D.W. Embley, Object Database Development: Concepts and Principles. Addison-Wesley, 1998.
[8] D.W. Embley, B.D. Kurtz, and S.N. Woodfield, Object-Oriented Systems Analysis: A Model-Driven Approach. Prentice Hall, 1992.
[9] D.W. Embley and T.W. Ling, “Synergistic Database Design with an Extended Entity-Relationship Model,” Proc. Eighth Int'l Conf. Entity-Relationship Approach (ER '89), pp. 118-135, Oct. 1989.
[10] D.W. Embley and W.Y. Mok, “Developing XML Documents with Guaranteed `Good' Properties,” Proc. 20th Int'l Conf. Conceptual Modeling (ER '01), pp. 426-441, Nov. 2001.
[11] D.W. Embley and W.Y. Mok, “Producing XML Documents with Guaranteed ‘Good’ Properties,” Proc. Seventh World Multi-Conf. Systemics, Cybernetics and Informatics (SCI 2003), vol. 9, pp. 195-198, July 2003.
[12] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: W.H. Freeman and Company, 1979.
[13] T. Halp, Conceptual Schema & Relational Database Design, second ed. Sydney, Australia: Prentice Hall of Australia Pty. Ltd., 1995.
[14] W. Kent, “Consequences of Assuming A Universal Relation,” ACM Trans. Database Systems, vol. 6, no. 4, pp. 539-556, Dec. 1981.
[15] T.W. Ling, “A Normal Form for Entity-Relationship Diagrams,” Proc. Fourth Int'l Conf. Entity-Relationship Approach (ER '85), pp. 24-35, Oct. 1985.
[16] D. Maier, The Theory of Relational Databases. Rockville, Md.: Computer Science Press, 1983.
[17] W.Y. Mok, “A Comparative Study of Various Nested Normal Forms,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 2, pp. 369-385, Mar./Apr. 2002.
[18] W.Y. Mok, Y-K. Ng, and D.W. Embley, “A Normal Form for Precisely Characterizing Redundancy in Nested Relations,” ACM Trans. Database Systems, vol. 21, no. 1, pp. 77-106, Mar. 1996.
[19] Z.M. Özsoyoglu and L.-Y. Yuan, “A New Normal Form for Nested Relations,” ACM Trans. Database Systems, vol. 12, no. 1, pp. 111-136, 1987.
[20] Y. Sagiv, “A Characterization of Globally Consistent Databases and Their Correct Access Paths,” Trans. Database Systems, vol. 8, no. 2, pp. 266-286, 1983.
[21] T.J. Teorey, D. Yang, and J.P. Fry, “A Logical Design Methodology for Relational Databases Using the Extended Entity-Relationship Model,” ACM Computing Surveys, vol. 18, no. 2, pp. 197-222, June 1986.
[22] M.W. Vincent and J. Liu, “Multivalued Dependencies and a 4NF for XML,” Proc. 15th Int'l Conf. Advanced Information Systems Eng. (CAiSE '03), pp. 14-29, June 2003.
[23] M.W. Vincent, J. Liu, and C. Liu, “Strong Functional Dependencies and Their Application to Normal Forms in XML,” ACM Trans. Database Systems, vol. 29, no. 3, pp. 445-462, Sept. 2004.

Index Terms:
XML data redundancy, compact XML storage structures, XML scheme generation.
Wai Yin Mok, David W. Embley, "Generating Compact Redundancy-Free XML Documents from Conceptual-Model Hypergraphs," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 8, pp. 1082-1096, Aug. 2006, doi:10.1109/TKDE.2006.125
Usage of this product signifies your acceptance of the Terms of Use.