This Article 
 Bibliographic References 
 Add to: 
Cache-Conscious Automata for XML Filtering
December 2006 (vol. 18 no. 12)
pp. 1629-1644
Hardware cache behavior is an important factor in the performance of memory-resident, data-intensive systems such as XML filtering engines. A key data structure in several recent XML filters is the automaton, which is used to represent the long-running XML queries in the main memory. In this paper, we study the cache performance of automaton-based XML filtering through analytical modeling and system measurement. Furthermore, we propose a cache-conscious automaton organization technique, called the hot buffer, to improve the locality of automaton state transitions. Our results show that 1) our cache performance model for XML filtering automata is highly accurate and 2) the hot buffer improves the cache performance as well as the overall performance of automaton-based XML filtering.

[1] SAX: Simple API for XML, http:/, 2005.
[2] A. Ailamaki, D.J. DeWitt, M.D. Hill, and M. Skounakis, “Weaving Relations for Cache Performance,” Proc. 27th Int'l Conf. Very Large Data Bases, 2001.
[3] M. Altinel and M.J. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,” The VLDB J., pp. 53-64, 2000.
[4] R. Berrendorf, H. Ziegler, and B. Mohr, “PCL: Performance Counter Library,”, 2005.
[5] A.F. Cardenas, “Analysis and Performance of Inverted Data Base Structures,” Comm. ACM, vol. 18, no. 5, pp. 253-263, 1975.
[6] C.-Y. Chan, P. Felber, M. Garofalakis, and R. Rastogi, “Efficient Filtering of XML Documents with XPath Expressions,” VLDB J., special issue on XML, vol. 11, no. 4, 2002.
[7] S. Chatterjee, V.V. Jain, A.R. Lebeck, S. Mundhra, and M. Thottethodi, “Nonlinear Array Layouts for Hierarchical Memory Systems,” Proc. 13th Int'l Conf. Super Computing, 1999.
[8] S. Chen, A. Ailamaki, P.B. Gibbons, and T.C. Mowry, “Improving Hash Join Performance through Prefetching,” Proc. Int'l Conf. Data Eng., 2004.
[9] S. Chen, P.B. Gibbons, T.C. Mowry, and G. Valentin, “Fractal Prefetching B+-Trees: Optimizing Both Cache and Disk Performance,” Proc. ACM SIGMOD Conf., 2001.
[10] Z. Chen, H. Jagadish, F. Korn, N. Koudas, R. Ng, S. Muthukrishnan, and D. Srivastava, “Counting Twigs in a Tree,” Proc. Int'l Conf. Data Eng., 2001.
[11] “XML Path Language (XPath)-Version 1.0,” W3C Recommendation, J. Clark and S. DeRose, eds.,, 1999.
[12] Y. Diao, M. Altinel, M.J. Franklin, H. Zhang, and P. Fischer, “Path Sharing and Predicate Evaluation for High-Performance XML Filterin,” ACM Trans. Database Systems, Dec. 2003.
[13] Y. Diao, P. Fischer, M.J. Franklin, and R. To, “YFilter: Efficient and Scalable Filtering of XML Documents,” Proc. Int'l Conf. Data Eng., 2002.
[14] T.J. Green, G. Miklau, M. Onizuka, and D. Suciu, “Processing XML Streams With Deterministic Automata,” Proc. Int'l Conf. Database Theory, 2002.
[15] R.A. Hankins and J.M. Patel, “Data Morphing: An Adaptive and Cache-Conscious Storage Technique,” Proc. 29th Int'l Conf. Very Large Data Bases, 2003.
[16] B. He, Q. Luo, and B. Choi, “Cache-Conscious Automata for XML Filtering,” Proc. Int'l Conf. Data Eng., 2005.
[17] M.D. Hill and A.J. Smith, “Evaluating Associativity in CPU Caches,” IEEE Trans. Computers, vol. 38, no. 12, pp. 1612-1630, Dec. 1989.
[18] G.A. Kiraz, “Compressed Storage of Sparse Finite-State Transducers,” Proc. Workshop Implementing Automata, 1999.
[19] N. Klarlund and T. Rauhe, “BDD Algorithms and Cache Misses,” Technical Report, BRICS Report Series RS-96-5. Univ. of Aarhus, 1996.
[20] S. Manegold, “The Calibrator (v0.9e), a Cache-Memory and TLB Calibration Tool,”, 2005.
[21] S. Manegold, P. Boncz, and M. Kersten, “Generic Database Cost Models for Hierarchical Memory Systems,” Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 191-202, 2002.
[22] V.L. Maout, ASTL: Automaton Standard Template Library, http://www-igm.univ-mlv.frlemaout/, 2005.
[23] J. Rao and K.A. Ross, “Cache Conscious Indexing for Decision-Support in Main Memory,” Proc. 25th Int'l Conf. Very Large Data Bases, 1999.
[24] J. Rao and K.A. Ross, “Making B+ Trees Cache Conscious in Main Memory,” Proc. ACM SIGMOD Conf., 2000.
[25] A. Shatdal, C. Kant, and J.F. Naughton, “Cache Conscious Algorithms for Relational Query Processing,” Proc. 20th Int'l Conf. Very Large Data Bases, 1994.
[26] B.W. Watson, “Practical Optimizations for Automata,” Proc. Second Int'l Workshop Implementating Automata, 1997.
[27] D. Wood, Theory of Computation, New York: John Wiley, 1987.
[28] J. Zhou and K.A. Ross, “Buffering Access to Memory-Resident Index Structure,” Proc. Int'l Conf. Very Large Data Bases, 2003.

Index Terms:
Cache-conscious, automata, XML filtering, query processing, cache behavior model, buffer.
Bingsheng He, Qiong Luo, Byron Choi, "Cache-Conscious Automata for XML Filtering," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 12, pp. 1629-1644, Dec. 2006, doi:10.1109/TKDE.2006.184
Usage of this product signifies your acceptance of the Terms of Use.