The Community for Technology Leaders
RSS Icon
Issue No.08 - Aug. (2013 vol.62)
pp: 1616-1628
Jie Tang , Beijing Institute of Technology, Beijing
Shaoshan Liu , Microsoft, Redmond
Chen Liu , Florida International University, Miami
Zhimin Gu , Beijing Institute of Technology, Beijing
Jean-Luc Gaudiot , University of California, Irvine, Irvine
Extensible Markup Language (XML) has become a widely adopted standard for data representation and exchange. However, its features also introduce significant overhead threatening the performance of modern applications. In this paper, we present a study of XML parsing and determine that memory-side data loading in the parsing stage incurs a significant performance overhead, as much as the computation does. Hence, we propose memory-side acceleration which incorporates of data prefetching techniques, and can be applied on top of computation-side acceleration to speed up the XML data parsing. To this end, we study here the impact of our proposed scheme on the performance and energy consumption and demonstrated how it is capable of improving performance by up to 20 percent as well as produce up to 12.77 percent of energy saving when implemented in 32-nm technology. In addition, we implement a prefetcher on an platform in an effort to evaluate its implementation feasibility in terms of area and energy overhead.
Prefetching, XML, Hardware, Acceleration, Data models, Field programmable gate arrays, hardware acceleration, XML parsing, prefetching
Jie Tang, Shaoshan Liu, Chen Liu, Zhimin Gu, Jean-Luc Gaudiot, "Acceleration of XML Parsing through Prefetching", IEEE Transactions on Computers, vol.62, no. 8, pp. 1616-1628, Aug. 2013, doi:10.1109/TC.2012.88
[1] K. Chiu, M. Govindaraju, and R. Bramley, "Investigating the Limits of Soap Performance for Scientific Computing," Proc. IEEE 11th Int'l Symp. High Performance Distributed Computing (HPDC-11), 2002.
[2] M.R. Head, M. Govindaraju, R. van Engelen, and W. Zhang, "Grid Scheduling and Protocols—Benchmarking xml Processors for Applications in Grid Web Services," Proc. ACM/IEEE Conf. Supercomputing (SC '06), p. 121, 2006.
[3] P. Apparao et al., "Architectural Characterization of an XML-Centric Commercial Server Workload," Proc. 33rd Int'l Conf. Parallel Processing, 2004.
[4] P. Apparao and M. Bhat, "A Detailed Look at the Characteristics of xml Parsing," Proc. First Workshop Building Block Engine Architectures for Computers and Networks (BEACON '04), 2004.
[5] M. Nicola and J. John, "XML Parsing: A Threat to Database Performance," Proc. 12th Int'l Conf. Information and Knowledge Management, 2003.
[6] Int'l HapMap Project: http:/, 2013.
[7] SAX Parsing Model: http:/, 2013.
[8] W3C, "Document Object Model (DOM) Level 2 Core Specification,", 2013.
[9] K. Chiu, T. Devadithya, W. Lu, and A. Slominski, "A Binary XML for Scientific Applications," Proc. First Int'l Conf. e-Science and Grid Computing, 2005.
[10] XimpleWare, "VTD-XML: The Future of XML Processing," http:/vtdxml., Accessed 10, Mar. 2007.
[11] W. Lu, K. Chiu, and Y. Pan, "A Parallel Approach to XML Parsing," Proc. IEEE/ACM Seventh Int'l Conf. Grid Computing, Sept. 2006.
[12] M.R. Head and M. Govindaraju, "Approaching a Parallelized XML Parser Optimized for Multi-Core Processor," Proc. Workshop Service-Oriented Computing Performance: Aspects, Issues, and Approaches (SOCP '07), June 2007.
[13] R.D. Cameron, K.S. Herdy, and D. Lin, "High Performance XML Parsing Using Parallel Bit Stream Technology," Proc. Conf. Center for Advanced Studies on Collaborative Research, Oct. 2008.
[14] L. Zhao and L. Bhuyan, "Performance Evaluation and Acceleration for XML Data Parsing," Proc. Ninth Workshop Computer Architecture Evaluation Using Commercial Workloads, 2006.
[15] J. Moscola and J.W. Lockwood, "Reconfigurable Content-Based Router Using Hardware-Accelerated Language Parser," ACM Trans. Design Automation of Electronic Systems, vol. 13, article 28, 2008.
[16] B. Nag, "Acceleration Techniques for XML Processors," Proc. XML Conf. and Exhibition, Nov. 2004.
[17] Z. Dai, N. Ni, and J. Zhu, "A 1 Cycle-per-Byte XML Parsing Accelerator," Proc. 18th Ann. ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays (FPGA '10), 2010.
[18] Apache Xerces: http://xerces.apache.orgindex.html, 2013.
[19] A. Jaleel, R.S. Cohn, C.K. Luk, and B. Jacob, "CMP$im: A Pin-Based on-the-Fly Multi-Core Cache Simulator," Proc. Fourth Ann. Workshop Modeling, Benchmarking and Simulation (MoBS), 2008.
[20] P. Shivakumar and N.P. Jouppi, "CACTI3.0: An Integrated Cache Timing, Power, and Area Model," WRL research report, 2001.
[21] Intel Vtune, /, 2013.
[22] XML Parsing Accelerator with Intel Streaming SIMD Extensions 4 (Intel SSE4), xml- parsing-accelerator-with-intel-streaming-simd-extensions-4-intel-sse4 /, Dec. 2008.
[23] A. Longshaw, "Scaling XML Parsing on Intel Architecture," Intel Software Network Resource Center, view537, Nov. 2008.
[24] Power vs. Performance: The 90 nm Inflection Point, solution_guidespower_ management.pdf , 2013.
[25] Windows Performance Analysis Tool, com/en-us/performance cc825801, 2013.
[26] Y. Ishii, M. Inaba, and K. Hiraki, "Access Map Pattern Matching Prefetch: Optimization Friendly Method," The First Int'l J. Instructional Level Parallelism Data Prefetching Championship, 2009.
[27] M. Dimitrov and H. Zhou, "Combining Local and Global History for High Performance Data Prefetching," The First Int'l J. Instructional Level Parallelism Data Prefetching Championship, 2009.
[28] L.M. Ramos, J.L. Briz, P.E. Ibáñez, and V. Viñals, "Multi-Level Adaptive Prefetching based on Performance Gradient Tracking," The First Int'l J. Instructional Level Parallelism Data Prefetching Championship, 2009.
[29] M. Ferdman, S. Somogyi, and B. Falsafi, "Spatial Memory Streaming with Rotated Patterns," The First Int'l J. Instructional Level Parallelism Data Prefetching Championship, 2009.
[30] M. Grannaes, M. Jahre, and L. Natvig, "Storage Efficient Hardware Prefetching Using Delta Correlating Prediction Tables," The First Int'l J. Instructional Level Parallelism Data Prefetching Championship, 2009.
[31] S. Verma, D.M. Koppelman, and L. Peng, "A Hybrid Adaptive Feedback Based Prefetcher," The First Int'l J. Instructional Level Parallelism Data Prefetching Championship, 2009.
[32] A. Sharif and H.H.S. Lee, "Data Prefetching Mechanism by Exploiting Global and Local Access Patterns," The First Int'l J. Instructional Level Parallelism Data Prefetching Championship (DPC-1), 2009.
[33] G. Liu, Z. Huang, J.K. Peri, X. Shi, and L. Peng, "Enhancement for Accurate Stream Prefetching," The First Int'l J. Instructional Level Parallelism Data Prefetching Championship (DPC-1), 2009.
[34] STAX Parsing Model:, 2013.
[35] D. Callahan, K. Kennedy, and A. Portereld, "Software Prefetching," Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 40-52, Apr. 1991.
[36] D.G. Perez, G. Mouchard, and O. Temam, "Microlib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms," Proc. Int'l Symp. Microarchitecture (MICRO), 2007.
[37] A.J. Smith, "Sequential Program Prefetching in Memory Hierarchies," IEEE Trans. Computers, vol. C-11, no. 12, pp. 7-21, Dec. 1978.
[38] J. Fu and J. Patel, "Stride Directed Prefetching in Scalar Processors," Proc. 25th Ann. Int'l Symp. Microarchitecture (MICRO 25), 1992.
[39] N. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," Proc. Int'l Symp. Computer Architectures (ISCA), 1990.
[40] S. Srinath and Y.N. Patt, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA), 2007.
[41] Xilinx Spartan III , 2013.
[42] Xilinx XPower logic_design/verificationxpower.htm , 2013.
[43] eMIPS: emipsdefault.aspx, 2013.
[44] W.Y. Chen, S.A. Mahlke, P.P. Chang, and W.W. Hwu, "Data Access Microarchitectures for Superscalar Processors with Compiler-Assisted Data Prefetching," Proc. 24th Ann. Int'l Symp. Microarchitecture (Microcomputing 24), 1991.
[45] A.C. Klaiber and H.M. Levy, "Architecture for Software- Controlled Data Prefetching," Proc. 18th Ann. Int'l Symp. Computer Architecture, pp. 43-63, May 1991.
[46] A.K. Porterfield, "Software Methods for Improvement of Cache Performance on Supercomputer Applications," PhD thesis, Dept. of Computer Science, Rice Univ., May 1989.
[47] Intel Labs, "SCC Platform Overview," Intel Many-Core Applications Research Community, Revision 0.75, Sept. 2010.
[48] J.L. Baer and T.F. Chen, "An Effective on-Chip PreloadingScheme to Reduce Data Access Penalty," Proc. Int'l Conf. Supercomputing (ICS), pp. 176-186, 1991.
[49] A. Lai, C. Fide, and B. Falsafi, "Dead-Block Correlating Prefetchers," Proc. 28th Int'l Symp. Computer Architecture (ISCA), pp. 144-154, 2001.
[50] M.J. Charney and A.P. Reeves, "Generalized Correlation-Based Hardware Prefetching," TR EECEG-95-1, School of Electrical Eng., Cornell Univ., Feb. 1995.
[51] A.J. Smith, "Cache Memories," Computing Surveys, vol. 14, no. 3, pp. 473-530, Sept. 1982.
88 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool