June 30, 2001 to July 4, 2001
S. Subramanya Sastry , University of Wisconsin-Madison
Rastislav Bodík , University of Wisconsin-Madison
James E. Smith , University of Wisconsin-Madison
Abstract: Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile data is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously. Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.
S. Subramanya Sastry, Rastislav Bodík, James E. Smith, "Rapid Profiling via Stratified Sampling", ISCA, 2001, Proceedings of 28th Annual International Symposium on Computer Architecture, Proceedings of 28th Annual International Symposium on Computer Architecture 2001, pp. 0278, doi:10.1109/ISCA.2001.937456