The case for a scalable coherence protocol for complex on-chip cache hierarchies in many core systems
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
Jiacheng Zhao , Inst. of Comput. Technol., Beijing, China
Xiaobing Feng , SKL Comput. Archit., Inst. of Comput. Technol., Beijing, China
Huimin Cui , SKL Comput. Archit., Inst. of Comput. Technol., Beijing, China
Youliang Yan , Shannon Lab., Huawei Technol. Co., Ltd., Shenzhen, China
Jingling Xue , Sch. of Comput. Sci. & Eng., Univ. of New South Wales, Sydney, NSW, Australia
Wensen Yang , Shannon Lab., Huawei Technol. Co., Ltd., Shenzhen, China
Despite their widespread adoption in cloud computing, multicore processors are heavily under-utilized in terms of computing resources. To avoid the potential for negative and unpredictable interference, co-location of a latency-sensitive application with others on the same multicore processor is disallowed, leaving many cores idle and causing low machine utilization. To enable co-location while providing QoS guarantees, it is challenging but important to predict performance interference between co-located applications. This research is driven by two key insights. First, the performance degradation of an application can be represented as a predictor function of the aggregate pressures on shared resources from all cores, regardless of which applications are co-running and what their individual pressures are. Second, a predictor function is piecewise rather than non-piecewise as in prior work, thereby enabling different types of dominant contention factors to be more accurately captured by different subfunctions in its different subdomains. Based on these insights, we propose to adopt a two-phase regression approach to efficiently building a predictor function. Validation using a large number of benchmarks and nine real-world datacenter applications on three different platforms shows that our approach is also precise, with an average error not exceeding 0.4%. When applied to the nine datacenter applications, our approach improves overall resource utilization from 50% to 88% at the cost of 10% QoS degradation.
Degradation, Bandwidth, Training, Interference, Aggregates, Predictive models, Abstracts
Jiacheng Zhao, Xiaobing Feng, Huimin Cui, Youliang Yan, Jingling Xue and Wensen Yang, "The case for a scalable coherence protocol for complex on-chip cache hierarchies in many core systems," Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques(PACT), Edinburgh, United Kingdom United Kingdom, 2013, pp. 201-212.