TY - GEN
T1 - Low-overhead, high-speed multi-core barrier synchronization
AU - Sartori, John
AU - Kumar, Rakesh
PY - 2010
Y1 - 2010
N2 - Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.
AB - Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.
UR - http://www.scopus.com/inward/record.url?scp=77949600101&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77949600101&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-11515-8_4
DO - 10.1007/978-3-642-11515-8_4
M3 - Conference contribution
AN - SCOPUS:77949600101
SN - 3642115144
SN - 9783642115141
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 18
EP - 34
BT - High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings
T2 - 5th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2010
Y2 - 25 January 2010 through 27 January 2010
ER -