Low-overhead, high-speed multi-core barrier synchronization

John Sartori; Rakesh Kumar

doi:10.1007/978-3-642-11515-8_4

Low-overhead, high-speed multi-core barrier synchronization

John Sartori, Rakesh Kumar

Electrical and Computer Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

22 Scopus citations

Abstract

Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.

Original language	English (US)
Title of host publication	High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings
Pages	18-34
Number of pages	17
DOIs	https://doi.org/10.1007/978-3-642-11515-8_4
State	Published - 2010
Event	5th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2010 - Pisa, Italy Duration: Jan 25 2010 → Jan 27 2010

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	5952 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	5th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2010
Country/Territory	Italy
City	Pisa
Period	1/25/10 → 1/27/10

Access

10.1007/978-3-642-11515-8_4

OpenUrl availability

Full text

Cite this

Sartori, J., & Kumar, R. (2010). Low-overhead, high-speed multi-core barrier synchronization. In High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings (pp. 18-34). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5952 LNCS). https://doi.org/10.1007/978-3-642-11515-8_4

Low-overhead, high-speed multi-core barrier synchronization. / Sartori, John; Kumar, Rakesh.
High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings. 2010. p. 18-34 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5952 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Sartori, J & Kumar, R 2010, Low-overhead, high-speed multi-core barrier synchronization. in High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5952 LNCS, pp. 18-34, 5th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2010, Pisa, Italy, 1/25/10. https://doi.org/10.1007/978-3-642-11515-8_4

Sartori J, Kumar R. Low-overhead, high-speed multi-core barrier synchronization. In High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings. 2010. p. 18-34. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-11515-8_4

@inproceedings{159d155a28cb4631abb21217bc308ef3,

title = "Low-overhead, high-speed multi-core barrier synchronization",

abstract = "Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.",

author = "John Sartori and Rakesh Kumar",

year = "2010",

doi = "10.1007/978-3-642-11515-8_4",

language = "English (US)",

isbn = "3642115144",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "18--34",

booktitle = "High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings",

note = "5th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2010 ; Conference date: 25-01-2010 Through 27-01-2010",

}

TY - GEN

T1 - Low-overhead, high-speed multi-core barrier synchronization

AU - Sartori, John

AU - Kumar, Rakesh

PY - 2010

Y1 - 2010

N2 - Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.

AB - Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.

UR - http://www.scopus.com/inward/record.url?scp=77949600101&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77949600101&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-11515-8_4

DO - 10.1007/978-3-642-11515-8_4

M3 - Conference contribution

AN - SCOPUS:77949600101

SN - 3642115144

SN - 9783642115141

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 18

EP - 34

BT - High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings

T2 - 5th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2010

Y2 - 25 January 2010 through 27 January 2010

ER -

Low-overhead, high-speed multi-core barrier synchronization

Abstract

Publication series

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this