Efficient control and communication paradigms for coarse-grained spatial architectures

Michael Pellauer; Angshuman Parashar; Michael Adler; Bushra Ahsan; Randy Allmon; Neal Crago; Kermin Fleming; Mohit Gambhir; Aamer Jaleel; Tushar Krishna; Daniel Lustig; Stephen Maresh; Vladimir Pavlov; Rachid Rayess; Antonia Zhai; Joel Emer

doi:10.1145/2754930

Efficient control and communication paradigms for coarse-grained spatial architectures

Michael Pellauer, Angshuman Parashar, Michael Adler, Bushra Ahsan, Randy Allmon, Neal Crago, Kermin Fleming, Mohit Gambhir, Aamer Jaleel, Tushar Krishna, Daniel Lustig, Stephen Maresh, Vladimir Pavlov, Rachid Rayess, Antonia Zhai, Joel Emer

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

14 Scopus citations

Abstract

There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to efficiently control each processing element (PE) in the system, and how to facilitate inter-PE communication without the overheads of traditional shared-memory coherent memory. In this article, we explore solving these problems using triggered instructions and latency-insensitive channels. Triggered instructions completely eliminate the program counter (PC) and allow programs to transition concisely between states without explicit branch instructions. Latency-insensitive channels allow efficient communication of inter-PE control information while simultaneously enabling flexible code placement and improving tolerance for variable events such as cache accesses. Together, these approaches provide a unified mechanism to avoid overserialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading. Our analysis shows that a spatial accelerator using triggered instructions and latency-insensitive channels can achieve 8ï¿½ greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64%, respectively, over a PC-style baseline, increasing the performance of the spatial programming approach by 2.0ï¿½.

Original language	English (US)
Article number	10
Journal	ACM Transactions on Computer Systems
Volume	33
Issue number	3
DOIs	https://doi.org/10.1145/2754930
State	Published - Aug 1 2015

Bibliographical note

Publisher Copyright:
ï¿½ 2015 ACM.

Keywords

Reconfigurable accelerators
Spatial programming

Access

10.1145/2754930

OpenUrl availability

Full text

Cite this

Pellauer, M., Parashar, A., Adler, M., Ahsan, B., Allmon, R., Crago, N., Fleming, K., Gambhir, M., Jaleel, A., Krishna, T., Lustig, D., Maresh, S., Pavlov, V., Rayess, R., Zhai, A., & Emer, J. (2015). Efficient control and communication paradigms for coarse-grained spatial architectures. ACM Transactions on Computer Systems, 33(3), Article 10. https://doi.org/10.1145/2754930

@article{b827303cf2b84eea9742d1f9c252c99a,

title = "Efficient control and communication paradigms for coarse-grained spatial architectures",

abstract = "There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to efficiently control each processing element (PE) in the system, and how to facilitate inter-PE communication without the overheads of traditional shared-memory coherent memory. In this article, we explore solving these problems using triggered instructions and latency-insensitive channels. Triggered instructions completely eliminate the program counter (PC) and allow programs to transition concisely between states without explicit branch instructions. Latency-insensitive channels allow efficient communication of inter-PE control information while simultaneously enabling flexible code placement and improving tolerance for variable events such as cache accesses. Together, these approaches provide a unified mechanism to avoid overserialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading. Our analysis shows that a spatial accelerator using triggered instructions and latency-insensitive channels can achieve 8{\"i}¿½ greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64%, respectively, over a PC-style baseline, increasing the performance of the spatial programming approach by 2.0{\"i}¿½.",

keywords = "Reconfigurable accelerators, Spatial programming",

author = "Michael Pellauer and Angshuman Parashar and Michael Adler and Bushra Ahsan and Randy Allmon and Neal Crago and Kermin Fleming and Mohit Gambhir and Aamer Jaleel and Tushar Krishna and Daniel Lustig and Stephen Maresh and Vladimir Pavlov and Rachid Rayess and Antonia Zhai and Joel Emer",

note = "Publisher Copyright: {\"i}¿½ 2015 ACM.",

year = "2015",

month = aug,

day = "1",

doi = "10.1145/2754930",

language = "English (US)",

volume = "33",

journal = "ACM Transactions on Computer Systems",

issn = "0734-2071",

publisher = "Association for Computing Machinery (ACM)",

number = "3",

}

TY - JOUR

T1 - Efficient control and communication paradigms for coarse-grained spatial architectures

AU - Pellauer, Michael

AU - Parashar, Angshuman

AU - Adler, Michael

AU - Ahsan, Bushra

AU - Allmon, Randy

AU - Crago, Neal

AU - Fleming, Kermin

AU - Gambhir, Mohit

AU - Jaleel, Aamer

AU - Krishna, Tushar

AU - Lustig, Daniel

AU - Maresh, Stephen

AU - Pavlov, Vladimir

AU - Rayess, Rachid

AU - Zhai, Antonia

AU - Emer, Joel

PY - 2015/8/1

Y1 - 2015/8/1

N2 - There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to efficiently control each processing element (PE) in the system, and how to facilitate inter-PE communication without the overheads of traditional shared-memory coherent memory. In this article, we explore solving these problems using triggered instructions and latency-insensitive channels. Triggered instructions completely eliminate the program counter (PC) and allow programs to transition concisely between states without explicit branch instructions. Latency-insensitive channels allow efficient communication of inter-PE control information while simultaneously enabling flexible code placement and improving tolerance for variable events such as cache accesses. Together, these approaches provide a unified mechanism to avoid overserialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading. Our analysis shows that a spatial accelerator using triggered instructions and latency-insensitive channels can achieve 8ï¿½ greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64%, respectively, over a PC-style baseline, increasing the performance of the spatial programming approach by 2.0ï¿½.

AB - There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to efficiently control each processing element (PE) in the system, and how to facilitate inter-PE communication without the overheads of traditional shared-memory coherent memory. In this article, we explore solving these problems using triggered instructions and latency-insensitive channels. Triggered instructions completely eliminate the program counter (PC) and allow programs to transition concisely between states without explicit branch instructions. Latency-insensitive channels allow efficient communication of inter-PE control information while simultaneously enabling flexible code placement and improving tolerance for variable events such as cache accesses. Together, these approaches provide a unified mechanism to avoid overserialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading. Our analysis shows that a spatial accelerator using triggered instructions and latency-insensitive channels can achieve 8ï¿½ greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64%, respectively, over a PC-style baseline, increasing the performance of the spatial programming approach by 2.0ï¿½.

KW - Reconfigurable accelerators

KW - Spatial programming

UR - http://www.scopus.com/inward/record.url?scp=84946081286&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946081286&partnerID=8YFLogxK

U2 - 10.1145/2754930

DO - 10.1145/2754930

M3 - Article

AN - SCOPUS:84946081286

SN - 0734-2071

VL - 33

JO - ACM Transactions on Computer Systems

JF - ACM Transactions on Computer Systems

IS - 3

M1 - 10

ER -

Efficient control and communication paradigms for coarse-grained spatial architectures

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this