Abstract
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.
Original language | English (US) |
---|---|
Pages | 171-183 |
Number of pages | 13 |
DOIs | |
State | Published - 2002 |
Externally published | Yes |
Event | Tenth International Conference on Architectural Support for Programming Languages and Operating Systems - San Jose, CA, United States Duration: Oct 5 2002 → Oct 9 2002 |
Other
Other | Tenth International Conference on Architectural Support for Programming Languages and Operating Systems |
---|---|
Country/Territory | United States |
City | San Jose, CA |
Period | 10/5/02 → 10/9/02 |