TY - GEN
T1 - Loop selection for thread-level speculation
AU - Wang, Shengyue
AU - Dai, Xiaoru
AU - Yellajyosula, Kiran S.
AU - Zhai, Antonia
AU - Yew, Pen Chung
PY - 2006
Y1 - 2006
N2 - Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. However, the high cost associated with unbalanced load, failed speculation, and inter-thread value communication makes it difficult to obtain the desired performance unless the speculative threads are carefully chosen. In this paper, we focus on extracting parallel threads from loops in general-purpose applications because loops, with their regular structures and significant coverage on execution time, are ideal candidates for extracting parallel threads. General-purpose applications, however, usually contain a large number of nested loops with unpredictable parallel performance and dynamic behavior, thus making it difficult to decide which set of loops should be parallelized to improve overall program performance. Our proposed loop selection algorithm addresses all these difficulties. We have found that (i) with the aid of profiling information, compiler analyses can achieve a reasonably accurate estimation of the performance of parallel execution, and that (ii) different invocations of a loop may behave differently, and exploiting this dynamic behavior can further improve performance. With a judicious choice of loops, we can improve the overall program performance of SPEC2000 integer benchmarks by as much as 20%.
AB - Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. However, the high cost associated with unbalanced load, failed speculation, and inter-thread value communication makes it difficult to obtain the desired performance unless the speculative threads are carefully chosen. In this paper, we focus on extracting parallel threads from loops in general-purpose applications because loops, with their regular structures and significant coverage on execution time, are ideal candidates for extracting parallel threads. General-purpose applications, however, usually contain a large number of nested loops with unpredictable parallel performance and dynamic behavior, thus making it difficult to decide which set of loops should be parallelized to improve overall program performance. Our proposed loop selection algorithm addresses all these difficulties. We have found that (i) with the aid of profiling information, compiler analyses can achieve a reasonably accurate estimation of the performance of parallel execution, and that (ii) different invocations of a loop may behave differently, and exploiting this dynamic behavior can further improve performance. With a judicious choice of loops, we can improve the overall program performance of SPEC2000 integer benchmarks by as much as 20%.
UR - http://www.scopus.com/inward/record.url?scp=43949097906&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=43949097906&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-69330-7_20
DO - 10.1007/978-3-540-69330-7_20
M3 - Conference contribution
AN - SCOPUS:43949097906
SN - 3540693297
SN - 9783540693291
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 289
EP - 303
BT - Languages and Compilers for Parallel Computing - 18th International Workshop, LCPC 2005, Revised Selected Papers
T2 - 18th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2005
Y2 - 20 October 2005 through 22 October 2005
ER -