The speculated execution of threads in a multithreaded architecture plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is before it is known whether they actually will be needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative Wrong Execution Cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.
|Original language||English (US)|
|Number of pages||15|
|Journal||IEEE Transactions on Parallel and Distributed Systems|
|State||Published - Mar 2005|
Bibliographical noteFunding Information:
This work was supported in part by US National Science Foundation grants EIA-9971666 and CCR-9900605, IBM Corporation, Compaq’s Alpha development group, and the Minnesota Supercomputing Institute. Resit Sendag was supported in part by the University of Rhode Island new faculty startup fund. Preliminary versions of this work were presented at the ACM Euro-Par 2002 Conference  and the IEEE International Parallel and Distributed Processing Symposium (IPDPS 2003) . The authors would like to thank the anonymous reviewers for their thoughtful comments on earlier versions of this paper. Their suggestions helped us to improve the presentation of this paper significantly.
- Mispredicted loads
- Multithreaded architecture
- Wrong execution
- Wrong execution cache