TY - JOUR
T1 - Improving data cache performance via address correlation
T2 - An upper bound study
AU - Chuang, Peng Fei
AU - Sendag, Resit
AU - Lilja, David J.
PY - 2004/12/1
Y1 - 2004/12/1
N2 - Address correlation is a technique that links the addresses that reference the same data values. Using a detailed source-code level analysis, a recent study [1] revealed that different addresses containing the same data can often be correlated at run-time to eliminate on-chip data cache misses. In this paper, we study the upper-bound performance of an Address Correlation System (ACS), and discuss specific optimizations for a realistic hardware implementation. An ACS'can effectively eliminate most of the LI data cache misses by supplying the data from a correlated address already found in the cache to thereby improve the performance of the processor. For 10 of the SPEC CPU2000 benchmarks, 57 to 99% of all LI data cache load misses can be eliminated, which produces an increase of 0 to 243% in the overall performance of a superscalar processor. We also show that an ACS with 1-2 correlations for a value can usually provide comparable performance results to that of the upper bound. Furthermore, a considerable number of correlations can be found within the same set in the LI data cache, which suggests that a low-cost ACS implementation is possible.
AB - Address correlation is a technique that links the addresses that reference the same data values. Using a detailed source-code level analysis, a recent study [1] revealed that different addresses containing the same data can often be correlated at run-time to eliminate on-chip data cache misses. In this paper, we study the upper-bound performance of an Address Correlation System (ACS), and discuss specific optimizations for a realistic hardware implementation. An ACS'can effectively eliminate most of the LI data cache misses by supplying the data from a correlated address already found in the cache to thereby improve the performance of the processor. For 10 of the SPEC CPU2000 benchmarks, 57 to 99% of all LI data cache load misses can be eliminated, which produces an increase of 0 to 243% in the overall performance of a superscalar processor. We also show that an ACS with 1-2 correlations for a value can usually provide comparable performance results to that of the upper bound. Furthermore, a considerable number of correlations can be found within the same set in the LI data cache, which suggests that a low-cost ACS implementation is possible.
UR - http://www.scopus.com/inward/record.url?scp=35048821641&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35048821641&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:35048821641
SN - 0302-9743
VL - 3149
SP - 541
EP - 550
JO - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
JF - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ER -