TY - GEN
T1 - An evaluation of a compiler optimization for improving the performance of a coherence directory
AU - Mounes-Toussi, Farnaz
AU - Lilja, David J.
AU - Li, Zhiyuan
PY - 1994/7/16
Y1 - 1994/7/16
N2 - Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache coherence in largescale shared-memory multiprocessors, but both of these approaches have significant limitations. We examine the potential performance improvement of a new Software-hardware controlled cache coherence mechanism [18]. This approach augments the run-time information available to a directory-based coherence mechanism with compile-time analysis that statically identifies write references that cannot cause coherence problems and writes that should be written through to memory. These references are marked as not needing to send invalidation messages to thereby reduce the network traffic produced by the directory while maintaining cache consistency. For those memory references that are ambiguous, due to conditional branches, or due to the need for complex data flow analysis, for instance, the compiler conservatively marks the references and relies on the hardware directory to ensure coherence. Trace-driven simulations are used to emulate the compile-time analysis on memory traces and to estimate the potential performance improvement that could be expected from a compiler performing this optimization on the Perfect Club benchmark programs. By reducing the number of invalidations, this optimized directory scheme is capable of reducing the processor-memory network traffic by np to 54 percent compared to an unoptimized directory mechanism. In addition, the overall miss ratio can be reduced up to 42 percent due to a corresponding reduction in the number of write misses.
AB - Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache coherence in largescale shared-memory multiprocessors, but both of these approaches have significant limitations. We examine the potential performance improvement of a new Software-hardware controlled cache coherence mechanism [18]. This approach augments the run-time information available to a directory-based coherence mechanism with compile-time analysis that statically identifies write references that cannot cause coherence problems and writes that should be written through to memory. These references are marked as not needing to send invalidation messages to thereby reduce the network traffic produced by the directory while maintaining cache consistency. For those memory references that are ambiguous, due to conditional branches, or due to the need for complex data flow analysis, for instance, the compiler conservatively marks the references and relies on the hardware directory to ensure coherence. Trace-driven simulations are used to emulate the compile-time analysis on memory traces and to estimate the potential performance improvement that could be expected from a compiler performing this optimization on the Perfect Club benchmark programs. By reducing the number of invalidations, this optimized directory scheme is capable of reducing the processor-memory network traffic by np to 54 percent compared to an unoptimized directory mechanism. In addition, the overall miss ratio can be reduced up to 42 percent due to a corresponding reduction in the number of write misses.
UR - http://www.scopus.com/inward/record.url?scp=84955604615&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84955604615&partnerID=8YFLogxK
U2 - 10.1145/181181.181281
DO - 10.1145/181181.181281
M3 - Conference contribution
AN - SCOPUS:84955604615
T3 - Proceedings of the International Conference on Supercomputing
SP - 75
EP - 84
BT - Proceedings of the 8th International Conference on Supercomputing, ICS 1994
PB - Association for Computing Machinery
T2 - 8th International Conference on Supercomputing, ICS 1994
Y2 - 11 July 1994 through 15 July 1994
ER -