In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms. We have fully implemented the algorithm in the Polaris parallelizing compiler. Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implemented cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D.
|Original language||English (US)|
|Number of pages||18|
|Journal||IEEE Transactions on Parallel and Distributed Systems|
|State||Published - Sep 2000|
Bibliographical noteFunding Information:
The research described in this paper was supported in part by the U.S. National Science Foundation Grants MIP 93-07910, MIP9610379, and CDA 95-02979. Special thanks go to David Poulsen at Kuck and Associates, Inc., for his help in the development of execution-driven simulations. We also wish to thank Polaris group members Peng Tu, Jay Hoeflinger, and Prof. David Padua for their guidance on the implementation of array data-flow analysis in the Polaris compiler. Finally, we thank Lawrence Rauchwerger and IBM for providing RS6000 clusters for simulations. A preliminary version of some of this work appears in , , and .