TurboTiling: Leveraging prefetching to boost performance of tiled codes

Sanyam Mehta, Rajat Garg, Nishad Trivedi, Pen Chung Yew

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

Loop tiling or blocking improves temporal locality by dividing the problem domain into tiles and then repeatedly accessing the data within a tile. While this reduces reuse, it also leads to an often ignored side-effect: breaking the streaming data access pattern. As a result, tiled codes are unable to exploit the sophisticated hardware prefetchers in present-day processors to extract extra performance. In this work, we propose a tiling algorithm to leverage prefetching to boost the performance of tiled codes. To achieve this, we propose to tile for the last-level cache as opposed to tiling for higher levels of cache as generally recommended. This approach not only exposes streaming access patterns in the tiled code that are amenable for prefetching, but also allows for a reduction in the off-chip traffic to memory (and therefore, better scaling with the number of cores). As a result, although we tile for the last level cache, we effectively access the data in the higher levels of cache because the data is prefetched in time for computation. To achieve this, we propose an algorithm to select a tile size that aims to maximize data reuse and minimize conflict misses in the shared last-level cache in modern multi-core processors. We find that the combined effect of tiling for the last-level cache and effective hardware prefetching gives significant improvement over existing tiling algorithms that target higher level L1/L2 caches and do not leverage the hardware prefetchers. When run on an Intel 8-core machine using different problem sizes, it achieves an average improvement of 27% and 48% for smaller and larger problem sizes, respectively, over the best tile sizes selected by state-of-the-art algorithms.

Original languageEnglish (US)
Title of host publicationProceedings of the 2016 International Conference on Supercomputing, ICS 2016
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450343619
DOIs
StatePublished - Jun 1 2016
Event30th International Conference on Supercomputing, ICS 2016 - Istanbul, Turkey
Duration: Jun 1 2016Jun 3 2016

Publication series

NameProceedings of the International Conference on Supercomputing
Volume01-03-June-2016

Other

Other30th International Conference on Supercomputing, ICS 2016
Country/TerritoryTurkey
CityIstanbul
Period6/1/166/3/16

Bibliographical note

Publisher Copyright:
© 2016 ACM.

Keywords

  • Loop tiling
  • Multi-core
  • Prefetching

Fingerprint

Dive into the research topics of 'TurboTiling: Leveraging prefetching to boost performance of tiled codes'. Together they form a unique fingerprint.

Cite this