Managing shared last-level cache in a heterogeneous multicore processor

Vineeth Mekkat; Anup Holey; Pen Chung Yew; Antonia Zhai

doi:10.1109/PACT.2013.6618819

Managing shared last-level cache in a heterogeneous multicore processor

Vineeth Mekkat, Anup Holey, Pen Chung Yew, Antonia Zhai

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

68 Scopus citations

Abstract

Heterogeneous multicore processors that integrate CPU cores and data-parallel accelerators such as GPU cores onto the same die raise several new issues for sharing various on-chip resources. The shared last-level cache (LLC) is one of the most important shared resources due to its impact on performance. Accesses to the shared LLC in heterogeneous multicore processors can be dominated by the GPU due to the significantly higher number of threads supported. Under current cache management policies, the CPU applications' share of the LLC can be significantly reduced in the presence of competing GPU applications. For cache sensitive CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can often tolerate increased memory access latency in the presence of LLC misses when there is sufficient thread-level parallelism. In this work, we propose Heterogeneous LLC Management (HeLM), a novel shared LLC management policy that takes advantage of the GPU's tolerance for memory access latency. HeLM is able to throttle GPU LLC accesses and yield LLC space to cache sensitive CPU applications. GPU LLC access throttling is achieved by allowing GPU threads that can tolerate longer memory access latencies to bypass the LLC. The latency tolerance of a GPU application is determined by the availability of thread-level parallelism, which can be measured at runtime as the average number of threads that are available for issuing. Our heterogeneous LLC management scheme outperforms LRU policy by 12.5% and TAP-RRIP by 5.6% for a processor with 4 CPU and 4 GPU cores.

Original language	English (US)
Title of host publication	PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques
Pages	225-234
Number of pages	10
DOIs	https://doi.org/10.1109/PACT.2013.6618819
State	Published - Nov 18 2013
Event	22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013 - Edinburgh, United Kingdom Duration: Sep 7 2013 → Sep 11 2013

Publication series

Name	Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
ISSN (Print)	1089-795X

Other

Other	22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013
Country/Territory	United Kingdom
City	Edinburgh
Period	9/7/13 → 9/11/13

Keywords

cache management policy
heterogeneous multicores
shared last-level cache

Access

10.1109/PACT.2013.6618819

OpenUrl availability

Full text

Cite this

Mekkat, V., Holey, A., Yew, P. C., & Zhai, A. (2013). Managing shared last-level cache in a heterogeneous multicore processor. In PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (pp. 225-234). Article 6618819 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT). https://doi.org/10.1109/PACT.2013.6618819

Managing shared last-level cache in a heterogeneous multicore processor. / Mekkat, Vineeth; Holey, Anup; Yew, Pen Chung et al.
PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. 2013. p. 225-234 6618819 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Mekkat, V, Holey, A, Yew, PC & Zhai, A 2013, Managing shared last-level cache in a heterogeneous multicore processor. in PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques., 6618819, Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, pp. 225-234, 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013, Edinburgh, United Kingdom, 9/7/13. https://doi.org/10.1109/PACT.2013.6618819

@inproceedings{563c33b77c5d4a76ab444c6c918ffa0f,

title = "Managing shared last-level cache in a heterogeneous multicore processor",

abstract = "Heterogeneous multicore processors that integrate CPU cores and data-parallel accelerators such as GPU cores onto the same die raise several new issues for sharing various on-chip resources. The shared last-level cache (LLC) is one of the most important shared resources due to its impact on performance. Accesses to the shared LLC in heterogeneous multicore processors can be dominated by the GPU due to the significantly higher number of threads supported. Under current cache management policies, the CPU applications' share of the LLC can be significantly reduced in the presence of competing GPU applications. For cache sensitive CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can often tolerate increased memory access latency in the presence of LLC misses when there is sufficient thread-level parallelism. In this work, we propose Heterogeneous LLC Management (HeLM), a novel shared LLC management policy that takes advantage of the GPU's tolerance for memory access latency. HeLM is able to throttle GPU LLC accesses and yield LLC space to cache sensitive CPU applications. GPU LLC access throttling is achieved by allowing GPU threads that can tolerate longer memory access latencies to bypass the LLC. The latency tolerance of a GPU application is determined by the availability of thread-level parallelism, which can be measured at runtime as the average number of threads that are available for issuing. Our heterogeneous LLC management scheme outperforms LRU policy by 12.5% and TAP-RRIP by 5.6% for a processor with 4 CPU and 4 GPU cores.",

keywords = "cache management policy, heterogeneous multicores, shared last-level cache",

author = "Vineeth Mekkat and Anup Holey and Yew, {Pen Chung} and Antonia Zhai",

year = "2013",

month = nov,

day = "18",

doi = "10.1109/PACT.2013.6618819",

language = "English (US)",

isbn = "9781479910212",

series = "Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT",

pages = "225--234",

booktitle = "PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques",

note = "22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013 ; Conference date: 07-09-2013 Through 11-09-2013",

}

TY - GEN

T1 - Managing shared last-level cache in a heterogeneous multicore processor

AU - Mekkat, Vineeth

AU - Holey, Anup

AU - Yew, Pen Chung

AU - Zhai, Antonia

PY - 2013/11/18

Y1 - 2013/11/18

N2 - Heterogeneous multicore processors that integrate CPU cores and data-parallel accelerators such as GPU cores onto the same die raise several new issues for sharing various on-chip resources. The shared last-level cache (LLC) is one of the most important shared resources due to its impact on performance. Accesses to the shared LLC in heterogeneous multicore processors can be dominated by the GPU due to the significantly higher number of threads supported. Under current cache management policies, the CPU applications' share of the LLC can be significantly reduced in the presence of competing GPU applications. For cache sensitive CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can often tolerate increased memory access latency in the presence of LLC misses when there is sufficient thread-level parallelism. In this work, we propose Heterogeneous LLC Management (HeLM), a novel shared LLC management policy that takes advantage of the GPU's tolerance for memory access latency. HeLM is able to throttle GPU LLC accesses and yield LLC space to cache sensitive CPU applications. GPU LLC access throttling is achieved by allowing GPU threads that can tolerate longer memory access latencies to bypass the LLC. The latency tolerance of a GPU application is determined by the availability of thread-level parallelism, which can be measured at runtime as the average number of threads that are available for issuing. Our heterogeneous LLC management scheme outperforms LRU policy by 12.5% and TAP-RRIP by 5.6% for a processor with 4 CPU and 4 GPU cores.

AB - Heterogeneous multicore processors that integrate CPU cores and data-parallel accelerators such as GPU cores onto the same die raise several new issues for sharing various on-chip resources. The shared last-level cache (LLC) is one of the most important shared resources due to its impact on performance. Accesses to the shared LLC in heterogeneous multicore processors can be dominated by the GPU due to the significantly higher number of threads supported. Under current cache management policies, the CPU applications' share of the LLC can be significantly reduced in the presence of competing GPU applications. For cache sensitive CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can often tolerate increased memory access latency in the presence of LLC misses when there is sufficient thread-level parallelism. In this work, we propose Heterogeneous LLC Management (HeLM), a novel shared LLC management policy that takes advantage of the GPU's tolerance for memory access latency. HeLM is able to throttle GPU LLC accesses and yield LLC space to cache sensitive CPU applications. GPU LLC access throttling is achieved by allowing GPU threads that can tolerate longer memory access latencies to bypass the LLC. The latency tolerance of a GPU application is determined by the availability of thread-level parallelism, which can be measured at runtime as the average number of threads that are available for issuing. Our heterogeneous LLC management scheme outperforms LRU policy by 12.5% and TAP-RRIP by 5.6% for a processor with 4 CPU and 4 GPU cores.

KW - cache management policy

KW - heterogeneous multicores

KW - shared last-level cache

UR - http://www.scopus.com/inward/record.url?scp=84887456430&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84887456430&partnerID=8YFLogxK

U2 - 10.1109/PACT.2013.6618819

DO - 10.1109/PACT.2013.6618819

M3 - Conference contribution

AN - SCOPUS:84887456430

SN - 9781479910212

T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

SP - 225

EP - 234

BT - PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques

T2 - 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013

Y2 - 7 September 2013 through 11 September 2013

ER -

Managing shared last-level cache in a heterogeneous multicore processor

Abstract

Publication series

Other

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this