Effectiveness of compiler-directed prefetching on data mining benchmarks

Ragavendra Natarajan; Vineeth Mekkat; Wei Chung Hsu; Antonia Zhai

doi:10.1142/S0218126612400063

Effectiveness of compiler-directed prefetching on data mining benchmarks

Ragavendra Natarajan, Vineeth Mekkat, Wei Chung Hsu, Antonia Zhai

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

For today's increasingly power-constrained multicore systems, integrating simpler and more energy-efficient in-order cores becomes attractive. However, since in-order processors lack complex hardware support for tolerating long-latency memory accesses, developing compiler technologies to hide such latencies becomes critical. Compiler-directed prefetching has been demonstrated effective on some applications. On the application side, a large class of data centric applications has emerged to explore the underlying properties of the explosively growing data. These applications, in contrast to traditional benchmarks, are characterized by substantial thread-level parallelism, complex and unpredictable control flow, as well as intensive and irregular memory access patterns. These applications are expected to be the dominating workloads on future microprocessors. Thus, in this paper, we investigated the effectiveness of compiler-directed prefetching on data mining applications in in-order multicore systems. Our study reveals that although properly inserted prefetch instructions can often effectively reduce memory access latencies for data mining applications, the compiler is not always able to exploit this potential. Compiler-directed prefetching can become inefficient in the presence of complex control flow and memory access patterns; and architecture dependent behaviors. The integration of multithreaded execution onto a single die makes it even more difficult for the compiler to insert prefetch instructions, since optimizations that are effective for single-threaded execution may or may not be effective in multithreaded execution. Thus, compiler-directed prefetching must be judiciously deployed to avoid creating performance bottlenecks that otherwise do not exist. Our experiences suggest that dynamic performance tuning techniques that adjust to the behaviors of a program can potentially facilitate the deployment of aggressive optimizations in data mining applications.

Original language	English (US)
Article number	1240006
Journal	Journal of Circuits, Systems and Computers
Volume	21
Issue number	2
DOIs	https://doi.org/10.1142/S0218126612400063
State	Published - Apr 2012

Bibliographical note

Funding Information:
This work is supported in part by grants from National Science Foundation under CNS-0834599, CSR-0834599, and CPS-0931931, a contract from Semiconductor Research Corporation under SRC-2008-TJ-1819, and gift grants from HP, IBM and Intel.

Keywords

Multicore
compilers
data mining
optimization
prefetching

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

10.1142/S0218126612400063

OpenUrl availability

Full text

Cite this

@article{f8213b83254f4cbaa4078c7b2c65d2b6,

title = "Effectiveness of compiler-directed prefetching on data mining benchmarks",

abstract = "For today's increasingly power-constrained multicore systems, integrating simpler and more energy-efficient in-order cores becomes attractive. However, since in-order processors lack complex hardware support for tolerating long-latency memory accesses, developing compiler technologies to hide such latencies becomes critical. Compiler-directed prefetching has been demonstrated effective on some applications. On the application side, a large class of data centric applications has emerged to explore the underlying properties of the explosively growing data. These applications, in contrast to traditional benchmarks, are characterized by substantial thread-level parallelism, complex and unpredictable control flow, as well as intensive and irregular memory access patterns. These applications are expected to be the dominating workloads on future microprocessors. Thus, in this paper, we investigated the effectiveness of compiler-directed prefetching on data mining applications in in-order multicore systems. Our study reveals that although properly inserted prefetch instructions can often effectively reduce memory access latencies for data mining applications, the compiler is not always able to exploit this potential. Compiler-directed prefetching can become inefficient in the presence of complex control flow and memory access patterns; and architecture dependent behaviors. The integration of multithreaded execution onto a single die makes it even more difficult for the compiler to insert prefetch instructions, since optimizations that are effective for single-threaded execution may or may not be effective in multithreaded execution. Thus, compiler-directed prefetching must be judiciously deployed to avoid creating performance bottlenecks that otherwise do not exist. Our experiences suggest that dynamic performance tuning techniques that adjust to the behaviors of a program can potentially facilitate the deployment of aggressive optimizations in data mining applications.",

keywords = "Multicore, compilers, data mining, optimization, prefetching",

author = "Ragavendra Natarajan and Vineeth Mekkat and Hsu, {Wei Chung} and Antonia Zhai",

note = "Funding Information: This work is supported in part by grants from National Science Foundation under CNS-0834599, CSR-0834599, and CPS-0931931, a contract from Semiconductor Research Corporation under SRC-2008-TJ-1819, and gift grants from HP, IBM and Intel.",

year = "2012",

month = apr,

doi = "10.1142/S0218126612400063",

language = "English (US)",

volume = "21",

journal = "Journal of Circuits, Systems and Computers",

issn = "0218-1266",

publisher = "World Scientific Publishing Co. Pte Ltd",

number = "2",

}

TY - JOUR

T1 - Effectiveness of compiler-directed prefetching on data mining benchmarks

AU - Natarajan, Ragavendra

AU - Mekkat, Vineeth

AU - Hsu, Wei Chung

AU - Zhai, Antonia

N1 - Funding Information: This work is supported in part by grants from National Science Foundation under CNS-0834599, CSR-0834599, and CPS-0931931, a contract from Semiconductor Research Corporation under SRC-2008-TJ-1819, and gift grants from HP, IBM and Intel.

PY - 2012/4

Y1 - 2012/4

N2 - For today's increasingly power-constrained multicore systems, integrating simpler and more energy-efficient in-order cores becomes attractive. However, since in-order processors lack complex hardware support for tolerating long-latency memory accesses, developing compiler technologies to hide such latencies becomes critical. Compiler-directed prefetching has been demonstrated effective on some applications. On the application side, a large class of data centric applications has emerged to explore the underlying properties of the explosively growing data. These applications, in contrast to traditional benchmarks, are characterized by substantial thread-level parallelism, complex and unpredictable control flow, as well as intensive and irregular memory access patterns. These applications are expected to be the dominating workloads on future microprocessors. Thus, in this paper, we investigated the effectiveness of compiler-directed prefetching on data mining applications in in-order multicore systems. Our study reveals that although properly inserted prefetch instructions can often effectively reduce memory access latencies for data mining applications, the compiler is not always able to exploit this potential. Compiler-directed prefetching can become inefficient in the presence of complex control flow and memory access patterns; and architecture dependent behaviors. The integration of multithreaded execution onto a single die makes it even more difficult for the compiler to insert prefetch instructions, since optimizations that are effective for single-threaded execution may or may not be effective in multithreaded execution. Thus, compiler-directed prefetching must be judiciously deployed to avoid creating performance bottlenecks that otherwise do not exist. Our experiences suggest that dynamic performance tuning techniques that adjust to the behaviors of a program can potentially facilitate the deployment of aggressive optimizations in data mining applications.

AB - For today's increasingly power-constrained multicore systems, integrating simpler and more energy-efficient in-order cores becomes attractive. However, since in-order processors lack complex hardware support for tolerating long-latency memory accesses, developing compiler technologies to hide such latencies becomes critical. Compiler-directed prefetching has been demonstrated effective on some applications. On the application side, a large class of data centric applications has emerged to explore the underlying properties of the explosively growing data. These applications, in contrast to traditional benchmarks, are characterized by substantial thread-level parallelism, complex and unpredictable control flow, as well as intensive and irregular memory access patterns. These applications are expected to be the dominating workloads on future microprocessors. Thus, in this paper, we investigated the effectiveness of compiler-directed prefetching on data mining applications in in-order multicore systems. Our study reveals that although properly inserted prefetch instructions can often effectively reduce memory access latencies for data mining applications, the compiler is not always able to exploit this potential. Compiler-directed prefetching can become inefficient in the presence of complex control flow and memory access patterns; and architecture dependent behaviors. The integration of multithreaded execution onto a single die makes it even more difficult for the compiler to insert prefetch instructions, since optimizations that are effective for single-threaded execution may or may not be effective in multithreaded execution. Thus, compiler-directed prefetching must be judiciously deployed to avoid creating performance bottlenecks that otherwise do not exist. Our experiences suggest that dynamic performance tuning techniques that adjust to the behaviors of a program can potentially facilitate the deployment of aggressive optimizations in data mining applications.

KW - Multicore

KW - compilers

KW - data mining

KW - optimization

KW - prefetching

UR - http://www.scopus.com/inward/record.url?scp=84862172593&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862172593&partnerID=8YFLogxK

U2 - 10.1142/S0218126612400063

DO - 10.1142/S0218126612400063

M3 - Article

AN - SCOPUS:84862172593

SN - 0218-1266

VL - 21

JO - Journal of Circuits, Systems and Computers

JF - Journal of Circuits, Systems and Computers

IS - 2

M1 - 1240006

ER -

Effectiveness of compiler-directed prefetching on data mining benchmarks

Abstract

Bibliographical note

Keywords

UN SDGs

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this