Data prefetching and data forwarding in shared memory multiprocessors

D. K. Poulsen; Pen Chung Yew

doi:10.1109/ICPP.1994.81

Data prefetching and data forwarding in shared memory multiprocessors

D. K. Poulsen, Pen Chung Yew

Computer Science and Engineering

Research output: Contribution to journal › Conference article › peer-review

31 Scopus citations

Abstract

This paper studies and compares the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency due to interprocessor communication in cache coherent, shared memory multiprocessors. Two multiprocessor prefetching algorithms are presented and compared. A simple blocked vector prefetching algorithm, considerably less complex than existing software pipelined prefetching algorithms, is shown to be effective in reducing memory latency and increasing performance. A Forwarding Write operation is used to evaluate the effectiveness of forwarding. The use of data forwarding results in significant performance improvements over data prefetching for codes exhibiting less spatial locality. Algorithms for data prefetching and data forwarding are implemented in a parallelizing compiler. Evaluation of the proposed schemes and algorithms is accomplished via execution-driven simulation of large, optimized, parallel numerical application codes with loop-level and vector parallelism. More data, discussion, and experiment details can be found in [1].

Original language	English (US)
Article number	5727799
Pages (from-to)	II276-II280
Journal	Proceedings of the International Conference on Parallel Processing
Volume	2
DOIs	https://doi.org/10.1109/ICPP.1994.81
State	Published - 1994
Event	23rd International Conference on Parallel Processing, ICPP 1994 - Raleigh, NC, United States Duration: Aug 15 1994 → Aug 19 1994

Access

10.1109/ICPP.1994.81

OpenUrl availability

Full text

Cite this

@article{67df399bc550433fbbe0239b492fe932,

title = "Data prefetching and data forwarding in shared memory multiprocessors",

abstract = "This paper studies and compares the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency due to interprocessor communication in cache coherent, shared memory multiprocessors. Two multiprocessor prefetching algorithms are presented and compared. A simple blocked vector prefetching algorithm, considerably less complex than existing software pipelined prefetching algorithms, is shown to be effective in reducing memory latency and increasing performance. A Forwarding Write operation is used to evaluate the effectiveness of forwarding. The use of data forwarding results in significant performance improvements over data prefetching for codes exhibiting less spatial locality. Algorithms for data prefetching and data forwarding are implemented in a parallelizing compiler. Evaluation of the proposed schemes and algorithms is accomplished via execution-driven simulation of large, optimized, parallel numerical application codes with loop-level and vector parallelism. More data, discussion, and experiment details can be found in [1].",

author = "Poulsen, {D. K.} and Yew, {Pen Chung}",

year = "1994",

doi = "10.1109/ICPP.1994.81",

language = "English (US)",

volume = "2",

pages = "II276--II280",

journal = "Proceedings of the International Conference on Parallel Processing",

issn = "0190-3918",

note = "23rd International Conference on Parallel Processing, ICPP 1994 ; Conference date: 15-08-1994 Through 19-08-1994",

}

TY - JOUR

T1 - Data prefetching and data forwarding in shared memory multiprocessors

AU - Poulsen, D. K.

AU - Yew, Pen Chung

PY - 1994

Y1 - 1994

N2 - This paper studies and compares the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency due to interprocessor communication in cache coherent, shared memory multiprocessors. Two multiprocessor prefetching algorithms are presented and compared. A simple blocked vector prefetching algorithm, considerably less complex than existing software pipelined prefetching algorithms, is shown to be effective in reducing memory latency and increasing performance. A Forwarding Write operation is used to evaluate the effectiveness of forwarding. The use of data forwarding results in significant performance improvements over data prefetching for codes exhibiting less spatial locality. Algorithms for data prefetching and data forwarding are implemented in a parallelizing compiler. Evaluation of the proposed schemes and algorithms is accomplished via execution-driven simulation of large, optimized, parallel numerical application codes with loop-level and vector parallelism. More data, discussion, and experiment details can be found in [1].

AB - This paper studies and compares the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency due to interprocessor communication in cache coherent, shared memory multiprocessors. Two multiprocessor prefetching algorithms are presented and compared. A simple blocked vector prefetching algorithm, considerably less complex than existing software pipelined prefetching algorithms, is shown to be effective in reducing memory latency and increasing performance. A Forwarding Write operation is used to evaluate the effectiveness of forwarding. The use of data forwarding results in significant performance improvements over data prefetching for codes exhibiting less spatial locality. Algorithms for data prefetching and data forwarding are implemented in a parallelizing compiler. Evaluation of the proposed schemes and algorithms is accomplished via execution-driven simulation of large, optimized, parallel numerical application codes with loop-level and vector parallelism. More data, discussion, and experiment details can be found in [1].

UR - http://www.scopus.com/inward/record.url?scp=77954460854&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954460854&partnerID=8YFLogxK

U2 - 10.1109/ICPP.1994.81

DO - 10.1109/ICPP.1994.81

M3 - Conference article

AN - SCOPUS:77954460854

SN - 0190-3918

VL - 2

SP - II276-II280

JO - Proceedings of the International Conference on Parallel Processing

JF - Proceedings of the International Conference on Parallel Processing

M1 - 5727799

T2 - 23rd International Conference on Parallel Processing, ICPP 1994

Y2 - 15 August 1994 through 19 August 1994

ER -

Data prefetching and data forwarding in shared memory multiprocessors

Abstract

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this