Efficiency of thread-level speculation in smt and cmp architectures - performance, power and thermal perspective

Venkatesan Packirisamy; Yangchun Luo; Wei Lung Hung; Antonia Zhai; Pen Chung Yew; Tin Fook Ngai

doi:10.1109/ICCD.2008.4751875

Efficiency of thread-level speculation in smt and cmp architectures - performance, power and thermal perspective

Venkatesan Packirisamy, Yangchun Luo, Wei Lung Hung, Antonia Zhai, Pen Chung Yew, Tin Fook Ngai

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

10 Scopus citations

Abstract

Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000's. However, because of the lack of compilers and other related software technologies, most of the generalpurpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. The TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a Superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of Energy-Delay-Squared product (ED ²), SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.

Original language	English (US)
Title of host publication	26th IEEE International Conference on Computer Design 2008, ICCD
Pages	286-293
Number of pages	8
DOIs	https://doi.org/10.1109/ICCD.2008.4751875
State	Published - 2008
Event	26th IEEE International Conference on Computer Design 2008, ICCD - Lake Tahoe, CA, United States Duration: Oct 12 2008 → Oct 15 2008

Publication series

Name	26th IEEE International Conference on Computer Design 2008, ICCD

Other

Other	26th IEEE International Conference on Computer Design 2008, ICCD
Country/Territory	United States
City	Lake Tahoe, CA
Period	10/12/08 → 10/15/08

Access

10.1109/ICCD.2008.4751875

OpenUrl availability

Full text

Cite this

Packirisamy, V., Luo, Y., Hung, W. L., Zhai, A., Yew, P. C., & Ngai, T. F. (2008). Efficiency of thread-level speculation in smt and cmp architectures - performance, power and thermal perspective. In 26th IEEE International Conference on Computer Design 2008, ICCD (pp. 286-293). Article 4751875 (26th IEEE International Conference on Computer Design 2008, ICCD). https://doi.org/10.1109/ICCD.2008.4751875

Efficiency of thread-level speculation in smt and cmp architectures - performance, power and thermal perspective. / Packirisamy, Venkatesan; Luo, Yangchun; Hung, Wei Lung et al.
26th IEEE International Conference on Computer Design 2008, ICCD. 2008. p. 286-293 4751875 (26th IEEE International Conference on Computer Design 2008, ICCD).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Packirisamy, V, Luo, Y, Hung, WL, Zhai, A , Yew, PC & Ngai, TF 2008, Efficiency of thread-level speculation in smt and cmp architectures - performance, power and thermal perspective. in 26th IEEE International Conference on Computer Design 2008, ICCD., 4751875, 26th IEEE International Conference on Computer Design 2008, ICCD, pp. 286-293, 26th IEEE International Conference on Computer Design 2008, ICCD, Lake Tahoe, CA, United States, 10/12/08. https://doi.org/10.1109/ICCD.2008.4751875

@inproceedings{3d48b5ca0a8f475bb1eb321283072665,

title = "Efficiency of thread-level speculation in smt and cmp architectures - performance, power and thermal perspective",

abstract = "Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000's. However, because of the lack of compilers and other related software technologies, most of the generalpurpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. The TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a Superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of Energy-Delay-Squared product (ED 2), SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.",

author = "Venkatesan Packirisamy and Yangchun Luo and Hung, {Wei Lung} and Antonia Zhai and Yew, {Pen Chung} and Ngai, {Tin Fook}",

year = "2008",

doi = "10.1109/ICCD.2008.4751875",

language = "English (US)",

isbn = "9781424426584",

series = "26th IEEE International Conference on Computer Design 2008, ICCD",

pages = "286--293",

booktitle = "26th IEEE International Conference on Computer Design 2008, ICCD",

note = "26th IEEE International Conference on Computer Design 2008, ICCD ; Conference date: 12-10-2008 Through 15-10-2008",

}

TY - GEN

T1 - Efficiency of thread-level speculation in smt and cmp architectures - performance, power and thermal perspective

AU - Packirisamy, Venkatesan

AU - Luo, Yangchun

AU - Hung, Wei Lung

AU - Zhai, Antonia

AU - Yew, Pen Chung

AU - Ngai, Tin Fook

PY - 2008

Y1 - 2008

N2 - Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000's. However, because of the lack of compilers and other related software technologies, most of the generalpurpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. The TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a Superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of Energy-Delay-Squared product (ED 2), SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.

AB - Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000's. However, because of the lack of compilers and other related software technologies, most of the generalpurpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. The TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a Superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of Energy-Delay-Squared product (ED 2), SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.

UR - http://www.scopus.com/inward/record.url?scp=62349083686&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=62349083686&partnerID=8YFLogxK

U2 - 10.1109/ICCD.2008.4751875

DO - 10.1109/ICCD.2008.4751875

M3 - Conference contribution

AN - SCOPUS:62349083686

SN - 9781424426584

T3 - 26th IEEE International Conference on Computer Design 2008, ICCD

SP - 286

EP - 293

BT - 26th IEEE International Conference on Computer Design 2008, ICCD

T2 - 26th IEEE International Conference on Computer Design 2008, ICCD

Y2 - 12 October 2008 through 15 October 2008

ER -

Efficiency of thread-level speculation in smt and cmp architectures - performance, power and thermal perspective

Abstract

Publication series

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this