Recent work has shown that pipelining and multiple instruction issuing are architecturally equivalent in their abilities to exploit parallelism, but there has been little work directly comparing the performance of these fine-grain parallel architectures with that of the coarse-grain multiprocessors. Using trace-driven simulations, the authors compare the performance of a superscalar processor and a pipelined processor using dynamic dependence checking with that of a shared memory multiprocessor. For very parallel programs, they find that the fine-grain processors must bypass an unrealistically large number of branches to match the performance of the multiprocessor. When executing programs with a wide range of potential parallelism, the best performance is obtained using a multiprocessor where each individual processor has a fine-grain parallelism of two to four.
|Original language||English (US)|
|Number of pages||10|
|Journal||Proceedings of the Annual Hawaii International Conference on System Sciences|
|State||Published - 1991|
|Event||24th Annual Hawaii International Conference on System Sciences, HICSS 1991 - Kauai, United States|
Duration: Jan 8 1991 → Jan 11 1991
Bibliographical noteFunding Information:
This work was supported by the National Science Foundation under Grant No. NSF MIP-8410110, with additional support from NASA Ames Research Center Grant No. NASA NCC 2-559 (DARPA), National Science Foundation Grant No. NSF MIP-88-07775, and Department of Energy Grant No. DOE DE-FG02-85ER25001.
© 1991 IEEE.