The past decade has produced numerous CPU architectural innovations. These have included multiple cores per CPU, multiple simultaneous threads per core, and, especially with GPUs, highly complex memory hierarchies. As a result, performance portability has become a major challenge to programmers. We identify the SIMD engines in modern CPU and GPU cores as the key to obtaining high performance for scientific application codes. This common element of all present computing devices makes performance portability possible. However, we find that achieving this performance requires us to express the code in terms of intrinsic functions for the SIMD engine instructions, and these functions are different for each device. To assist the programmer in creating the necessary code expressions for each vendor's compilers, we have built an automated code translator that takes as input a single Fortran source code, written in a special style and annotated with directives, and creates output code for each device and compiler combination. The manual translations for GPU permit us here to evaluate the performance that our code transformations deliver on these devices. We present a performance study using our single-fluid PPM gas dynamics code and covering the latest multi-core processors and the Nvidia GPU.
|Original language||English (US)|
|Number of pages||4|
|Journal||Procedia Computer Science|
|State||Published - 2012|
|Event||12th Annual International Conference on Computational Science, ICCS 2012 - Omaha, NB, United States|
Duration: Jun 4 2012 → Jun 6 2012
Bibliographical noteFunding Information:
This work has been supported through grants CNS-0708822 and OCI-0832618 from the National Science Foundation and by the Department of Energy through a contract from the Los Alamos National Laboratory. We are also pleased to acknowledge helpful discussions on GPU programming with Guochun Shi at NCSA.
- High-perfromance computing
- Parallel computing
- Scientific computation