Programmable graphics processing units (GPUs) nowadays offer very high performance computing power at relatively low hardware cost and power consumption. In this paper, we present the implementation of the dynamics routine of the HIRLAM weather forecast model on the NVIDIA GeForce 9800 GX2 GPU card using the Compute Unified Device Architecture (CUDA) as parallel programming model. We converted the original Fortran to C and CUDA by hand, straightforwardly, without much concern about optimization. On a single GPU, we observe speed-ups by an order of magnitude over our hosting CPU (Intel quad core, 1998 MHz). This includes the relatively very costly copying of data between GPU and CPU memories. Calculation times proper decreased by a factor of 2000. A single GPU, however, has not enough memory for practical use. Therefore, we investigated a parallel implementation on 4 GPUs. We found a parallel speed-up of 3.6, which is not very promising if memory limitations force the use of many GPUs in parallel. We discuss several options to solve this issue.
VT Vu, GJ Cats, AA Wolters. GPU Acceleration of the Dynamics Routine in the HIRLAM Weather Forecast Model
accepted, 2010, 2010, yes