Introduction

Nowadays, global climate models (GCM) are providing simulations at best at 100 km resolutions, which may then be used to drive regional climate models (RCM) in order to dynamically downscale, typically down to a grid resolution of 10 km. Nevertheless, this resolution is not sufficient to properly represent the meteorological processes occurring in regions such as the Alps, since valleys and ridges are only marginally resolved. Non-hydrostatic mesoscale meteorological models have shown promising results on the km- and sub-km-scales even for such mountainous regions (e.g., Zängl 2002, 2004, 2007). Although we could not yet afford running these models at such a high resolution for periods as needed in climate studies, this means that the models should be adapted to work in climate mode with resolutions of 1 km or below for such complex areas. This increase of the resolution is quite challenging computationally. Computational demands do not increase linearly with horizontal resolution, but at least quadratic or, depending on the implications for vertical resolution and time-step criteria, even with higher powers. Additionally, RAM and mass storage demands are also increased. This is leading the modelling community towards the intensive use of HPC facilities such as the Vienna Scientific Cluster. Within this context, the HiRmod project aims at a systematic approach to prepare two of the currently most widely used mesoscale meteorological models to create climate-change scenarios at km, or even sub-km, resolution in mountainous terrain making use of the VSC. The models are MM5 version 3.7 (Grell et al., 1994, http://www.mmm.ucar.edu/mm5/) and WRF-ARW version 3.2 (Skamarock et al., 2008, http://www.mmm.ucar.edu/wrf/). Optimum set-ups of the model, including machine-specific set-ups and benchmark tests, are being developed in this project.

MM5 performance evaluation

The performance evaluation of the MM5 model was done for a two-day simulation period, including the spin-up time. The configuration of the model was chosen to be as close to the simulations expected in the project as possible, so that the complexity of the simulations is similar. There are 6 nested domains, centred around the Inn Valley near Innsbruck, with 2-way nesting interaction and grid distances down to 0.27 km with 202x130 grid cells in the innermost domain. All the domains are used with 39 vertical model layers, with the model top at 50 hPa.To prevent CFL stability problems, a relatively short time step is used, 60 s in the outermost domain with 64.8 km grid distance, which is automatically refined in the nested domains.

This simulation was run first in a shared-memory mode on different platforms in order to compare the cpu time needs. Results (Table 1) showed that the current configuration of the Vienna Scientific Cluster is up to 70% faster than the other available platforms for the HiRmod project.

Machine	Compiler	Sim. Time : Real time	Num. cores	Processor
imp3	ifort v10..0	1 h : 4 h 38 min	8	Intel(R) Xeon(R) E5450 @ 3.00GHz
imp3	gfortran v4.3.1	1 h : 9 h 30 min	8	Intel(R) Xeon(R) Quad Core E5450 3.00 GHz
imp9	ifort v9.1	1 h : 12 h	4	Dual Core AMD Opteron Processor 280 2.4 GHz
PHOENIX	ifort v9.1	1 h : ~5 h 5 min	16	Intel I5472 Quad Core
VSC	ifort v11.1	1 h : ~3 h 30 min	8	Intel(R) Xeon(R)
VSC	gfortran v4.3.1	1 h : ~7 h 20 min	8	Intel(R) Xeon(R)

Table 1. Comparison of CPU times for a 2-d benchmark run. „imp“ refers to to machines available at BOKU-Met.

Since MM5 version 2, a parallelised version of the model has been available to the user community (http://www.mmm.ucar.edu/mm5/mpp/). This distributed-memory version has been following the developments of its equivalent non-parallelised version and updated accordingly. Therefore, following the shared-memory-mode MM5 comparison, the distributed version of the MM5 model was implemented in the VSC in order to study the model scalability using the Intel Fortran compiler and Qlogic parallel platform. This implementation required the modification of the top-level makefile of the parallel run-time system library RSL (http://www.mcs.anl.gov/~michalak/rsl/) to allow its compilation with gcc instead of Intel Fortran C compiler, since the latter caused runtime problems regardless of successful compilation.

To study the effect of the number of processors on the parallel execution time, the benchmark simulation was then made with increasing number of cores, in multiples of 8 to use complete nodes. The execution time decreases until 384 cores and then a plateau is reached (Figure 1). The slightly decreasing efficiency beyond this number could be due to the message-passing communication calls. It seems that about 184 cores represent a good value for production runs in terms of efficient use of VSC resources combined with good turn-around time.

Figure 1. Number of real hours to run the 48-hour benchmark with increasing number of cores.

Due to the limited duration of the simulation period in these benchmark runs, they are not very useful to assess the quality of such simulations. In order to evaluate this, a longer run was carried out and compared with measurements at weather stations for a setting with 5 nested domains surrounding Vienna (serving also the Provision Project Biokraftstoffe, http://www.provision-biokraftstoffe.at). Runs in VSC, both distributed and shared memory modes, performed well (Figure 2) and show only minimal differences. The periods where the measurements and the simulations differ significantly (more than 4 K for the temperature in Vienna, BOKU Dachstation) are mainly caused by the certain input data used in all the MM5 simulations and do not indicate any specific feature of the runs in the VSC. Noticeable is the difference of runtime need-ed. Whereas the distributed-memory run in 128 cores took approximately 60 hours to simulate a 61-day period, the shared-memory run (using one node and thus 8 cores only) was stopped on the 51st day simulated when it had been about 20 days (ca. 500 h) in the machine. We wanted to see in this test whether both implementations give the same results.

Figure 2. Temperature time series at 2 m a.g.l., measured (black and grey solid lines for the Hohe Warte and BOKU-Met stations, respectively), simulated in shared-memory mode (blue) and distributed-memory mode (purple). Red line corresponds to a previous shared-memory run which had to be stopped.

Outlook

A similar study will be performed in the near future with the WRF-ARW model version 3.2 released in April 2010. This model has better scalability than MM5. It will be tested with a set-up as similar as possible to the MM5 benchmarks, considering the intrinsic differences between both models and the parametrizations available.

References

Grell G., Dudhia J. and Stauffer D (1994), A Description of the Fifth-Generation Penn State/NCAR Mesoscale Model (MM5), (http://www.mmm.ucar.edu/mm5/documents/mm5-desc-doc.html)

Skamarock, W. C., Klemp, J. B. , Dudhia, J., Gill, D. O., Barker, D. M., Duda, M., Huang, X.-Y.,. Wang, W and J. G. Powers (2008), A description of the Advanced Research WRF Version 3. (http://www.mmm.ucar.edu/wrf/users/docs/arw_v3.pdf)

Zängl, G. (2002), An Improved Method for Computing Horizontal Diffusion in a Sigma-Coordinate Model and Its Application to Simulations over Mountainous Topography. Mon. Wea. Rev., 130, 1423–1432.

Zängl, G. (2004), A reexamination of the valley wind system in the Alpine Inn Valley with numerical simulations. Meteorology and Atmospheric Physics, 87, 4, 241-256.

Zängl, G. (2007), To what extent does increased model resolution improve simulated precipitation fields? A case study of two north-Alpine heavy-rainfall events, Met. Zeit., 16, 5, pp 571-580.

The Project is funded by "Klima- und Energiefonds": www.klimafonds.gv.at