Parallel MEGAN Codes and Performance Benchmark Shengting Cui, Serena Chung, and Joseph Vaughan Washington State University, Pullman WA 99164 In this section, we give a brief description of the parallelized version of EMPROC in MEGAN v2.10 for use with the chemical transport model CMAQ and its benchmark performance. The parallelized version of EMPROC can be run on a system that is capable of running CMAQ, i.e., a system that has the Models-3 I/O API library and CMAQ’s STENX and PARIO libraries installed. Refer to the CMAQ documentation on how to install these libraries. (Other parts of the MEGAN code, such as IOAPI2UAM, MET2MGN, MGN2MECH, and TXT2IOAPI, are unchanged.) 1. Parallelization Scheme The parallelization of the EMPROC generally follows the parallelization of CMAQ, i.e., a 2-d domain decomposition in the horizontal plane is necessary. The main subroutines involved in the parallelization are: GRID_CONF.F, HGRD_DEFN.F, mpcomm_init.F, par_term.F, subhdomain.F and subhfile.F which are routines also provided in CMAQ. In our parallelization scheme, after the setup of the horizontal domains and the grid distribution to each domain, the parallelization coding mainly involves the modification to the source code of the serial version of emproc.F. In the parallel version, emproc.F is now a subroutine, and a new driver.F routine is introduced as the main driver program for subroutine EMPROC. The driver.F code is responsible for setting up the domain decomposition, the MPI communication between the subdomains, initialization and shutdown of MPI and Models-3 I/O API. The parallel subdomain-based calculations are performed in the two-dimensional grid loops in emproc.F. In the parallel version the loops are similar to the serial version, except now the loops are for the subdomains. One other important strategy is that we have used both global and local domain variables. We have introduced local variables for the individual subdomains, and for some variables it is necessary to keep both the subdomain local variables and the global variables. In addition, for those global variables that are needed during the iteration, we introduced a transformation between the local domain grids to the global grids system. This allowed the calculations of properties required in the global grid system with little additional computing expense. Using the

Models-3 I/O

API,

the parallel version makes extensive use of INTERPX function, to replace the original READ3, so that data are mapped directly to the local domain. In some cases, the INTERPX function does not work and READ3 has to be used, and the transformation of the input variables from global to local domain is performed explicitly by using the mapping relationship between global and local domains.

2. Benchmark performance of the codes Benchmarks of the parallel code were carried out on the aeolus cluster, a highperformance Linux system in the Voiland College of Engineering and Architecture at Washington State University, and are reported in Table 1. In the table, the wall clock time for performance of the serial code running on one processor is shown first, followed by the times correspond to parallel code running on 2 to 16 processors. The average times shown in Table 1 were obtained from averaging the timing of multiple runs. Table 1. Number of processors used and the average time taken for a 24-hour EMPROC simulation on a 285x258 grid.

No. of Processors

1

2

4

8

16

Average Time to Complete the Run (minutes)

55.2

29.9

13.9

7.5

4.3

3. Input and Output Files For the parallel version of the MEGAN code, in addition to the original input files required for the serial code, there are three additional input files which are necessary for the domain set up. These files are: GRIDDESC, GRIDCRO2D and METCRO3D, all provided by the MCIP meteorological preprocessor. 4. Linux scripts and Makefile The script files needed for setting up the environment, date and time, as well as the startup file paths are: qsub_emproc_parallel.csh, run_emproc_parallel.csh, and setcase.csh; these scripts are slight modification of the original (serial version) of run_emproc.csh and setcase.csh. The Makefile for compiling the codes uses the same compiler options and libraries as that of CMAQ. The compiler used is OPENMPI-PGI/1.6. The entire parallel MEGAN package is available for download at web site: (To be determined?) 5. Conclusions In conclusion, the performance of the parallel code shows a scaling efficiency over 80 to 90% for up to 8 to 16 processors. The significance of parallelization of MEGAN is that it reduces the MEGAN calculation step from taking nearly an hour to less than 10 minutes if 8 or 16 processors are used. This makes MEGAN much more viable as a component in an air-quality forecast system.

Parallel-Megan-Codes_v4b.pdf

There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

104KB Sizes 7 Downloads 229 Views

Recommend Documents

No documents