This is the README file for the RAMSES application benchmark, distributed with the DEISA Benchmark Suite: http://www.deisa.eu/science/benchmarking/ Last modified by the DEISA Benchmark Team on 2008-08-25. ------------- RAMSES readme ------------- Contents -------- 1 General description 2 Code structure 3 Parallelization 4 Building 5 Running the code 6 Input and output data 1 General description ===================== The RAMSES package is intended to be a versatile platform to develop applications using Adaptive Mesh Refinement for computational astrophysics. The current implementation allows solving the Euler equations in presence of self-gravity and cooling, treated as additional source terms in the momentum and energy equations. RAMSES contains various algorithms designed for: - Cartesian AMR grids in 1D, 2D or 3D - Solving the Poisson equation with a Multi-grid and a Conjugate Gradient solver - Using various Riemann solvers (Lax-Friedrich, HLLC, exact) for adiabatic gas dynamics - Computing collision-less particles (dark matter and stars) dynamics using a PM code - Computing the cooling and heating of a metal-rich plasma due to atomic physics processes and an homogeneous UV background (Haardt and Madau model). - Implementing a model of star-formation based on a standard Schmidt law with the traditional set of parameters. - Implementing a model of supernovae-driven winds based on a local Sedov blast wave solution. 2 Code structure ================ RAMSES basic algorithm is the following: 1) Read input/restart files for each module and initialize data 2) Temporal loop a) Refine mesh b) Load balance each N timesteps (if AMR activated) c) Recursive calls to update solution for the different mesh levels d) Dump restart or output files if needed 3 Parallelization ================= Parallelization is done by domain decomposition. RAMSES tries to balance the cells equally between the processes. With AMR, it is necessary to perform load balancing from time to time. This is done by using a Peano-Hilbert space-filling curve each N timesteps. Other load balancing techniques are available but only this one is used in the bench. 4 Building ========== Usually to build RAMSES within JuBE for a new architecture (NEWARCH) the following issues have to be addressed: 1) Under platform directory, create a new NEWARCH directory which contains a batch job skeleton (see others architectures for example). 2) Update platform/platform.xml file Creating a new section where NEWARCH is the same as in the directory ~/platform. Set values for compilers (names,default flags) and all needed librairies paths. 3) For each application, create a new top level xml file for this new architecture most often this is done by copying one of the file already available, example: cp bench-jump.xml bench-NEWARCH.xml edit bench-NEWARCH.xml and correct values accordingly to NEWARCH. Change the values for $threadspertask, $taskspernode, $nodes accordingly to the characteristics of NEWARCH 4) Edit compile.xml Create a new section where NEWARCH is the same as in the directory DEISA_BENCH/platform. Set values in the new compile section with those proper for the new architecture. Particular attention goes to the values: ARFLAGS, FFLAGS, F90FLAGS, CFLAGS, CXXFLAGS, LDFLAGS 5) Run the compile step with JuBE: edit bench-NEWARCH.xml and be sure that you have set something like this: This will build the Ramses binary for 3 runs on 32 and 64 MPI tasks (threadspertask="1", taskspernode="4" and nodes="8,16"). If the compile process fail, go to the directory where JuBE has run the compile command (tmp/.../src) and try to run manually the command gmake. Analyze the error and try to fix it by modifying the file Makefile.defs. Once you have a working Makefile.defs files, report back the correct configuration values in the compile.xml file. 5 Running the code ================== JuBE generates for each benchmark automatically a job script and the directory where the job is run and where RAMSES writes output files. The file execute.xml describes how the job script is set-up, for different architectures. Inputs for different benchmark cases are taken from directory "input", as described in the prepare.xml file. To select a given benchmark case edit the file: bench--.xml and set active="1" in the benchmark tag you are interested in. To run the benchmarks within JuBE then simply execute: ../../bench/jube bench--.xml To run the benchmarks manually: - Create a run directory at your convenience into a filesystem seen by all compute nodes. - Copy into this run directory the binary and the input file namelist. - Create a job script suitable for your queuing system containing the following command. For example, to run RAMSES on 64 processors: ./ramses3d > run.out where is the program used to load the executable on remote node, for example "mpirun -n 64". 6 Input and output data ======================= 6.1 Input: ---------- The input file is in input/sedov3d.nml.in This file is preprocessed by JuBE. The 2 following values are modified: #TIMESTEPS#: number of timesteps #LEVELS#: the number of grid levels (if increased by 1, the size of the problem is multiplied by 8) 6.2 Timings: ------------ Timing information can be found: - on standard output (or output of the batch job) issued from time command, - in run.out file (timing per MPI tasks) issued by the binary. 6.3 Validation of the runs: --------------------------- To validate runs, you have just to check the run.out file issued by the run (this is the standard output of RAMSES redirected to the run.out file). At each timestep, information are written to stdout: Mesh structure Level 1 has 1 grids ( 0, 1, 0,) Level 2 has 8 grids ( 1, 1, 1,) Level 3 has 64 grids ( 8, 8, 8,) Level 4 has 512 grids ( 64, 64, 64,) Level 5 has 4096 grids ( 512, 512, 512,) Level 6 has 32768 grids ( 4096, 4096, 4096,) Level 7 has 262144 grids ( 32768, 32768, 32768,) Level 8 has 2097152 grids ( 262144, 262144, 262144,) Main step= 5 mcons= 0.00E+00 econs= 7.42E-14 epot= 0.00E+00 ekin= 1.25E-01 Fine step= 5 t= 9.08971E-07 dt= 2.126E-07 a= 1.000E+00 mem=41.7% The 'level' lines describe the distribution of the cells on the different processes (min, max and average). Therefore, this will change for a given testcase if you run with different number of processes. To validate the run, you have to check that the values mcons, econs, epot, ekin, t, dt and a are the same between the runs with the same testcase (reference files are available in the reference directory) at a given timestep. The difference between the reference and the run values must be less than 1e-8. 6.4 Other files: ---------------- Currently, this benchmark does not write other files (this can be changed in the input parameters).