This is the README file for the SU3_AHiggs application benchmark, distributed with the DEISA Benchmark Suite: http://www.deisa.eu/science/benchmarking/ Last modified by the DEISA Benchmark Team on 2008-08-22. ----------------- SU3_AHiggs readme ----------------- Contents -------- 1 General description 2 Code structure 3 Parallelisation 4 Building 5 Execution 6 Verification 7 Input data 8 Output data 1 General description ===================== SU3_AHiggs is a lattice quantum chromodynamics (QCD) code intended for computing the conditions of the Early Universe. Instead of the "full QCD", the code applies an effective field theory, which is valid at high temperatures. In the effective theory, the lattice is 3D. For this reason, SU3_AHiggs stresses different parts of the architecture than the conventional QCD applications using 4D lattices. SU3_AHiggs has roots in the MILC code, but it is heavily rewritten by Prof. Kari Rummukainen (University of Oulu, Finland). The code is written solely in C and it uses MPI communications. No external libraries are needed to run the program. The directory SU3/src contains several closely related QCD programs: * SU3_4D * SU3_AHiggs * SU3_Gauge In the DEISA benchmarks, only the code SU3_AHiggs is used. If you find errors in any of the files in the SU3 package, please contact benchmarking@deisa.eu. 2 Code structure ================ In SU3_AHiggs, the spacetime is discretised and replaced with a 3D cubic lattice. Every lattice vertex contains a 3 x 3 traceless Hermitian matrix. From each vertex, in turn, there are six edges to nearest-neighbour vertices. Edges are 3 x 3 unitary matrices. The aim of the SU3_AHiggs computation is to generate lattice configurations from the microcanonical distribution, which is the statistical equilibrium state of the system. The program uses heat-bath and over-relaxation algorithms to update lattice vertices and links. The computation starts from a random initial configuration. The main function of SU3_AHiggs is in the file su3h_n/control.c. After the initial setup, main calls the function runthis, which in turn calls other functions in the SU3 package. If the dataset is sufficiently large, most of the computing time is spent on lattice updates (functions updategauge and updatehiggs in files su3h_n/updategauge.c and su3h_n/updatehiggs.c). If the dataset is too small, in turn, the computation becomes communication bound. MPI routines are not called directly but with customised communication functions defined in generic/comdefs.h and generic/com_mpi.c. 3 Parallelisation ================= SU3_AHiggs uses a 3D domain decomposition method for parallelisation. Each MPI task communicates with six neighbouring tasks only. The communication routines are defined in the files generic/comdefs.h and generic/com_mpi.c. The most important routines are: * start_get() This function starts asynchronous sends and receives required to gather neighbouring lattice vertices and links. The call graph looks like this: start_get() --> start_gather() --> MPI_Irecv(), MPI_Isend() * wait_get() This function waits for receives to finish, ensuring that the data has actually arrived. The call graph looks like this: wait_get() --> wait_gather() --> MPI_Wait() With a 32^3 lattice, the program performs well up to 256 processes. With a 256^3 lattice, the speedup is almost linear with the number of processes. The highest processor number tested so far is 2048. The lattice size and the number of iterations are controlled by four user-adjustable parameters. 4 Building ========== To build SU3_AHiggs with the JUBE tool on a new architecture (NEWARCH), do the following steps: 1) Create a new top-level XML file for the new architecture (bench-NEWARCH.xml). In this task, you can use the already existing files as a starting point: bench-Cray-XT4-Louhi.xml, bench-IBM-SP4-Jump.xml, and bench-SGI-Altix-HLRB2.xml. Normally you have to change the values of $nodes and $taskspernode only. 2) Edit compile.xml: Create a new section , where NEWARCH is the same as in the file DEISA_BENCH/platform/platform.xml. Substitute values in the new compile section with those proper for the new architecture. Normally you need to change #CFLAGS# and #LDFLAGS#. Possibly you want to change #CC# and #MPI_CC# also. 3) Run the compile step within the benchmark "test": Edit bench-NEWARCH.xml and make sure that you have: ... Then run: perl ../../bench/jube -debug bench-NEWARCH.xml If the compile step fails, go to the directory where JUBE has run the compile step: tmp/SU3_NEWARCH_test_i000001/.../src Then try to run the command make manually. Analyze the error and try to fix it modifying the file Makefile.defs. After the problem is solved, edit the file compile.xml accordingly. If you cannot solve the problem just by editing compile.xml, please contact benchmarking@deisa.eu. 5 Execution =========== To run SU3_AHiggs with the JUBE tool, do the following steps: 1) Before running the benchmarks you need an execute script template, such as: DEISA_BENCH/platform/Cray-XT4-Louhi/cray_qsub.job.in 2) Edit execute.xml: Create a new section , and match the values in the new section with the execute script template. 3) Run a benchmark: Select a benchmark by setting active="1" in the file bench-NEWARCH.xml. Then run: perl ../../bench/jube -submit bench-NEWARCH.xml To run SU3_AHiggs manually (without JUBE), do the following steps: 1) Copy the SU3_AHiggs executable to a directory that is accessible from compute nodes. The name of the SU3_AHiggs executable is: src/su3h_n/su3_ahiggs 2) Copy the input files beta, parameter, and status to the same directory. In the directory input, there are several sets of input files available: input/lat_256/* (256^3 lattice, 100 iterations) input/lat_32/* (32^3 lattice, 10000 iterations) input/test/* (32^3 lattice, 100 iterations) 3) Start the program with a MPI launcher available in your system, for example: aprun -n 8 ./su3_ahiggs The test benchmark takes approximately 10 seconds with 8 processor cores. Other benchmarks run longer: approximately 1 minute with 1024 cores. Important: The number of tasks in su3_ahiggs must be a power of 2. Otherwise the program cannot layout the lattice, and the execution stops. 6 Verification ============== JUBE verifies benchmark results automatically as part of the result analysis step. In SU3_AHiggs, the verification cannot be done directly by comparing benchmark results with some reference results. The reason to this is that the results are very sensitive to compiler optimizations and the number of MPI tasks as well. This can make results to appear very different if compared with the reference results. Everything can still be all right, as long as the results are statistically the same. Therefore SU3_AHiggs uses a statistical comparison test to verify benchmark results (Student's t-test). Significance level is chosen to be 1e-4 (correct results are rejected once every 10000 times). First 50 iterations are not included in the comparison. The reference results are found at: reference/lat_256/higgs.out (256^3 lattice, 100 iterations) reference/lat_32/higgs.out (32^3 lattice, 10000 iterations) reference/test/higgs.out (32^3 lattice, 100 iterations) These files contain the Higgs field at each iteration for a given lattice size. To verify benchmark results manually (without JUBE), do the following steps: 1) Copy the executable src/aa/aa to the directory SU3/run. 2) Run the following command in the directory SU3: perl run/check_results_su3.pl output.xml stdout.log stderr.log \ $RUNDIR reference/lat_256 The environment variable $RUNDIR should point to the directory where SU3_AHiggs has been executed. 3) If the benchmark results are correct, the file output.xml includes the following lines: If not, the same lines look like this: 7 Input data ============ Input data for SU3_AHiggs consist of three short ASCII files containing simulation parameters related to temperature, lattice size, iterations, etc. For example, the files related to the test benchmark look like this: input/test/beta: betag 12 x 0.06 y 0.69025056 input/test/parameters: nx 32 ny 32 nz 32 micro steps 4 n_measurement 1 n_correlation 10000 w_correlation 100000 n_save -1000 blocking levels 1 level 0 1 level 1 1 input/test/status: restart 0 n_iteration 100 n_thermal 0 seed 479817384 run status iteration time: gauge time: higgs time: rest It is easy to create new datasets by changing the lattice size (variables nx, ny, and nz), number of iterations (n_iteration), and seed number for the random number generator (seed). The duration of a simulation is roughly proportional to: nx * ny * nz * n_iteration SU3_AHiggs has currently three datasets: test 32^3 lattice, 100 iterations small 32^3 lattice, 10000 iterations (artificial dataset) large 256^3 lattice, 100 iterations (real research dataset) The test dataset is designed to help porting to new architectures. The small dataset, in turn, is designed for benchmarking purposes. With it, benchmark timings depend strongly on the interconnect speed. 8 Output data ============= During the benchmarks, SU3_AHiggs writes out its result to the following files: correl measure status Note that the file named status is both input and output file; SU3_AHiggs modifies it during the computation. The file measure is a binary file that contains simulation results at each iteration. Its contents can be read with the tool named aa available in the directory src/aa. The benchmark timings are written to the standard output. JUBE reads them automatically as part of the analysis step. To get benchmark timings manually, grep for "total time in seconds" in the standard output.