This is the README file for the Fenfloss application benchmark, distributed with the DEISA Benchmark Suite: http://www.deisa.eu/science/benchmarking/ Last modified by the DEISA Benchmark Team on 2008-08-25. --------------- Fenfloss readme --------------- Contents ------- 1. General description 2. Code structure 3. Parallelization 4. Building 5. Execution 6. Data 1. General description ====================== Fenfloss is a code which uses the Finite Element Method (FEM) on unstructured meshes to solve the incompressible Navier-Stokes equations. It can compute laminar flows as well as turbulent ones and is mainly used to simulate the flow through hydromechanical turbines. Fenfloss can handle static and moving geometries and may be coupled with a structural mechanics code to simulate Fluid Structure Interactions (FSI). The code is developed at the Institute of Fluid Mechanics and Hydraulic Machinery (IHS) at the University of Stuttgart. The homepage of this project can be found at: http://www.ihs.uni-stuttgart.de/forschung/projekte/fenfloss/index.en.shtml The code is not publicly available, but a special version only for Benchmarking with the DEISA-Benchmarking suite may be obtained from HLRS. This version has reduced functionality and is streamlined for the benchmarking. 2. Code structure ================== Fenfloss basic building blocks are the following: a) Read input/restart (already distributed) files. b) Initialize data. c) Timestep loop for instationary flows. d) Equation system solving iterations (linearization). e) Assembly of FEM matrix. f) Solving linear equation system. Other building blocks like turbulence modelling and updates due to moving geometries may optionally be applied. But most of the calculation time is spent in parts (e) and (f) in any case. For the solution of the linear equation system generally a BiCGSTAB algorithm is used. The input files have to be distributed already in a preprocessing step. Each process reads only its own data from those files. For the DEISA benchmark 10 steps to solve the nonlinear equation system are used for a stationary problem. A simple geometry is generated by an included preprocessing program (PAGI). 3. Parallelization ================== Fenfloss is parallelized with MPI with large messages is point to point communication, however for the linear equation solver "MPI_allreduces" with small messages are heavily used. 4. Building ================== Usually to build Fenfloss within JuBE for a new architecture (NEWARCH) the following issues have to be adderssed: 1) create a new top level xml file for the new architecture most often this is done copying one of the file already available, example: cp bench-IBM-SP4-Jump.xml bench-NEWARCH.xml edit bench-NEWARCH.xml and correct values accordingly (following the example above, substitute jump with NEWARCH) In the Change the values for $threadspertask, $taskspernode, $nodes accordingly to the characteristics of NEWARCH 2) edit compile.xml create a new section where NEWARCH is the same as in the directory DEISA_BENCH/platform. Substitute values in the new compile section with those proper for the new architectures. Particular attention goes to the values: #MPI_F90# #F90# #FFLAGS# #LDFLAGS# #STRIPLEN# Please note that there Pagi is a program on its own and is compiled as a sequential program. It needs to be executed on the machine (generally on the frontend) before the actual execution of Fenfloss to generate the input files. Pay attention to the path specified in #EXECNAME# for the pagi part. It needs to be accessible by the preparation step. #STRIPLEN# is a tuning parameter to adapt the strip-mining length of inner loops. You propably need to figure out a good setting for your machine here. For JUMP as a reference, a well performing setting is 24. 3) run the compile step within the quick_test_1pe: edit bench-NEWARCH.xml and be sure that ... If the compile fail, go to the directory where JuBE has run the compile command (tmp/.../src), then try to run manually the command make within the directory. Analyze the error and try to fix it modifying the file Makefile Note that most problems at compile time come from the #FFLAGS#, and few other like #MPI_F90#. To guess what are the correct values to be used, you can Once you have a working Makefile files, report back the correct configuration values in the compile.xml file. 5. Running the code ===================== JuBE generates for each benchmark automatically a job script and the directory where the job is run and where Fenfloss writes output files. The file execute.xml describe how the job script is set-up, for different architectures. Inputs for different benchmark cases are taken from directory "input", as described in the prepare.xml file. The #MATFORM# parameter in flow.stf.in is again a tuning parameter which selects the representation of the matrix in the code. It can be "VEK" in which case a Jagged Diagonal Storage is chosen and the loops are strip-mined with the #STRIPLEN# parameter given in the compile step. The other option is "CRS" which stores the matrix in Compressed Row Storage format, and no strip mining is done. The best selection depends upon your architecture. To select a given benchmark case edit the file: bench--.xml and set active="1" in the benchmark tag you are interested in. To run the benchmarks within JuBE then simply execute: ../../bench/jube bench--.xml To run the benchmarks manually: create a run directory at your convenience into a filesystem seen by all compute node and copy the executables fen and pagi.x from the compilation step into this directory. Copy cavity.in and flow.stf.in to that directory and modify the #...# Values to your liking. The cavity parameter file is used as input for the pagi preprocessing program and needs to be renamed to pagi.stf. Run the pagi executable (generated by the compile step). to create the input files. The other paramterfile is used to control Fenfloss itself and needs to be renamed to flow.stf. Create a job script suitable for your queuing system containing the following command: ./fen > fenfloss.out where is the program used to load the executable on remote node, for example "mpirun". 6. Output data ============== The binary files generated as output by Fenfloss themselves are not of any interest for the benchmark. The relevant timing informations can be found at the end of the generated TIME_ file: ****************** SUMMARY *********************** * * * Maximal time spent on complete program execution: * ** 7.431 seconds ** * * * * * Maximal time spent by a process on iterations (excluding * * initialization and finalization, but including result * * management with file output of the results): * ** 6.890 seconds ** * * * * * Maximal time spent on the result management (which includes * * file output): * ** 0.585 seconds ** * * * * * Relevant computational time: * ** 6.305 seconds ** * * * * **************************************************************** **************************************************************** The timings are taken with MPI_WTIME within the program. The timing considered in the DEISA-Benchmark is the Iterations-Time, which is written to standard output and looks like: ************************************************** Iterations-Time: 25.797 seconds. ************************************************** For verification the changes over the iterations are used instead of the final binary simulation output itself. The changes over the iterations are written by Fenfloss to the file ITER_MIT. Other files: ----------- The generator pagi generates a informational file about the distribution called pagi.info. There an estimation of the maximal expected speedup for the given number of processors can be found. -------------------------------------------------