This is the README file for the CPMD application benchmark, distributed with the DEISA Benchmark Suite: http://www.deisa.eu/science/benchmarking/ Last modified by the DEISA Benchmark Team on 2008-08-25. ----------- CPMD readme ----------- Contents ------- 1. General description 2. Code structure 3. Parallelization 4. Building 5. Execution 6. Data 1. General description ====================== The CPMD code is a plane wave/pseudopotential implementation of Density Functional Theory, particularly designed for ab-initio molecular dynamics. Its first version was developed by Jurg Hutter at IBM Zurich Research Laboratory starting from the original Car-Parrinello codes. During the years many people from diverse organizations contributed to the development of the code and of its pseudopotential library: Michele Parrinello, Jurg Hutter, D. Marx, P. Focher, M. Tuckerman, W. Andreoni, A. Curioni, E. Fois, U. Roetlisberger, P. Giannozzi, T. Deutsch, A. Alavi, D. Sebastiani, A. Laio, J. VandeVondele, A. Seitsonen, S. Billeter and others. The current version, 3.13.2 is copyrighted jointly by IBM Corp and by Max Planck Institute, Stuttgart, and is distributed free of charge to non-profit organizations ( see download ). Profit organizations interested at the code should contact us . CPMD runs on many different computer architectures and it is well parallelized (MPI and Mixed MPI/SMP). Complete documentation can be found at the code site: www.cmpd.org 2. Code structure ================== CPMD basic building block are the following: a) Read input/restart files. b) Init and distribute data. c) Main loop for Molecular Dynamics. d) Compute averages and other observables every N steps, where N is given in input. e) Dump restart file and output every M steps., where M is given in input. In the main loop several different kind of phase space sampling and energy minimization tecniques are implemented. Regardless of this tecniques at each step electronic wave functions are transformed ( through an ad-hoc 3D FFT ) back and forward between real and reciprocoal space. Instead, depending on the algorithm used, at each step an orthogonalization of the electronic wave functions is also performed. Again, at each step, depending on the kind of the dynamics (selected from input) ions position, electrons and system geometry are propagated to integrate Newton equation. For DEISA benchmark 10 steps of steepest descent energy minimization of a water box have been used. Different box size, with different number of molecules are used to run benchmarks of differend computetional weight. 3. Parallelization ================== CPMD is parallelized using a mixed MPI/OpenMP paradigm. Reciprocal space basis set (plane waves) and real space vectors are distributed across all MPI tasks, and within each MPI task computation are distributed among OpenMP threads. If required CPMD could group MPI tasks into the so called Task groups, used to scale up 3D FFT. In fact 3D FFT is used to transform electronic wave functions from reciprocal and real space, and to let this operation to scale when the number of processors is larger than the FFT planes, data can be redistributed to task groups so that each group can process several wavefunction at the same time. linear algebra operations. Hamiltonian and other matrixes used in the iterative diagonalization are distributed block-like across the ortho group, a subgroup of the pool processors, organized in a square 2D grid. Images and pools are lously coupled and processors communicate between different images and pools only once in a while, whereas processors within each pool are tightly coupled and communications are significant. To control the number of images, pools and task groups, command line argument: -nimage -npools -ntg can be used. The dimension of the ortho group is automatically selected by the code. For the DEISA benchmark, when not specified, one image, two polls and one task group are used. 4. Building ================== This benchmark is designed to work with version 3.13.2 of CPMD Usually to build CPMD within JuBE for a new architecture (NEWARCH) the following issues have to be adderssed: 1) create a new top level xml file for the new architecture most often this is done copying one of the file already available, example: cp bench-jump.xml bench-NEWARCH.xml edit bench-NEWARCH.xml and correct values accordingly (following the example above, substitute jump with NEWARCH) In the Change the values for $threadspertask, $taskspernode, $nodes accordingly to the characteristics of NEWARCH 2) edit compile.xml create a new section where NEWARCH is the same as in the directory DEISA_BENCH/platform. Substitute values in the new compile section with those proper for the new architectures. Particular attention goes to the values: #CPP# #CPPFLAGS# #LAPACK_LIB# #LAPACK_DIR# 3) run the compile step within the quick_test_1pe: edit bench-NEWARCH.xml and be sure that ... If the compile fail, go to the directory where JuBE has run the compile command (tmp/.../src/SOURCE), then try to run manually the command make within the directory SOURCE. Analyze the error and try to fix it modifying the file Makefile.defs Note that most problems at compile time come from the #CPPFLAGS#, and few other like #CPP# #LAPACK_LIB# #LAPACK_DIR#. To guess what are the correct values to be used, you can look for the correct values browsing inside the directory SOURCE/INSTALL, or you can run the configuration script of CPMD: mkconfig.sh Once you have a working Makefile.defs files, report back the correct configuration values in the compile.xml file. 5. Running the code ===================== JuBE generates for each benchmark automatically a job script and the directory where the job is run and where CPMD writes output files. The file execute.xml describe how the job script is set-up, for different architectures. Inputs for different benchmark cases are taken from directory "input", as described in the prepare.xml file. To select a given benchmark case edit the file: bench--.xml and set active="1" in the benchmark tag you are interested in. To run the benchmarks within JuBE then simply execute: ../../bench/jube bench--.xml To run the benchmarks manually: create a run directory at your confenience into a filesystem seen by all compute node with at leas 10GByte of free space. Copy into the run directory the files H_SPRIK, O_SPRIK and input.in contained in the directory input/h2o_128mol, or in one of the other directory h2o_1mol, h2o_256mol, h2o_384mol and h2o_512mol corresponding to different benchmark sizes. Create a job script suitable for your queuing system containing the following command: ./cmpd.x input.in > cpmd.out where is the program used to load the executable on remote node, in example "mpirun". For an explanation of the code input and output file point your browser to http://www.cpmd.org/ 6. Output data ============== The inputs for different simulation sizes used in this benchmark are saved in the following directory: input/h2o_1mol input/h2o_128mol input/h2o_256mol input/h2o_384mol input/h2o_512mol Within each directory there are three files: i - input.in CPMD keywords and system geometry ii - H_SPRIK Hydrogen pseudopotentials iii - O_SPRIK Oxygen pseudopotentials Inputs represent a simulation of a real system, water in a box, at different resolutions. The output of interest for the benchmarks are printed on standard output. Relevant information for timing is the time after the first iteration, and is printed in a line that looks like: Other useful timing informations are printed at the end of the output: For verification instead the relevant information is the value of the total energy after the first iteration that looks like: The output of interest for the benchmarks are printed on standard output. Relevant information for timing is the time per step, which is printed in the last column of the step short informations, in example for the h2o_128mol benchmark step short informations looks like: NFI GEMAX CNORM ETOT DETOT TCPU 1 6.790E-02 2.513E-03 -2160.009617 0.000E+00 14.25 2 6.679E-02 2.370E-03 -2164.989104 -4.979E+00 13.99 3 6.549E-02 2.237E-03 -2169.244665 -4.256E+00 13.99 4 6.405E-02 2.113E-03 -2172.892020 -3.647E+00 14.02 5 6.251E-02 1.998E-03 -2176.028008 -3.136E+00 14.03 6 6.088E-02 1.890E-03 -2178.733324 -2.705E+00 13.99 7 5.920E-02 1.789E-03 -2181.075059 -2.342E+00 13.97 8 5.748E-02 1.695E-03 -2183.108949 -2.034E+00 14.00 9 5.574E-02 1.606E-03 -2184.881317 -1.772E+00 13.99 10 5.401E-02 1.523E-03 -2186.430722 -1.549E+00 13.98 Other useful timing informations are printed at the end of the output: **************************************************************** * * * TIMING * * * **************************************************************** SUBROUTINE CALLS CPU TIME ELAPSED TIME N-FFTCOM 8704 52.83 52.83 INVFFTN 5632 35.91 35.91 ... ... ... FWFFT 55 .54 .54 ---------------------------------------------------------------- TOTAL TIME 183.86 183.86 **************************************************************** For verification instead the relevant information is the last value of the total energy that, again, for the h2o_128mol benchmakr look like: (K+E1+L+N+X) TOTAL ENERGY = -2186.43072156 A.U. Other files: ----------- -------------------------------------------------