This is the README file for the IQCS application benchmark, distributed with the DEISA Benchmark Suite: http://www.deisa.eu/science/benchmarking/ Last modified by the DEISA Benchmark Team on 2009-03-11. ----------- IQCS readme ----------- Contents -------- 1. General description 2. Code structure 3. Parallelization 4. Building 5. Execution 6. Data 1. General description ====================== The Improving Quantum Computer Simulations (IQCS) benchmark does not belong to a conventional class of HPC-codes like MC, MD, CFD. Instead, since the simulation of quantum operations is memory bounded (adding another qubit doubles the size of the state vector), it serves primarily as a benchmark for memory bandwidth and internode communication. In contrast to a classical bit, which can be either 0 or 1, the state of a qubit |q> is given by a linear superposition of the basis states |0> and |1> |q> = a |0> + b |1> ; |a|^2 + |b|^2 = 1 with a and b being complex numbers. The state vector of |q> is defined by (a,b)^t. A quantum operation on |q> is represented by a unitary 2x2-matrix U. For example, the Hadamard-operation H is given by (1 1) H = 1/sqrt(2) (1 -1) (*) mapping the basis states according to |0> -> 1/sqrt(2) (1,1)^t , |1> -> 1/sqrt(2) (1,-1)^t . The state |q> of a n-qubit system can be written as |q> = a_{0...00} |0...00> + a_{0...01} + a_{0...10} + ... + a_{0...11} |0...11> + a_{1...11} |1...11> where (a_{0...00}, a_{0...01}, a_{0...10}, ..., a_{0...11}, a_{1...11})^t corresponds to the state vector. It is obvious that the size of the state vector grows exponentially with the number of qubits. Storing each complex component of the state vector in two doubles, each component requires 16 Byte of memory. Therefore, the state vector of a n-qubit system consumes (16 * 2^n)/1024^4 TByte. In case of 37 qubits, 2 TB are needed to store the state vector of the system. 2. Code structure ================= The application source for the IQCS benchmark program is distributed as a tar file. It comprises the files: iqcs.f90 - Main application iqcs_param.f90.in - Definition of simulation parameters, especially 'nbits' and 'nlogtasks' Makfefile.defs.in - Template for definitions of the library paths and compilation parameters. Makefile.in - Template for main Makefile measure_all.f90 - Wraping subroutine to remeasure all Qubits measure.f90 - Remeasurement subroutine called after operation H.f90 - Hadamard operation subroutine 3. Parallelization ================== A Hadamard-operation on qubit i (we count the rightmost qubit as #1, the leftmost as #n) is represented by a nxn-matrix which can be broken down to a sequence of 2^(n-1) operations of form (*). These 2x2 matrices can be executed in parallel. The IQCS code applies a Hadamard-operation sequentially to all qubits ranging from 1 to n. Since it is usually (i.e. for larger systems) not possible to store the whole state vector locally, it has to be distributed over different tasks. In order to perform a quantum operation on the whole state vector, communication between the different tasks is needed. Therefore, each task possesses an additional buffer which receives half of the state vector of its partner task. After the local computation is finished this updated part of the state vector is communicated back. Due to this additional buffer the overall memory consumption is by a factor 1.5 larger than in case of simple state vector storage. The implementations of the Hadamard-operation as well as the remeasurement provide an OpenMP parallelization for the loop structures. Details of the implementation are described in: K. De Raedt, K. Michielsen, H. De Raedt, B. Trieu, G. Arnold, M. Richter, Th. Lippert, H. Watanabe, N. Ito: Massively parallel quantum computer simulator, Comp. Phys. Comm. 176 (2007) 121-136. 4. Building =========== Building the IQCS benchmark code outside of the benchmark environment should be straight forward, after setting the needed parameters by hand. 1) First set the two runtime parameters 'nbits' and 'nlogtasks' in the file iqcs_param.f90 to the appropriate values. The value for 'nbits' defines the overall memory needed for the simulation according to the following table: Number of Qubits Size of state vector [GB] overall memory needed [GB] 27 2 3 28 4 6 29 8 12 30 16 24 31 32 48 32 64 96 .. .. .. Each additional Qubit doubles the amount of overall memory needed by the application. The parameter 'nlogtasks' defines the number of tasks the executable must be called with in the form: number of tasks = 2^(nlogtasks) Therefore it influences the memory needed per task. 2) Copy Makefile.defs.in and Makefile.in to to Makefile.defs and Makefile respectively. Change the needed values for both files to fit your system parameters. 3) Run make clean all In case the compilation fails due to insufficient memory the preprocessor tag DYNALLOC can be used to switch to dynamic allocation. 5. Execution ============ The compiled binary can be executed without further options. A wallclock time of 30 minutes per run should be sufficient on most benchmarked systems. 6. Data ======= All simulation data for the benchmark is generated on the fly by the simulation code. The two parameters influencing the data gerenated are compiled into the application. All output of the application is written to stdout.