This is the README file for the IQCS application benchmark,
distributed with the DEISA Benchmark Suite:
http://www.deisa.eu/science/benchmarking/
Last modified by the DEISA Benchmark Team on 2009-03-11.
-----------
IQCS readme
-----------
Contents
--------
1. General description
2. Code structure
3. Parallelization
4. Building
5. Execution
6. Data
1. General description
======================
The Improving Quantum Computer Simulations (IQCS) benchmark does not
belong to a conventional class of HPC-codes like MC, MD, CFD. Instead,
since the simulation of quantum operations is memory bounded (adding
another qubit doubles the size of the state vector), it serves
primarily as a benchmark for memory bandwidth and internode
communication.
In contrast to a classical bit, which can be either 0 or 1, the state
of a qubit |q> is given by a linear superposition of the basis states
|0> and |1>
|q> = a |0> + b |1> ; |a|^2 + |b|^2 = 1
with a and b being complex numbers. The state vector of |q> is defined
by (a,b)^t. A quantum operation on |q> is represented by a unitary
2x2-matrix U. For example, the Hadamard-operation H is given by
(1 1)
H = 1/sqrt(2) (1 -1) (*)
mapping the basis states according to
|0> -> 1/sqrt(2) (1,1)^t ,
|1> -> 1/sqrt(2) (1,-1)^t .
The state |q> of a n-qubit system can be written as
|q> = a_{0...00} |0...00> + a_{0...01} + a_{0...10} + ...
+ a_{0...11} |0...11> + a_{1...11} |1...11>
where (a_{0...00}, a_{0...01}, a_{0...10}, ..., a_{0...11}, a_{1...11})^t
corresponds to the state vector. It is obvious that the size of the
state vector grows exponentially with the number of qubits. Storing
each complex component of the state vector in two doubles, each
component requires 16 Byte of memory. Therefore, the state vector of a
n-qubit system consumes (16 * 2^n)/1024^4 TByte. In case of 37 qubits,
2 TB are needed to store the state vector of the system.
2. Code structure
=================
The application source for the IQCS benchmark program is distributed
as a tar file. It comprises the files:
iqcs.f90 - Main application
iqcs_param.f90.in - Definition of simulation parameters, especially
'nbits' and 'nlogtasks'
Makfefile.defs.in - Template for definitions of the library paths and
compilation parameters.
Makefile.in - Template for main Makefile
measure_all.f90 - Wraping subroutine to remeasure all Qubits
measure.f90 - Remeasurement subroutine called after operation
H.f90 - Hadamard operation subroutine
3. Parallelization
==================
A Hadamard-operation on qubit i (we count the rightmost qubit as #1,
the leftmost as #n) is represented by a nxn-matrix which can be broken
down to a sequence of 2^(n-1) operations of form (*). These 2x2
matrices can be executed in parallel. The IQCS code applies a
Hadamard-operation sequentially to all qubits ranging from 1 to n.
Since it is usually (i.e. for larger systems) not possible to store
the whole state vector locally, it has to be distributed over
different tasks. In order to perform a quantum operation on the whole
state vector, communication between the different tasks is
needed. Therefore, each task possesses an additional buffer which
receives half of the state vector of its partner task. After the local
computation is finished this updated part of the state vector is
communicated back. Due to this additional buffer the overall memory
consumption is by a factor 1.5 larger than in case of simple state
vector storage.
The implementations of the Hadamard-operation as well as the
remeasurement provide an OpenMP parallelization for the loop
structures.
Details of the implementation are described in:
K. De Raedt, K. Michielsen, H. De Raedt, B. Trieu, G. Arnold,
M. Richter, Th. Lippert, H. Watanabe, N. Ito:
Massively parallel quantum computer simulator,
Comp. Phys. Comm. 176 (2007) 121-136.
4. Building
===========
Building the IQCS benchmark code outside of the benchmark environment
should be straight forward, after setting the needed parameters by
hand.
1) First set the two runtime parameters 'nbits' and 'nlogtasks' in the
file iqcs_param.f90 to the appropriate values.
The value for 'nbits' defines the overall memory needed for the simulation
according to the following table:
Number of Qubits Size of state vector [GB] overall memory needed [GB]
27 2 3
28 4 6
29 8 12
30 16 24
31 32 48
32 64 96
.. .. ..
Each additional Qubit doubles the amount of overall memory needed by the
application. The parameter 'nlogtasks' defines the number of tasks the
executable must be called with in the form:
number of tasks = 2^(nlogtasks)
Therefore it influences the memory needed per task.
2) Copy Makefile.defs.in and Makefile.in to to Makefile.defs and Makefile
respectively. Change the needed values for both files to fit your system
parameters.
3) Run make clean all
In case the compilation fails due to insufficient memory the preprocessor
tag DYNALLOC can be used to switch to dynamic allocation.
5. Execution
============
The compiled binary can be executed without further options. A
wallclock time of 30 minutes per run should be sufficient on most
benchmarked systems.
6. Data
=======
All simulation data for the benchmark is generated on the fly by the
simulation code. The two parameters influencing the data gerenated are
compiled into the application. All output of the application is written
to stdout.