CINECA HPC SYSTEM: MARCONI 100

MARCONI 100 (M100) is the new accelerated cluster based on IBM Power9 architecture and Volta NVIDIA GPUs, acquired by Cineca within PPI4HPC European initiative. This system opens the way to the pre-exascale Leonardo Supercomputer expected to be installed in 2021.It is available from April 2020 to the Italian public and industrial researchers. Its computing capacity is about 32 PFlops.

Quick Startup Guide

M100 Features

  • Nodes: 980
  • Processors: 2×16 cores IBM POWER9 AC922 at 3.1 GHz
  • Accelerators: 4 x NVIDIA Volta V100 GPUs, Nvlink 2.0, 16GB
  • Cores: 32 cores/node
  • Hyper-Threading: 128 (virtual) cpus [32 physical cores with 4 HTs each]
  • RAM: 256 GB/node (242 usable)
  • Peak Performance: 32 PFlop/s

OPEN SOURCE QUANTUM EMULATORS

  • Qiskit, open-source python library by IBM for developing quantum algorithms. Qiskit supports multi-threading (using OpenMP) emulation methods and additional congurable options.
Webpage Qiskit

  • Cirq, open-source python library by Google for the development of quantum algorithms.
Webpage Cirq

  • Qsim (Cirq): It uses gate fusion, AVX/FMA vectorized instructions and multi-threading using OpenMP to achieve state of the art simulations of quantum circuits. qsim is integrated with Cirq.
Webpage Qsim

  • QuTip, open-source python library for circuit emulation and resolution of open quantum systems.
Webpage QuTip

PERFORMANCES M100

Performances are based on the computational time needed to emulate the full state-vector associated to the QFT algorithm (circuit below) on n-qubits using a single node of M100. 

Fig. 1

In general, a state vector of n-qubits uses 2^n complex values (each complex number needs 16 Bytes in double precision or 8 Bytes in single precision). The memory usage is shown in the plot below. Remember that each node in M100 has 242 GB of available RAM.

Fig. 2

FIG. 2: Memory required (express in GB) by the state vector increasing the number of qubits n (x axis on the plot) Each node has a memory of 242 GB, the theoretical maximum number of qubits that the node can handle is 33 in double precision, 34 in single precision.

A. QISKIT: StatevectorSimulator

StatevectorSimulator is an Ideal quantum circuit emulator of the state vector designed to run on a single machine.Supports Multi-Threading. 

For more info

Results up to n = 30

Fig .3

FIG. 3: The plot (Log-scale) shows on the y-axis the runtime (in seconds) of the QFT algorithm increasing the number of qubits (x- axis on the plot). We were able to run the QFT algorithm using a single node up to 30 qubits in under 100 seconds with 32 threads.

Results up to n = 33 in double precision and n = 34 in single precision

Fig. 4

FIG. 4: The plot shows on the y-axis the runtime (in seconds) of the QFT algorithm increasing the number of qubits (x- axis on the plot). We were able to run the QFT algorithm using a single node up to 33 qubits (double precision) in under 1000 seconds and 34 qubits (single precision) in under 2000 seconds with 128 threads.