The I/O Bottleneck in Quantum Computing

Quantum computers of the next few years will be suitable for simulating quantum physics in chemistry and materials science, not for problems that require large amounts of data.

When you simulate algorithms on quantum computers on current hardware, you quickly run out of machines powerful enough to do the job. For a single qubit in a superposition state \(\alpha |0\rangle + \beta |1\rangle\), there are two complex amplitudes (\(\alpha,\beta\)) that need to be encoded as 4-byte floating-point numbers each, so 2 · 8 bytes = 16 bytes.

The table below summarizes the exponentially increasing requirements for the simulation of any number of qubits on a classical computers. For instance, to simulate 50 qubits, 9 PB of RAM are required, which is only found in the Oak Ridge National Laboratory’s Frontier supercomputer.

Qubits States Memory    
1 21=2 16 bytes    
2 22=4 32 bytes    
3 23=8 64 bytes    
4 24=16 128 bytes Atari 2600 (1977)  
5 25=32 256 bytes    
       
9 29 = 512 4 KB Apollo 11: 4 KB (1969)  
10 210 = 1,024 8 KB    
13 213 = 8,192 65 KB Commodore 64: 64 KB (1982)  
20 220 = 𝒪(106) 8 MB    
24 224 = 𝒪(107) 134 MB iPhone 1: 128 MB (2007)  
25 225 = 𝒪(108) 268 MB    
30 230 = 𝒪(109) 9 GB    
33 233 = 𝒪(1010) 69 GB NVIDIA A100: 80 GB (2020)  
40 240 = 𝒪(1012) 9 TB    
45 245 = 𝒪(1014) 281 TB NVIDIA DGX GH200 144 TB (2023)  
50 250 = 𝒪(1015) 9 PB HPE Frontier: 9.2 PB (2022)  
75 275 = 𝒪(1023) 302 ZB All data stored by 2025: 200 ZB  
100 2100 = 𝒪(1030) 1010 ZB    
265 2265 = 𝒪(1080) 😲 Protons in universe  
1,000 21000 = 𝒪(10301) 😱    
10,000 210000 = 𝒪(103010) 🤯    

The I/O Bottleneck

The table may lead us to conclude erroneously that classical computers will be the bottleneck for quantum computers. That is not the case. Quite the contrary.

Following the argument by Hoefler and collaborators, we can estimate the peak bandwidth of a quantum computer by \(N/\tau\), where \(N\) is the number of qubits and \(\tau\) is the average gate speed. The assumption is that only one gate operation is required to initialize or load each qubit. The typical gate speed is between 10 ns (superconductors) and 10 μs (e.g. trapped ions, neutral atoms, nitrogen vacancies in diamond).

The current state of the art is approximately 25 logical qubits for ion traps. A machine with 100 logical qubits is feasible within a decade, for which we land in the range of 10 Mbit/s to 10 Gbit/s peak bandwidth.

For comparison, the peak memory bandwidth for the latest Intel i9 or Apple M2 CPUs is 100 Gbit/s An Intel Xeon Max comes in at 1,000 Gbit/s. For an NVIDIA A100 GPU that figure is 10,000 Gbit/s. A difference of at least two orders of magnitude.

QRAM

With quantum memory, that I/O bottleneck might not be an insurmountable problem. Unfortunately, no realistic QRAM implementation exists yet. Decoherence, qubit connectivity, and cost stand in the way of a QRAM large enough to solve the I/O bottleneck.

[A] QRAM that can address millions or billions of different memory elements is not likely feasible in the foreseeable future. Hann (2021)

Loading 1 TB of data with a decent 1 Gbit/s bandwidth, we need more than 2 hours, which is on par with coherence records for a single qubit: 6h for nuclear spins at 2K, 3h for Si at at 4K, 1h for ions around 1K, and only 39 min for Si at room temperature. But of course we need a few more qubits than one.

In quantum machine learning, particularly QCNNs, you may only need as many data points as the numbers of parameters in the model itself. While this reduces the amount of data needed significantly, many modern ML models routinely have millions, billions, or even trillions of parameters.

Outlook

Current NISQ-era machines and early fault-tolerant quantum computers that are expected within a decade are only expected to deliver an advantage for algorithms with exponential speed-ups: Shor, QFT, HHL, and quantum simulations. Quadratic speed-ups are simply not enough for practical quantum advantage.

For Shor’s algorithm, at least 372 logical qubits are needed to break RSA encryption, which is likely well beyond the reach of the next decade or so. To solve large, sparse linear systems with the HHL algorithm, we will once again bump into the I/O bottleneck.

Speed-ups for a lot of use cases being explored today with quantum annealing, QAOA, VQE, or QML are not yet known. They may or may not offer a quantum advantage in the foreseeable future. Approaches based on Grover’s algorithm, which offers only a quadratic speed-up, such as in certain machine learning algorithms, protein folding, or solving non-linear differential equations (e.g. in computational fluid dynamics), are not expected to deliver any advantage any time soon.

[Q]uantum computers will be practical for ‘big compute’ problems on small data, not big data problems. Hoefler et al. (2023)