This page explains the core concepts behind QDP (Quantum Data Plane): the architecture, the supported encoding methods, GPU memory management, DLPack zero-copy integration, and key performance characteristics.
At a high level, QDP is organized as layered components:
qumat.qdp exposes a friendly Python entry point (QdpEngine).qdp/qdp-python builds the _qdp module, bridging Python ↔ Rust and implementing Python-side DLPack (__dlpack__).qdp/qdp-core contains the engine (QdpEngine), encoder implementations, IO readers, GPU pipelines, and DLPack export.qdp/qdp-kernels provides the CUDA kernels invoked by the Rust encoders.Data flow (conceptual):
Python (qumat.qdp) → _qdp (PyO3) → qdp-core (Rust) → qdp-kernels (CUDA)
│ │ │ │
└──── torch.from_dlpack(qtensor) ◄──────┴── DLPack DLManagedTensor (GPU ptr)
QDP encodes classical data into a state vector $\vert\psi\rangle$ for $n$ qubits.
[1, 2^n] (always 2D)[batch_size, 2^n]QDP supports two output precisions:
float32float64QDP uses a strategy pattern for encoding methods:
encoding_method is a string, e.g. "amplitude", "basis", "angle"All encoders perform input validation (at minimum):
len(data) <= 2^n; for angle: len(data) = n (one per qubit); for basis: index in rangeNote: $n=30$ is already very large—just the output state for a single sample is on the order of $2^{30}$ complex numbers.
Goal: represent a real-valued feature vector $x$ as quantum amplitudes:
\[\vert\psi\rangle = \sum_{i=0}^{2^{n}-1} \frac{x_i}{\|x\|_2}\,\vert i\rangle\]Key properties in QDP:
len(x) < 2^n, the remaining amplitudes are treated as zeros.When to use it:
Trade-offs:
num_qubits ($2^n$), so it can become memory-heavy quickly.Goal: map an integer index $i$ into a computational basis state $\vert i\rangle$.
For $n$ qubits with $0 \le i < 2^n$:
Key properties in QDP:
[index]sample_size = 1)num_qubitsWhen to use it:
Goal: map one real angle per qubit to a product state. For angles $x_0, \ldots, x_{n-1}$ (one per qubit):
\[\vert\psi(x)\rangle = \bigotimes_{k=0}^{n-1} \bigl( \cos(x_k)\,\vert 0\rangle + \sin(x_k)\,\vert 1\rangle \bigr)\]So each basis state $\vert i\rangle$ gets a real amplitude $\prod_{k} \bigl( (i_k=0 \Rightarrow \cos(x_k)),\ (i_k=1 \Rightarrow \sin(x_k)) \bigr)$ (no phase; state is real-valued in the computational basis).
Key properties in QDP:
sample_size must equal num_qubits. Single sample: length-n vector; batch: shape [batch_size, n]."angle" is also supported.When to use it:
Trade-offs:
QDP is designed to keep the encoded states on the GPU and to avoid unnecessary allocations/copies where possible.
For each encoded sample, QDP allocates a state vector of size $2^n$. Memory usage grows exponentially:
QDP performs pre-flight checks before large allocations to fail fast with an OOM-aware message (e.g., suggesting smaller num_qubits or batch size).
For high-throughput IO → GPU workflows (especially streaming from Parquet), QDP uses:
Streaming Parquet encoding is implemented as a producer/consumer pipeline:
In the current implementation, streaming Parquet supports:
"amplitude""angle""basis"For large workloads, QDP uses multiple CUDA streams and CUDA events to overlap:
This reduces time spent waiting on PCIe transfers and can improve throughput substantially for large batches.
QDP exposes results using the DLPack protocol, which allows frameworks like PyTorch to consume GPU memory without copying.
Conceptually:
DLManagedTensor.__dlpack__.torch.from_dlpack(qtensor) and takes ownership via DLPack’s deleter.Important details:
num_qubits realistic. Output size is $2^n$ and becomes the dominant cost quickly.If you need to understand where time is spent (copy vs compute), QDP supports NVTX-based profiling. See Observability.