cudaErrorIllegalAddress error when using exp_pauli(...) on multiple GPUs #2434

FabianLangkabel · 2024-11-28T15:14:43Z

Required prerequisites

Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
Make sure you've read the documentation. Your issue may be addressed there.
Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

When using exp_pauli(pauli string) in a kernel, cudaq.observe(...) and cudaq.sample(...) calls result in the error:

RuntimeError: cudaErrorIllegalAddress
RuntimeError: cudaErrorIllegalAddress
RuntimeError: cudaErrorIllegalAddress
terminate called after throwing an instance of 'ubackend::RuntimeError'
  what():  cudaErrorIllegalAddress

if the Python script is started with multiple ranks/GPUs (mpirun -n 4 python ...). Single qubit gates and CX gates work without problems, even if the number of qubits exceeds the memory of one GPU and several GPUs are required. The error also does not occur with only one rank/GPU (mpirun -n 1 python ...).

Steps to reproduce the bug

Create exp_pauli.py script:

import cudaq
from cudaq import spin
from time import perf_counter_ns


cudaq.set_target("nvidia", option="mgpu")
rank = cudaq.mpi.rank()


@cudaq.kernel
def kernel(qubit_count: int):
    qubits = cudaq.qvector(qubit_count)
    exp_pauli(0.1, qubits, "IIIIYXYYIIIIIIIIIIIIIIIIIIIIII")

qubit_count=30
op = cudaq.spin.z(0)

t0 = perf_counter_ns()
result = cudaq.observe(kernel, op, qubit_count)
t = (perf_counter_ns() - t0) * 1e-9
if rank == 0:
    print(f"time: {t:.3f} sec")

Start the script with several Ranks/GPUs:
mpirun -n 4 python exp_pauli.py

Expected behavior

Output of the time the cudaq.observe(...) call took

Is this a regression? If it is, put the last known working version (or commit) here.

Not a regression

Environment

CUDA-Q version: 0.8 and 0.9 both tested
Python version: 3.11
Operating system: Linux

Suggestions

No response

The text was updated successfully, but these errors were encountered:

1tnguyen self-assigned this Nov 28, 2024

1tnguyen added the bug Something isn't working label Nov 28, 2024

1tnguyen added this to the release 0.9.1 milestone Nov 28, 2024

1tnguyen removed this from the release 0.9.1 milestone Dec 10, 2024

1tnguyen mentioned this issue Dec 12, 2024

Enable trajectory simulation for the nvidia target #2466

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudaErrorIllegalAddress error when using exp_pauli(...) on multiple GPUs #2434

cudaErrorIllegalAddress error when using exp_pauli(...) on multiple GPUs #2434

FabianLangkabel commented Nov 28, 2024

cudaErrorIllegalAddress error when using exp_pauli(...) on multiple GPUs #2434

cudaErrorIllegalAddress error when using exp_pauli(...) on multiple GPUs #2434

Comments

FabianLangkabel commented Nov 28, 2024

Required prerequisites

Describe the bug

Steps to reproduce the bug

Expected behavior

Is this a regression? If it is, put the last known working version (or commit) here.

Environment

Suggestions