-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory profiling of qml.state()
#771
Comments
BenchmarkingI have performed benchmark with the following setup:
Across the range of ProfilingI chose to use 2 profilers:
Here I used a larger num_wires=27 to help identity the memory allocations. We first look at the profiling from memray ( Memory allocationFrom the call stack we can see three distinct phases of memory allocation.
During the application, the state-vector in the C++ memory buffer is:
The python-binding to these C++ state-vector manipulations are from Each of the copies of state vectors is about 2GB (2^27qubits * 128b complex = 2GB), and with the 3 copies created from above, explains the peak usage at ~6.51GB. Comparing the memory footprint to the pure python Runtime costBack to This confirms that:
BottlenecksThe latter two points above might be improved. In terms of copying the C++ state-vector to python numpy, this may not be necessary. For this circuit, since no further gates are applied before returning the state, there is no operations before the copy. And if there is no need for an explicit copy in python, we can simply expose the C++ array by creating a view in python, without copying it like in https://github.com/PennyLaneAI/pennylane-lightning/blob/master/pennylane_lightning/core/src/simulators/lightning_qubit/bindings/LQubitBindings.hpp#L206 . It might be beneficial to have both a copy and a view method to improve general memory management. In terms of the the Possible improvementBy returning |
Important Note
Context
The
lightning.qubit
device in PennyLane-Lightning has optimal support for many quantum gates and measurement processes at both the Python and C++ layers. TheLightningMeasurements
class at lightning_qubit/_measurements.py implements the Python interface for performant C++ measurement routines in MeasurementsLQubit. Among PennyLane's measurement processes,qml.state
, that returns the underlying quantum state in the computational basis, is backed by the public methods of StateVectorLQubitManaged.hpp.The Python <> C++ memory management plays an important role in the performance of
qml.state
, although returning the underlying state-vector is not computationally intensive. Some preliminary results determined poor scaling ofqml.state
inlightning.qubit
comparing todefault.qubit
, the default pure-Python Pennylane device.Requirements
lightning.qubit
vsdefault.qubit
. In this code sample,device_name
can be eitherlightning.qubit
ordefault.qubit
and5 < num_wires < 25
. Define some thresholds whendefault.qubit
is faster thanlightning.qubit
.qml.state
?Please provide your answers as follow-up comments in this github issue. You may use Github Gist for larger files.
Feel free to ask any questions or raise any concerns regarding the issue. We'll be happy to discuss with you!
The text was updated successfully, but these errors were encountered: