-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working compilation #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR should correctly build a wheel with Catalyst support included
Would you mind expanding on what is required to support catalyst?
requirements.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pin a version for scikit-build
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, why? A breaking version change won't necessarily break this. If a breaking change does end backwards compatibility, it's likely 20 minutes of work to update and pin a version at that time.
@natestemen The lines of code to wrap a back end for Catalyst are relatively minimal, though there's at least a The C++ interface is found by PennyLane and its users via a method in the Python back end class for the same "device" (which is a software simulator, in this case). Basically, Catalyst can swap in the C++ back end in for the Python back end, with the point being to improve performance (though this explanation is likely a gross simplification of what's going on and what's intended). Besides the C++ back end, and the Python back end, there are respective device capability schema manifests, for Python and C++, which must basically be at logical parity for capability. In general, if a capability is declared on either side, Python or C++, it must be declared and implemented on both sides. TL;DNR: Python and C++ back ends, with Catalyst and PennyLane, are device wrappers with total parity between capability manifests, such that the two back ends are perfectly interchangeable for the interface each supports. (The point of this implementation exercise is to let Catalyst improve your performance in the context of PennyLane or more generally.) |
We'll fix the tests on the next PR. (We inherited them from the original |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 🚀
U1 = { properties = [ "controllable", "invertible" ] } | ||
MultiControlledX = { properties = [ "controllable", "invertible" ] } | ||
Identity = { properties = [ "controllable", "invertible" ] } | ||
QFT = { properties = [ "invertible" ] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, that's a pretty high level primitive 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qrack always supported it as a primitive (without the round of terminal swap gates)! I'm not 100% clear on whether you folks include the round of terminal swap gates, by definition, but I think you do, and that's included in this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we do indeed include the swap gates 👍
https://github.com/PennyLaneAI/pennylane/blob/5b192e62267d7bb39c629b52ff930006279719d5/pennylane/templates/subroutines/qft.py#L187
"CNOT", | ||
"C(CNOT)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious whether these native controlled gates are a new feature of the Python device with this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are. I didn't even realize these would be correctly interpreted, this way, but Catalyst complained about the mismatch between C++ and Python back end operation manifests, without these.
auto AllocateQubit() -> QubitIdType override { | ||
return qsim->Allocate(1U); | ||
} | ||
auto AllocateQubits(size_t num_qubits) -> std::vector<QubitIdType> override { | ||
std::vector<QubitIdType> ids(num_qubits); | ||
for (size_t i = 0U; i < num_qubits; ++i) { | ||
ids[i] = qsim->Allocate(1U); | ||
} | ||
return ids; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic qubit allocation is something that we planned for early but unfortunately never got to test because PennyLane doesn't support it.
Do you generally make use of that feature?
Small note: I would expect allocating qubits one by one to be slower than a single resize on a simulator device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We offer first class support for dynamic qubit allocation and deallocation! (Actually, at least Microsoft Q# supports or requires this feature in simulators, as well.) This is made possible largely by Qrack's historical focus on "Schmidt decomposition" as a means of optimization. It's worth noting specifically, the Release()
(or "Decompose()
" and "Dispose()
" methods) do not check that their target qubits are separable, first, which could lead to denormalization of the state and undefined behavior, but this is very easily avoided by simply measuring all directly targeted qubits before releasing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's worth clarifying something about your note: I do not expect allocating qubits one-by-one to be significantly slower than allocating them all-at-once, with Qrack's default Schmidt decomposition optimizations!
When Qrack is Schmidt decomposing (by default), if an n
-qubit simulator is allocated in default reset state, each qubit is actually separately allocated and initialized as a single two-state system, to begin with, and it's entangling gates that lead to the necessity of Kronecker products. So, whether allocated all-at-once or one-by-one, Qrack is allocating these one-by-one as two-state systems, either way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh interesting, glad to learn something new :)
// Measure to prevent denormalization | ||
qsim->M(id); | ||
// Deallocate | ||
qsim->Dispose(id, 1U); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note that this behaviour is divergent from our device, which doesn't allow deallocation of an entangled qubit. In your case the deallocation would proceed while also randomly projecting the remainder of the system. I don't think either is necessarily better but the difference could lead to surprises down the road 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the only randomness is in the measurement before deallocation. Once measurement occurs, if there are no further entangling gates, the round trip of deallocation and Kronecker product, back to original state post-measurement, would be 100% idempotent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, what I mean is that the state post-deallocation (including the measurement) is random, is it not? That is you have to make a choice about which state-vector half you continue computing with, and that choice is made randomly with the measurement?
This PR should correctly build a wheel with Catalyst support included (with help from
scikit-build
). All supported API features should be seen as at parity between the Python and C++ back ends, according to Catalyst.Still to do: we need tests, we need to debug, and we need to complete the implementation of the supported observables in the manifest.