Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split PySparseObservable off SparseObservable #13595

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Cryoris
Copy link
Contributor

@Cryoris Cryoris commented Dec 23, 2024

Summary

Closes #13594 to prepare for SparseObservable's C API. This change has been tested with our basic C API for SparseObservable, which will come in a separate PR to keep the review load in balance 🙂

Details and comments

This PR splits the sparse observable class into a Rust-only SparseObservable struct and a PySparseObservable, which serves as Python interface. As suggested in #13391, the Python interface keeps an Arc to a read-write-locked SparseObservable. The API Change label is only due to some minuscule change in an error message, the Python interface remains unchanged.

The implementation is based on

#[pyclass(name = "SparseObservable", ...)]  // exposed as qiskit.quantum_info.SparseObservable, as before
struct PySparseObservable {
    // This class keeps a pointer to a pure Rust-SparseTerm and serves as interface from Python.
    inner: Arc<RwLock<SparseObservable>>,
}

and methods on PySparseObservable first acquire the read- or write-lock to perform actions on the inner data. For example, implementing transpose becomes

    fn transpose(&self) -> PyResult<Self> {
        // acquire the read lock, mapping the PoisonError into our own error that can be cast to a PyErr
        let inner = self.inner.read().map_err(|_| InnerReadError)?;

        // perform the action
        let result = inner.transpose();  
        
        // return a new Arc<RwLock> (if we did an inplace operation, we would just return nothing)
        Ok(Self { inner: Arc::new(RwLock::new(result)) })
    }

Some notes/questions:

  • For SparseTerm we analogously split off PySparseTerm, since it can be returned to Python. The view/mutable view versions are not returned to Python and don't need a specific interface.
  • We couldn't implement IntoPy to PoisonError (coming from RwLock::read/write), so as solution we introduced custom InnerReadErrors and InnerWriteErrors.
  • We moved some methods from the pymethods into the core Rust object and restricted direct access to the inner data, in favor of using public getters/methods.
  • The SparseObservable docstring is moved to the Python interface for now, though we might want to add a bit more Rust-specific info.

@Cryoris Cryoris added Changelog: API Change Include in the "Changed" section of the changelog Rust This PR or issue is related to Rust code in the repository labels Dec 23, 2024
@Cryoris Cryoris added this to the 2.0.0 milestone Dec 23, 2024
@Cryoris Cryoris requested a review from a team as a code owner December 23, 2024 10:50
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@coveralls
Copy link

Pull Request Test Coverage Report for Build 12465601613

Details

  • 1192 of 1245 (95.74%) changed or added relevant lines in 4 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.02%) to 88.974%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/sparse_observable.rs 149 151 98.68%
crates/accelerate/src/py_sparse_observable.rs 1041 1092 95.33%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 1 92.73%
crates/accelerate/src/sparse_observable.rs 3 93.37%
Totals Coverage Status
Change from base Build 12420636821: 0.02%
Covered Lines: 79695
Relevant Lines: 89571

💛 - Coveralls

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this.

This is just a quick high-level overview - I'll look more in detail in the new year, especially since I'll have to use a lot more local tools to do a good comparison - with the file move and changes to the code, it's hard to see what's gone on here.

Top level questions:

  • Why split py_sparse_observable into a separate flat file? I'd have expected any of:
    • keep both in the same file
    • make a sparse_observable module to put them in
    • make a separate crate that contains only the C components
      with a rough preference to just keeping everything in the same file for now. This form to me has meant that a lot of logically private functions have had to become pub(crate), and now there's more places to look to understand the code.
  • For everything that's become pub(crate): in some cases, I think pub(crate) just indicates that a function is defined in the wrong file. In many others, since this PR is looking to a future when SparseObservable is consumable by non-Qiskit crates directly from Rust, I suspect that anything that became pub(crate) should be either private or fully pub. If it's useful for the Python wrapper, feels highly likely it ought to be a proper public interface.

Comment on lines +154 to +165
/// A single term from a complete :class:`SparseObservable`.
///
/// These are typically created by indexing into or iterating through a :class:`SparseObservable`.
#[pyclass(name = "Term", frozen, module = "qiskit.quantum_info")]
#[derive(Clone, Debug)]
struct PySparseTerm {
// This class keeps a pointer to a pure Rust-SparseTerm and serves as interface from Python.
// Note that we can use the standard clone, since the PySparseTerm is immutable from Python
// space, hence we can just clone the pointer. Otherwise we'd have to impl Clone manually,
// like for the PySparseObservable.
inner: Arc<RwLock<SparseTerm>>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does PySparseTerm need to include an Arc<RwLock<SparseTerm>>, rather than just being a wrapper around SparseTerm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean because it's immutable anyways?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't imagine a situation where there'd be shared ownership of an underlying SparseTerm - the only reason the Arc<RwLock<T>> is useful for SparseObservable is because of the shallow-copy qargs-after-__call__ behaviour of the QI classes (so strictly it's not even needed for this PR).

Comment on lines +981 to +984
let inner = SparseObservable::zero(num_qubits);
Self {
inner: Arc::new(RwLock::new(inner)),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In normal situations, this kind of thing is what the IntoPy/IntoPyObject trait is for. I think we can do that in a coherent manner, even with a clean crate structure, since IntoPyObject takes a type argument that will be Py<PySparseObservable>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this is actually one of the to-dos we didn't try in the end, that would be much nicer 🙂

Comment on lines +1764 to +1768
// get the number of qubits in the layout and map the layout to Vec<u32> to
// call SparseObservable.apply_layout
let (num_qubits, layout): (u32, Option<Vec<u32>>) = if layout.is_none() {
// if the layout is none,
(num_qubits.unwrap_or(inner.num_qubits()), None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the comments left around the file seem like they're old helper reminders? Might be good to run through and clean them up.

Comment on lines +1902 to +1904
// we acquire the read lock once here and use the internal methods
// to avoid checking and acquiring the lock in every method call
let inner = self.inner.read().map_err(|_| InnerReadError)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this comment is helpful to you, feel free to leave it, but this is really the base expectation for how to handle the RwLock - it would be weird to keep locking and unlocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it, it was mainly helpful for my own sanity starting to use these objects 😜

Comment on lines +800 to +810

impl Clone for PySparseObservable {
// we need to implement clone specifically to ensure the underlying inner data
// is cloned, not the reference to it
fn clone(&self) -> Self {
let inner = self.inner.read().unwrap();
Self {
inner: Arc::new(RwLock::new(inner.clone())),
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this implementation? In what situations are you thinking that this will be called from Rust space, and how sure are you that these semantics are always what would be expected? I'm pretty uncomfortable with a Clone implementation that can panic.

Since this is object intended for consumption only from Python space, perhaps it would be cleaner to override __copy__ and __deepcopy__, if copy.copy and copy.deepcopy aren't doing what you expect (though I think they probably should already work).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that we didn't think of putting this into __copy__ and we wanted to ensure we clone the object, not the ref 🙂 You're right, it's not really intended to be called from Rust space

(though I think they probably should already work).

I.e. that Python's native __copy__ would already copy the inner data instead of the reference to it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy.copy and copy.deepcopy don't need __copy__ etc to be defined - they'll use the pickle-protocol methods (__reduce__ or __getstate__/__statestate__) if they're not, and that causes SparseObservable to integrate with them already. I'm relatively sure I wrote a test.

@@ -146,7 +126,7 @@ impl BitTerm {
/// returning `Ok(None)` for it. All other letters outside the alphabet return the complete
/// error condition.
#[inline]
fn try_from_u8(value: u8) -> Result<Option<Self>, BitTermFromU8Error> {
pub(crate) fn try_from_u8(value: u8) -> Result<Option<Self>, BitTermFromU8Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this needs to be pub beyond this module, I don't see much reason to make it pub(crate) and not just pub, looking to a future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to make it pub with some of the C functionality, but I wanted to be as cautious as possible -- we can just make it pub pending the structural question you raised above 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I've seen very few reasons for pub(crate) so far - most of them have either been that the function is misplaced, an API boundary has got very muddled, or they should just be pub. We've had quite a lot of churn turning pub(crate) into pub.

Comment on lines +204 to +205
#[error("cannot shrink the qubit count in an observable from {current} to {target}")]
NotEnoughQubits { current: usize, target: usize },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just a straight-up code change? Might be better to do that in a separate PR, if you think it's necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I think we added this because it comes up in apply_layout, which was previously only a Python method which directly triggered a PyErr. Now that apply_layout is moved into the Rust methods, we needed a new error -- so it's linked to this PR due to the move of the apply_layout logic 🙂 (which could ofc also be separated out if you prefer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sure, if it's actually an important part of the split, it's fine to do here. (Part of the difficulty in reviewing this kind of PR!)

/// costs. If this does not fit your use cases, you can either periodically call :meth:`simplify`,
/// or discuss further APIs with us for better building of observables.
#[pyclass(module = "qiskit.quantum_info", sequence)]
/// See PySparseObservable in crates/accelerate/src/py_sparse_observable for detailed docs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibley

See [PySparseObservable] for detailed docs.

might be clearer that it's just implied you should go find that struct, and less likely for the file to get out of sync.

Comment on lines +385 to +423
#[inline]
pub fn coeffs(&self) -> &Vec<Complex64> {
&self.coeffs
}

#[inline]
pub fn coeffs_mut(&mut self) -> &mut Vec<Complex64> {
&mut self.coeffs
}

#[inline]
pub fn indices(&self) -> &Vec<u32> {
&self.indices
}

#[inline]
pub fn indices_mut(&mut self) -> &mut Vec<u32> {
&mut self.indices
}

#[inline]
pub fn boundaries(&self) -> &Vec<usize> {
&self.boundaries
}

#[inline]
pub fn boundaries_mut(&mut self) -> &mut Vec<usize> {
&mut self.boundaries
}

#[inline]
pub fn bit_terms(&self) -> &Vec<BitTerm> {
&self.bit_terms
}

#[inline]
pub fn bit_terms_mut(&mut self) -> &mut Vec<BitTerm> {
&mut self.bit_terms
}
Copy link
Member

@jakelishman jakelishman Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning &Vec<T> is almost never correct; you almost invariably mean &[T] (and you do, in every case here). &mut Vec can be correct in some circumstances, but in this case is very not - all these methods allow the Vec-resizing methods to be called, which easily makes the data incoherent. For ones we definitely need to expose, it should be &mut [T].

boundaries_mut and indices_mut might very well be considered unsafe, since you can easily break data coherence by writing bad values to them. I can't remember what the old code did around this, but if they are actually needed and used, then they may well want to be unsafe in Rust interfaces.

Comment on lines +1 to +8
---
upgrade:
- |
Binary arithmetic operations on :class:`.SparseObservable` have a slightly updated
error message, if the two observables in the operation have incompatible numberss of qubits.
Previously, the resulting ``ValueError`` contained the message
``incompatible numbers of qubits: A and B``, which now is
``mismatched numbers of qubits: A, B``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good thinking, but I don't think we really need to consider this an interface break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah even better -- I was just being double safe 😉

@Cryoris
Copy link
Contributor Author

Cryoris commented Dec 23, 2024

Thanks for the comments! I think they all make sense but I'll read them more carefully next year as well 🙂 Regarding

Why split py_sparse_observable into a separate flat file? I'd have expected any of:

  • keep both in the same file
  • make a sparse_observable module to put them in
  • make a separate crate that contains only the C components

with a rough preference to just keeping everything in the same file for now. This form to me has meant that a lot of logically private functions have had to become pub(crate), and now there's more places to look to understand the code.

To me, having a separate crate (I assume into py_ext?) sounds the cleanest, but I didn't want just move it w/o discussion, so I moved it into a separate file to facilitate that process 😛 I'm fine with keeping it in the same file as well for now too, though.

@jakelishman
Copy link
Member

btw should Max be a co-author, or is this all you so far?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: API Change Include in the "Changed" section of the changelog Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate SparseObservable into a Rust-only core and Python interface
4 participants