Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Ingest somacore classes #3307

Merged
merged 5 commits into from
Nov 19, 2024
Merged

[python] Ingest somacore classes #3307

merged 5 commits into from
Nov 19, 2024

Conversation

ryan-williams
Copy link
Member

@ryan-williams ryan-williams commented Nov 7, 2024

Accompanying SOMA "core" changes: single-cell-data/SOMA#244

It can be awkward having some implementations in that repo:

  • I'm experimenting with a dask kwarg to ExperimentAxisQuery.to_anndata; EAQ impl has lived in the core repo, to date, but arguably should not
  • @bkmartinjr has been working on improved "fast CSR" algorithms, which were previously implemented and used in the SOMA core repo, but arguably belong in an implementation repo like this one.

Changes:
Move files and classes from single-cell-data/SOMA to this repo:

  • _query.py:
    • Axis
    • AxisIndexer
    • AxisQueryResult
    • ExperimentAxisQuery
    • MatrixAxisQuery
    • JoinIDCache
  • _fast_csr.py
  • _eager_iter.py

See also #3345 -- this isn't a performance PR per se but it will enable an ExperimentAxisQuery performance-improvement follow-up to #3328

[sc-59595]

Copy link

codecov bot commented Nov 7, 2024

Codecov Report

Attention: Patch coverage is 87.70833% with 59 lines in your changes missing coverage. Please review.

Project coverage is 85.81%. Comparing base (8b6ab64) to head (3017ad8).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3307      +/-   ##
==========================================
+ Coverage   85.53%   85.81%   +0.28%     
==========================================
  Files          54       57       +3     
  Lines        5703     6169     +466     
==========================================
+ Hits         4878     5294     +416     
- Misses        825      875      +50     
Flag Coverage Δ
python 85.81% <87.70%> (+0.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 85.81% <87.70%> (+0.28%) ⬆️
libtiledbsoma ∅ <ø> (∅)
---- 🚨 Try these New Features:

@ryan-williams ryan-williams marked this pull request as ready for review November 8, 2024 14:00
@johnkerl johnkerl changed the title Ingest somacore classes [python] Ingest somacore classes Nov 13, 2024
Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like straightforward code motion (as intended!) -- looks good to me

Holding off on approval until we get the somacore PR merged & a release tagged

self,
index_factory=index_factory,
)
self._index_factory = index_factory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also remove all of this factory support for indexers. It was added to break the circular dependency when the C++ indexer was added to tiledbsoma. At this point, there is no reason the query implementation can't just use tiledbsoma.IntIndexer directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this a bit but couldn't immediately figure it out; suggest we leave it for follow-on work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me. I need to do a bunch of work in this code, so I can take it on

Copy link
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me, with one optional item I noted in comments (clean-up opportunity in query.py). I think we should do that cleanup, but I'm fine if it arrives in a later PR.

index_factory=index_factory,
)
self._index_factory = index_factory
self._threadpool_: Optional[ThreadPoolExecutor] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also remove this class-specific threadpool, and use Experiment.context.threadpool (the common threadpool used elsewhere). The implementation in somacore predated the existence of the SOMA context threadpool and at this point is obsolete/wasteful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I can do the threadpool cleanup when I add the partitioned reader -- if that is helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added 3017ad8 to this PR, which was my attempt to address this.

AFAICT, it is now safe to assume Experiment.context.threadpool is not None. Existing code+test implied otherwise, but I'm guessing that's out of date / no longer relevant. Can someone confirm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that the tests are out of date. There is code in the repo that assumes threadpool must not be None.

We should update the tests!

@ryan-williams ryan-williams merged commit cde6da1 into main Nov 19, 2024
31 checks passed
@ryan-williams ryan-williams deleted the rw/sc branch November 19, 2024 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants