-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] Ingest somacore
classes
#3307
Conversation
e216405
to
9b5366f
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3307 +/- ##
==========================================
+ Coverage 85.53% 85.81% +0.28%
==========================================
Files 54 57 +3
Lines 5703 6169 +466
==========================================
+ Hits 4878 5294 +416
- Misses 825 875 +50
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like straightforward code motion (as intended!) -- looks good to me
Holding off on approval until we get the somacore PR merged & a release tagged
45c1bb2
to
8c6ff84
Compare
self, | ||
index_factory=index_factory, | ||
) | ||
self._index_factory = index_factory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also remove all of this factory support for indexers. It was added to break the circular dependency when the C++ indexer was added to tiledbsoma
. At this point, there is no reason the query implementation can't just use tiledbsoma.IntIndexer directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this a bit but couldn't immediately figure it out; suggest we leave it for follow-on work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works for me. I need to do a bunch of work in this code, so I can take it on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me, with one optional item I noted in comments (clean-up opportunity in query.py). I think we should do that cleanup, but I'm fine if it arrives in a later PR.
apis/python/src/tiledbsoma/_query.py
Outdated
index_factory=index_factory, | ||
) | ||
self._index_factory = index_factory | ||
self._threadpool_: Optional[ThreadPoolExecutor] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also remove this class-specific threadpool, and use Experiment.context.threadpool (the common threadpool used elsewhere). The implementation in somacore predated the existence of the SOMA context threadpool and at this point is obsolete/wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, I can do the threadpool cleanup when I add the partitioned reader -- if that is helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that the tests are out of date. There is code in the repo that assumes threadpool must not be None.
We should update the tests!
Accompanying SOMA "core" changes: single-cell-data/SOMA#244
It can be awkward having some implementations in that repo:
dask
kwarg toExperimentAxisQuery.to_anndata
; EAQ impl has lived in the core repo, to date, but arguably should notChanges:
Move files and classes from single-cell-data/SOMA to this repo:
_query.py
:Axis
AxisIndexer
AxisQueryResult
ExperimentAxisQuery
MatrixAxisQuery
JoinIDCache
_fast_csr.py
_eager_iter.py
See also #3345 -- this isn't a performance PR per se but it will enable an
ExperimentAxisQuery
performance-improvement follow-up to #3328[sc-59595]