Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Review TileDB-SOMA for spec compliance #834

Closed
3 tasks done
atolopko-czi opened this issue Jan 30, 2023 · 9 comments · Fixed by single-cell-data/SOMA#145
Closed
3 tasks done

[python] Review TileDB-SOMA for spec compliance #834

atolopko-czi opened this issue Jan 30, 2023 · 9 comments · Fixed by single-cell-data/SOMA#145
Assignees

Comments

@atolopko-czi
Copy link
Member

atolopko-czi commented Jan 30, 2023

Once TileDB-SOMA RC0 is code-complete, review the TileDB-SOMA public API to ensure compliance with the SOMA abstract specification w.r.t class names, class properties, and method names , and method signatures.

The review team is:
@atolopko-czi , @mlin , @thetorpedodog , @aaronwolen

  • Any discrepancies found can simply be noted on this issue as comments. @atolopko-czi will coalesce into one or more issues for any fixes that may be needed.
  • Review discrepancies with the larger team if it is not clear whether it is the implementation or the spec that should be corrected.
  • Python-perspective review & impl fixes, as needed (@atolopko-czi)
    - [ ] R-perspective review & impl fixes, as needed (@aaronwolen)
@atolopko-czi
Copy link
Member Author

@johnkerl @@thetorpedodog We still have many "⚠️" notes in the spec. Do we need to address these before performing a spec compliance check or proceed as is?

  • "Issue: are there parts of the Arrow type system that we wish to explicitly exclude from SOMA?"
  • "Issue: it would be a good idea to factor SOMAExperiment and SOMAMeasurement into separate sections."
  • "[SOMADataFrame] To be further specified: all methods need specification"
  • "[SOMADenseNDArray] To be further specified: this is incomplete."
  • "[Operation: read()] Issue: support for other formats, such as Arrow Table, is under discussion."
  • "[General Utilities] To be further specified"
  • "[Value Filters] To be further specified"
  • "[Other Issues (open issues with this doc)] Issues to be resolved: ..."

@atolopko-czi
Copy link
Member Author

Tracking review progress and action items here.

@johnkerl
Copy link
Member

johnkerl commented Mar 2, 2023

cc @bkmartinjr

@atolopko-czi
Copy link
Member Author

atolopko-czi commented Mar 3, 2023

@thetorpedodog @mlin Found and fixed some discrepancies in spec in single-cell-data/SOMA#145, but looking for feedback on how to handle the following additional discrepancies:

  • Collection.add_new_collection: cls param: cls (impl) vs type (spec) [use kind in both places, consistently throughout]
  • Should spec mention closed and mode properties? (e.g. it does mention context property) [ignore]
  • DataFrame.write: values param only admits pa.Table (impl) vs spec admits pa.Table or pa.RecordBatch [issue to add to impl; also for SparseNDArray]

@bkmartinjr
Copy link
Member

Regarding the above open issues - my general attitude is we should move away from inline "issue" tracking and start using GH issues (to capture discussion, etc).

On the specific items listed above:

"Issue: are there parts of the Arrow type system that we wish to explicitly exclude from SOMA?"

I believe this has been resolved by the decision to have create methods raise an error when an unsupported type is requested (i.e., where the underlying storage engine is unable or unwilling to support a given schema).

Propose we remove this issue from the document.

"Issue: it would be a good idea to factor SOMAExperiment and SOMAMeasurement into separate sections."

As this is editorial, and non-critical, it could be removed. At most, it belongs in a low-priority issue on the repo.

"[SOMADataFrame] To be further specified: all methods need specification"

I believe all methods do have at lease a cursory specification, so this can be removed.

"[SOMADenseNDArray] To be further specified: this is incomplete."

Ditto

"[Operation: read()] Issue: support for other formats, such as Arrow Table, is under discussion."

These are largely resolved, so I would remove and any new discussion should be directed to a GH issue.

"[General Utilities] To be further specified"

remove

"[Value Filters] To be further specified"

remove

"[Other Issues (open issues with this doc)] Issues to be resolved: ..."

item (1) seems resolved.

items (2) and (3) could be moved to a GH issue and this section could be removed.

@atolopko-czi
Copy link
Member Author

atolopko-czi commented Mar 3, 2023

Regarding

@johnkerl @@thetorpedodog We still have many "⚠️" notes in the spec. Do we need to address these before performing a spec compliance check or proceed as is?

and

Regarding the above open issues - my general attitude is we should move away from inline "issue" tracking and start using GH issues (to capture discussion, etc).

Removed 'em all, creating follow-up github issues for 2 specifically:

@maniarathi
Copy link
Contributor

Couple more discrepancies I noted -- the SOMACollection spec says that has() and iterator are available operations, but this is currently not true. I am not sure whether there is a plan to add these in the future or not, but right now it will output a has no attribute error if those methods are called on a Collection object.

@atolopko-czi
Copy link
Member Author

Couple more discrepancies I noted -- the SOMACollection spec says that has() and iterator are available operations, but this is currently not true. I am not sure whether there is a plan to add these in the future or not, but right now it will output a has no attribute error if those methods are called on a Collection object.

These are handled by the Collection's Python dunder methods, __iter__() and __contains__(), which work:

In [13]: [k for k in c]
Out[13]: ['census_info', 'census_data']

In [14]: 'census_info' in c
Out[14]: True

So while Collection does not provide an explicit has() or iterator() method, I think we're covered by these Python idiomatic patterns.

@atolopko-czi
Copy link
Member Author

Split off the R-related portion of the issue to #1062

@johnkerl johnkerl changed the title Review TileDB-SOMA for spec compliance [python] Review TileDB-SOMA for spec compliance Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants