Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse matrices as Imgs #331

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

minnerbe
Copy link
Contributor

@minnerbe minnerbe commented Jun 2, 2023

I propose to include CSR/CSC sparse arrays wrapped as Img.

Motivation

The need for supporting traditional sparse arrays arose while working on a spatial transcriptomics visualization library using ImgLib2. This library needs to read and write files stored in the (Python based) AnnData format, which stores certain data as scipy CSR/CSC arrays on disk. While an on-the-fly conversion to a dense array would be possible in most cases, it is more convenient having the sparse data represented as proper ImgLib2 Imgs.
Also, considering the popularity of CSR/CSC arrays in scientific computing in general, it's maybe of independent interest to represent and visualize them in ImgLib2.

Changes

I introduced the following without changing existing code:

  • An abstract SparseImg class from which CSR and CSC style Imgs are derived. These basically wrap three 1D ArrayImgs (data, indices on contiguous slices of the matrix, and pointers to those slices in the data / indices array). This abstract class also features some static utility methods.
  • A SparseRandomAcess that, given an index, finds the corresponding element in O(log(n)), where n is the size of the contiguous dimension.
  • A SparseLocalizingCursor that traverses the whole sparse array in linear complexity and (trivially) without cache misses on the underlying ArrayImgs. Since any cursor needs to track the current position internally to check if the next element is zero, also Img::cursor returns the localizing cursor.
  • A SparseImgFactory that creates an empty CSR/CSC array.
  • A test asserting correctness of most functionality.

Possible problems

  • CSR/CSC arrays are inherently limited to 2D.
  • Since adding new non-zero elements requires O(number of current non-zero elements) steps, this operation is usually not supported by sparse arrays. Thus, populating a new sparse matrix created by a factory is not possible, severely limiting the usefulness of SparseImgFactory.
  • The documentation is quite sparse. ;) I can add more if requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant