Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Can not handle an h5 file with 5D data structure #1111

Open
knowblesse opened this issue Apr 1, 2024 · 3 comments
Open

BUG: Can not handle an h5 file with 5D data structure #1111

knowblesse opened this issue Apr 1, 2024 · 3 comments

Comments

@knowblesse
Copy link

Describe the issue:

When suite2p try to open an h5 file with 5D data structure, it can not iterate through frames with this error.
TypeError: Indexing arrays must have integer dtypes
The cause of the error is the fact that arguments of np.arange receives int / int when hdims != 3.

Reproduce the code example:

# needs an h5 file with 5D array structure.

Error message:

Traceback (most recent call last):
  File "/home/jihoon_jeong/miniforge3/envs/2p_prep/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3550, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-5c404d568d0f>", line 1, in <module>
    f[key][irange, ...]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/jihoon_jeong/miniforge3/envs/2p_prep/lib/python3.9/site-packages/h5py/_hl/dataset.py", line 831, in __getitem__
    selection = sel.select(self.shape, args, dataset=self)
  File "/home/jihoon_jeong/miniforge3/envs/2p_prep/lib/python3.9/site-packages/h5py/_hl/selections.py", line 82, in select
    return selector.make_selection(args)
  File "h5py/_selector.pyx", line 282, in h5py._selector.Selector.make_selection
  File "h5py/_selector.pyx", line 197, in h5py._selector.Selector.apply_args
TypeError: Indexing arrays must have integer dtypes

Version information:

0.14.0

Context for the issue:

No response

@knowblesse
Copy link
Author

knowblesse commented Apr 1, 2024

Double value in np.arange was not the only problem.

Fix 1

In line 63-64, nframes_all which I think a variable representing total number of frames, only consider the first two dimension. This works fine with 4D, but when 5D data is provided,
the algorithm must consider number of frames * number of planes * number of channels

nframes_all = f[key].shape[0] if hdims == 3 else f[key].shape[0] * f[key].shape[1]

Fix 2

I don't know why there is this comment data = (nchan x) (nframes x) nplanes x pixels x pixels

If that comment is right, than...

  1. both nframes x nplanes x pixels x pixels and nchan x nframes x pixels x pixels should work for 4D data,
  2. and 5D data with nchan x nframes x nplanes x pixels x pixels

but the second case of the 4D and the 5D fail because of the line 81
im = f[key][irange, ...]

It would be much safer to check the shape of the data and raise error when the data shape didn't match with ops.npy.

Same thing applies to line 82-83

Fix 3

I don't understand the logic in line 97 i0 = nchannels * ((j) % nplanes)

as ((j) % nplanes) is always j

For the 5D, don't we want something like this?

[fr0 ch0 pl0], [fr0 ch0 pl1], [fr0 ch1 pl0],[fr0 ch1 pl1] [fr1 ch0 pl0], ...

In this case, i0 = j will do the job as it will move one index further for each plane.

Then line 98-99 and line 104-105 should be changed too.

Minor

line 87 if type(im[0, 0, 0]) == np.uint16: would be much direct if im.dtype == np.uint16:


Please correct me if I'm wrong. BTW, it's an awesome package! Thank you for your job!

@carsen-stringer
Copy link
Member

Thanks @knowblesse for noticing this and trying to fix this issue. We don't use h5 files ourselves, but I think this deinterleaving of irange is necessary for scanbox converted files? I believe these files had data arrays which were/are 3D/4D which needed to be converted to 5D format to put into our binaries (tiffs have this sort of ordering too from Scanimage). But I honestly don't remember who we added this specific 4D/5D fix for.

If possible, ideally the fix would be backwards compatible, e.g. for Fix 1,

nframes_all = np.prod(np.array(f[key].shape[:-2]))

For Fix 2 -- check if the h5 file has the right dimensions -- if so, skip all of the deinterleaving steps and write straight to binary?

I'm not sure if Fix 3 means there is a bug for these sorts of files already anyway?

@knowblesse
Copy link
Author

Hi Dr. Stringer, I'm Ji Hoon, and thank you for your comment! Your feedback is truly an honor!
Our lab, led by PI Tristan Geiller at Yale, has recently embarked on building our own data processing pipeline. During this process, we had the pleasure of discovering your remarkable package. After careful consideration, we decided to adopt the H5 format with a 5D array over the TIFF stack option.

I actually fixed the issue, and tested on 3D, 4D (with multiple channels), 4D (with multiple planes), and 5D (both multiple channels and planes). I've submitted a pull request (#1112), and I would greatly appreciate your review.

  • While I cannot guarantee compatibility with the "scanbox file" as I've never used it, I've ensured functionality for H5 files. Additionally, I've created a test dataset labeled with each frame's information, which I'm happy to share.
  • For issue 1, I've implemented the exact code as suggested.
  • Regarding issue 2, I've included additional error-checking scripts in the commit. However, if you prefer a cleaner codebase without the messy sanity checks, I can remove them.
  • As for issue 3, the original code won't save the frame properly for the 5D data in current state. I recommend testing this behavior with the sample data I've used, as flattening arrays can often be a source of confusion.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants