You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our current _load_tiffs function temporarily spawns two copies of the data when loading from disk, before the function returns and one copy (the list) is garbage collected. For medium to large datasets, this might lead to a memory error, even though the dataset alone fits into memory just fine. I've confirmed this behaviour with python profiler.
def_load_tiffs(data_dir):
"""Load a directory of individual frames into a stack. Args: data_dir (Path): Path to directory of tiff files Raises: FileNotFoundError: No tif files found in data_dir Returns: np.array: 4D array with dims TYXC """files=np.sort(glob.glob(f"{data_dir}/*.tif*"))
iflen(files) ==0:
raiseFileNotFoundError(f"No tif files were found in {data_dir}")
ims= []
forfintqdm(files, "Loading TIFFs"):
ims.append(imread(f))
# ims now holds a full sized copy of the datamov=np.stack(ims)
# both ims and mov hold full sized copies of the data, before we returnreturnmov
We should update this code to peek at the first frame and check its size, then allocate a numpy array we assign into. Roughly, as below:
def_load_tiffs(data_dir):
files=np.sort(glob.glob(f"{data_dir}/*.tif*"))
iflen(files) ==0:
raiseFileNotFoundError(f"No tif files were found in {data_dir}")
first_im=imread(all_tiffs[0])
shape= (len(all_tiffs), *first_im.shape)
dtype=first_im.dtypestack=np.zeros(shape=shape, dtype=dtype)
stack[0] =first_imfori, finenumerate(tqdm(all_tiffs[1:], "Loading TIFFs")):
imread(f, out=stack[i+1])
returnstack
Minimal example to reproduce the bug
Best way to reproduce is to load a dataset that is more than half your RAM size. I've usually noticed this when running pipelines for multiple datasets, but as I mentioned above, have confirmed this with python profile (will try reproduce the profile at some stage if we want, but I don't think I have a copy anymore...).
Severity
Unusable
Annoying, but still functional
Very minor
The text was updated successfully, but these errors were encountered:
Description
Our current
_load_tiffs
function temporarily spawns two copies of the data when loading from disk, before the function returns and one copy (the list) is garbage collected. For medium to large datasets, this might lead to a memory error, even though the dataset alone fits into memory just fine. I've confirmed this behaviour with python profiler.We should update this code to peek at the first frame and check its size, then allocate a numpy array we assign into. Roughly, as below:
Minimal example to reproduce the bug
Best way to reproduce is to load a dataset that is more than half your RAM size. I've usually noticed this when running pipelines for multiple datasets, but as I mentioned above, have confirmed this with python profile (will try reproduce the profile at some stage if we want, but I don't think I have a copy anymore...).
Severity
The text was updated successfully, but these errors were encountered: