You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After opening a file, if a user tries to access the num_frames of a scan tifffile will iterate over each page to find their offsets (see step 2 in the Details of data loading section in the readme). This turns out to be very slow when done over the network (almost 200x slower than when the file is local):
The chain of operations goes scan.num_frames -> len(TiffFile.pages) -> TiffFile.TiffPages.seek(-1). What seek(-1) does is starting on the first page which has already been read, move page by page accessing their offset value and saving it in an index. Per page, it performs two seeks and two reads on the tiff file handle (which is an io.BufferedReader object); these reads take most of the time.
However, they only read 8 bytes each (fh.read(tagnosize) reads the number of tags and fh.read(offsetsize) reads the actual offset) which doesn't account to enough info for it to be a bottleneck (even assuming each 8 byte is packeted as a 96 byte TCP packet, that is only around 4 Mb which would not take 28 seconds). My guess is that it is the sheer number of packets that is causing the problem.
In any way, because all of ScanImage's tiff files' pages are the same size on file, the offset from page to page will be exactly the same so we only need to compute one offset overall (or maybe one per file to be safe and avoid read errors if two files come from diff scans). This will require changing the seek function in tifffile.TiffPages to only compute the offset once and fill out the rest of page offsets with it.
The text was updated successfully, but these errors were encountered:
because all of ScanImage's tiff files' pages are the same size on file, the offset from page to page will be exactly the same so we only need to compute one offset overall
FWIW, this is not true for ScanImage > 2015 BigTIFF files, where the ImageDescription tag value varies. See also cgohlke/tifffile#29.
Hi @cgohlke
Tags changing size will be annoying, I would have to check when that is the case. I remember checking offsets for some test cases and they were the same but maybe it changes for some configs. At least, it will be patently obvious if the offsets are wrong (all kinds of stuff should break).
Thanks for letting us know 👍
PS: Not sure why that would have been a problem in the referenced issue, I thought tifffile explicitly reads the offsets page by page (even if is_scanimage is True), that's what this issue was supposed to be about.
After opening a file, if a user tries to access the
num_frames
of a scantifffile
will iterate over each page to find their offsets (see step 2 in the Details of data loading section in the readme). This turns out to be very slow when done over the network (almost 200x slower than when the file is local):The chain of operations goes
scan.num_frames -> len(TiffFile.pages) -> TiffFile.TiffPages.seek(-1)
. What seek(-1) does is starting on the first page which has already been read, move page by page accessing their offset value and saving it in an index. Per page, it performs two seeks and two reads on the tiff file handle (which is an io.BufferedReader object); these reads take most of the time.However, they only read 8 bytes each (
fh.read(tagnosize)
reads the number of tags andfh.read(offsetsize)
reads the actual offset) which doesn't account to enough info for it to be a bottleneck (even assuming each 8 byte is packeted as a 96 byte TCP packet, that is only around 4 Mb which would not take 28 seconds). My guess is that it is the sheer number of packets that is causing the problem.In any way, because all of ScanImage's tiff files' pages are the same size on file, the offset from page to page will be exactly the same so we only need to compute one offset overall (or maybe one per file to be safe and avoid read errors if two files come from diff scans). This will require changing the seek function in
tifffile.TiffPages
to only compute the offset once and fill out the rest of page offsets with it.The text was updated successfully, but these errors were encountered: