Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecating hardcoded OFFSET constant #1

Open
mkitti opened this issue May 5, 2022 · 4 comments
Open

Deprecating hardcoded OFFSET constant #1

mkitti opened this issue May 5, 2022 · 4 comments
Labels
question Further information is requested

Comments

@mkitti
Copy link
Collaborator

mkitti commented May 5, 2022

Proposal: Making OFFSET, the start of the array image data, a variable

Currently, .dat readers typically code OFFSET as a fixed constant, 1024. This value encodes the start of the array data and the end of the attribute metadata.

image

For forward-compatibility, in anticipation of the need for additional metadata to support multiple microscopes, I propose making Offset an independent variable rather than a fixed constant. The variable could either be a directly encoded attribute or can be calculated from other existing attributes.

image

Proposal: An attribute for Offset at offset 992

An independent attribute for Offset would be a robust solution as it can uniquely delineate the separation between metadata and array data. I propose byte offset 992 for the location for unsigned 64-bit integer in big endian format for consistency.

  • A value of 0x0000000000000000 indicates that the Offset value should be calculated or assumed to be 1024 to enable backwards compatibility.
  • A value of 0xffffffffffffffff indicates that there is no meaningful Offset for contiguous array data. For example, this may indicate that the array data is chunked and/or compressed.

Proposal: Calculate Offset from FileLength and other attributes

Alternatively, Offset could be calculated from the FileLength attribute which currently indicates the end of the array data and is stored at byte offset 1000 as a 64-bit big endian integer. Since the length of the array can be calculated as the product of Xresolution, Yresolution, Number_of_channels, and the size of the datatype, the Offset value can be calculated from the FileLength as follows.

Offset = FileLength - Xresolution * Yresolution * Number_of_channels * nbytes_in_datatype

A new special value for FileLength is 0xffffffffffffffff which indicates that the FileLength should be interpreted as the actual end of the file and may not be a reliable value from which to calculate the array data offset. For example, this value should be used if the array data is chunked and/or compressed or the CSV recipe data is no longer indicated in the trailer of the file.

Example application: Proposed hybrid DAT/HDF5 file

If this proposal is implemented, a hybrid DAT/HDF5 becomes possible, where the extra metadata space is used to contained HDF5 metadata according to the HDF5 file metadata.

A simple contiguous HDF5 file can accommodate a userblock of 1 KB or some doubling thereof. This userblock can accomodate the existing DAT file metadata. The HDF5 metadata header can be contained within a subsequent 2 KB by written by the HDF5 library with an early allocation flag. A hybrid-HDF5/dat file could be made if the OFFSET were shifted to 3 KB (3072 bytes).

image

A potential modification to the LabView writer consists of changing a single constant from 1024 to 3072. The details of the potential writer modification are out of scope for this proposal.

image

To clarify, the scope of this proposal applies solely to DAT file readers.

Reader References

@mkitti mkitti changed the title Deprecating hardcoded OFFSET constant? Deprecating hardcoded OFFSET constant May 5, 2022
@mkitti mkitti added the question Further information is requested label May 5, 2022
@clbarnes
Copy link
Collaborator

Writing out an HDF5 file would be enormously helpful and basically solve all implementation problems on the read side. As I understand it, the scope fills up a memory buffer as it goes and then writes out to a file all at the end, which gives quite a lot of latitude in terms of juggling the numbers around before the write (e.g. splitting channels into separate datasets, writing valid metadata as it pertains to both the group and the channels etc.). If giving a flexible offset is the first step, I'm all for it.

@mkitti
Copy link
Collaborator Author

mkitti commented May 19, 2022

Hi @clbarnes,

My understanding is that "2D scan Tclk.vi" is the main image writer component of the software. The top half puts data into a queue, a memory buffer, and the bottom half reads from that queue and writes to disk. However, this happens concurrently, but not synchronously. That is we do not have to wait for data to be written to disk before acquiring new data.

The bottom component has two hard coded values. byte offset 1024 indicates where the image data begins. Byte offset 1000 indicates the end of the image data. After this, the recipe is written.

The current way we have implemented of converting the dat file to a HDF5 file is just using a reader to read in the dat file and then writing out an equivalent HDF5 file, perhaps with the image data chunked and compressed.

An alternative way would be to reopen the file and write in a HDF5 header somewhere. For example, if we moved the DAT header somewhere else, we could overwrite the DAT header with a HDF5 header at the beginning of the file, and then add the attributes to the HDF5 file. The only advantage of this approach is that the image data does need to be rewritten to obtain a HDF5 file. This also could be done during transmission of the file off the acquisition computer.

We will likely proceed with the current method of resaving the entire file in the near term.

-Mark

Zoom in of the bottom file writing component of "2D scan Tclk.vi"
image

Overview of "2D scan Tclk.vi"
2D_scan_Tclkd

@clbarnes
Copy link
Collaborator

Got it, thank you! I suppose both cases of rewriting the file or just writing HDF5 metadata into it requires a reader to be co-maintained with the microscope software, which needs to be robust, scaleable, relatively standalone etc.. In which case we may as well just use that tool to do whatever conversion we need, HDF5/zarr/N5/TIFFs/npy/ whatever. The value of having the scope software just generate a valid HDF5 to begin with is that everyone's starting point looks the same, but if that's not an option, it's not an option.

@d-v-b
Copy link
Collaborator

d-v-b commented May 20, 2022

but if that's not an option, it's not an option.

Is it really not an option though? Is there centralized maintenance of the acquisition software? If not, then someone could go ahead and just add the hdf5 writing functionality and the problem is solved (for that person / group)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants