Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata in the conversion of DAT file to HDF5 file #2

Open
mkitti opened this issue May 9, 2022 · 7 comments
Open

Metadata in the conversion of DAT file to HDF5 file #2

mkitti opened this issue May 9, 2022 · 7 comments
Labels
question Further information is requested

Comments

@mkitti
Copy link
Collaborator

mkitti commented May 9, 2022

Dear @acardona and @histonemark,

@d-v-b currently has code to ingest and export the DAT files to HDF5 in Python here:
https://github.com/janelia-cosem/fibsem-tools/blob/master/src/fibsem_tools/io/fibsem.py
https://github.com/janelia-cosem/fibsem-tools/blob/master/src/fibsem_tools/io/h5.py

See also janelia-cellmap/fibsem-tools#24 .

I also wrote some code for an early demo here which may be more concrete at the moment:
https://github.com/mkitti/fibsem-tools/blob/fibsem_h5/src/fibsem_tools/io/fibsem_h5.py

Earlier you had expressed and interest in text readable metadata. That could be exported via the hdf5-json package:
https://hdf5-json.readthedocs.io/en/latest/

We will likely include the former 1024 byte DAT headers as an attribute of the HDF5 file. A non-mutually exclusive alternative would be to put them into a 1 KB HDF5 userblock so that legacy DAT reader code could ingest the old header.

We could send you a sample HDF5 file. Do you have any preferences with regard to metadata processing?

-Mark

@mkitti mkitti added the question Further information is requested label May 9, 2022
@histonemark
Copy link
Collaborator

histonemark commented May 10, 2022

Hi Mark, Thanks! In principle we have data so we can test your tool directly here. Regarding the header data storage, either of the options you list are valid in my opinion, its something you wanna be able to read in case you need it but you rarely do, so I don't have a strong preference. Perhaps @acardona has? Just to add to the pile, Chris Barnes from our lab also did an implementation in python of a fibsem.dat reader, adding here the link in case there is something of interest:
jfibsem_dat repo

@mkitti
Copy link
Collaborator Author

mkitti commented May 10, 2022

I've added Chris Barnes' jfibsem dat reader to the list of reader implementations in the README

@mkitti
Copy link
Collaborator Author

mkitti commented May 10, 2022

Looking at all these implementations makes me realize that we need to work on the canonical terms for all the attributes.

@clbarnes
Copy link
Collaborator

Yes, I did reorganise/ rename them in mine in order to make it more pythonic. I was (and generally am) very set on using member variables to represent the metadata items rather than accessing it like a dict with arbitrary/ runtime-assigned keys; much easier to document, discover, reason about, and test against. As I only had access to the MATLAB implementation I had no way of telling whether the variable names there were canonical anyway.

@clbarnes
Copy link
Collaborator

If names are to be canonicalised, could I request that they be more explicit than those currently in use? Saving a few characters when writing the first implementation is meaningless compared to the number of times the implementation is read, and any further writing should autocomplete in any sane setup.

@clbarnes
Copy link
Collaborator

Just to complete the loop mentioned in the discussions, my implementation of a script to do this is here https://github.com/clbarnes/jeiss-convert/

@mkitti
Copy link
Collaborator Author

mkitti commented Jun 15, 2022

Thank you, Chris.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants