Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oxford uncompressed .ebsp file load problem in Aztec V6.1 #690

Closed
Tijmenvermeij opened this issue Oct 4, 2024 · 18 comments · Fixed by #700
Closed

Oxford uncompressed .ebsp file load problem in Aztec V6.1 #690

Tijmenvermeij opened this issue Oct 4, 2024 · 18 comments · Fixed by #700
Labels
bug Something isn't working help wanted Would be nice if someone could help
Milestone

Comments

@Tijmenvermeij
Copy link

Hi,

We recently upgraded Aztec to V6.1, and it seems that now we cannot load uncompressed .ebsp files anymore using Kikuchipy. It gives the error that the .ebsp is compressed, but this is not the case (according to what we specify in Aztec software).
Does anyone have any idea what the problem can be?

A go-around for us would be to export the patterns to .h5oina and load those, but that would double the amount of data that we use in the end...

Thanks,
Tijmen

@hakonanes
Copy link
Member

Hi @Tijmenvermeij,

Thanks for this report. Just to clarify, you can read uncompressed patterns written with AZtec v6.0, but not with 6.1?

The reason for why the reader claims that the patterns are compressed lies here in OxfordBinaryReader.get_single_pattern_header(offset):

self.file.seek(offset)
header = np.fromfile(self.file, dtype=self.pattern_header_dtype, count=1)
return (
bool(header["is_compressed"][0]),
int(header["nrows"]),
int(header["ncols"]),
int(header["n_bytes"]),
)

If the particular byte assumed to contain the boolean is anything else than 0, it equates to true.

Can you send me a small .ebsp file of uncompressed patterns written with this software version? The only way we can fix this is by reverse engineering.

Unfortuantely, Oxford Instruments doesn't publish specifications for the binary .ebsp files (they are meant for internal use only, I've been told). The specification of their public HDF5 format (H5OINA) provides the following information on what's new in H5OINA v6.0 (corresponding to AZtec v6.1):

Add support for Unity, including export of multidetector systems with Unity and an auxillary detector.

What they have changed in the binary format to accomodate this, I have no clue.

A go-around for us would be to export the patterns to .h5oina and load those, but that would double the amount of data that we use in the end...

This is a valid problem and something that could be brought up with the Oxford Instrument folks!

@hakonanes hakonanes added bug Something isn't working help wanted Would be nice if someone could help labels Oct 4, 2024
@Tijmenvermeij
Copy link
Author

Hi,

We upgraded Aztec from V5.1 to V6.1(SP3). V5.1 patterns worked fine with Kikuchipy.
See here a small .ebsp file, saved as uncompressed with V6.1: https://www.dropbox.com/scl/fi/4nm24gdddb92u3ve303uw/32014751-d469-4c74-88fc-af3797e6872a.ebsp?rlkey=j1iob6d6ng6wawfxgmg5zz9db&dl=1

Note that the patterns are pure noise.

If necessary, we can contact Oxford instruments and ask for clarifications.

Thanks!
Tijmen

@hakonanes
Copy link
Member

Can you give any metadata about the file (number of patterns, pattern rows and columns, data type uint8 or uint16)? The reader interprets there to be 401 patterns of shape (nrows, ncols) = (128, 156), but this just leaves 8 007 209 - (401 * 128 * 156) = 41 bytes for remaining information, which I'm 99% sure is too little. The .ebsp files I've seen have an 8-byte file version in the beginning, then each pattern's byte starting position, then each pattern pre-pended with a 16-byte header and sometimes appended with an 18-byte footer. I think the number of patterns, 401, is incorrect.

@Tijmenvermeij
Copy link
Author

Hi,

See here the .h5oina file with the patterns included: https://www.dropbox.com/scl/fi/elp7deoyrz0r9l2clh6g3/Project-2-Specimen-1-Site-1-Map-Data-2.h5oina?rlkey=v1gpnhvjgjldtp8k1lhixzs2v&dl=1

There should be 400 patterns and they should be 8bit... The pattern size you mention seems to be correct.

@Yimin-Zhu
Copy link

Hi both, regarding the new AZtec .ebsp file, I got some clues from the EMsoft developer(Marc DeGraef): "starting from AZtec6 the number of bytes in each pattern header change from 16 to 42, and there is a 25 offsets to the pattern" Hope this helps.

@Tijmenvermeij
Copy link
Author

Thanks @Yimin-Zhu

I tried to load patterns from .hoina, which works, but only when patterns are saved "processed" (8bit, incl BG correction). When saving as "unprocessed", the .h5oina doesn't seem to load properly in Kikuchipy. Not sure what the problem is; the memory starts filling up, indicating that "Lazy" import does not work.

So I will start to have a look at loading of the .ebsp files myself. @hakonanes please let me know if you already made any progress... I'll do the same.

Thanks!
Tijmen

@CiosG
Copy link

CiosG commented Oct 10, 2024 via email

@Tijmenvermeij
Copy link
Author

Thanks for the information, Grzegorz!

Yeah I'm quite annoyed by Oxford's data management. Basically I'm storing 3 times as much data as actually needed, which adds up quite fast with some of our larger scans... Pat Trimby leaving doesn't help I guess, but perhaps we need to start bothering Mark Coleman or someone else about this :)

I'm not sure why Kikuchipy has issues importing the .h5oina files with unprocessed patterns included. The name of the processed dataset in the h5 structure seems to be the same... I'll run some more trials on a small dataset.

@Tijmenvermeij
Copy link
Author

So about importing .h5oina data in Kikuchipy, it seems that the Dataname 'Processed Patterns' is still valid. But once there are also Unprocessed Patterns stored in the h5, it seems to me that the Kikuchipy reader starts to load these into memory, even when Lazy=True.
Does the reader by default import all other data into memory, accept for the 'Processed Patterns', when Lazy=True?

@Tijmenvermeij
Copy link
Author

So about importing .h5oina data in Kikuchipy, it seems that the Dataname 'Processed Patterns' is still valid. But once there are also Unprocessed Patterns stored in the h5, it seems to me that the Kikuchipy reader starts to load these into memory, even when Lazy=True. Does the reader by default import all other data into memory, accept for the 'Processed Patterns', when Lazy=True?

Sorry for the spam, but I solved this issue, at least temporarily. I found the line that specifies that the pattern dataset should not be read into memory (I think), and added "Unprocessed Patterns" to it. I changed line 99 in oxford_h5ebsd.py to the following:
dd = _hdf5group2dict(group["EBSD/Data"], data_dset_names=[self.patterns_name, "Unprocessed Patterns"])
This solves the issue for me.
Perhaps this is not a suitable permanent solution though. I guess the user would need to have the option to import unprocessed patterns, if they like, instead of processed patterns...

@hakonanes
Copy link
Member

I got some clues from the EMsoft developer(Marc DeGraef): "starting from AZtec6 the number of bytes in each pattern header change from 16 to 42, and there is a 25 offsets to the pattern"

Thank you for sharing, @Yimin-Zhu, this may be exactly the information we need.

@Tijmenvermeij, I opened #692 to track your issue with lazy loading of H5OINA files. Thank you for reporting this. I suggest to continue that discussion there.

@CiosG, thank you for bringing your knowledge of AZtec's workings to this issue. I've opened #693 to address the *.uebsp extension (unknown to me!).

@hakonanes
Copy link
Member

@Tijmenvermeij, can you confirm that this is how the first pattern in your small test dataset (*.ebsp and *.h5oina) should look like?

first_pattern

@hakonanes
Copy link
Member

The new pattern header in Oxford Instrument's *.ebsp files with version 6 (possibly also 5) is:

  • int32 (map x)
  • int32 (map y)
  • int32 (is_compressed)
  • int32 (n pattern rows)
  • int32 (n pattern columns)
  • int32 (n pattern bytes)

The footer stays the same, with beam (x, y), which stores the same information as map (x, y) scaled by the step size.

metadata

@marcdegraef, thanks for pointing the extra bytes out to @Yimin-Zhu, who pointed it out to us here.

@hakonanes
Copy link
Member

@Tijmenvermeij, I made a fix in https://github.com/hakonanes/kikuchipy/tree/690-oxford-ebsp-aztec-6.1, could you try it out? python -m pip install 'kikuchipy@git+https://github.com/hakonanes/kikuchipy@tree/690-oxford-ebsp-aztec-6.1

@Tijmenvermeij
Copy link
Author

@Tijmenvermeij, can you confirm that this is how the first pattern in your small test dataset (*.ebsp and *.h5oina) should look like?

first_pattern

Hi, this seems correct!

@hakonanes
Copy link
Member

Thanks for confirming! Then I'll go ahead with the patch release.

@hakonanes
Copy link
Member

Fixed in #700, will be part of a patch release 0.11.1 soon.

@hakonanes
Copy link
Member

This shouldn't be a problem in the new 0.11.1 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Would be nice if someone could help
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants