Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: consider adding support for GUANO audio metadata parsing/writing #270

Open
sammlapp opened this issue Oct 10, 2024 · 5 comments
Open
Labels

Comments

@sammlapp
Copy link

Describe the solution you'd like
Guano is a metadata convention used by the bat acoustics community

see spec https://github.com/riggsd/guano-spec/blob/master/guano_specification.md

and python package https://pypi.org/project/guano/

It has a nice set of values in the defaults, and allows "namespaces" with custom sets of fields as well.

Implementation-wise, it writes a separate section of WAV file (only wav is supported) similar to the default header, but it can be at the end for some reason. However, it seems it also supports I/O of text files and other formats.

@sammlapp sammlapp added the ENH: enhancement New feature or request label Oct 10, 2024
@NickleDave
Copy link
Collaborator

NickleDave commented Oct 10, 2024

Thank you @sammlapp for suggesting this.

I have looked at GUANO before but hadn't added it.
Are you seeing a lot of usage of this format in the wild?

We are biased right now towards formats for annotating speech-like sequences of sounds like birdsong syllables; it seems like GUANO is at the other extreme where the goal is to provide as much relevant metadata as possible about a detection, using the term as it used in bioacoustics. (I know you know this, just adding context for anyone else who stumbles on the issue.)

It's not clear to me from the spec: is there a way to represent multiple detections within a single file?

Don't mean to grill you, I really appreciate your suggesting this -- I'm just hoping since you're an actual bioacoustician you might have more insight into how this format is being used in the wild

Some related discussion here: tdwg/ac#264 and tdwg/ac#247 (related in the sense that it provides context about how standards groups are thinking about GUANO)

@NickleDave
Copy link
Collaborator

Also ... are you aware of any publicly available datasets that use this format?

I think I looked before and couldn't find any, another reason I didn't raise an issue about it.
Just asking since it would help to test that we can actually parse / write

@NickleDave
Copy link
Collaborator

🤔 this at least says it was collected with Anabat (and Audiomoth):
https://databank.illinois.edu/datasets/IDB-4200947

@sammlapp
Copy link
Author

To be honest, I wouldn't prioritize this as I don't see it being used a lot and haven't had a reason to need it. I opened the feature request because there was an open feature request about Guano on OpenSoundscape, but it seems like if integration is implemented anywhere it should be in Crowsetta rather than OpenSoundscape

@NickleDave
Copy link
Collaborator

Got it, thank you @sammlapp. Happy to add it if you start seeing more of a need for it, looks fairly painless from the Python implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants