Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference: only images without audio #6

Closed
Oktai15 opened this issue Mar 11, 2019 · 3 comments
Closed

Inference: only images without audio #6

Oktai15 opened this issue Mar 11, 2019 · 3 comments

Comments

@Oktai15
Copy link

Oktai15 commented Mar 11, 2019

Hello, @miha-skalic, great work!

Can I use your model without audio features? For example, I want to test your model on my video, I don't have feature extractor for audio (because it was not published), do I have ability to try your model? If yes, so how can I?

@miha-skalic
Copy link
Owner

Hi @Oktai15 ,

Unfortunately, Google has (not yet) release the audio feature extraction part. I'm guessing that one could use a vector of zeros for the audio features. Note that we have not tested this and thus cannot tell anything-thing about the impact on the performance.

@Oktai15
Copy link
Author

Oktai15 commented Mar 16, 2019

Thank you, @miha-skalic!

@Oktai15 Oktai15 closed this as completed Mar 16, 2019
@ideaRunner
Copy link

ideaRunner commented Apr 18, 2019

About the audio feature extraction, I found this code https://github.com/tensorflow/models/tree/master/research/audioset#output-embeddings
Also a person who have used it and get a good result.
antoine77340/Youtube-8M-WILLOW#28

The released AudioSet embeddings were postprocessed before release by applying a PCA transformation (which performs both PCA and whitening) as well as quantization to 8 bits per embedding element. This was done to be compatible with the YouTube-8M project which has released visual and audio embeddings for millions of YouTube videos in the same PCA/whitened/quantized format.
We provide a Python implementation of the postprocessing which can be applied to batches of embeddings produced by VGGish. vggish_inference_demo.py shows how the postprocessor can be run after inference.
If you don't need to use the released embeddings or YouTube-8M, then you could skip postprocessing and use raw embeddings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants