Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio embedding length #186

Open
NevermoreCY opened this issue Aug 29, 2024 · 0 comments
Open

Audio embedding length #186

NevermoreCY opened this issue Aug 29, 2024 · 0 comments

Comments

@NevermoreCY
Copy link

NevermoreCY commented Aug 29, 2024

Hi, i have a question about Audio Embedding. In the paper, you mentioned that "Given the contextual influence on sequential audio data, we extracted the corresponding 5-second audio segment for the S frames." However, in code talk_video.py line 250, you set audio_tensor to the corresponding 5 frames of the audio embedding. Is that "5-second" a typo in the paper? Or did i misunderstand the pipeline.

From my understanding, the audio is first extracted from the video, then the audio is processed by wave2vec2 to obtain the audio embedding. So the audio embedding has same length as the video data(unit is number of frames). Does that means you cut the videos into 5 second slices before go to the data_preprocess.py scripts?

Thanks for reading and answering my concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant