-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where to set number of frames #11
Comments
Hi, Thank you for your interest in our work and sorry about the confusion. It is true that we use T=64 in the Coarse stream and T=128 in the Fine stream inputs. The value that you mention, 320 (80x4), is the number of frames we consider at the original frame rate. However, we sample at a lower frame rate (gramma_tau=5), which results in 64 frames by default (320=64x5). Similarly, when we pre-extract Fine features and feed them to the two-stream model, we extract features for such 128 frames. ps: gamma_tau becomes 10 (i.e., 5 -> 10) within the dataset file, but we still sample 64/128 frames (corresponding temporal receptive field increases to 640/1280 frames at the original frame-rate) However, we trained these hyperparameters so that they work best for Charades. For your own dataset, you can play around with it. Generally, our backbone X3D can work with frame-rates as low as 2.5FPS (25 original FPS / 10). And the number of frames in the Fine-stream better cover the whole video if possible. On Charades, 128 frames at this lower frame-rate is sufficient to cover the full temporal duration of >90% of videos. Does this make things clear? Thanks! |
Hello, Thank you very much for your response to my question. I do have one follow up question. It has to do with the extract_fineFEAT.py script lines 81 and 86. 'testing' was set for both dataloaders. This causes an issue when I run the next train_coarse_fineFEAT.py script as it cannot find features for the training data. Your help is much appreciated! |
Sorry about the delay in response. The purpose of using |
Hello,
I recently read your paper and it states that for "Charades dataset, we configure our network to use T = 64,
T= 128 and α = 1/4."
Could you point me to where in the code the number of frames are set to 128?
I've only seen the frames parameter being set to 80 in:
Would you mind clarifying why the value is 80?
I'm trying to train my own dataset and would like to make sure I adjust the number of frames at the correct points in the code. I would greatly appreciate it if you could point me in the right direction :)
The text was updated successfully, but these errors were encountered: