Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to set number of frames #11

Open
margauxbowditch opened this issue Mar 17, 2023 · 3 comments
Open

Where to set number of frames #11

margauxbowditch opened this issue Mar 17, 2023 · 3 comments

Comments

@margauxbowditch
Copy link

margauxbowditch commented Mar 17, 2023

Hello,
I recently read your paper and it states that for "Charades dataset, we configure our network to use T = 64,
T= 128 and α = 1/4."
Could you point me to where in the code the number of frames are set to 128?

I've only seen the frames parameter being set to 80 in:

  • train_fine.py on line 57 (80x4)
  • extract_fineFEAT.py on line 61
  • train_coarse_fineFEAT.py line 60 (80x4)

Would you mind clarifying why the value is 80?
I'm trying to train my own dataset and would like to make sure I adjust the number of frames at the correct points in the code. I would greatly appreciate it if you could point me in the right direction :)

@margauxbowditch margauxbowditch changed the title here to set number of frames Where to set number of frames Mar 17, 2023
@kkahatapitiya
Copy link
Owner

Hi,

Thank you for your interest in our work and sorry about the confusion. It is true that we use T=64 in the Coarse stream and T=128 in the Fine stream inputs. The value that you mention, 320 (80x4), is the number of frames we consider at the original frame rate. However, we sample at a lower frame rate (gramma_tau=5), which results in 64 frames by default (320=64x5). Similarly, when we pre-extract Fine features and feed them to the two-stream model, we extract features for such 128 frames.

ps: gamma_tau becomes 10 (i.e., 5 -> 10) within the dataset file, but we still sample 64/128 frames (corresponding temporal receptive field increases to 640/1280 frames at the original frame-rate)

However, we trained these hyperparameters so that they work best for Charades. For your own dataset, you can play around with it. Generally, our backbone X3D can work with frame-rates as low as 2.5FPS (25 original FPS / 10). And the number of frames in the Fine-stream better cover the whole video if possible. On Charades, 128 frames at this lower frame-rate is sufficient to cover the full temporal duration of >90% of videos.

Does this make things clear?

Thanks!

@margauxbowditch
Copy link
Author

Hello,

Thank you very much for your response to my question.

I do have one follow up question. It has to do with the extract_fineFEAT.py script lines 81 and 86. 'testing' was set for both dataloaders. This causes an issue when I run the next train_coarse_fineFEAT.py script as it cannot find features for the training data.
How do you propose one should fix this? Can one set line 81 to be 'training' instead of 'testing' so that features will be extracted for all the data?

Your help is much appreciated!

@kkahatapitiya
Copy link
Owner

Sorry about the delay in response. The purpose of using testing flag for both train/val splits when extracting features, is to avoid any random sampling and augmentations that apply when using training flag. Extracted features correspond to actual inputs as they are, not augmented versions. However, extract_fineFEAT.py extract features for both train/val splits. Can you verify this is the case for your data? As long as you extract features with testing flag for both your train/val splits, you should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants