Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PARENT ISSUE] Implement the temporal changes in 4M to account for video #2

Open
kdu4108 opened this issue Jun 21, 2024 · 2 comments
Open
Assignees

Comments

@kdu4108
Copy link
Collaborator

kdu4108 commented Jun 21, 2024

Implement the model according to this design: https://docs.google.com/presentation/d/1AY3QV1N_hoi9aXI1r8QTqrNmDK9LyorgJDQMPWb8hBo/edit#slide=id.g2e696416940_0_144.

This includes (at least) several steps, each which will be detailed in its own github issue/PR:

@kdu4108 kdu4108 changed the title Implement the temporal changes in 4M to account for video [PARENT ISSUE] Implement the temporal changes in 4M to account for video Jun 21, 2024
@garjania
Copy link

Considering the RGB frames, before adding anything to modality_info or modality_transform, we need to tokenize them. So I suggest to also include the RGB tokenization step for the video datasets somewhere along the first steps.

@kdu4108
Copy link
Collaborator Author

kdu4108 commented Jul 3, 2024

(why?) -- We need to tokenize RGB (and all other vision-like modalities) because they can be inputted as tokens to the model. (in fact, RGB is the only one which allows for pixel-patches which would not require tokenization)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants