-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User/rcadene/2024 10 07 vla #467
base: main
Are you sure you want to change the base?
Conversation
) | ||
|
||
hidden_states = llava_output.hidden_states[-1] # Use last layer's hidden state | ||
hidden_states = hidden_states[:, -4:, :] #make 4 a config parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the last 4 embeddings as an input to the action decoder. Because chunk_size is divisible by 4.
processed_inputs = self.processor( | ||
text=batch["prompt"], videos=list(batch["observation.images"]), return_tensors="pt", padding=True, do_rescale=False | ||
).to(self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reasons I lose the original batch size? it should be 2. Maybe because we have a 5-dim input, not sure what to do with the camera index. I remove normalization as well, because the processor takes unnormalized PIL images/frames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
Returns: | ||
action_logits: Tensor of predicted actions. | ||
""" | ||
batch_size = hidden_states.size(0) # Ensure batch size is extracted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is the core and I think it might contain the mistakes. the way we repeat the input to match the chunk_size and also the encoder_out value needs to be checked and reviewed.
What this does
Explain what this PR does. Feel free to tag your PR with the appropriate label(s).
Examples:
How it was tested
Explain/show how you tested your changes.
Examples:
test_something
intests/test_stuff.py
.new_feature
and checked that training converges with policy X on dataset/environment Y.some_function
, it now runs X times faster than previously.How to checkout & try? (for the reviewer)
Provide a simple way for the reviewer to try out your changes.
Examples:
SECTION TO REMOVE BEFORE SUBMITTING YOUR PR
Note: Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR. Try to avoid tagging more than 3 people.
Note: Before submitting this PR, please read the contributor guideline.