Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save_visual_results in visualBERT #36

Open
guanhdrmq opened this issue Aug 28, 2023 · 0 comments
Open

save_visual_results in visualBERT #36

guanhdrmq opened this issue Aug 28, 2023 · 0 comments

Comments

@guanhdrmq
Copy link

Hi HIla,
Could you point where is the save_visual_results function definition? I use ViLT for multimodal transformer but cannot use num_tokens = image_attn_blocks[0].attn_probs.shape[-1] to set the number of tokens. For example, VILT for VQA task and image is 384*384 size. The number of vision and text mixed token is 185 including cls token, so the vision token is 144 and the text token is 40 (max length).
Thanks very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant