You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code and model weights of paper [CVPR 2024] AM-RADIO: Agglomerative Vision Foundation Model - Reduce All Domains Into One has been released by Nvidia
RADIO , a new vision foundation model (actually a new vit pretrained weight), excels across visual domains, serving as a superior replacement for vision backbones. Integrating CLIP variants, DINOv2, and SAM through distillation, it preserves unique features like text grounding and segmentation correspondence.
The text was updated successfully, but these errors were encountered:
I haven't seen it yet. But I notice the new RADIOv2.5 model is released, which merged knowledge from DFN CLIP, DINOv2, SigLIP, and SAM through multi-teacher distillation. It looks very practical in downstream task. https://github.com/NVlabs/RADIO/blob/main/RADIOv2.5_tech_report.md
https://github.com/NVlabs/RADIO
The code and model weights of paper [CVPR 2024] AM-RADIO: Agglomerative Vision Foundation Model - Reduce All Domains Into One has been released by Nvidia
The text was updated successfully, but these errors were encountered: