Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for directly running on segmentation on video files. #46

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

rolson24
Copy link

This PR adds support for running segmentation directly on video files instead of individual image files. It uses torchvision's built-in VideoReader object and only adds the dependancy of PyAV which is are the python bindings for ffmpeg. Alternatively, users could compile torchvision from source with the video_reader backend if they didn't want to install PyAV. I think this could really improve the easy of building demos for SAM-2 if this gets added because then the entire video doesn't have to be extracted first and then read into RAM.

I will do some more rigorous testing to make sure it doesn't affect the expected behavior, but it seems to be working for now.

@facebook-github-bot
Copy link

Hi @rolson24!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@jordan-barrett-jm
Copy link

Hey @rolson24 this is super interesting! I've actually been attempting to run video segmentation using longer videos (5+ minutes) and I've been running into memory allocation errors. Have you tested longer videos using this method?

@rolson24
Copy link
Author

rolson24 commented Aug 1, 2024

@jordan-barrett-jm
I have not fully tested it yet, but I'm going to right now. I will let you know if it works. Also I think the Ultralytics team is working hard to integrate SAM-2.0 into their library with online video segmentation, but its not quite ready yet.

Also I have a colab notebook that demonstrates this change that is based on the Roboflow one here

@jordan-barrett-jm
Copy link

Thanks! One solution I've found in the interim is mini batching the images

@MattLiutt
Copy link

Thanks for the great work! Just curious, it seems like we cannot still add new point during inference right? What I mean is sort of real-time tracking.

@rolson24
Copy link
Author

rolson24 commented Aug 1, 2024 via email

@MattLiutt
Copy link

I've tested and checked the code, once the inference started, it prohibited from adding new points.

@Bhumika28661773
Copy link

@rolson24 how do i run and test it and how can i get to know the label assigned to each object in the video?

dcnieho added a commit to dcnieho/segment-anything-2 that referenced this pull request Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants