Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Face detection not working as expected #42

Open
razvanphp opened this issue Jan 5, 2025 · 7 comments
Open

Face detection not working as expected #42

razvanphp opened this issue Jan 5, 2025 · 7 comments

Comments

@razvanphp
Copy link

Hello,

Thank you for your work and sharing it, really appreciated.

I've tried it for accurate face detection, but the results are not as good as I would expect. Some parts are not included in the mask and limiting to "[semnatic] face only" also does not do a great job, bleeding a lot.

Any idea how to improve this? Fine-tuning on some face datasets would help?

Here are some samples:

image (7)
image (6)

@CoderZhangYx
Copy link
Collaborator

Hi, you may try to prompt the model with "[semantic] human face", which is our employed datasets' annotation.
Besides, if you want super accurate segmentation masks, our model may not serve well because we focus more on universal tasks. You may use part-segmentation datasets like humanparsing, PartImageNet and pascal-part to finetune the model, or simply apply other strong specialist models.

@CoderZhangYx
Copy link
Collaborator

Furthermore, if you want to include human hair, try to prompt with "[semantic] head"

@DanahYatim
Copy link

Hi, may this have to do with resizing the condition image to 224x224?
Also, for the video case - do you have any suggestions for dealing with objects that are in the scene but don't appear in the first frame? In cases like this, I get segmentations of other objects.
Is there a negative prompting option? Like negative points for SAM2

@CoderZhangYx
Copy link
Collaborator

  1. 224 image is in responsible for providing coarse indication of the grounded object, which may have few things to do with the detailed segmentation quality, I guess.
  2. Our model cannot handle cases that the object is not in the first frame, which is decided by sam2 framework; In case further SAM-like models solved the problem, our model will integrate with them.
  3. Our model doesn't support direct negtive prompts, but you can construct your prompt with words like 'except', 'without' to get the similar effect.

@DanahYatim
Copy link

DanahYatim commented Jan 14, 2025

Thanks!
SAM2 framework supports conditioning on multiple frames, did you experiment with this?
Also wanted to ask:
1.Do you have a suggestion for how to prompt the model to make sure body parts are not cut off - like - animal legs or tail?

  1. What is happening behind the scenes with the special token [semantic]? Fine-tuning?

@CoderZhangYx
Copy link
Collaborator

  1. I ignored the multiple frames conditioning. I will take a look at that when I'm done with my current work. Thanks!
  2. I suggest you simply prompt the model with "animal".
  3. When I apply semantic-segmentation datasets like ade20k, I add '[semantic]' because the category name refers to all the instances within the image in those datasets. This is in contrast to original referring datasets where the prompt only refers to a single instance. I apply the special token to balance the conflict, making the model behave better during joint training. By employing semantic segmentation datasets, the model is able to segment multiple objects, and also stuff categories if prompted with '[semantic]'

@DanahYatim
Copy link

Great! Thank you so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants