Face detection not working as expected #42

razvanphp · 2025-01-05T11:23:54Z

Hello,

Thank you for your work and sharing it, really appreciated.

I've tried it for accurate face detection, but the results are not as good as I would expect. Some parts are not included in the mask and limiting to "[semnatic] face only" also does not do a great job, bleeding a lot.

Any idea how to improve this? Fine-tuning on some face datasets would help?

Here are some samples:

CoderZhangYx · 2025-01-05T12:09:55Z

Hi, you may try to prompt the model with "[semantic] human face", which is our employed datasets' annotation.
Besides, if you want super accurate segmentation masks, our model may not serve well because we focus more on universal tasks. You may use part-segmentation datasets like humanparsing, PartImageNet and pascal-part to finetune the model, or simply apply other strong specialist models.

CoderZhangYx · 2025-01-05T12:10:36Z

Furthermore, if you want to include human hair, try to prompt with "[semantic] head"

DanahYatim · 2025-01-14T10:16:24Z

Hi, may this have to do with resizing the condition image to 224x224?
Also, for the video case - do you have any suggestions for dealing with objects that are in the scene but don't appear in the first frame? In cases like this, I get segmentations of other objects.
Is there a negative prompting option? Like negative points for SAM2

CoderZhangYx · 2025-01-14T11:02:26Z

224 image is in responsible for providing coarse indication of the grounded object, which may have few things to do with the detailed segmentation quality, I guess.
Our model cannot handle cases that the object is not in the first frame, which is decided by sam2 framework; In case further SAM-like models solved the problem, our model will integrate with them.
Our model doesn't support direct negtive prompts, but you can construct your prompt with words like 'except', 'without' to get the similar effect.

DanahYatim · 2025-01-14T16:05:04Z

Thanks!
SAM2 framework supports conditioning on multiple frames, did you experiment with this?
Also wanted to ask:
1.Do you have a suggestion for how to prompt the model to make sure body parts are not cut off - like - animal legs or tail?

What is happening behind the scenes with the special token [semantic]? Fine-tuning?

CoderZhangYx · 2025-01-14T16:36:37Z

I ignored the multiple frames conditioning. I will take a look at that when I'm done with my current work. Thanks!
I suggest you simply prompt the model with "animal".
When I apply semantic-segmentation datasets like ade20k, I add '[semantic]' because the category name refers to all the instances within the image in those datasets. This is in contrast to original referring datasets where the prompt only refers to a single instance. I apply the special token to balance the conflict, making the model behave better during joint training. By employing semantic segmentation datasets, the model is able to segment multiple objects, and also stuff categories if prompted with '[semantic]'

DanahYatim · 2025-01-14T23:50:36Z

Great! Thank you so much for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Face detection not working as expected #42

Face detection not working as expected #42

razvanphp commented Jan 5, 2025

CoderZhangYx commented Jan 5, 2025

CoderZhangYx commented Jan 5, 2025

DanahYatim commented Jan 14, 2025

CoderZhangYx commented Jan 14, 2025

DanahYatim commented Jan 14, 2025 •

edited

Loading

CoderZhangYx commented Jan 14, 2025

DanahYatim commented Jan 14, 2025

Face detection not working as expected #42

Face detection not working as expected #42

Comments

razvanphp commented Jan 5, 2025

CoderZhangYx commented Jan 5, 2025

CoderZhangYx commented Jan 5, 2025

DanahYatim commented Jan 14, 2025

CoderZhangYx commented Jan 14, 2025

DanahYatim commented Jan 14, 2025 • edited Loading

CoderZhangYx commented Jan 14, 2025

DanahYatim commented Jan 14, 2025

DanahYatim commented Jan 14, 2025 •

edited

Loading