Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object detection without predicting bounding box #608

Closed
nikky4D opened this issue Jan 18, 2021 · 10 comments
Closed

Object detection without predicting bounding box #608

nikky4D opened this issue Jan 18, 2021 · 10 comments
Labels
documentation Improvements or additions to documentation example request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@nikky4D
Copy link

nikky4D commented Jan 18, 2021

📓 New <Tutorial/Example>

Is this a request for a tutorial or for an example?
This is request for an example of how to setup the parser and model training for an object prediction setup. Some guidance of how to

What is the task?
Object Prediction. In this case, the training set is same as with object detection where images have bounding boxes. However, we are interested in predicting the label of a provided bounding box. We are not concerned with predicting the bounding box itself. Could I set this up using FasterRCNN or DETR by removing components that predict bounding boxes?

Is this example for a specific model?
Object detection is typically centered around finding bounding boxes and labels. Here I only want the labels, as bounding boxes are given.

Is this example for a specific dataset?
Any dataset, PASCAL-VOC, etc would do.


Don't remove
Main issue for examples: #39

@nikky4D nikky4D added documentation Improvements or additions to documentation example request good first issue Good for newcomers help wanted Extra attention is needed labels Jan 18, 2021
@lgvaz
Copy link
Collaborator

lgvaz commented Jan 19, 2021

Does the Dataset.from_images method works on your case? Check out this tutorial

@nikky4D
Copy link
Author

nikky4D commented Jan 19, 2021

Thanks for the link. My question is more on how to modify a RCNN type network that takes in images and predicts a bounding box and label, to instead take in image and bounding box, and predict a label only. Would that just be simple image classification problem where we just focus the network's attention on the bounding box area?

@lgvaz
Copy link
Collaborator

lgvaz commented Jan 19, 2021

Aaah, gotcha!

I think the simpler approach would be to just do image classification as you suggested. Crop the regions of the bboxes and feed that to a simple classifier.

You can of course build a model that takes as input the image and the bbox-cordinates, but there is no current "easy" way of doing that.

The parser itself will not change.

Can I ask you how your pipeline looks like? I'm curious how you have the bboxes but is only interested in predicting labels.

@nikky4D
Copy link
Author

nikky4D commented Jan 19, 2021

Thanks for the information. It is very helpful.

I am doing an image-based human action classification. That means classifying actions using static images, no video. When the system is done, I expect that the model will take detections of humans from a typical object detector, then classify the human pose to some action label.

So, to train the system, I got data from the ActivityNet challenge, AVA-Actions dataset. The dataset has label and bounding box annotations at specific timeframes for different actions. So, to get images, I take their videos, and get the annotated frames. This becomes the dataset that has images and bounding boxes. Needed a lot of cleaning, but I have about 500 images per class of interest for my work.

I've actually tried two image classification models (fastai transfer learning with resnet and densenet). The first model uses the full image, and the second uses images cropped to the bounding box. The second works better than the first, but both suffer on the test set, so I know something is wrong somewhere. Hence, why I want to try if I can use image and bounding box as input and see how it does.

Hope this helps. And thanks again for the help.

@lgvaz
Copy link
Collaborator

lgvaz commented Jan 20, 2021

Have you tried to train a "normal" detection model and see how it does?

I expect that the model will take detections of humans from a typical object detector

Why does this first object detector does not output labels for the actions as well?

@nikky4D
Copy link
Author

nikky4D commented Jan 20, 2021

For the first question: I did have that idea and that brought me here from fastai forums. When I trained the model, it predicts actions and bounding boxes but misses detections on the person of interest in my image. Since I care more about action than the person detection, I wanted to see if I could find a way to focus the detection on the bounding box, without cropping the image, if I can as it can distort the aspect ratio, and remove context information that may be helpful for the system.

For the second question: I am making a modular system so each part can be finetuned separately. I already have a person detector running that works very well. So i'm just adding the action detection from pose portion.

@lgvaz
Copy link
Collaborator

lgvaz commented Jan 22, 2021

Got it! I'm afraid I can't offer much help here, the second point is definitely possible, you would need to add another input branch to your models that takes the bboxes coordinates, but since the number of inputs is variable you will need to get creative here.

@nikky4D
Copy link
Author

nikky4D commented Jan 22, 2021

Thank you again for all the help. When I figure out the modification, I'll put in a PR tutorial for it.

@nikky4D nikky4D closed this as completed Jan 22, 2021
@lennyjuma
Copy link

Does the Dataset.from_images method works on your case? Check out this tutorial

this link is not reachable

@lgvaz
Copy link
Collaborator

lgvaz commented Jun 17, 2021

@lennyjuma try this one: https://airctic.com/0.8.0/inference/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation example request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants