Object detection without predicting bounding box #608

nikky4D · 2021-01-18T15:16:33Z

📓 New <Tutorial/Example>

Is this a request for a tutorial or for an example?
This is request for an example of how to setup the parser and model training for an object prediction setup. Some guidance of how to

What is the task?
Object Prediction. In this case, the training set is same as with object detection where images have bounding boxes. However, we are interested in predicting the label of a provided bounding box. We are not concerned with predicting the bounding box itself. Could I set this up using FasterRCNN or DETR by removing components that predict bounding boxes?

Is this example for a specific model?
Object detection is typically centered around finding bounding boxes and labels. Here I only want the labels, as bounding boxes are given.

Is this example for a specific dataset?
Any dataset, PASCAL-VOC, etc would do.

Don't remove
Main issue for examples: #39

lgvaz · 2021-01-19T11:01:52Z

Does the Dataset.from_images method works on your case? Check out this tutorial

nikky4D · 2021-01-19T16:59:36Z

Thanks for the link. My question is more on how to modify a RCNN type network that takes in images and predicts a bounding box and label, to instead take in image and bounding box, and predict a label only. Would that just be simple image classification problem where we just focus the network's attention on the bounding box area?

lgvaz · 2021-01-19T18:22:29Z

Aaah, gotcha!

I think the simpler approach would be to just do image classification as you suggested. Crop the regions of the bboxes and feed that to a simple classifier.

You can of course build a model that takes as input the image and the bbox-cordinates, but there is no current "easy" way of doing that.

The parser itself will not change.

Can I ask you how your pipeline looks like? I'm curious how you have the bboxes but is only interested in predicting labels.

nikky4D · 2021-01-19T19:41:34Z

Thanks for the information. It is very helpful.

I am doing an image-based human action classification. That means classifying actions using static images, no video. When the system is done, I expect that the model will take detections of humans from a typical object detector, then classify the human pose to some action label.

So, to train the system, I got data from the ActivityNet challenge, AVA-Actions dataset. The dataset has label and bounding box annotations at specific timeframes for different actions. So, to get images, I take their videos, and get the annotated frames. This becomes the dataset that has images and bounding boxes. Needed a lot of cleaning, but I have about 500 images per class of interest for my work.

I've actually tried two image classification models (fastai transfer learning with resnet and densenet). The first model uses the full image, and the second uses images cropped to the bounding box. The second works better than the first, but both suffer on the test set, so I know something is wrong somewhere. Hence, why I want to try if I can use image and bounding box as input and see how it does.

Hope this helps. And thanks again for the help.

lgvaz · 2021-01-20T11:43:02Z

Have you tried to train a "normal" detection model and see how it does?

I expect that the model will take detections of humans from a typical object detector

Why does this first object detector does not output labels for the actions as well?

nikky4D · 2021-01-20T15:00:46Z

For the first question: I did have that idea and that brought me here from fastai forums. When I trained the model, it predicts actions and bounding boxes but misses detections on the person of interest in my image. Since I care more about action than the person detection, I wanted to see if I could find a way to focus the detection on the bounding box, without cropping the image, if I can as it can distort the aspect ratio, and remove context information that may be helpful for the system.

For the second question: I am making a modular system so each part can be finetuned separately. I already have a person detector running that works very well. So i'm just adding the action detection from pose portion.

lgvaz · 2021-01-22T00:05:20Z

Got it! I'm afraid I can't offer much help here, the second point is definitely possible, you would need to add another input branch to your models that takes the bboxes coordinates, but since the number of inputs is variable you will need to get creative here.

nikky4D · 2021-01-22T15:49:45Z

Thank you again for all the help. When I figure out the modification, I'll put in a PR tutorial for it.

lennyjuma · 2021-06-15T15:13:31Z

Does the Dataset.from_images method works on your case? Check out this tutorial

this link is not reachable

lgvaz · 2021-06-17T14:12:58Z

@lennyjuma try this one: https://airctic.com/0.8.0/inference/

nikky4D added documentation Improvements or additions to documentation example request good first issue Good for newcomers help wanted Extra attention is needed labels Jan 18, 2021

nikky4D closed this as completed Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Object detection without predicting bounding box #608

Object detection without predicting bounding box #608

nikky4D commented Jan 18, 2021

lgvaz commented Jan 19, 2021

nikky4D commented Jan 19, 2021

lgvaz commented Jan 19, 2021

nikky4D commented Jan 19, 2021

lgvaz commented Jan 20, 2021

nikky4D commented Jan 20, 2021

lgvaz commented Jan 22, 2021

nikky4D commented Jan 22, 2021

lennyjuma commented Jun 15, 2021

lgvaz commented Jun 17, 2021

Object detection without predicting bounding box #608

Object detection without predicting bounding box #608

Comments

nikky4D commented Jan 18, 2021

📓 New <Tutorial/Example>

lgvaz commented Jan 19, 2021

nikky4D commented Jan 19, 2021

lgvaz commented Jan 19, 2021

nikky4D commented Jan 19, 2021

lgvaz commented Jan 20, 2021

nikky4D commented Jan 20, 2021

lgvaz commented Jan 22, 2021

nikky4D commented Jan 22, 2021

lennyjuma commented Jun 15, 2021

lgvaz commented Jun 17, 2021