Looking for advice on increasing accuracy and speed of predictions #2036

GraysonButcher · 2024-12-05T17:35:17Z

GraysonButcher
Dec 5, 2024

Hello! I am new to the pose estimation world and have been attempting to use SLEAP—in combination with a small python program I wrote—to identify head-twitch responses with long evans rats. There are a few things I’ve been struggling to optimize which I was hoping to elicit some help with: 1) improving inference performance and 2) improving speed of inference. My ultimate goal is “better” and “faster” inference labeling. Though they may not apply to my scenario, I’ve seen numbers like > .80 mAP scores and inference speeds of 200+ fps and I’m not sure how to approach either of those numbers. I’ll provide some background on setup, examples of labels, and more details on the problems below. The changes I've made to model parameters have been a function of different suggestions I've found on the forums (as well as the troubleshooting workflows tutorial), but I still find myself struggling to get a 'feel' for what really improves my model accuracy AND speed.

Setup

I’ve labeled around 900 frames on 1280x1024 resolution, ~160fps, top-down videos. I’ve been assigning labels to the nose, middle of the head, ears, and one part of the back (see pictures, below). I can’t, unfortunately, reduce the resolution of the basler cameras I’m using (at least using the pylon viewer I use for recording videos). And I need to use as high of FPS as I can, since my behavior-of-interest is incredibly brief (see Halberstadt & Geyer, 2013; in mice, “average number of rotations per HTR being 7.2 ± 0.1 (mean ± SEM), and the direction of rotation alternating at an average frequency of 90.3 ± 0.3Hz”). In fact, I’d love to get much higher FPS, but as far as I can tell I can’t improve our computer/camera setup much more.

The computer uses an NVIDIA RTW 4500 ada generation GPU and has 128GB of RAM (I think). I’m not sure if there’s other stuff I need to share with computer specs. Generally, running training/inference results in ~100% GPU usage. It looks like it’s all the 3D component, I don’t believe the other stuff fills up. Example picture:

Problems and Examples

Problem Background: One problem I’m encountering is that I can’t seem to get the quality of labels quite to where I want them. For the most part, the inferences are workable, but I continue to run into new problem poses or slightly different shaped animals (e.g., chubbier animals with smaller ears). Qualitatively, the inferences are pretty close to where I want them for ~90% of frames, even on “bad” videos, but times where it strays from accurate labeling can really trip up my custom python program for identifying head-twitch responses. Though I’ve been able to adapt my program for certain types of predictable errors (e.g., if the nose coordinates are moving a lot, it’s not a headtwitch!), I feel like the ultimate solution is to increase the fidelity of the predictions and not worry so much about the (potentially endless) postprocessing.

With the different model parameters I’ve tried, my mAP scores are generally between 0.29 to 0.39. I don’t know why the scores are so low, even with videos where it looks qualitatively pretty dang good throughout. Perhaps it’s because the nose is occasionally absent when the animal is grooming? I’ll also note, hesitantly, that it seems like model parameters that give me ‘smoother’ keypoint labels (across frames) also slightly increase the frequency of inaccurate labels (especially of the nose). I also seem to occasionally run into a pose that, despite labeling 5-10 frames of almost identical frames, I still get inaccurate inference on.
Picture time. Below are some pictures of my so-called ‘problematic’ frames. Then I’ll show my models and metrics for them.

Current favorite model. It infers at about 77 fps and performance is significantly better than when the max stride is set to 16.

Another model I tried following a discussion I saw on the forums. Slower inference, smoother and more accurate keypoints, but also getting a lot of frames where it says a keypoint is missing. I had never gotten that before trying this model.

My attempt to strike a balance between the two previous models. Don’t really get missing keypoints anymore, and also get pretty smooth keypoints. But, I still get inaccurate frames at points and it takes longer to run inference.

I tried to make it quicker here and this one just sucks.

This one takes forever to train and run inference (I'm guessing because the input scale is 1.0?). I don’t remember the quality of the labels, but it didn’t seem feasible to use, if I recall correctly.

Picture of the metrics summary for all models with the mAP.

My first question is what advice would you provide to improve performance? I’ve generally tried to follow the advice on ‘troubleshooting workflow,’ though I can’t say that I am doing so with high fidelity since I am still learning. Part of me wants to just continue to seek out and label additional frames, but I’m growing less confident that this is going to solve my issues (given that doing so hasn’t always ameliorated the issue so far).

And my second question is what advice would you give for increasing the speed/rate of inference? Depending on the specific model I use, I’m getting about 50 to 80 FPS of inferencing. With my favorite model parameters, it takes about 4 hours to run inference on a 2-hour video.

TL;DR: I want to increase the speed and accuracy of my labels but I’m not sure where to go from here. Thank you so so much for your time and I apologize if I've missed something in the forums.

Answered by talmo

Dec 6, 2024

Hi @GraysonButcher,

Thanks for the extensive documentation of your workflow so far!! That might be a contender for the most thorough description of a problem that we've ever received here 😁

Let's dig in:

And I need to use as high of FPS as I can, since my behavior-of-interest is incredibly brief (see Halberstadt & Geyer, 2013; in mice, “average number of rotations per HTR being 7.2 ± 0.1 (mean ± SEM), and the direction of rotation alternating at an average frequency of 90.3 ± 0.3Hz”). In fact, I’d love to get much higher FPS, but as far as I can tell I can’t improve our computer/camera setup much more.

You might consider using a magnetometer in your setup as it's been well validated for…

View full answer

talmo · 2024-12-06T02:00:41Z

talmo
Dec 6, 2024
Maintainer

Hi @GraysonButcher,

Thanks for the extensive documentation of your workflow so far!! That might be a contender for the most thorough description of a problem that we've ever received here 😁

Let's dig in:

And I need to use as high of FPS as I can, since my behavior-of-interest is incredibly brief (see Halberstadt & Geyer, 2013; in mice, “average number of rotations per HTR being 7.2 ± 0.1 (mean ± SEM), and the direction of rotation alternating at an average frequency of 90.3 ± 0.3Hz”). In fact, I’d love to get much higher FPS, but as far as I can tell I can’t improve our computer/camera setup much more.

You might consider using a magnetometer in your setup as it's been well validated for HTR. This paper from Adam Halberstadt even describes a cool approach for handling the data analysis piece. Not sure how it works in an arena like yours, but could be worth a shot :)

In case you're not at the limit of your camera's capacity, you could try some of the tips here, in particular the binning since you have resolution to spare.

If the issue is in acquisition, try out campy which has an easy configuration system for messing around with encoding settings so that your acquisition PC can keep up with the high FPS data stream or use the GPU for accelerated encoding.

With the different model parameters I’ve tried, my mAP scores are generally between 0.29 to 0.39. I don’t know why the scores are so low, even with videos where it looks qualitatively pretty dang good throughout. Perhaps it’s because the nose is occasionally absent when the animal is grooming?

Yep, the mAP score is a best effort metric that conveniently summarizes performance into a single number, but to use it appropriately, you'd really need to do a good bit of work to determine the true variability in your data. This page describes this in a lot more detail. In SLEAP, the problem is indeed that the number of keypoints you drop will greatly affect the OKS (and consequently mAP), even more so when you have few keypoints.

I would recommend just treating it as something that allows you to compare relative performance between different model configurations trained on the same data.

I’ll also note, hesitantly, that it seems like model parameters that give me ‘smoother’ keypoint labels (across frames) also slightly increase the frequency of inaccurate labels (especially of the nose). I also seem to occasionally run into a pose that, despite labeling 5-10 frames of almost identical frames, I still get inaccurate inference on.

Not sure about the smoothing across time (SLEAP predicts on individual images without temporal dependencies), but the rest can totally happen just by chance since model optimization has multiple layers of stochasticity. For example, it might be that several of your almost identical frames land in the validation set by chance, making them underrepresented overall.

My first question is what advice would you provide to improve performance? I’ve generally tried to follow the advice on ‘troubleshooting workflow,’ though I can’t say that I am doing so with high fidelity since I am still learning.

Alright, so let's get into some model optimization, starting off with your favorite model:

First off, you can decrease the validation fraction to 0.1, which will recover a bit more of your labels to be used for training.

Second, set your sigma to 2.5. A sigma of 5 is pretty huge, especially for trying to recover HTRs! The sigma controls the spread of the confidence maps. The bigger they are, the easier it is to train, but the less spatially precise your estimates will be. This notebook covers some theory and has some experiments that help build an intuition for this behavior.

Third, after retraining with the above changes, try training again, but this time, enable the "Use Trained Model" and "Resume Training" checkboxes. This will initialize the model with the weights from your last run. Importantly, also enable the "Online Mining" checkbox. This turns on a special optimization mode that will upweight underperforming node types, encouraging the model to focus more on hard nodes that are often missing. We find this works great when we're starting off doing well in general, but using it when training from scratch often leads to instability. You're welcome to try both, but I think resuming from the model that's doing well as a baseline should help you bridge that last gap.

Part of me wants to just continue to seek out and label additional frames, but I’m growing less confident that this is going to solve my issues (given that doing so hasn’t always ameliorated the issue so far).

Assuming the model isn't majorly misconfigured, labeling should actually always help. We didn't think it would until recently when we started experimenting with very large dataset sizes (10k-100k labels), and it turns out that it just keeps getting better -- even with a limited capacity model like the ones SLEAP uses! That said, it's definitely diminishing returns past ~1k, so you're really just optimizing for the outliers and to improve generalization across datasets/animals.

And my second question is what advice would you give for increasing the speed/rate of inference? Depending on the specific model I use, I’m getting about 50 to 80 FPS of inferencing. With my favorite model parameters, it takes about 4 hours to run inference on a 2-hour video.

If your goal is to just increase throughput, the easiest thing is to increase the batch size. The default is 4 so that images will fit on most GPUs, but that's pretty conservative and usually won't make use of the whole GPU's capacity. Try 16 or 32 and see where you're at. You'll need to use the CLI (sleap-track) instead of the GUI for this.

Beyond that, you've already observed how changing the model can affect speed. I would focus on getting to a model that is accurate before tuning it for speed, but the thing that'll give you the most gain will definitely be setting the output stride to 4 or 8. This will subsample the confidence maps which will drastically improve speed at the cost of decreased resolution of your landmarks (i.e., you'll get more quantization error).

Give these tips a go and let us know how it goes!

Cheers,

Talmo

2 replies

GraysonButcher Dec 6, 2024
Author

@talmo,

Thank you so much for the quick and very helpful responses to all my questions! We may go the magnetometer route in the future but wanted to take a swing at this approach first. And I may try out campy to see if I can squeeze out a bit better acquisition.

I'll go ahead and make the suggested changes to the validation fraction and sigma. Then I'll try resuming training and adding the mining.

If those changes get me what I'm after in terms of accuracy, I'll then tackle the batch size and output stride stuff. I never would have guessed that increasing the batch size would help with speed of inference, I need to definitely read more about how all these parameters work! And If I'm understanding correctly, I'll need to retrain the network at this point with a higher batch size and/or output stride, and then I'll have to again do the resume training with online mining?

Thanks again for all your guidance! And thank you for developing and disseminating SLEAP, it's an awesome program.

-Grayson

talmo Dec 6, 2024
Maintainer

Hi @GraysonButcher,

Thank you so much for the quick and very helpful responses to all my questions! We may go the magnetometer route in the future but wanted to take a swing at this approach first. And I may try out campy to see if I can squeeze out a bit better acquisition.

I'll go ahead and make the suggested changes to the validation fraction and sigma. Then I'll try resuming training and adding the mining.

Dope, keep us posted!

If those changes get me what I'm after in terms of accuracy, I'll then tackle the batch size and output stride stuff. I never would have guessed that increasing the batch size would help with speed of inference, I need to definitely read more about how all these parameters work! And If I'm understanding correctly, I'll need to retrain the network at this point with a higher batch size and/or output stride, and then I'll have to again do the resume training with online mining?

You can change the batch size that you use at inference time without retraining :)

Thanks again for all your guidance! And thank you for developing and disseminating SLEAP, it's an awesome program.

❤️

Cheers,

Talmo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Looking for advice on increasing accuracy and speed of predictions #2036

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Looking for advice on increasing accuracy and speed of predictions #2036

GraysonButcher Dec 5, 2024

Replies: 1 comment · 2 replies

talmo Dec 6, 2024 Maintainer

GraysonButcher Dec 6, 2024 Author

talmo Dec 6, 2024 Maintainer

GraysonButcher
Dec 5, 2024

Replies: 1 comment 2 replies

talmo
Dec 6, 2024
Maintainer

GraysonButcher Dec 6, 2024
Author

talmo Dec 6, 2024
Maintainer