Can one train a top down centered instance model without retraining an outstanding centroid model every time? #1291

amblypatty · 2023-04-28T17:00:41Z

amblypatty
Apr 28, 2023

Hello,

I am doing a lot of training and retraining of a top down model with only a single instance on Google Colab. The centroid model is accurately trained enough every time I go through training, but of course the centered instance model is what needs more training. I understand that if I generate new training data by correcting predicted instances, that must be added to the centroid model or else the centered instance model will have no centroid on those new training frames to crop and use as input for training. However, sometimes if I find that the previous training session did not improve the predictions, then I might simple change some hyperparameters (like different types of augmentation, adjustment of hard keypoints for OHKM, or the number of filters in the model, etc.) on the centered instance model and run the training again to see if that helps improve accuracy. I ask two adjacent questions:

When retraining the top down model with no new training frames:
Is there a way to call in the previously trained centroid model for training the centered instance model without having to retrain the centroid model? Doing so takes up valuable computational space and time (even if it is shorter time than the centered instance model).

If I have an outstanding centroid model, and I have generated additional training data in order to retrain the centered instance model, is there a way to use the previous centroid model to simply predict (rather than train) the centroids in the new training subset and add those predicted centroid frames to the input group of training frames for the centered instance model training set? This idea may be flawed, but ultimately, I want to not have to 'train' an already trained centroid model and still be able to retrain the centered instance model with either different hyperparameters or new training data.

Thanks for your help!

Answered by talmo

May 19, 2023

Hi @amblypatty,

Apologies for the delay! Let me tackle some of these:

Check the Use Trained Model box (using the [Trained] centroid model's training_config.json) in the Centroid Model Configuration window in the SLEAP GUI.

👍

Checking the Resume Training box (using the [Trained] centered instance training_config.json from the model I am looking to retrain with the prior weights).

We're chasing down a bug right now where it looks like this functionality might not be working from the GUI due to how we handle the CLI vs config specification of resuming training. We're working on a hotfix that we hope will be ready for release next week.

In the meantime, you can try using the --base_ch…

View full answer

roomrys · 2023-05-01T16:26:11Z

roomrys
May 1, 2023
Maintainer

Hi @amblypatty,

Yes! You can use opt to just use the trained model instead of retraining. There should be a "Use Trained Model" checkbox that pops up if you select an already trained model.

Let us know if this helps!

Thanks,
Liezl

1 reply

amblypatty May 3, 2023
Author

Okay, thanks. I checked the Use Trained Model box for the centroid model configuration and then on the centered instance model configuration, I checked the Resume Training box. I saw in the zip file that the centroid.json file was not included (I assume it was from checking the Use Trained Model box). In Colab, where I am doing training, I only trained the centered_instance model by commenting out the line to train the centroid model. My training results were poor, but the centered_instance training process at least progressed successfully. Is there a specific way to use the opt function in Colab or did I follow the typical method you described?

amblypatty · 2023-05-11T04:30:50Z

amblypatty
May 11, 2023
Author

Hi @roomrys,

I am still having some trouble with retraining my model when I have new annotations, where the training and validation loss doesn't descend. However, I am using the same hyperparameters as the previous training session. I feel confident that the new annotations I've made should help the model learn and make more accurate predictions (I specifically worked on fixing the erroneous predictions, nearly doubling the number of training annotations (from 164 labeled frames to 300 labeled frames) as the previous training session). The following are my general steps:

Check the Use Trained Model box (using the [Trained] centroid model's training_config.json) in the Centroid Model Configuration window in the SLEAP GUI.
Checking the Resume Training box (using the [Trained] centered instance training_config.json from the model I am looking to retrain with the prior weights).
I notice that the training job does not have a centroid.json file, as stated in above comment. # This makes sense in my head.
Iin the centered_instance.json, I notice that the "training_inds" list shows only the prior trained model's list of frame numbers and does not include any newly annotated frame numbers. # Is this what I should be seeing in the json file when retraining with new training data?
I adjust the "instance_cropping" "crop size" of the centered_instance.json to 640 pixels (to account for some large asymmetrical instances in the training data as I did in the previous training session).
After extracting the files in my connected Google Drive, I train with the following code in Colab:

#!sleap-train centroid.json resolved_skeletons_with_predictions.pkg.slp
!sleap-train centered_instance.json resolved_skeletons_with_predictions.pkg.slp

Am I doing something wrong, like not actually enacting the centroid model? Additionally, should I be adjusting the learning rate when retraining the centered_instance model to enable better loss and val_loss evaluation? If so, in which direction and by how much would one expect to adjust it? For example, if the first training session had an initial learning rate of 0.0001 and stopped at 1.5e^-06, would I adjust the initial learning rate of the retraining configuration up to 0.0005, 0.001, or down to something like 0.00005?

4 replies

amblypatty May 19, 2023
Author

@roomrys @talmo

talmo May 19, 2023
Maintainer

Hi @amblypatty,

Apologies for the delay! Let me tackle some of these:

Check the Use Trained Model box (using the [Trained] centroid model's training_config.json) in the Centroid Model Configuration window in the SLEAP GUI.

👍

Checking the Resume Training box (using the [Trained] centered instance training_config.json from the model I am looking to retrain with the prior weights).

We're chasing down a bug right now where it looks like this functionality might not be working from the GUI due to how we handle the CLI vs config specification of resuming training. We're working on a hotfix that we hope will be ready for release next week.

In the meantime, you can try using the --base_checkpoint flag of the sleap-train CLI instead which should work appropriately regardless of the config/GUI setting.

You can also try just training from scratch without resuming since you should see an improvement with the additional labels.

I notice that the training job does not have a centroid.json file, as stated in above comment. # This makes sense in my head.

✅

Iin the centered_instance.json, I notice that the "training_inds" list shows only the prior trained model's list of frame numbers and does not include any newly annotated frame numbers. # Is this what I should be seeing in the json file when retraining with new training data?

Thanks for pointing that out! I'm not sure we had noticed that previously. The new centered_instance.json probably shouldn't have the list of frame numbers at all and that may indeed be leading to the trainer not using the new labels! We'll look into it as part of the bug fix investigation mentioned above.

I adjust the "instance_cropping" "crop size" of the centered_instance.json to 640 pixels (to account for some large asymmetrical instances in the training data as I did in the previous training session).

This may lead to issues if the base model had a different input size, but maybe you just meant that the GUI was forcing the crop size to be smaller so you're adjusting it manually? I think you would get an error if the input size were different from the base checkpoint, but this may also not be happening because of the bug mentioned above.

After extracting the files in my connected Google Drive, I train with the following code in Colab:
#!sleap-train centroid.json resolved_skeletons_with_predictions.pkg.slp
!sleap-train centered_instance.json resolved_skeletons_with_predictions.pkg.slp
Am I doing something wrong, like not actually enacting the centroid model?

Nope, this looks correct to me! Centered instance training happens totally independent of the centroid model.

Additionally, should I be adjusting the learning rate when retraining the centered_instance model to enable better loss and val_loss evaluation? If so, in which direction and by how much would one expect to adjust it? For example, if the first training session had an initial learning rate of 0.0001 and stopped at 1.5e^-06, would I adjust the initial learning rate of the retraining configuration up to 0.0005, 0.001, or down to something like 0.00005?

Good question! This is a bit of an empirical question but the rule of thumb is that you would start with a slightly lower learning rate than the first one since you're just fine-tuning it. If you started with 1e-4 in the first model, you could resume it at 0.5e-4 or 1e-5. The intuition here is that if the learning rate is too high, you lose the benefits of initializing with the original weights, while if the learning rate is too low (like what it was when training stopped early), you might not be able to take properly fit the new data and overcome the overfitting to the last batch of data.

Hope this helps!

Talmo

Answer selected by amblypatty

talmo May 19, 2023
Maintainer

Oh and we're working on the bug fixes in #1314 if you want to track the progress btw.

amblypatty May 19, 2023
Author

Hi @talmo,

Thank you for the reply. Indeed, this helps me understand my training situation and updates me on the related bugs/upcoming fixes in the retraining process. I did not catch that the commands for using SLEAP Colab were CLI commands (I am still novice to all this technology).

Regarding item 5:

I adjust the "instance_cropping" "crop size" of the centered_instance.json to 640 pixels (to account for some large asymmetrical instances in the training data as I did in the previous training session).

This may lead to issues if the base model had a different input size, but maybe you just meant that the GUI was forcing the crop size to be smaller so you're adjusting it manually? I think you would get an error if the input size were different from the base checkpoint, but this may also not be happening because of the bug mentioned above.

Yes, I meant that I was adjusting the crop size manually, making sure to keep the crop size the same as the previous base checkpoint. I do not recall having different input sizes during retraining, so I am not sure what the result would be in doing so.
Additionally, since rooting around in the pull request for issue #1295, I was able to find my training_editor_form.yaml and adjust the crop size range, so now larger values are allowed in the GUI.

I am learning more every day. Thank you again for the update and advice. I'll look forward to the hotfixes.

Cheers,
Patrick

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can one train a top down centered instance model without retraining an outstanding centroid model every time? #1291

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Can one train a top down centered instance model without retraining an outstanding centroid model every time? #1291

amblypatty Apr 28, 2023

Replies: 2 comments · 5 replies

roomrys May 1, 2023 Maintainer

amblypatty May 3, 2023 Author

amblypatty May 11, 2023 Author

amblypatty May 19, 2023 Author

talmo May 19, 2023 Maintainer

talmo May 19, 2023 Maintainer

amblypatty May 19, 2023 Author

amblypatty
Apr 28, 2023

Replies: 2 comments 5 replies

roomrys
May 1, 2023
Maintainer

amblypatty May 3, 2023
Author

amblypatty
May 11, 2023
Author

amblypatty May 19, 2023
Author

talmo May 19, 2023
Maintainer

talmo May 19, 2023
Maintainer

amblypatty May 19, 2023
Author