Help!!! How to save the current training checkpoint in colab #5620

ZFbaby · 2024-09-11T02:59:36Z

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

mac os

Python Version

3.7

MediaPipe Model Maker version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

objc dect

Describe the actual behavior

Help!!! How to save the current training checkpoint in colab

Describe the expected behaviour

Help!!! How to save the current training checkpoint in colab

Standalone code/steps you may have used to try to get what you need

When I use colab to run the following code for training

model = object_detector.ObjectDetector.create(
train_data=train_data,
validation_data=validation_data,
options=options)

When colab runs for more than 12 hours, all the data is cleared and training can no longer continue. Can anyone help me save the training progress so that I can continue training at the original progress next time? Is there any relevant code for reference? I haven't found an effective solution yet....

Other info / Complete Logs

When I use colab to run the following code for training

model = object_detector.ObjectDetector.create(
train_data=train_data,
validation_data=validation_data,
options=options)

When colab runs for more than 12 hours, all the data is cleared and training can no longer continue. Can anyone help me save the training progress so that I can continue training at the original progress next time? Is there any relevant code for reference? I haven't found an effective solution yet....

kuaashish · 2024-09-11T06:39:19Z

Hi @ZFbaby,

We currently do not offer a direct API for this functionality, but you can achieve the desired behavior through custom code. Please refer to the create_method and replicate the function calls, excluding train_model and save_float_ckpt. After initializing the ObjectDetector instance without the training step, you can load the model using restore_float_ckpt.

Note that this approach requires using the same hparms.export_dir from the initial training run where the float checkpoint was saved. To avoid this dependency, you can modify the restore_float_ckpt method with the following custom code:

self._model.load_checkpoint(
    <INSERT CUSTOM PATH>,
    include_last_layer=True,
)
self._model.compile()
self._is_qat = False

For further clarification, please refer to the related issue #5522.

Thank you.

github-actions · 2024-09-19T01:57:55Z

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

ZFbaby added the type:modelmaker Issues related to creation of custom on-device ML solutions label Sep 11, 2024

google-ml-butler bot assigned kuaashish Sep 11, 2024

kuaashish added os:macOS Issues on MacOS platform:python MediaPipe Python issues task:object detection Issues related to Object detection: Track and label objects in images and video. stat:awaiting response Waiting for user response labels Sep 11, 2024

github-actions bot added the stale label Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help!!! How to save the current training checkpoint in colab #5620

Help!!! How to save the current training checkpoint in colab #5620

ZFbaby commented Sep 11, 2024

kuaashish commented Sep 11, 2024

github-actions bot commented Sep 19, 2024

Help!!! How to save the current training checkpoint in colab #5620

Help!!! How to save the current training checkpoint in colab #5620

Comments

ZFbaby commented Sep 11, 2024

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

Python Version

MediaPipe Model Maker version

Task name (e.g. Image classification, Gesture recognition etc.)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

kuaashish commented Sep 11, 2024

github-actions bot commented Sep 19, 2024