Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help!!! How to save the current training checkpoint in colab #5620

Open
ZFbaby opened this issue Sep 11, 2024 · 2 comments
Open

Help!!! How to save the current training checkpoint in colab #5620

ZFbaby opened this issue Sep 11, 2024 · 2 comments
Assignees
Labels
os:macOS Issues on MacOS platform:python MediaPipe Python issues stale stat:awaiting response Waiting for user response task:object detection Issues related to Object detection: Track and label objects in images and video. type:modelmaker Issues related to creation of custom on-device ML solutions

Comments

@ZFbaby
Copy link

ZFbaby commented Sep 11, 2024

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

mac os

Python Version

3.7

MediaPipe Model Maker version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

objc dect

Describe the actual behavior

Help!!! How to save the current training checkpoint in colab

Describe the expected behaviour

Help!!! How to save the current training checkpoint in colab

Standalone code/steps you may have used to try to get what you need

When I use colab to run the following code for training

model = object_detector.ObjectDetector.create(
train_data=train_data,
validation_data=validation_data,
options=options)

When colab runs for more than 12 hours, all the data is cleared and training can no longer continue. Can anyone help me save the training progress so that I can continue training at the original progress next time? Is there any relevant code for reference? I haven't found an effective solution yet....

Other info / Complete Logs

When I use colab to run the following code for training

model = object_detector.ObjectDetector.create(
train_data=train_data,
validation_data=validation_data,
options=options)

When colab runs for more than 12 hours, all the data is cleared and training can no longer continue. Can anyone help me save the training progress so that I can continue training at the original progress next time? Is there any relevant code for reference? I haven't found an effective solution yet....
@ZFbaby ZFbaby added the type:modelmaker Issues related to creation of custom on-device ML solutions label Sep 11, 2024
@kuaashish
Copy link
Collaborator

Hi @ZFbaby,

We currently do not offer a direct API for this functionality, but you can achieve the desired behavior through custom code. Please refer to the create_method and replicate the function calls, excluding train_model and save_float_ckpt. After initializing the ObjectDetector instance without the training step, you can load the model using restore_float_ckpt.

Note that this approach requires using the same hparms.export_dir from the initial training run where the float checkpoint was saved. To avoid this dependency, you can modify the restore_float_ckpt method with the following custom code:

self._model.load_checkpoint(
    <INSERT CUSTOM PATH>,
    include_last_layer=True,
)
self._model.compile()
self._is_qat = False

For further clarification, please refer to the related issue #5522.

Thank you.

@kuaashish kuaashish added os:macOS Issues on MacOS platform:python MediaPipe Python issues task:object detection Issues related to Object detection: Track and label objects in images and video. stat:awaiting response Waiting for user response labels Sep 11, 2024
Copy link

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
os:macOS Issues on MacOS platform:python MediaPipe Python issues stale stat:awaiting response Waiting for user response task:object detection Issues related to Object detection: Track and label objects in images and video. type:modelmaker Issues related to creation of custom on-device ML solutions
Projects
None yet
Development

No branches or pull requests

2 participants