Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After finishing the training, continuing the training revealed that the curve of learning rate had reversed。 #88

Open
Mr47121836 opened this issue May 14, 2024 · 6 comments

Comments

@Mr47121836
Copy link

image
image
This is my log and tensorboard log.

@p0p4k
Copy link
Owner

p0p4k commented May 14, 2024

Did you stop and continue training?

@Mr47121836
Copy link
Author

Mr47121836 commented May 14, 2024 via email

@Mr47121836
Copy link
Author

Did you stop and continue training?

yes,when i stop and continue training,i discover learnning rate is not the last learnning rate.And I also discover orignal vits also have this situation.
This is vits train
image

@p0p4k
Copy link
Owner

p0p4k commented May 16, 2024

Yea its normal, the checkpoint saved at different step and continue is from that step, while model was trained a little bit more steps in the first run, the board shows everything. You can sort using Wall button in the tensorboard in the left side.

@Mr47121836
Copy link
Author

Yea its normal, the checkpoint saved at different step and continue is from that step, while model was trained a little bit more steps in the first run, the board shows everything. You can sort using Wall button in the tensorboard in the left side.

but,When I continued training, I found a huge change in the curve.
image
image
image
I thing it can't load the checkpoint ?

@Mr47121836
Copy link
Author

Yea its normal, the checkpoint saved at different step and continue is from that step, while model was trained a little bit more steps in the first run, the board shows everything. You can sort using Wall button in the tensorboard in the left side.

but,When I continued training, I found a huge change in the curve. image image image I thing it can't load the checkpoint ?

image
The total loss have a huge change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants