-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early-stopping does not work properly in Keras 3 when used in a for loop #20256
Comments
Hi @Senantq - Can you help me with dataset to reproduce the issue ? |
Hi @mehtamansi29 |
Hi @Senantq - Thanks but the drive links is not accessible for me. Can you provide accessible link ? |
@Senantq Some possible causes that wouldn't be a bug:
I do not see further deltas but it may depend on the actual code, if it's different from the included. In OP, for a standard classification one should use SparseCE, or CE, but I assume OP knows and it's used for a reason. It's easier to help if one includes a minimal, self-contained code snippet for the issue. Datasets are very easy to load Example. |
Hi @Senantq - I am unable to reproduce your exact code with your dataset as your drive link is not accessible. But I run your model with some of layers on mnist dataset with same early stopping callbacks and seems working fine. As Attached gist here for your reference. |
Hi everyone,
The code is run as a .py script, so the problem does not come from there.
It could have be maybe, but then I don't see why it works perfectly fine with TF 2.15/Keras 2.
This is completely voluntary, thank you for the remainder.
Understood. I will try to do the simplest code next time, but I was questioned due to the particularities of the training here. I am also encountering another problem with the very same script on a cluster, where the code stops within the first 30minutes due to an OOM problem on a A100, but runs for 7h straight on a V100 which as 8Gb less than the A100. So I am beginning to suspect a memory leak that could be due to the CUDAs libs. Thank you for the time spent |
Hi @Senantq-
Thanks for the code. I am getting this after running your code.
Code:
It means there is no images coming through the training. As due to loop, model is intialize and train for few epochs and after getting zero training image iteration got stop. |
The fact that one of the main folder (here Caucasians) has no training images at the beginning of the 'proportion in prop' for loop is expected. This is due to some research purposes for my PhD in psychology. But it should still receive plenty of training images from the other main folder (Afro_Americans, something like 20*130 images). I don't think this should stop the training however |
Hello,
I am using Keras 3.5 with TF 2.17. My code is more or less the following (but it is not a grid search as in the real code I also increment some other variables that are not directly linked to the network):
However, when I run it, only the very first run in the whole code works fine. The others all stops at something like 1 or 2 epochs even if the 'val_mse' variable is decreasing. I have run it using Keras 2.15.0 (tensorflow 2.15.0.post1) and it worked fine then.
Any help is much appreciated, thank you
The text was updated successfully, but these errors were encountered: