Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between training logs and manually evaluating metrics #679

Open
JulienVig opened this issue May 29, 2024 · 0 comments
Open

Discrepancy between training logs and manually evaluating metrics #679

JulienVig opened this issue May 29, 2024 · 0 comments
Labels
bug Something isn't working discojs Related to Disco.js

Comments

@JulienVig
Copy link
Collaborator

JulienVig commented May 29, 2024

While training a model in the webapp, the training accuracy reported by tensorflow.js fit method can be widely different from evaluation values, especially when using batch norm layers (such as in mobilenet).

For example, calling model.evaluateDataset on the training dataset after each epoch can show diverging trends, with the tfjs accuracy logs rising to 1 while the manual evaluation staying constant around random or even dropping to 0.

Similarly, using the webapp to test a model that we just trained and selecting the same training set as test set yields different result than what is reported in the training board (which are tfjs training logs).

The difference stays small for small networks but accuracies completely diverge when doing transfer learning with a pre-trained model such as mobilenet.

A small difference can be explained by how tfjs fit method evaluates the accuracy: the model is updated after each batch so the accuracy reported is an aggregation of many models versions rather than one model evaluating all the training set.

This is related to keras-team/keras#6977 which seems to blame dropout and batch normalization layers. I did not manage to mitigate the issue by following fixes mentioned in the issue (sometimes due to tensorflow.js not allowing certain operations)

This stackoverflow post reports a similar issue during transfer learning and having solved it by retraining all Batch normalization layers to fit the statistics to the new dataset.

Main points:

  • If the webapp training board (= tfjs fit method training logs) shows a certain accuracy, it doesn't mean that evaluating the model on the same training set will yield the same accuracy
  • Empirically, discrepancies seems to only occur when models contain some Batch Norm. It didn't manage to mitigate the issue (mostly due to fixes being in python and tfjs limiting our options) but it is worth investigating further. Models without batch norm show a small and expected difference
@JulienVig JulienVig added bug Something isn't working discojs Related to Disco.js labels May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discojs Related to Disco.js
Projects
None yet
Development

No branches or pull requests

1 participant