-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml/ex: ref. CEL, ggml_backend_sched for MNIST #976
ggml/ex: ref. CEL, ggml_backend_sched for MNIST #976
Conversation
9216056
to
caf425b
Compare
There seems to be an issue with CPU MNIST training where the validation loss is much too low. |
I think I figured out the problem:
In |
You need to call |
Adding calls to |
I don't think that's feasible without adding too much complexity, |
The instances where I think a reallocation could maybe be causing problems:
Do you think it would be feasible to allocate separate graphs for the forward pass, the backward pass, and the optimizer step? |
Everything can be reallocated between graph evaluations, the goal of ggml-alloc (the I can see two ways to handle this:
|
Actually, that won't work. Both the forward and backward pass need to be in the same graph because the backward pass may need to use data from the forward pass that cannot be overwritten until then. |
I think you could have a separate graph and ggml-alloc (or sched) for the optimizer only. To do so, you would have to flag all the inputs to the optimizer as graph outputs. |
caf425b
to
e8c7030
Compare
e8c7030
to
c7d77f7
Compare
I was not able to nail down the exact problem with moderate effort. For now I'll just revert the MNIST changes, once I have a more high-level API for training I'll adapt and expand |
This PR refactors the CPU implementation of cross entropy loss to avoid false sharing (from the partial sums being in the same cache line) as well as potential issues with the loop variables being
int
. It also addsggml_backend_sched
support for the MNIST example.