Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml/examples: add backend support for numerical optimization #949

Merged
merged 36 commits into from
Sep 20, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
a3c341d
CUDA eval works
JohannesGaessler Sep 2, 2024
03c5b72
stochastic gradient descent op
JohannesGaessler Sep 2, 2024
1b2a4e5
Adam except decay
JohannesGaessler Sep 6, 2024
06bf41b
CUDA CROSS_ENTROPY_LOSS_BACK
JohannesGaessler Sep 6, 2024
fd31a57
CUDA mnist-fc training works
JohannesGaessler Sep 7, 2024
dacab7b
backend CLI arg
JohannesGaessler Sep 10, 2024
c7adfba
refactor gguf load
JohannesGaessler Sep 10, 2024
7094b55
remove sched from opt_step_adam
JohannesGaessler Sep 11, 2024
2040338
implement l1 regularization (weight decay)
JohannesGaessler Sep 11, 2024
5d687c0
extra call to add optimizer
JohannesGaessler Sep 12, 2024
16cb38f
initialize gradients with ggml_graph_reset
JohannesGaessler Sep 12, 2024
3e93361
gradient accumulation
JohannesGaessler Sep 13, 2024
14d19f6
increment iter per eval instead of epoch
JohannesGaessler Sep 13, 2024
7dd2c94
adjust backend interfaces
JohannesGaessler Sep 13, 2024
7e12e32
fix ggml_graph_reset without backend
JohannesGaessler Sep 13, 2024
c37fb9a
fix ggml graph export/import
JohannesGaessler Sep 14, 2024
46722f9
fixup
JohannesGaessler Sep 14, 2024
544d86a
rename
JohannesGaessler Sep 14, 2024
1c0a888
revert ggml_opt changes
JohannesGaessler Sep 14, 2024
2472d51
more general CUDA repeat_back
JohannesGaessler Sep 14, 2024
6e64c8c
update documentation, fix CNN
JohannesGaessler Sep 14, 2024
bde1131
validation split
JohannesGaessler Sep 14, 2024
7b90357
add clarifying comment
JohannesGaessler Sep 15, 2024
d813691
optimize PyTorch training
JohannesGaessler Sep 15, 2024
cf0f60e
adjust buffer size, thread count
JohannesGaessler Sep 15, 2024
d07b6e3
fix 0.0f validation split
JohannesGaessler Sep 16, 2024
c1d13df
Update examples/mnist/mnist-common.cpp
JohannesGaessler Sep 16, 2024
478472b
fix gradient accumulation
JohannesGaessler Sep 16, 2024
1d0e3ca
tensor flag for accumulators -> tensor hash set
JohannesGaessler Sep 17, 2024
db89f4e
Update include/ggml.h
JohannesGaessler Sep 17, 2024
02e1a37
Update tests/test-backend-ops.cpp
JohannesGaessler Sep 17, 2024
dbcd543
Update tests/test-backend-ops.cpp
JohannesGaessler Sep 17, 2024
594a143
fix test prints
JohannesGaessler Sep 18, 2024
0642b69
Update src/ggml-backend.c
JohannesGaessler Sep 18, 2024
00b43cf
better CUDA support for noncontiguous out_prod
JohannesGaessler Sep 18, 2024
461b648
add comment
JohannesGaessler Sep 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 69 additions & 53 deletions examples/mnist/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ $ python3 mnist-train-fc.py mnist-fc-f32.gguf

...

Test loss: 0.069983+-0.009196, Test accuracy: 97.94+-0.14%
Test loss: 0.066051+-0.011630, Test accuracy: 98.07+-0.14%

Model tensors saved to mnist-fc-f32.gguf:
fc1.weight (500, 784)
Expand All @@ -28,7 +28,7 @@ fc2.bias (10,)
```

The training script includes an evaluation of the model on the test set.
To evaluate the model using GGML, run:
To evaluate the model on the CPU using GGML, run:

```bash
$ ../../build/bin/mnist-eval mnist-fc-f32.gguf data/MNIST/raw/t10k-images-idx3-ubyte data/MNIST/raw/t10k-labels-idx1-ubyte
Expand All @@ -37,45 +37,50 @@ ________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________######__________________
____________________________########____________________
________________________########________________________
____________________########________________##__________
__________________######____________________##__________
________________######______________________####________
______________######________________________####________
____________######__________________________####________
____________####____________________________####________
__________####______________________________####________
__________####______________________________####________
__________##________________________________####________
__________##______________________________####__________
__________##____________________________######__________
__________##__________________________######____________
____________##____________________########______________
____________##########################__________________
______________##################________________________
________________________________________________________
________________________________________________________
__________________________________####__________________
______________________________########__________________
__________________________##########____________________
______________________##############____________________
____________________######________####__________________
__________________________________####__________________
__________________________________####__________________
________________________________####____________________
______________________________####______________________
________________________##########______________________
______________________########__####____________________
________________________##__________##__________________
____________________________________##__________________
__________________________________##____________________
__________________________________##____________________
________________________________##______________________
____________________________####________________________
__________##____________######__________________________
__________##############________________________________
________________####____________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
mnist_graph_eval: trying to load a ggml graph from mnist-fc-f32.gguf
ggml_graph_import: invalid magic number, got 46554747
mnist_graph_eval: could not load a ggml graph from mnist-fc-f32.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
mnist_model: using CPU backend
mnist_model_init_from_file: loading model weights from 'mnist-fc-f32.gguf'
mnist_model_init_from_file: model arch is mnist-fc
mnist_model_init_from_file: successfully loaded weights from mnist-fc-f32.gguf
main: loaded model in 1.52 ms
mnist_model_eval: model evaluation on 10000 images took 26.65 ms, 2.66 us/image
main: predicted digit is 0
main: test_loss=0.069983+-0.009196
main: test_acc=97.94+-0.14%
main: loaded model in 13.03 ms
mnist_model_eval: model evaluation on 10000 images took 95.02 ms, 9.50 us/image
main: predicted digit is 3
main: test_loss=0.066051+-0.009343
main: test_acc=98.07+-0.14%
```

In addition to the evaluation on the test set the GGML evaluation also prints a random image from the test set as well as the model prediction for said image.
To train a fully connected model using GGML run:
To train a fully connected model on the CPU using GGML run:

``` bash
$ ../../build/bin/mnist-train mnist-fc mnist-fc-f32.gguf data/MNIST/raw/train-images-idx3-ubyte data/MNIST/raw/train-labels-idx1-ubyte
Expand All @@ -96,12 +101,12 @@ $ python3 mnist-train-cnn.py mnist-cnn-f32.gguf

...

Test loss: 0.046456
Test accuracy: 98.40%
Test loss: 0.045483
Test accuracy: 98.56%
GGUF model saved to 'mnist-cnn-f32.gguf'
```

The saved model can be evaluated using the `mnist-eval` binary:
The saved model can be evaluated on the CPU using the `mnist-eval` binary:

```bash
$ ../../build/bin/mnist-eval mnist-fc-f32.gguf data/MNIST/raw/t10k-images-idx3-ubyte data/MNIST/raw/t10k-labels-idx1-ubyte
Expand All @@ -111,50 +116,61 @@ ________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________####____________________________
__________________________##____________________________
__________________________##____________________________
__________________________##____________________________
__________________________##____________________________
__________________________##____________________________
____________________________##__________________________
____________________________##__________________________
____________________________##__________________________
______________________________##________________________
______________________________##________________________
______________________________####______________________
________________________________##______________________
________________________________##______________________
________________________________####____________________
______________________________________##________________
______________________________________##________________
______________________________________##________________
____________________________________##__________________
__________________________________####__________________
__________________________________##____________________
________________________________##______________________
______________________________##________________________
____________________________####________________________
____________________________##__________________________
__________________________##____________________________
________________________##______________________________
______________________##________________________________
____________________####________________________________
____________________##__________________________________
__________________##____________________________________
________________##______________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
mnist_graph_eval: trying to load a ggml graph from mnist-cnn-f32.gguf
ggml_graph_import: invalid magic number, got 46554747
mnist_graph_eval: could not load a ggml graph from mnist-cnn-f32.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
mnist_model: using CPU backend
mnist_model_init_from_file: loading model weights from 'mnist-cnn-f32.gguf'
mnist_model_init_from_file: model arch is mnist-cnn
mnist_model_init_from_file: successfully loaded weights from mnist-cnn-f32.gguf
main: loaded model in 5.45 ms
mnist_model_eval: model evaluation on 10000 images took 605.60 ms, 60.56 us/image
main: loaded model in 11.88 ms
mnist_model_eval: model evaluation on 10000 images took 1074.09 ms, 107.41 us/image
main: predicted digit is 1
main: test_loss=0.046456+-0.007354
main: test_acc=98.40+-0.13%
main: test_loss=0.045483+-0.006884
main: test_acc=98.56+-0.12%
```

Like with the fully connected network the convolutional network can also be trained using GGML:
Like with the fully connected network the convolutional network can also be trained on the CPU using GGML:

``` bash
$ ../../build/bin/mnist-train mnist-cnn mnist-cnn-f32.gguf data/MNIST/raw/train-images-idx3-ubyte data/MNIST/raw/train-labels-idx1-ubyte
```

As always, the evaluation is done using `mnist-eval` and like with the fully connected network the GGML graph is exported to `mnist-cnn-f32.ggml`.

## CUDA

The fully connected model can be trained and evaluated using CUDA.
`mnist-train` and `mnist-eval` accept an additional, optional argument behind those listed so far to specify the backend.
The default is `CPU`, by specifying `CUDA0` the first available CUDA device can be used instead (make sure to compile GGML with CUDA cupport).

## Web demo

The evaluation code can be compiled to WebAssembly using [Emscripten](https://emscripten.org/) (may need to re-login to update `$PATH` after installation).
Expand Down
Loading
Loading