Skip to content

Commit

Permalink
ggml/examples: add backend support for numerical optimization (#949)
Browse files Browse the repository at this point in the history
* CUDA eval works

* stochastic gradient descent op

* Adam except decay

* CUDA CROSS_ENTROPY_LOSS_BACK

* CUDA mnist-fc training works

* backend CLI arg

* refactor gguf load

* remove sched from opt_step_adam

* implement l1 regularization (weight decay)

* extra call to add optimizer

* initialize gradients with ggml_graph_reset

* gradient accumulation

* increment iter per eval instead of epoch

* adjust backend interfaces

* fix ggml_graph_reset without backend

* fix ggml graph export/import

* fixup

* rename

* revert ggml_opt changes

* more general CUDA repeat_back

* update documentation, fix CNN

* validation split

* add clarifying comment

* optimize PyTorch training

* adjust buffer size, thread count

* fix 0.0f validation split

* Update examples/mnist/mnist-common.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* fix gradient accumulation

* tensor flag for accumulators -> tensor hash set

* Update include/ggml.h

Co-authored-by: slaren <[email protected]>

* Update tests/test-backend-ops.cpp

Co-authored-by: slaren <[email protected]>

* Update tests/test-backend-ops.cpp

Co-authored-by: slaren <[email protected]>

* fix test prints

* Update src/ggml-backend.c

Co-authored-by: Georgi Gerganov <[email protected]>

* better CUDA support for noncontiguous out_prod

* add comment

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: slaren <[email protected]>
  • Loading branch information
3 people authored Sep 20, 2024
1 parent ea40f60 commit e7b2390
Show file tree
Hide file tree
Showing 33 changed files with 1,290 additions and 344 deletions.
122 changes: 69 additions & 53 deletions examples/mnist/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ $ python3 mnist-train-fc.py mnist-fc-f32.gguf

...

Test loss: 0.069983+-0.009196, Test accuracy: 97.94+-0.14%
Test loss: 0.066051+-0.011630, Test accuracy: 98.07+-0.14%

Model tensors saved to mnist-fc-f32.gguf:
fc1.weight (500, 784)
Expand All @@ -28,7 +28,7 @@ fc2.bias (10,)
```

The training script includes an evaluation of the model on the test set.
To evaluate the model using GGML, run:
To evaluate the model on the CPU using GGML, run:

```bash
$ ../../build/bin/mnist-eval mnist-fc-f32.gguf data/MNIST/raw/t10k-images-idx3-ubyte data/MNIST/raw/t10k-labels-idx1-ubyte
Expand All @@ -37,45 +37,50 @@ ________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________######__________________
____________________________########____________________
________________________########________________________
____________________########________________##__________
__________________######____________________##__________
________________######______________________####________
______________######________________________####________
____________######__________________________####________
____________####____________________________####________
__________####______________________________####________
__________####______________________________####________
__________##________________________________####________
__________##______________________________####__________
__________##____________________________######__________
__________##__________________________######____________
____________##____________________########______________
____________##########################__________________
______________##################________________________
________________________________________________________
________________________________________________________
__________________________________####__________________
______________________________########__________________
__________________________##########____________________
______________________##############____________________
____________________######________####__________________
__________________________________####__________________
__________________________________####__________________
________________________________####____________________
______________________________####______________________
________________________##########______________________
______________________########__####____________________
________________________##__________##__________________
____________________________________##__________________
__________________________________##____________________
__________________________________##____________________
________________________________##______________________
____________________________####________________________
__________##____________######__________________________
__________##############________________________________
________________####____________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
mnist_graph_eval: trying to load a ggml graph from mnist-fc-f32.gguf
ggml_graph_import: invalid magic number, got 46554747
mnist_graph_eval: could not load a ggml graph from mnist-fc-f32.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
mnist_model: using CPU backend
mnist_model_init_from_file: loading model weights from 'mnist-fc-f32.gguf'
mnist_model_init_from_file: model arch is mnist-fc
mnist_model_init_from_file: successfully loaded weights from mnist-fc-f32.gguf
main: loaded model in 1.52 ms
mnist_model_eval: model evaluation on 10000 images took 26.65 ms, 2.66 us/image
main: predicted digit is 0
main: test_loss=0.069983+-0.009196
main: test_acc=97.94+-0.14%
main: loaded model in 13.03 ms
mnist_model_eval: model evaluation on 10000 images took 95.02 ms, 9.50 us/image
main: predicted digit is 3
main: test_loss=0.066051+-0.009343
main: test_acc=98.07+-0.14%
```

In addition to the evaluation on the test set the GGML evaluation also prints a random image from the test set as well as the model prediction for said image.
To train a fully connected model using GGML run:
To train a fully connected model on the CPU using GGML run:

``` bash
$ ../../build/bin/mnist-train mnist-fc mnist-fc-f32.gguf data/MNIST/raw/train-images-idx3-ubyte data/MNIST/raw/train-labels-idx1-ubyte
Expand All @@ -96,12 +101,12 @@ $ python3 mnist-train-cnn.py mnist-cnn-f32.gguf

...

Test loss: 0.046456
Test accuracy: 98.40%
Test loss: 0.045483
Test accuracy: 98.56%
GGUF model saved to 'mnist-cnn-f32.gguf'
```

The saved model can be evaluated using the `mnist-eval` binary:
The saved model can be evaluated on the CPU using the `mnist-eval` binary:

```bash
$ ../../build/bin/mnist-eval mnist-fc-f32.gguf data/MNIST/raw/t10k-images-idx3-ubyte data/MNIST/raw/t10k-labels-idx1-ubyte
Expand All @@ -111,50 +116,61 @@ ________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________####____________________________
__________________________##____________________________
__________________________##____________________________
__________________________##____________________________
__________________________##____________________________
__________________________##____________________________
____________________________##__________________________
____________________________##__________________________
____________________________##__________________________
______________________________##________________________
______________________________##________________________
______________________________####______________________
________________________________##______________________
________________________________##______________________
________________________________####____________________
______________________________________##________________
______________________________________##________________
______________________________________##________________
____________________________________##__________________
__________________________________####__________________
__________________________________##____________________
________________________________##______________________
______________________________##________________________
____________________________####________________________
____________________________##__________________________
__________________________##____________________________
________________________##______________________________
______________________##________________________________
____________________####________________________________
____________________##__________________________________
__________________##____________________________________
________________##______________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
mnist_graph_eval: trying to load a ggml graph from mnist-cnn-f32.gguf
ggml_graph_import: invalid magic number, got 46554747
mnist_graph_eval: could not load a ggml graph from mnist-cnn-f32.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
mnist_model: using CPU backend
mnist_model_init_from_file: loading model weights from 'mnist-cnn-f32.gguf'
mnist_model_init_from_file: model arch is mnist-cnn
mnist_model_init_from_file: successfully loaded weights from mnist-cnn-f32.gguf
main: loaded model in 5.45 ms
mnist_model_eval: model evaluation on 10000 images took 605.60 ms, 60.56 us/image
main: loaded model in 11.88 ms
mnist_model_eval: model evaluation on 10000 images took 1074.09 ms, 107.41 us/image
main: predicted digit is 1
main: test_loss=0.046456+-0.007354
main: test_acc=98.40+-0.13%
main: test_loss=0.045483+-0.006884
main: test_acc=98.56+-0.12%
```

Like with the fully connected network the convolutional network can also be trained using GGML:
Like with the fully connected network the convolutional network can also be trained on the CPU using GGML:

``` bash
$ ../../build/bin/mnist-train mnist-cnn mnist-cnn-f32.gguf data/MNIST/raw/train-images-idx3-ubyte data/MNIST/raw/train-labels-idx1-ubyte
```

As always, the evaluation is done using `mnist-eval` and like with the fully connected network the GGML graph is exported to `mnist-cnn-f32.ggml`.

## CUDA

The fully connected model can be trained and evaluated using CUDA.
`mnist-train` and `mnist-eval` accept an additional, optional argument behind those listed so far to specify the backend.
The default is `CPU`, by specifying `CUDA0` the first available CUDA device can be used instead (make sure to compile GGML with CUDA cupport).

## Web demo

The evaluation code can be compiled to WebAssembly using [Emscripten](https://emscripten.org/) (may need to re-login to update `$PATH` after installation).
Expand Down
Loading

0 comments on commit e7b2390

Please sign in to comment.