Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUF #2398

Merged
merged 253 commits into from
Aug 21, 2023
Merged

GGUF #2398

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
253 commits
Select commit Hold shift + click to select a range
6873148
gguf : first API pass
ggerganov Jul 26, 2023
8d6acfe
gguf : read header + meta data
ggerganov Jul 26, 2023
d91b985
gguf : read tensor info
ggerganov Jul 26, 2023
78b226a
gguf : initial model loading - not tested
ggerganov Jul 26, 2023
860c9c6
gguf : add gguf_get_tensor_name()
ggerganov Jul 26, 2023
cb871fa
gguf : do not support passing existing ggml_context to gguf_init
ggerganov Jul 26, 2023
d313c0f
gguf : simplify gguf_get_val
ggerganov Jul 26, 2023
e46870f
gguf : gguf.c is now part of ggml.c
ggerganov Jul 26, 2023
5628ec7
gguf : read / write sample models
ggerganov Jul 26, 2023
d8491fc
gguf : add comments
ggerganov Jul 26, 2023
c85d317
refactor : reduce code duplication and better API (#2415)
monatis Jul 27, 2023
d89533d
gguf : expose the gguf_type enum through the API for now
ggerganov Jul 27, 2023
d2b6ca1
gguf : add array support
ggerganov Jul 27, 2023
158be8f
gguf.py : some code style changes
ggerganov Jul 27, 2023
68f5348
convert.py : start a new simplified implementation by removing old stuff
ggerganov Jul 27, 2023
d2bb3ac
convert.py : remove GGML vocab + other obsolete stuff
ggerganov Jul 27, 2023
11ef380
GGUF : write tensor (#2426)
monatis Jul 28, 2023
3492f84
gguf : add gguf_find_key (#2438)
klosax Jul 28, 2023
1495735
gguf : fix writing tensors
monatis Jul 28, 2023
9475cdb
Merge branch 'gguf-write-tokenization' into gguf
monatis Jul 28, 2023
08dc8fd
gguf : do not hardcode tensor names to read
monatis Jul 29, 2023
06f423a
gguf : write sample tensors to read
monatis Jul 29, 2023
d54f53c
gguf : add tokenization constants
monatis Jul 29, 2023
999431c
quick and dirty conversion example
klosax Jul 29, 2023
ea5f9ad
gguf : fix writing gguf arrays
monatis Jul 29, 2023
aa99562
Merge branch 'gguf' of https://github.com//ggerganov/llama.cpp into gguf
monatis Jul 29, 2023
93f7f7a
gguf : write tensors one by one and code reuse
monatis Jul 29, 2023
0c219fb
gguf : fix writing gguf arrays
monatis Jul 29, 2023
c861e23
gguf : write tensors one by one
monatis Jul 29, 2023
8a76dd8
gguf : write tensors one by one
monatis Jul 29, 2023
cc3dd7f
gguf : write tokenizer data
monatis Jul 29, 2023
0317c41
gguf : upd gguf conversion script
monatis Jul 29, 2023
8ad7cd4
Update convert-llama-h5-to-gguf.py
klosax Jul 29, 2023
0f5e57f
gguf : handle already encoded string
monatis Jul 29, 2023
34469b9
ggml.h : get array str and f32
klosax Jul 29, 2023
2c22e3b
ggml.c : get arr str and f32
klosax Jul 29, 2023
9577821
gguf.py : support any type
klosax Jul 29, 2023
06c3e4a
Update convert-llama-h5-to-gguf.py
klosax Jul 29, 2023
32e037f
gguf : fix set is not subscriptable
monatis Jul 29, 2023
87c34e4
gguf : update convert-llama-h5-to-gguf.py
monatis Jul 29, 2023
0790c12
constants.py : add layer norm eps
klosax Jul 30, 2023
ccd81a7
gguf.py : add layer norm eps and merges
klosax Jul 30, 2023
b4676ee
ggml.h : increase GGML_MAX_NAME to 64
klosax Jul 30, 2023
b19c117
ggml.c : add gguf_get_arr_n
klosax Jul 30, 2023
4ed98bf
Update convert-llama-h5-to-gguf.py
klosax Jul 30, 2023
e9192b0
add gptneox gguf example
klosax Jul 30, 2023
f175b05
Makefile : add gptneox gguf example
klosax Jul 30, 2023
2fabc17
Update convert-llama-h5-to-gguf.py
klosax Jul 30, 2023
30c4ea4
add gptneox gguf example
klosax Jul 30, 2023
068a8e0
Update convert-llama-h5-to-gguf.py
klosax Jul 30, 2023
2a09146
Update convert-gptneox-h5-to-gguf.py
klosax Jul 30, 2023
4f5b622
Update convert-gptneox-h5-to-gguf.py
klosax Jul 31, 2023
6b3a7b9
Update convert-llama-h5-to-gguf.py
klosax Jul 31, 2023
7aa0a0e
gguf : support custom alignment value
monatis Jul 31, 2023
b26f5b2
gguf : fix typo in function call
monatis Jul 31, 2023
bb42aef
gguf : mmap tensor data example
monatis Jul 31, 2023
f3de876
fix : update convert-llama-h5-to-gguf.py
monatis Jul 31, 2023
da4900e
Update convert-llama-h5-to-gguf.py
klosax Jul 31, 2023
e7a7416
convert-gptneox-h5-to-gguf.py : Special tokens
klosax Aug 1, 2023
c77fabb
gptneox-main.cpp : special tokens
klosax Aug 1, 2023
36a36c3
Update gptneox-main.cpp
klosax Aug 1, 2023
ff1cb02
constants.py : special tokens
klosax Aug 1, 2023
49380a2
gguf.py : accumulate kv and tensor info data + special tokens
klosax Aug 1, 2023
1b4f9c8
convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens
klosax Aug 1, 2023
cf365fb
gguf : gguf counterpart of llama-util.h
monatis Aug 2, 2023
c3a65c4
gguf-util.h : update note
monatis Aug 2, 2023
e1e9b28
convert-llama-h5-to-gguf.py : accumulate kv / ti + special tokens
klosax Aug 2, 2023
c5ba5ef
convert-llama-h5-to-gguf.py : special tokens
klosax Aug 2, 2023
23abbe8
Delete gptneox-common.cpp
klosax Aug 4, 2023
6691aa8
Delete gptneox-common.h
klosax Aug 4, 2023
2922280
convert-gptneox-h5-to-gguf.py : gpt2bpe tokenizer
klosax Aug 4, 2023
e6f19ba
gptneox-main.cpp : gpt2 bpe tokenizer
klosax Aug 4, 2023
5d98989
gpt2 bpe tokenizer (handles merges and unicode)
klosax Aug 4, 2023
fb0b243
Makefile : remove gptneox-common
klosax Aug 4, 2023
278ada9
gguf.py : bytesarray for gpt2bpe tokenizer
klosax Aug 4, 2023
db5618a
cmpnct_gpt2bpe.hpp : comments
klosax Aug 4, 2023
4357e69
gguf.py : use custom alignment if present
klosax Aug 7, 2023
1da82c5
Merge branch 'master' into gguf
ggerganov Aug 7, 2023
8083ae3
gguf : minor stuff
ggerganov Aug 7, 2023
65559a2
Update gptneox-main.cpp
klosax Aug 7, 2023
ece4fc1
map tensor names
klosax Aug 8, 2023
f4d137d
convert-gptneox-h5-to-gguf.py : map tensor names
klosax Aug 8, 2023
7d5f452
convert-llama-h5-to-gguf.py : map tensor names
klosax Aug 8, 2023
0246d0d
gptneox-main.cpp : map tensor names
klosax Aug 8, 2023
1c4d8bf
gguf : start implementing libllama in GGUF (WIP)
monatis Aug 10, 2023
4f86518
gguf : start implementing libllama in GGUF (WIP)
monatis Aug 10, 2023
4c0f64e
rm binary commited by mistake
monatis Aug 10, 2023
22de6c5
upd .gitignore
monatis Aug 10, 2023
42cc04d
gguf : calculate n_mult
monatis Aug 10, 2023
cfb8e35
gguf : inference with 7B model working (WIP)
monatis Aug 10, 2023
f316b94
gguf : rm deprecated function
monatis Aug 10, 2023
e7d346c
gguf : start implementing gguf_file_saver (WIP)
monatis Aug 11, 2023
a356b0e
gguf : start implementing gguf_file_saver (WIP)
monatis Aug 11, 2023
b2440f1
gguf : start implementing gguf_file_saver (WIP)
monatis Aug 11, 2023
eb8ca69
gguf : add gguf_get_kv_type
monatis Aug 11, 2023
e3a4960
gguf : add gguf_get_kv_type
monatis Aug 11, 2023
28abfc9
gguf : write metadata in gguf_file_saver (WIP)
monatis Aug 11, 2023
781b9ec
gguf : write metadata in gguf_file_saver (WIP)
monatis Aug 11, 2023
d09fd10
gguf : write metadata in gguf_file_saver
monatis Aug 11, 2023
61919c1
gguf : rm references to old file formats
monatis Aug 11, 2023
7009cf5
gguf : shorter name for member variable
monatis Aug 11, 2023
f44bbd3
gguf : rm redundant method
monatis Aug 11, 2023
e732423
gguf : get rid of n_mult, read n_ff from file
monatis Aug 11, 2023
2a5ac7a
Update gguf_tensor_map.py
klosax Aug 11, 2023
e76c59d
Update gptneox-main.cpp
klosax Aug 11, 2023
2f52008
gguf : rm references to old file magics
monatis Aug 12, 2023
186c496
Merge branch 'gguf' of https://github.com//ggerganov/llama.cpp into gguf
monatis Aug 12, 2023
4fa017a
gguf : start implementing quantization (WIP)
monatis Aug 12, 2023
0e1a3c7
gguf : start implementing quantization (WIP)
monatis Aug 12, 2023
c4f02b4
gguf : start implementing quantization (WIP)
monatis Aug 12, 2023
b2571af
gguf : start implementing quantization (WIP)
monatis Aug 12, 2023
fa7c395
gguf : start implementing quantization (WIP)
monatis Aug 12, 2023
1fc3d30
gguf : start implementing quantization (WIP)
monatis Aug 12, 2023
202eab0
gguf : quantization is working
monatis Aug 12, 2023
60d5408
gguf : roper closing of file
monatis Aug 12, 2023
5d81a71
gguf.py : no need to convert tensors twice
klosax Aug 12, 2023
8f09157
convert-gptneox-h5-to-gguf.py : no need to convert tensors twice
klosax Aug 12, 2023
4cef57c
convert-llama-h5-to-gguf.py : no need to convert tensors twice
klosax Aug 12, 2023
f821847
convert-gptneox-h5-to-gguf.py : simplify nbytes
klosax Aug 12, 2023
e606ffe
convert-llama-h5-to-gguf.py : simplify nbytes
klosax Aug 12, 2023
5e58ffa
gptneox-main.cpp : n_layer --> n_block
klosax Aug 12, 2023
8b5f0c5
constants.py : n_layer --> n_block
klosax Aug 12, 2023
d2ce9cf
gguf.py : n_layer --> n_block
klosax Aug 12, 2023
489616e
convert-gptneox-h5-to-gguf.py : n_layer --> n_block
klosax Aug 12, 2023
e91a222
convert-llama-h5-to-gguf.py : n_layer --> n_block
klosax Aug 12, 2023
c7bd8c1
gptneox-main.cpp : n_layer --> n_block
klosax Aug 12, 2023
9bf5a7e
Update gguf_tensor_map.py
klosax Aug 12, 2023
e3d1f07
convert-gptneox-h5-to-gguf.py : load model in parts to save memory
klosax Aug 13, 2023
17800cd
convert-llama-h5-to-gguf.py : load model in parts to save memory
klosax Aug 13, 2023
91d4bfd
convert : write more metadata for LLaMA
monatis Aug 13, 2023
1d60468
fix conflicts
monatis Aug 13, 2023
bf2dad3
convert : rm quantization version
monatis Aug 13, 2023
2827b84
convert-gptneox-h5-to-gguf.py : add file_type key
klosax Aug 13, 2023
6beebf3
gptneox-main.cpp : add file_type key
klosax Aug 13, 2023
24f4883
fix conflicts
monatis Aug 13, 2023
196b50f
gguf : add todos and comments
monatis Aug 14, 2023
56a1f32
Merge branch 'master' into gguf
ggerganov Aug 14, 2023
5d22a9d
convert-gptneox-h5-to-gguf.py : tensor name map changes
klosax Aug 14, 2023
51939d7
Create gguf_namemap.py : tensor name map changes
klosax Aug 14, 2023
806a157
Delete gguf_tensor_map.py
klosax Aug 14, 2023
d753dfb
gptneox-main.cpp : tensor name map changes
klosax Aug 14, 2023
a7d226f
convert-llama-h5-to-gguf.py : fixes
klosax Aug 14, 2023
5c5a95b
gguf.py : dont add empty strings
klosax Aug 14, 2023
0c19ae7
simple : minor style changes
ggerganov Aug 14, 2023
62490f1
gguf : use UNIX line ending
ggerganov Aug 14, 2023
6f64b6c
Create convert-llama-7b-pth-to-gguf.py
klosax Aug 14, 2023
f00780b
llama : sync gguf-llama.cpp with latest llama.cpp (#2608)
ggerganov Aug 14, 2023
6f14854
gitignore : add gptneox-main
ggerganov Aug 14, 2023
8af3a99
Merge branch 'master' into gguf
ggerganov Aug 14, 2023
ec1b100
llama : tokenizer fixes (#2549)
goerch Aug 14, 2023
afc4ca2
convert : update convert-new.py with tokenizer fixes (#2614)
goerch Aug 14, 2023
7494c78
llama : sync gguf-llama with llama (#2613)
ggerganov Aug 14, 2023
6c63550
llama : update tokenizer style
ggerganov Aug 14, 2023
7ec125b
convert-llama-h5-to-gguf.py : add token types
klosax Aug 14, 2023
5d518d4
constants.py : add token types
klosax Aug 14, 2023
cedb487
gguf.py : add token types
klosax Aug 14, 2023
ab2cbd0
convert-llama-7b-pth-to-gguf.py : add token types
klosax Aug 14, 2023
ca47582
gguf-llama.cpp : fix n_head_kv
klosax Aug 14, 2023
2dd5d2c
convert-llama-h5-to-gguf.py : add 70b gqa support
klosax Aug 14, 2023
b6056c3
gguf.py : add tensor data layout
klosax Aug 15, 2023
66756c8
convert-llama-h5-to-gguf.py : add tensor data layout
klosax Aug 15, 2023
2ae0e98
convert-llama-7b-pth-to-gguf.py : add tensor data layout
klosax Aug 15, 2023
4a1741a
gptneox-main.cpp : add tensor data layout
klosax Aug 15, 2023
ea5615a
convert-llama-h5-to-gguf.py : clarify the reverse permute
klosax Aug 16, 2023
758ff1b
llama : refactor model loading code (#2620)
ggerganov Aug 16, 2023
88b5769
gguf : deduplicate (#2629)
ggerganov Aug 16, 2023
c8ee87f
gguf.py : merge all files in gguf.py
ggerganov Aug 16, 2023
5ec1893
convert-new.py : pick #2427 for HF 70B support
ggerganov Aug 16, 2023
42f8fe1
examples/gguf : no need to keep q option for quantization any more
monatis Aug 17, 2023
5a0a2c5
llama.cpp : print actual model size
klosax Aug 17, 2023
d6fd53a
llama.cpp : use ggml_elements()
klosax Aug 17, 2023
e0429d3
convert-new.py : output gguf (#2635)
ggerganov Aug 17, 2023
2ddd968
convert.py : update to support GGUF output
ggerganov Aug 17, 2023
dd016cc
Revert "ci : disable CI temporary to not waste energy"
ggerganov Aug 17, 2023
d646c4e
convert.py : n_head_kv optional and .gguf file extension
klosax Aug 17, 2023
8ace03a
convert.py : better always have n_head_kv and default it to n_head
ggerganov Aug 17, 2023
11bf436
llama : sync with recent PRs on master
ggerganov Aug 17, 2023
6d66ef9
Merge branch 'master' into gguf
ggerganov Aug 17, 2023
c3b7393
editorconfig : ignore models folder
ggerganov Aug 17, 2023
dd9e2fc
ci : update ".bin" to ".gguf" extension
ggerganov Aug 17, 2023
81a2c2a
llama : fix llama_model_loader memory leak
ggerganov Aug 17, 2023
93f285b
gptneox : move as a WIP example
ggerganov Aug 17, 2023
899f9a5
llama : fix lambda capture
ggerganov Aug 17, 2023
e72c8c2
ggml : fix bug in gguf_set_kv
ggerganov Aug 17, 2023
fb11dd3
common.h : .bin --> .gguf
klosax Aug 17, 2023
78e1e57
quantize-stats.cpp : .bin --> .gguf
klosax Aug 17, 2023
acaa982
convert.py : fix HF tensor permuting / unpacking
ggerganov Aug 17, 2023
b3cc182
llama.cpp : typo
klosax Aug 17, 2023
57eaadb
llama : throw error if gguf fails to init from file
ggerganov Aug 17, 2023
5484737
llama : fix tensor name grepping during quantization
ggerganov Aug 17, 2023
fc3a523
gguf.py : write tensors in a single pass (#2644)
monatis Aug 17, 2023
b668cd3
convert-gptneox-hf-to-gguf.py : fixes
klosax Aug 17, 2023
640ddc4
gguf.py : gptneox mapping
klosax Aug 17, 2023
9e2d4dd
convert-llama-hf-to-gguf.py : fixes
klosax Aug 17, 2023
3c1b721
convert-llama-7b-pth-to-gguf.py : fixes
klosax Aug 17, 2023
c20ae49
ggml.h : reverse GGUF_MAGIC
klosax Aug 17, 2023
147a99b
gguf.py : reverse GGUF_MAGIC
klosax Aug 17, 2023
d9e6890
test-tokenizer-0.cpp : fix warning
klosax Aug 17, 2023
306070c
llama.cpp : print kv general.name
klosax Aug 17, 2023
b275de7
llama.cpp : get special token kv and linefeed token id
klosax Aug 18, 2023
aa3efe8
llama : print number of tensors per type + print arch + style
ggerganov Aug 18, 2023
856afff
Merge branch 'master' into gguf
ggerganov Aug 18, 2023
e35f8c7
tests : update vocab file with new magic
ggerganov Aug 18, 2023
dea5be6
editorconfig : fix whitespaces
ggerganov Aug 18, 2023
660ca9b
llama : re-order functions
ggerganov Aug 18, 2023
38016ed
Merge branch 'master' into gguf
ggerganov Aug 18, 2023
2d6c2c7
llama : remove C++ API + reorganize common source in /common dir
ggerganov Aug 18, 2023
035d511
llama : minor API updates
ggerganov Aug 18, 2023
5d2656d
llama : avoid hardcoded special tokens
ggerganov Aug 18, 2023
a4ad2bf
llama : fix MPI build
ggerganov Aug 18, 2023
25b8a89
llama : introduce enum llama_vocab_type + remove hardcoded string con…
ggerganov Aug 18, 2023
fb7c883
convert-falcon-hf-to-gguf.py : falcon HF --> gguf conversion, not tested
klosax Aug 18, 2023
d5e976c
falcon-main.cpp : falcon inference example
klosax Aug 18, 2023
16ab9ba
convert-falcon-hf-to-gguf.py : remove extra kv
klosax Aug 18, 2023
c0e4ca6
convert-gptneox-hf-to-gguf.py : remove extra kv
klosax Aug 18, 2023
593b04f
convert-llama-7b-pth-to-gguf.py : remove extra kv
klosax Aug 18, 2023
281d6d1
convert-llama-hf-to-gguf.py : remove extra kv
klosax Aug 18, 2023
bd5a579
gguf.py : fix for falcon 40b
klosax Aug 18, 2023
1d80eea
falcon-main.cpp : fix for falcon 40b
klosax Aug 18, 2023
2c8055b
convert-falcon-hf-to-gguf.py : update ref
klosax Aug 18, 2023
b3a7a2b
convert-falcon-hf-to-gguf.py : add tensor data layout
klosax Aug 19, 2023
dadf098
cmpnct_gpt2bpe.hpp : fixes
klosax Aug 19, 2023
781bf24
falcon-main.cpp : fixes
klosax Aug 19, 2023
8945d47
gptneox-main.cpp : fixes
klosax Aug 19, 2023
6a2e520
cmpnct_gpt2bpe.hpp : remove non-general stuff
klosax Aug 19, 2023
c0a1269
Update examples/server/README.md
klosax Aug 19, 2023
28b8c26
cmpnct_gpt2bpe.hpp : cleanup
klosax Aug 19, 2023
76b4662
convert-llama-hf-to-gguf.py : special tokens
klosax Aug 20, 2023
f838faa
convert-llama-7b-pth-to-gguf.py : special tokens
klosax Aug 20, 2023
5a02b96
convert-permute-debug.py : permute debug print
klosax Aug 21, 2023
4f92488
convert-permute-debug-master.py : permute debug for master
klosax Aug 21, 2023
7de7cb4
convert-permute-debug.py : change permute type of attn_q
klosax Aug 21, 2023
d5c8fcf
convert.py : 70b model working (change attn_q permute)
klosax Aug 21, 2023
287db51
Delete convert-permute-debug-master.py
klosax Aug 21, 2023
58bde5c
Delete convert-permute-debug.py
klosax Aug 21, 2023
c818c40
convert-llama-hf-to-gguf.py : fix attn_q permute
klosax Aug 21, 2023
6a69a69
gguf.py : fix rope scale kv
klosax Aug 21, 2023
5f6ff38
convert-llama-hf-to-gguf.py : rope scale and added tokens
klosax Aug 21, 2023
dc1f051
convert-llama-7b-pth-to-gguf.py : rope scale and added tokens
klosax Aug 21, 2023
c082b9f
llama.cpp : use rope scale kv
klosax Aug 21, 2023
9070e33
convert-llama-7b-pth-to-gguf.py : rope scale fix
klosax Aug 21, 2023
7a7d1ba
convert-llama-hf-to-gguf.py : rope scale fix
klosax Aug 21, 2023
1e7a009
Merge branch 'master' into gguf
ggerganov Aug 21, 2023
6490ff7
py : fix whitespace
ggerganov Aug 21, 2023
e06cbce
gguf : add Python script to convert GGMLv3 LLaMA models to GGUF (#2682)
KerfuffleV2 Aug 21, 2023
8d177ed
llama : improve token type support (#2668)
goerch Aug 21, 2023
0b53b8b
llama : add API for token type
ggerganov Aug 21, 2023
49c25cc
tests : use new tokenizer type API (#2692)
goerch Aug 21, 2023
811f653
py : cosmetics
ggerganov Aug 21, 2023
66a66a0
readme : add notice about new file format
ggerganov Aug 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
*.o
*.a
*.so
*.gguf
*.bin
.DS_Store
.build/
Expand Down Expand Up @@ -47,6 +48,8 @@ models-mnt
/server
/Pipfile
/embd-input-test
/gguf
/gguf-llama-simple
/libllama.so
/llama-bench
build-info.h
Expand All @@ -65,7 +68,6 @@ perf-*.txt

examples/jeopardy/results.txt


pyproject.toml
poetry.lock
poetry.toml
Expand Down
13 changes: 11 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -497,9 +497,11 @@ else()
endif()

#
# Build libraries
# libraries
#

# ggml

add_library(ggml OBJECT
ggml.c
ggml.h
Expand All @@ -524,10 +526,11 @@ if (BUILD_SHARED_LIBS)
install(TARGETS ggml_shared LIBRARY)
endif()

# llama

add_library(llama
llama.cpp
llama.h
llama-util.h
)

target_include_directories(llama PUBLIC .)
Expand All @@ -546,6 +549,10 @@ if (BUILD_SHARED_LIBS)
install(TARGETS llama LIBRARY)
endif()

#
# install
#

include(GNUInstallDirs)
install(
FILES convert.py
Expand Down Expand Up @@ -584,6 +591,8 @@ endif()
# programs, examples and tests
#

add_subdirectory(common)

if (LLAMA_BUILD_TESTS AND NOT CMAKE_JS_VERSION)
include(CTest)
add_subdirectory(tests)
Expand Down
23 changes: 13 additions & 10 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Define the default target now so that it is always the first target
BUILD_TARGETS = main quantize quantize-stats perplexity embedding vdot train-text-from-scratch convert-llama2c-to-ggml simple server embd-input-test llama-bench
BUILD_TARGETS = main quantize quantize-stats perplexity embedding vdot train-text-from-scratch convert-llama2c-to-ggml simple server embd-input-test gguf llama-bench

# Binaries only useful for tests
TEST_TARGETS = tests/test-llama-grammar tests/test-grammar-parser tests/test-double-float tests/test-grad0 tests/test-opt tests/test-quantize-fns tests/test-quantize-perf tests/test-sampling tests/test-tokenizer-0
Expand Down Expand Up @@ -45,8 +45,8 @@ OPT = -Ofast
else
OPT = -O3
endif
CFLAGS = -I. $(OPT) -std=c11 -fPIC
CXXFLAGS = -I. -I./examples $(OPT) -std=c++11 -fPIC
CFLAGS = -I. $(OPT) -std=c11 -fPIC
CXXFLAGS = -I. -I./common $(OPT) -std=c++11 -fPIC
LDFLAGS =

ifdef LLAMA_DEBUG
Expand Down Expand Up @@ -329,23 +329,23 @@ ggml-alloc.o: ggml-alloc.c ggml.h ggml-alloc.h

OBJS += ggml-alloc.o

llama.o: llama.cpp ggml.h ggml-alloc.h ggml-cuda.h ggml-metal.h llama.h llama-util.h
llama.o: llama.cpp ggml.h ggml-alloc.h ggml-cuda.h ggml-metal.h llama.h
$(CXX) $(CXXFLAGS) -c $< -o $@

common.o: examples/common.cpp examples/common.h
common.o: common/common.cpp common/common.h
$(CXX) $(CXXFLAGS) -c $< -o $@

console.o: examples/console.cpp examples/console.h
console.o: common/console.cpp common/console.h
$(CXX) $(CXXFLAGS) -c $< -o $@

grammar-parser.o: examples/grammar-parser.cpp examples/grammar-parser.h
grammar-parser.o: common/grammar-parser.cpp common/grammar-parser.h
$(CXX) $(CXXFLAGS) -c $< -o $@

libllama.so: llama.o ggml.o $(OBJS)
$(CXX) $(CXXFLAGS) -shared -fPIC -o $@ $^ $(LDFLAGS)

clean:
rm -vf *.o *.so *.dll main quantize quantize-stats perplexity embedding benchmark-matmult save-load-state server simple vdot train-text-from-scratch convert-llama2c-to-ggml embd-input-test llama-bench build-info.h $(TEST_TARGETS)
rm -vf *.o *.so *.dll main quantize quantize-stats perplexity embedding benchmark-matmult save-load-state server simple vdot train-text-from-scratch convert-llama2c-to-ggml embd-input-test gguf llama-bench build-info.h $(TEST_TARGETS)

#
# Examples
Expand Down Expand Up @@ -385,7 +385,10 @@ $(LIB_PRE)embdinput$(DSO_EXT): examples/embd-input/embd-input.h examples/embd-in
embd-input-test: $(LIB_PRE)embdinput$(DSO_EXT) examples/embd-input/embd-input-test.cpp build-info.h ggml.o llama.o common.o $(OBJS)
$(CXX) $(CXXFLAGS) $(filter-out %$(DSO_EXT),$(filter-out %.h,$(filter-out %.hpp,$^))) -o $@ $(LDFLAGS) -L. -lembdinput

train-text-from-scratch: examples/train-text-from-scratch/train-text-from-scratch.cpp build-info.h ggml.o llama.o $(OBJS)
gguf: examples/gguf/gguf.cpp build-info.h ggml.o llama.o $(OBJS)
$(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS)

train-text-from-scratch: examples/train-text-from-scratch/train-text-from-scratch.cpp build-info.h ggml.o llama.o common.o $(OBJS)
$(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS)

convert-llama2c-to-ggml: examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp build-info.h ggml.o llama.o $(OBJS)
Expand Down Expand Up @@ -418,7 +421,7 @@ vdot: pocs/vdot/vdot.cpp ggml.o $(OBJS)
tests/test-llama-grammar: tests/test-llama-grammar.cpp build-info.h ggml.o llama.o common.o $(OBJS)
$(CXX) $(CXXFLAGS) $(filter-out %.txt,$^) -o $@ $(LDFLAGS)

tests/test-grammar-parser: tests/test-grammar-parser.cpp examples/grammar-parser.cpp build-info.h ggml.o llama.o common.o $(OBJS)
tests/test-grammar-parser: tests/test-grammar-parser.cpp build-info.h ggml.o llama.o common.o $(OBJS)
$(CXX) $(CXXFLAGS) $(filter-out %.txt,$^) -o $@ $(LDFLAGS)

tests/test-double-float: tests/test-double-float.cpp build-info.h ggml.o llama.o common.o $(OBJS)
Expand Down
34 changes: 21 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,17 @@

Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++

### 🚧 Incoming breaking change + refactoring:
### Hot topics

See PR https://github.com/ggerganov/llama.cpp/pull/2398 for more info.
A new file format has been introduced: [GGUF](https://github.com/ggerganov/llama.cpp/pull/2398)

To devs: avoid making big changes to `llama.h` / `llama.cpp` until merged
Last revision compatible with the old format: [dadbed9](https://github.com/ggerganov/llama.cpp/commit/dadbed99e65252d79f81101a392d0d6497b86caa)

### Current `master` should be considered in Beta - expect some issues for a few days!

### Be prepared to re-convert and / or re-quantize your GGUF models while this notice is up!

### Issues with non-GGUF models will be considered with low priority!

----

Expand Down Expand Up @@ -291,7 +297,7 @@ When built with Metal support, you can enable GPU inference with the `--gpu-laye
Any value larger than 0 will offload the computation to the GPU. For example:

```bash
./main -m ./models/7B/ggml-model-q4_0.bin -n 128 -ngl 1
./main -m ./models/7B/ggml-model-q4_0.gguf -n 128 -ngl 1
```

### MPI Build
Expand Down Expand Up @@ -330,7 +336,7 @@ The above will distribute the computation across 2 processes on the first host a
Finally, you're ready to run a computation using `mpirun`:

```bash
mpirun -hostfile hostfile -n 3 ./main -m ./models/7B/ggml-model-q4_0.bin -n 128
mpirun -hostfile hostfile -n 3 ./main -m ./models/7B/ggml-model-q4_0.gguf -n 128
```

### BLAS Build
Expand Down Expand Up @@ -513,10 +519,10 @@ python3 convert.py models/7B/
python convert.py models/7B/ --vocabtype bpe

# quantize the model to 4-bits (using q4_0 method)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0
./quantize ./models/7B/ggml-model-f16.gguf ./models/7B/ggml-model-q4_0.gguf q4_0

# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -n 128
./main -m ./models/7B/ggml-model-q4_0.gguf -n 128
```

When running the larger models, make sure you have enough disk space to store all the intermediate files.
Expand Down Expand Up @@ -572,7 +578,7 @@ Here is an example of a few-shot interaction, invoked with the command
./examples/chat-13B.sh

# custom arguments using a 13B model
./main -m ./models/13B/ggml-model-q4_0.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
./main -m ./models/13B/ggml-model-q4_0.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
```

Note the use of `--color` to distinguish between user input and generated text. Other parameters are explained in more detail in the [README](examples/main/README.md) for the `main` example program.
Expand Down Expand Up @@ -635,6 +641,8 @@ OpenLLaMA is an openly licensed reproduction of Meta's original LLaMA model. It

### Using [GPT4All](https://github.com/nomic-ai/gpt4all)

*Note: these instructions are likely obsoleted by the GGUF update*

- Obtain the `tokenizer.model` file from LLaMA model and put it to `models`
- Obtain the `added_tokens.json` file from Alpaca model and put it to `models`
- Obtain the `gpt4all-lora-quantized.bin` file from GPT4All model and put it to `models/gpt4all-7B`
Expand Down Expand Up @@ -710,7 +718,7 @@ If your issue is with model generation quality, then please at least scan the fo
#### How to run

1. Download/extract: https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip?ref=salesforce-research
2. Run `./perplexity -m models/7B/ggml-model-q4_0.bin -f wiki.test.raw`
2. Run `./perplexity -m models/7B/ggml-model-q4_0.gguf -f wiki.test.raw`
3. Output:
```
perplexity : calculating perplexity over 655 chunks
Expand Down Expand Up @@ -809,13 +817,13 @@ docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-
On completion, you are ready to play!

```bash
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
```

or with a light image:

```bash
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
```

### Docker With CUDA
Expand Down Expand Up @@ -846,8 +854,8 @@ The resulting images, are essentially the same as the non-CUDA images:
After building locally, Usage is similar to the non-CUDA examples, but you'll need to add the `--gpus` flag. You will also want to use the `--n-gpu-layers` flag.

```bash
docker run --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
docker run --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
```

### Contributing
Expand Down
44 changes: 22 additions & 22 deletions ci/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -159,17 +159,17 @@ function gg_run_open_llama_3b_v2 {

python3 ../convert.py ${path_models}

model_f16="${path_models}/ggml-model-f16.bin"
model_q8_0="${path_models}/ggml-model-q8_0.bin"
model_q4_0="${path_models}/ggml-model-q4_0.bin"
model_q4_1="${path_models}/ggml-model-q4_1.bin"
model_q5_0="${path_models}/ggml-model-q5_0.bin"
model_q5_1="${path_models}/ggml-model-q5_1.bin"
model_q2_k="${path_models}/ggml-model-q2_k.bin"
model_q3_k="${path_models}/ggml-model-q3_k.bin"
model_q4_k="${path_models}/ggml-model-q4_k.bin"
model_q5_k="${path_models}/ggml-model-q5_k.bin"
model_q6_k="${path_models}/ggml-model-q6_k.bin"
model_f16="${path_models}/ggml-model-f16.gguf"
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
model_q4_0="${path_models}/ggml-model-q4_0.gguf"
model_q4_1="${path_models}/ggml-model-q4_1.gguf"
model_q5_0="${path_models}/ggml-model-q5_0.gguf"
model_q5_1="${path_models}/ggml-model-q5_1.gguf"
model_q2_k="${path_models}/ggml-model-q2_k.gguf"
model_q3_k="${path_models}/ggml-model-q3_k.gguf"
model_q4_k="${path_models}/ggml-model-q4_k.gguf"
model_q5_k="${path_models}/ggml-model-q5_k.gguf"
model_q6_k="${path_models}/ggml-model-q6_k.gguf"

wiki_test_60="${path_wiki}/wiki.test-60.raw"

Expand Down Expand Up @@ -285,17 +285,17 @@ function gg_run_open_llama_7b_v2 {

python3 ../convert.py ${path_models}

model_f16="${path_models}/ggml-model-f16.bin"
model_q8_0="${path_models}/ggml-model-q8_0.bin"
model_q4_0="${path_models}/ggml-model-q4_0.bin"
model_q4_1="${path_models}/ggml-model-q4_1.bin"
model_q5_0="${path_models}/ggml-model-q5_0.bin"
model_q5_1="${path_models}/ggml-model-q5_1.bin"
model_q2_k="${path_models}/ggml-model-q2_k.bin"
model_q3_k="${path_models}/ggml-model-q3_k.bin"
model_q4_k="${path_models}/ggml-model-q4_k.bin"
model_q5_k="${path_models}/ggml-model-q5_k.bin"
model_q6_k="${path_models}/ggml-model-q6_k.bin"
model_f16="${path_models}/ggml-model-f16.gguf"
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
model_q4_0="${path_models}/ggml-model-q4_0.gguf"
model_q4_1="${path_models}/ggml-model-q4_1.gguf"
model_q5_0="${path_models}/ggml-model-q5_0.gguf"
model_q5_1="${path_models}/ggml-model-q5_1.gguf"
model_q2_k="${path_models}/ggml-model-q2_k.gguf"
model_q3_k="${path_models}/ggml-model-q3_k.gguf"
model_q4_k="${path_models}/ggml-model-q4_k.gguf"
model_q5_k="${path_models}/ggml-model-q5_k.gguf"
model_q6_k="${path_models}/ggml-model-q6_k.gguf"

wiki_test="${path_wiki}/wiki.test.raw"

Expand Down
20 changes: 20 additions & 0 deletions common/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# common

set(TARGET common)

add_library(${TARGET} OBJECT
common.h
common.cpp
console.h
console.cpp
grammar-parser.h
grammar-parser.cpp
)

if (BUILD_SHARED_LIBS)
set_target_properties(${TARGET} PROPERTIES POSITION_INDEPENDENT_CODE ON)
endif()

target_include_directories(${TARGET} PUBLIC .)
target_compile_features(${TARGET} PUBLIC cxx_std_11)
target_link_libraries(${TARGET} PRIVATE llama)
Loading