Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal support #78

Open
ghost opened this issue Aug 2, 2023 · 2 comments
Open

Metal support #78

ghost opened this issue Aug 2, 2023 · 2 comments

Comments

@ghost
Copy link

ghost commented Aug 2, 2023

ggml and llama.cpp support Metal, do Apple Silicon users need to use LLaMA.cpp or can they use gglm.cpp with Falcon?

@pjezek
Copy link

pjezek commented Aug 19, 2023

I tried the following:

build: LLAMA_METAL=1 make falcon_main falcon_quantize falcon_perplexity

then run the model with: ./falcon_main -t 4 -ngl 100 -b 1 -m ../Models/WizardLM-Uncensored-Falcon-7B-GGML/wizardlm-7b-uncensored.ggccv1.q4_0.bin -enc -p "write a story about llamas"

It outputs:

main: build = 883 (2b487f2)
falcon.cpp: loading model from ../Models/WizardLM-Uncensored-Falcon-7B-GGML/wizardlm-7b-uncensored.ggccv1.q4_0.bin
falcon.cpp: file version 10
+---------------+------------+---------+---------+-------+--------+---------------+---------+--------+-------+--------+
|          Info |     format | n_vocab |   n_bpe | n_ctx | n_embd |   n_head ; kv | n_layer | falcon | ftype |   n_ff |
+---------------+------------+---------+---------+-------+--------+---------------+---------+--------+-------+--------+
|               |    ggcc v1 |   65024 |   64784 |  2048 |   4544 |      71 ;   1 |      32 |  7; 7B |     2 |  18176 |
+---------------+------------+---------+---------+-------+--------+---------------+---------+--------+-------+--------+
falcon_model_load_internal: ggml ctx size =    0.00 MB (mmap size = 3872.00 MB)
falcon.cpp: Special mode: Wizard-type finetuning - changing tensor shape
falcon.cpp: Special mode: Wizard-type finetuning - changing tensor shape
falcon_model_load_internal: mem required  = 4196.81 MB (+   48.00 MB per state)
[==================================================] 100%  Tensors populated
falcon_context_prepare: Context falcon_main RAM buffers - key_val =   16.00 MB, Compute =  160.00 MB, Scratch 0 =  124.00 MB, Scratch 1 =   40.14 MB
ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/Volumes/SanDisk/ggllm.cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x14160b850
ggml_metal_init: loaded kernel_mul                            0x14160bf70
ggml_metal_init: loaded kernel_mul_row                        0x14160c5a0
ggml_metal_init: loaded kernel_scale                          0x14160cac0
ggml_metal_init: loaded kernel_silu                           0x14160cfe0
ggml_metal_init: loaded kernel_relu                           0x14160d500
ggml_metal_init: loaded kernel_gelu                           0x14160da20
ggml_metal_init: loaded kernel_soft_max                       0x14160e0d0
ggml_metal_init: loaded kernel_diag_mask_inf                  0x14160e730
ggml_metal_init: loaded kernel_get_rows_f16                   0x14160edb0
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x14160f430
ggml_metal_init: loaded kernel_get_rows_q4_1                  0x14160fc20
ggml_metal_init: loaded kernel_get_rows_q2_k                  0x1416102a0
ggml_metal_init: loaded kernel_get_rows_q3_k                  0x141610920
ggml_metal_init: loaded kernel_get_rows_q4_k                  0x141610fa0
ggml_metal_init: loaded kernel_get_rows_q5_k                  0x141611620
ggml_metal_init: loaded kernel_get_rows_q6_k                  0x141611ca0
ggml_metal_init: loaded kernel_rms_norm                       0x141612350
ggml_metal_init: loaded kernel_norm                           0x141612a00
ggml_metal_init: loaded kernel_mul_mat_f16_f32                0x1416133d0
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32               0x141613ab0
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32               0x141614190
ggml_metal_init: loaded kernel_mul_mat_q2_k_f32               0x141614870
ggml_metal_init: loaded kernel_mul_mat_q3_k_f32               0x1416150f0
ggml_metal_init: loaded kernel_mul_mat_q4_k_f32               0x1416157d0
ggml_metal_init: loaded kernel_mul_mat_q5_k_f32               0x141615eb0
ggml_metal_init: loaded kernel_mul_mat_q6_k_f32               0x141616590
ggml_metal_init: loaded kernel_rope                           0x141617080
ggml_metal_init: loaded kernel_alibi_f32                      0x141617940
ggml_metal_init: loaded kernel_cpy_f32_f16                    0x1416181d0
ggml_metal_init: loaded kernel_cpy_f32_f32                    0x141618a60
ggml_metal_init: loaded kernel_cpy_f16_f16                    0x1416192f0
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  3874.44 MB
ggml_metal_add_buffer: allocated 'eval            ' buffer, size =   160.00 MB
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =    48.02 MB
ggml_metal_add_buffer: allocated 'scr0            ' buffer, size =   124.00 MB
ggml_metal_add_buffer: allocated 'scr1            ' buffer, size =    40.14 MB

+------------+-----+------+--------+-------------+-------------+-----+------+---------+------+---------+------+------+------+-----+
| Syst. Info | AVX | AVX2 | AVX512 | AVX512_VBMI | AVX512_VNNI | FMA | NEON | ARM_FMA | F16C | FP16_VA | SIMD | BLAS | SSE3 | VSX |
+------------+-----+------+--------+-------------+-------------+-----+------+---------+------+---------+------+------+------+-----+
|  4/10 thrd | 0   | 0    | 0      | 0           | 0           | 0   | 1    | 1       | 0    | 1       | 0    | 1    | 0    | 0   |
+------------+-----+------+--------+-------------+-------------+-----+------+---------+------+---------+------+------+------+-----+

+------------+-------+-------+-------+-------+-------+-------+-------+-------+------+------+--------+---------+
|   Sampling | rpt_n | rpt_p | prs_p | frq_p | top_k | tfs_z | top_p | typ_p | temp | miro | mir_lr | mir_ent |
+------------+-------+-------+-------+-------+-------+-------+-------+-------+------+------+--------+---------+
|            |    64 | 1.100 | 0.000 | 0.000 |    40 | 1.000 | 0.950 | 1.000 | 0.80 |    0 | 0.1000 | 5.00000 |
+============+=======+=======+=======+=======+=======+=======+-------+-------+------+------+--------+---------+
| Generation |   Ctx | Batch |  Keep | Prom. |          Seed |             Finetune | Stop |
+------------+-------+-------+-------+-------+---------------+----------------------+------+
|            |  2048 |     1 |     0 |    10 |    1692449979 |               WIZARD | #  1 |
+------------+-------+-------+-------+-------+---------------+----------------------+------+


GGML_ASSERT: ggml-metal.m:530: ne02 == ne12
GGML_ASSERT: ggml-metal.m:530: ne02 == ne12
zsh: abort      ./falcon_main -t 4 -ngl 100 -b 1 -m  -enc -p "write a story about llamas"

@0x3333
Copy link

0x3333 commented Dec 22, 2023

Same issue here... I'll try to convert the model to other types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants