[REQUEST] Tips for implementing custom models with ggml #655

astariul · 2023-12-15T08:39:17Z

astariul
Dec 15, 2023

I trained my custom model (basically a small GPT-2 Transformer with custom tokenization) in Pytorch (and HuggingFace transformers).

Now I'd like to use this model with ggml. But I'm struggling to make it work.

What I did so far is just copy-paste the GPT-2 example from this repo (the example is awesome), and :

Modify the python script to convert the checkpoint to ggml format
Modify the main.cpp to work with my model

It runs, but I get non-sense predictions (and sometimes NaN), so obviously it's going wrong somewhere.

So I start debugging, viewing the tensors' shapes & contents, and compare these to the outputs in Python.

But even the most basic operations give odd results ??

For example, when printing the weights and resulting tensor of the very first operation (embedding the input tokens) :

model/wte [128, 30001] :   // The weights for the embeddings layer
        -0.0563832 0.00455847 -0.125328 [...] -0.0923348 0.00975095 -0.0128656
        -0.0577813 -0.550247 0.570445 [...] -0.0183976 -0.192852 0.661399
        -0.240149 -0.155847 0.295066 [...] 0.00072934 -0.0952855 0.66841
                [...]
        -0.524615 0.586094 0.619831 [...] -1.05395 -0.600185 0.134186
        -0.86325 0.155288 0.182313 [...] -0.780446 -0.87372 0.173702
        -0.902127 0.275109 0.46895 [...] -0.373516 -0.42468 -0.5294

in/embd (view) [3] : 1 84 9   // Tokens of my prompt

And after the operation ggml_get_rows :

ggml/examples/gpt-2/main.cpp

Line 596 in 6b846cb

ggml_get_rows(ctx0, model.wte, embd),

I get :

wte [128, 3] :
        0.0315285 0.381108 0.18417 [...] -0.158195 -0.0631451 0.197098
        0.824364 1.46239 -0.773059 [...] 0.014166 0.311336 -1.38654
        -0.322728 1.78014 -0.559355 [...] -0.316343 0.0137355 -0.73075

Which... Doesn't make sense ?... The first token is 1, so according to the weights printed earlier, it should be -0.0577813 -0.550247 0.570445 [...] -0.0183976 -0.192852 0.661399...

In this situation, any tips on how to debug and fix the issues ?

Answered by ggerganov

Dec 19, 2023

Do you get the correct values if you make the graph to have only the ggml_get_rows operation?

The reason I'm asking is because further operations can overwrite the results of previous ops, so if you are looking at the ggml_get_rows memory after computing the graph, you could be looking at new values generated from the next operations in the graph. So first step is to remove all other ops and make sure that result is expected.

View full answer

ggerganov · 2023-12-19T10:34:06Z

ggerganov
Dec 19, 2023
Maintainer

Do you get the correct values if you make the graph to have only the ggml_get_rows operation?

The reason I'm asking is because further operations can overwrite the results of previous ops, so if you are looking at the ggml_get_rows memory after computing the graph, you could be looking at new values generated from the next operations in the graph. So first step is to remove all other ops and make sure that result is expected.

1 reply

astariul Dec 21, 2023
Author

Yes, I get the right value !
Thanks for your help !

YavorGIvanov · 2023-12-19T11:54:47Z

YavorGIvanov
Dec 19, 2023
Collaborator

Yeah. @ggerganov is right. How you actually print the tensor values when you debug is critical as memory is reused and this could lead to wrong debug outputs, so you can show us how you do it.

In order to get actually the correct output one way to go about is set the tensor name:

struct ggml_tensor * t = ggml_get_rows(ctx0, model.wte, embd);
ggml_set_name(t, "debug");
.. continue using t

Then if you are using the CPU backend:

go to ggml.c
go to the end of the ggml_compute_forward(..) function
and add a snippet to print the tensor with name "debug"
Example snippet:

    if (strstr(tensor->name, "debug") != NULL && params->ith + 1 == params->nth) {
        if (tensor->type == GGML_TYPE_F16) {
            print_t_f16(tensor->name, tensor, 10);
        } else {
            fprintf(stderr, "%s: error: unsupported type %d\n", __func__, tensor->type);
        }
    }

where e.g. print_t_f16 I define as:

static void print_t_f16(const char* title, struct ggml_tensor * t, int n) {
    printf("%s\n", title);

    const ggml_fp16_t * data = (ggml_fp16_t *) t->data;

    printf("dims: %jd %jd %jd %jd f16\n", t->ne[0], t->ne[1], t->ne[2], t->ne[3]);
    printf("First & Last %d elements:\n", n);
    for (int i = 0; i < MIN((int) (t->ne[0]*t->ne[1]), n); i++) {
        printf("%.5f ", (double)GGML_FP16_TO_FP32(data[i]));
        if (i != 0 && i % t->ne[0] == 0) {
            printf("\n");
        }
    }
    printf("\n");
    for (int i = 0; i < MIN((int) (t->ne[0]*t->ne[1]), n); i++) {
        printf("%.5f ", (double)GGML_FP16_TO_FP32(data[ggml_nelements(t) - n + i]));
        if ((ggml_nelements(t) - n + i) % t->ne[0] == 0) {
            printf("\n");
        }
    }
    printf("\n");
    double sum = 0.0;
    for (int i = 0; i < ggml_nelements(t); i++) {
        float d = GGML_FP16_TO_FP32(data[i]);
        if (!isnan(d) && !isinf(d)) {
            sum += (double)d;
        }
    }
    printf("sum:  %f\n\n", sum);
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] Tips for implementing custom models with ggml #655

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[REQUEST] Tips for implementing custom models with ggml #655

astariul Dec 15, 2023

Replies: 2 comments · 1 reply

ggerganov Dec 19, 2023 Maintainer

astariul Dec 21, 2023 Author

YavorGIvanov Dec 19, 2023 Collaborator

astariul
Dec 15, 2023

Replies: 2 comments 1 reply

ggerganov
Dec 19, 2023
Maintainer

astariul Dec 21, 2023
Author

YavorGIvanov
Dec 19, 2023
Collaborator