fix #338 - use wte if no lm_head for gpt2 #343

philpax · 2023-07-02T17:30:55Z

I think this should fix #338, but I don't have a model without a LM head to test.

@steventrouble do you have a sample model to test with?

LLukas22 · 2023-07-02T18:53:56Z

See ggerganov's original gpt-2 models.

philpax · 2023-07-02T21:18:07Z

Hm, ok, tested with that - turns out it segfaults due to an out-of-bounds memory access. Will investigate some other time...

steventrouble · 2023-07-09T00:15:30Z

I found the node that's causing the segfault. It's the 1d view of memory_k here being called with an out-of-bounds offset.

Here's an assertion that may help you repro.

The issue seems related to the similarity between the names hyperparameters.n_ctx and context_size. Updating start_session to use self.context_size instead of self.hyperparameters.n_ctx fixes the issue for me, but I don't know enough about ggml to be sure that's the root cause. Example commit

LLukas22 · 2023-07-09T10:55:41Z

I'll try to take a look at this later but good job finding this, saves me a lot of time 👍

philpax · 2023-07-09T17:33:09Z

I found the node that's causing the segfault. It's the 1d view of memory_k here being called with an out-of-bounds offset.

Here's an assertion that may help you repro.

The issue seems related to the similarity between the names hyperparameters.n_ctx and context_size. Updating start_session to use self.context_size instead of self.hyperparameters.n_ctx fixes the issue for me, but I don't know enough about ggml to be sure that's the root cause. Example commit

Well spotted! Feel free to open a PR with that and I'll close this. What you've (correctly) identified there is that that the context length that the model supports (hyperparameters n_ctx) is not the same as the context length that's actually in use. Oops.

philpax · 2023-07-09T19:45:48Z

Obsoleted by #362

fix #338 - use wte if no lm_head for gpt2

453f36a

steventrouble mentioned this pull request Jul 9, 2023

Update gpt2 to use wte if no lm_head #362

Merged

philpax closed this Jul 9, 2023

philpax deleted the gpt2-optional-lm-head branch July 16, 2023 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #338 - use wte if no lm_head for gpt2 #343

fix #338 - use wte if no lm_head for gpt2 #343

philpax commented Jul 2, 2023

LLukas22 commented Jul 2, 2023

philpax commented Jul 2, 2023

steventrouble commented Jul 9, 2023 •

edited

Loading

LLukas22 commented Jul 9, 2023

philpax commented Jul 9, 2023

philpax commented Jul 9, 2023

fix #338 - use wte if no lm_head for gpt2 #343

fix #338 - use wte if no lm_head for gpt2 #343

Conversation

philpax commented Jul 2, 2023

LLukas22 commented Jul 2, 2023

philpax commented Jul 2, 2023

steventrouble commented Jul 9, 2023 • edited Loading

LLukas22 commented Jul 9, 2023

philpax commented Jul 9, 2023

philpax commented Jul 9, 2023

steventrouble commented Jul 9, 2023 •

edited

Loading