Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

fix #338 - use wte if no lm_head for gpt2 #343

Closed
wants to merge 1 commit into from

Conversation

philpax
Copy link
Collaborator

@philpax philpax commented Jul 2, 2023

I think this should fix #338, but I don't have a model without a LM head to test.

@steventrouble do you have a sample model to test with?

@LLukas22
Copy link
Contributor

LLukas22 commented Jul 2, 2023

See ggerganov's original gpt-2 models.

@philpax
Copy link
Collaborator Author

philpax commented Jul 2, 2023

Hm, ok, tested with that - turns out it segfaults due to an out-of-bounds memory access. Will investigate some other time...

@steventrouble
Copy link
Contributor

steventrouble commented Jul 9, 2023

I found the node that's causing the segfault. It's the 1d view of memory_k here being called with an out-of-bounds offset.

Here's an assertion that may help you repro.

The issue seems related to the similarity between the names hyperparameters.n_ctx and context_size. Updating start_session to use self.context_size instead of self.hyperparameters.n_ctx fixes the issue for me, but I don't know enough about ggml to be sure that's the root cause. Example commit

@LLukas22
Copy link
Contributor

LLukas22 commented Jul 9, 2023

I'll try to take a look at this later but good job finding this, saves me a lot of time 👍

@philpax
Copy link
Collaborator Author

philpax commented Jul 9, 2023

I found the node that's causing the segfault. It's the 1d view of memory_k here being called with an out-of-bounds offset.

Here's an assertion that may help you repro.

The issue seems related to the similarity between the names hyperparameters.n_ctx and context_size. Updating start_session to use self.context_size instead of self.hyperparameters.n_ctx fixes the issue for me, but I don't know enough about ggml to be sure that's the root cause. Example commit

Well spotted! Feel free to open a PR with that and I'll close this. What you've (correctly) identified there is that that the context length that the model supports (hyperparameters n_ctx) is not the same as the context length that's actually in use. Oops.

@philpax
Copy link
Collaborator Author

philpax commented Jul 9, 2023

Obsoleted by #362

@philpax philpax closed this Jul 9, 2023
@philpax philpax deleted the gpt2-optional-lm-head branch July 16, 2023 19:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPT-2 doesn't always have an lm_head
3 participants