-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model Export to Hugging Face format and optionally upload #571
Conversation
Seems to work not so bad for me you can see the model here https://huggingface.co/eliebak/dummy-model2 and here is the generation test: |
That's great to hear @eliebak. I've since fixed and improved the script to allow selecting the output dtype to be either float16 or float32, irrespective of the model.bin format, with the I've also run local evals using Eleuther AI Harness as described on the open_llm_leaderboard and comparing with those published for openai-community/gpt2 to get a better picture of the performance of the edu-fineweb-10B model. |
btw here is related code from @matthewdouglas https://gist.github.com/matthewdouglas/1c0833f7fa9adbc54e4f5dc09e2b59a2 I'll want to merge one of these two to master |
@rhys101 can you share your eleuther eval harness command? i was a bit surprised that their docs are very sparse on the actual evals one should be running |
I think both are pretty similar - in fact I ran the float16 evals on its conversion and got the same output scores. One suggestion for the gist code would be to make the @matthewdouglas please use whatever bits from this PR that are useful! |
I followed the guide on the Open-LLM-Leaderboard for the version of EleutherAI they use and how they configured each test. Here are the two scripts I use locally to run the evals: Python script
|
Ty @rhys101 ! I'm organizing everything together right now and running it. |
@rhys101 is there a reason i'm missing for why |
…on bfloat16 before using in earnest.
For e..g HellaSwag I get warning like: Token indices sequence length is longer than the specified maximum sequence length for this model (1091 > 1024). Running this sequence through the model will result in indexing errors I'm not sure if the code handles this correctly, i.e. cropping the # of shots as needed to fit everything in. |
I am digging into why the Eleuther eval is SO slow. In llm.c it takes me like 1-2 seconds to evaluate HellaSwag, but here it is taking many long minutes, even messing with the batch size (which in your code defaults to 1). |
The principle I was trying to follow was to keep float32 accuracy throughout, then convert if needed to float16 / bfloat16. I've added the bfloat16 option in latest push, but testing on the 774M model is generating very low evals compared to float32 and float16, so it needs some thought / checking to see why. Here's the evals on float32 of the 774 model:
For float16:
Then bfloat16:
This is on both this PR code and also the gist code (you can just edit config.json to set the torch_dtype for inference - the evals pick this up). Need to check a bit more on what's happening with bfloat16 here. For reference, the openai-community/gpt2-large average score is 32.07 |
@rhys101 how long does it take for you to run e.g. only HellaSwag? I'm used to this taking only a few seconds in llm.c |
Yes, it's unreal - it can take quite a while to fully run the evals, the mmlu set especially. I've tried to set the test order in the shell script so that the quickest are run first, giving an early guidance on which direction the evals are going. |
I see in |
I've Just fired off an 124M eval on a local 4090 to measure: | 0 NVIDIA GeForce RTX 4090 Off | 00000000:26:00.0 Off | Off | (the GPU drops to 0% in-between each test as the CPU does the data prep. Often for a long time..) Timings are: (min:sec)
(^ hellaswag still running, that's the indicative time at start) |
It's reasonable to add The lm-eval-harness should be using the GPU with There should also be an "auto" option for the batch size. |
merged |
This continues the work on exporting llm.c models to Hugging Face formats (Issue 502).
It's a standalone export script that will convert a GPT2 llm.c binary model file to a local HF model directory. It copies over a standard GPT2 tokenizer into the HF model as well.
python export_hf.py --input input_file.bin --output model_name
It can also optionally upload the model to Hugging Face under the current logged-in user account.
(use
huggingface-cli login
if needed before running the script).python export_hf.py --input input_file.bin --output model_name --push true
I've tested on a 124M example export which gave some semi-coherent output - it improves quite a bit with a repetition_penalty set to 1.3.
There may well be mis-configurations in the exported model config.json or issues with the conversion script which may become apparent with further review and testing.