-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add end to end llama test, including generating and running vmfb #224
Conversation
python/turbine_models/gen_external_params/gen_external_params.py
Outdated
Show resolved
Hide resolved
|
||
def test_export(quantization: Literal["int4", None], precision: Literal["f16", "f32"]): | ||
llama.export_transformer_model( | ||
hf_model_name="llSourcell/medllama2_7b", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this model is actually slightly different than the meta version. Might be better to use "Trelis/Llama-2-7b-chat-hf-function-calling-v2" or a secret hf token for the actual meta model we use
python/turbine_models/gen_external_params/gen_external_params.py
Outdated
Show resolved
Hide resolved
args.external_weight_file = "medllama2_7b_f16_int4.safetensors" | ||
args.run_vmfb = True | ||
args.device = "llvm-cpu" | ||
args.precision = precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these flags set from the pytest_generate_tests in conftest.py? If so this may be a very long test for a ci as each config will probably be at least 5 min
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup they are. But it only runs one combo unless the --all
flag is passed to pytest.
We can do regular pytest to run only one config for usual tests, and then pass the all flag for e.g. monthly releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Yeah for this CI I think we just want to run one config and eventually have a nightly CI that runs all or at least a larger subset of configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to get the CPU ci in ASAP, and then take some time later to make a more comprehensive CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @IanNod ! I'll make a push later today to:
- Switch Default Model: Set Trelis/Llama-2-7b-chat-hf-function-calling-v2 as the default model.
- Enhance Tests: Develop tests to check for both crashes and functional correctness.
- Update Dependencies: Add fire to requirements.txt if included in the project.
- Improve run_vmfb_comparison: Update it to automatically detect and report output discrepancies.
I'll also see if I can get away with fewer default flags.
args.external_weight_file = "medllama2_7b_f16_int4.safetensors" | ||
args.run_vmfb = True | ||
args.device = "llvm-cpu" | ||
args.precision = precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup they are. But it only runs one combo unless the --all
flag is passed to pytest.
We can do regular pytest to run only one config for usual tests, and then pass the all flag for e.g. monthly releases.
- Saves weights to .safetensors file - Load weights at runtime with a "stripped" .mlir
…y fail tests on comparison fail
…n argparse and function params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to rebase. Once you do we can see how long this takes. I'm concerned that comparing to torch is overkill
…y fail tests on comparison fail
…n argparse and function params
python/turbine_models/gen_external_params/gen_external_params.py
Outdated
Show resolved
Hide resolved
Requested changes addressed, just need a bigger CI (we currently have 86GB of mem requirement, the CI machine has 62, and a 128 GB ci machine would probably be pretty future-proof) |
Also: