Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add end to end llama test, including generating and running vmfb #224

Merged
merged 29 commits into from
Dec 11, 2023

Conversation

renxida
Copy link
Contributor

@renxida renxida commented Dec 7, 2023

Also:

  • fixes some errors caused by IREE version bump
  • make gen_external_params use google fire to eliminate the argparse sprawl and make the python api & cli api consistent.

@renxida renxida requested a review from IanNod December 7, 2023 01:16
python/turbine_models/custom_models/stateless_llama.py Outdated Show resolved Hide resolved
python/turbine_models/tests/conftest.py Outdated Show resolved Hide resolved

def test_export(quantization: Literal["int4", None], precision: Literal["f16", "f32"]):
llama.export_transformer_model(
hf_model_name="llSourcell/medllama2_7b",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this model is actually slightly different than the meta version. Might be better to use "Trelis/Llama-2-7b-chat-hf-function-calling-v2" or a secret hf token for the actual meta model we use

python/turbine_models/tests/stateless_llama_test.py Outdated Show resolved Hide resolved
args.external_weight_file = "medllama2_7b_f16_int4.safetensors"
args.run_vmfb = True
args.device = "llvm-cpu"
args.precision = precision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these flags set from the pytest_generate_tests in conftest.py? If so this may be a very long test for a ci as each config will probably be at least 5 min

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup they are. But it only runs one combo unless the --all flag is passed to pytest.

We can do regular pytest to run only one config for usual tests, and then pass the all flag for e.g. monthly releases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Yeah for this CI I think we just want to run one config and eventually have a nightly CI that runs all or at least a larger subset of configs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to get the CPU ci in ASAP, and then take some time later to make a more comprehensive CI.

Copy link
Contributor Author

@renxida renxida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @IanNod ! I'll make a push later today to:

  • Switch Default Model: Set Trelis/Llama-2-7b-chat-hf-function-calling-v2 as the default model.
  • Enhance Tests: Develop tests to check for both crashes and functional correctness.
  • Update Dependencies: Add fire to requirements.txt if included in the project.
  • Improve run_vmfb_comparison: Update it to automatically detect and report output discrepancies.

I'll also see if I can get away with fewer default flags.

args.external_weight_file = "medllama2_7b_f16_int4.safetensors"
args.run_vmfb = True
args.device = "llvm-cpu"
args.precision = precision
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup they are. But it only runs one combo unless the --all flag is passed to pytest.

We can do regular pytest to run only one config for usual tests, and then pass the all flag for e.g. monthly releases.

python/turbine_models/tests/stateless_llama_test.py Outdated Show resolved Hide resolved
Copy link
Member

@dan-garvey dan-garvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to rebase. Once you do we can see how long this takes. I'm concerned that comparing to torch is overkill

.github/workflows/test_models.yml Show resolved Hide resolved
python/turbine_models/tests/conftest.py Outdated Show resolved Hide resolved
python/turbine_models/tests/vmfb_comparison.py Outdated Show resolved Hide resolved
python/turbine_models/tests/vmfb_comparison.py Outdated Show resolved Hide resolved
python/turbine_models/tests/vmfb_comparison.py Outdated Show resolved Hide resolved
@renxida
Copy link
Contributor Author

renxida commented Dec 11, 2023

Requested changes addressed, just need a bigger CI (we currently have 86GB of mem requirement, the CI machine has 62, and a 128 GB ci machine would probably be pretty future-proof)

@dan-garvey dan-garvey merged commit 4a9976a into nod-ai:main Dec 11, 2023
3 checks passed
@renxida renxida deleted the end-to-end-test branch December 15, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants