-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for custom files for run_lora_clm.py #1039
Conversation
9e96f2b
to
4aca51d
Compare
4aca51d
to
4871d58
Compare
Converting to draft to resolve issues from merge |
@regisss , I am trying to add a new test for the changes I made : |
What I usually do to remove a very specific test, like optimum-habana/tests/test_examples.py Line 223 in c495f47
Regarding the test, if it's a functional one, feel free to write a new test script. |
47dad76
to
e75335f
Compare
@regisss , updated with new test ~3min run time. |
@regisss , please take a look |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
tests/test_custom_file_input.py
Outdated
("bigcode/starcoder", ["--do_train", f"--train_file {PATH_TO_RESOURCES}/custom_dataset.jsonl", "--validation_split_percentage 10"]), | ||
("bigcode/starcoder", ["--do_train", f"--train_file {PATH_TO_RESOURCES}/custom_dataset.txt", "--validation_split_percentage 10"]), | ||
("bigcode/starcoder", ["--do_train", f"--train_file {PATH_TO_RESOURCES}/custom_dataset.jsonl", "--do_eval", f"--validation_file {PATH_TO_RESOURCES}/custom_dataset.jsonl", "--validation_split_percentage 20"]), | ||
("bigcode/starcoder", ["--do_train", f"--train_file {PATH_TO_RESOURCES}/custom_dataset.txt", "--do_eval", f"--validation_file {PATH_TO_RESOURCES}/custom_dataset.txt", "--validation_split_percentage 20"]), | ||
("bigcode/starcoder", ["--do_train", "--dataset_name timdettmers/openassistant-guanaco", "--do_eval", f"--validation_file {PATH_TO_RESOURCES}/custom_dataset.jsonl", "--validation_split_percentage 20"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this only works on Gaudi2 given the size of StarCoder, right?
e5d200a
to
3ff2174
Compare
What does this PR do?
Fixes
Dataset 'jsonl' doesn't exist on the Hub or cannot be accessed.
AttributeError: 'DataArguments' object has no attribute 'keep_linebreaks'
ValueError: Instruction "train[:0%]" corresponds to no data!
KeyError: 'instruction'
FAILED tests/test_examples.py::CausalLanguageModelingLORAExampleTester::test_run_lora_clm_falcon-40b_single_card - KeyError: 'databricks/databricks-dolly-15k'
is covered by another PR: https://github.com/huggingface/optimum-habana/pull/1139/filesAdditions
Assumptions
Sample command
Contents of sample
custom_dataset.jsonl
Datasets tested with
timdettmers/openassistant-guanaco
tatsu-lab/alpaca/
flytech/python-codes-25k
databricks/databricks-dolly-15k
Fixes # (issue)
Before submitting