-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix import issue when running huggingface_lowresource.sh #4
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for the PR, here is some feedback:
- There is no need to change the way of importing packages in taa/archive.py etc. since I have moved the
__main__
function in thetaa/search.py
outside (examples/reproduce_experiment.py
). - The
__main__
function intaa/search_augment_train.py
may also need to move to theexamples
fold, please check it. - If a dataset in
huggingface/datasets
doesn't have the original val set, please split out 10% training samples for validation. - Other specific suggestions on revision are commented after each file, please check it.
Thanks again.
if C.get()['ir'] < 1 and C.get()['method'] != 'bt': | ||
# rebalanced data | ||
ir_index = np.where(labels == 0) | ||
texts = np.append(texts, texts[ir_index].repeat(int(1 / C.get()['ir']) - 1)) | ||
labels = np.append(labels, labels[ir_index].repeat(int(1 / C.get()['ir']) - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have fixed
transform_train.transforms.insert(0, Augmentation(default_policy())) | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have fixed
[('tfidf_word_insert', 0.6572641245063084, 0.32120987775289295), | ||
('random_word_swap', 0.4009335761117499, 0.3015697007069029)]] | ||
|
||
|
||
def default_policy(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have deleted the default policy since it is not been used anymore
class_num = train_dataset.features['label'].num_classes | ||
all_train_examples = get_examples(train_dataset, text_key) | ||
|
||
train_examples, valid_examples = general_split(all_train_examples, test_size=test_size, train_size=1-test_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A judgment statement should be added: if the dataset originally has a validation set, there is no need to split the val set from the training set.
No description provided.