Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix import issue when running huggingface_lowresource.sh #4

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wolvecap
Copy link
Contributor

No description provided.

Copy link
Member

@RenShuhuai-Andy RenShuhuai-Andy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the PR, here is some feedback:

  1. There is no need to change the way of importing packages in taa/archive.py etc. since I have moved the __main__ function in the taa/search.py outside (examples/reproduce_experiment.py).
  2. The __main__ function in taa/search_augment_train.py may also need to move to the examples fold, please check it.
  3. If a dataset in huggingface/datasets doesn't have the original val set, please split out 10% training samples for validation.
  4. Other specific suggestions on revision are commented after each file, please check it.

Thanks again.

Comment on lines +109 to +113
if C.get()['ir'] < 1 and C.get()['method'] != 'bt':
# rebalanced data
ir_index = np.where(labels == 0)
texts = np.append(texts, texts[ir_index].repeat(int(1 / C.get()['ir']) - 1))
labels = np.append(labels, labels[ir_index].repeat(int(1 / C.get()['ir']) - 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have fixed

Comment on lines -48 to +59
transform_train.transforms.insert(0, Augmentation(default_policy()))
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have fixed

[('tfidf_word_insert', 0.6572641245063084, 0.32120987775289295),
('random_word_swap', 0.4009335761117499, 0.3015697007069029)]]


def default_policy():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have deleted the default policy since it is not been used anymore

class_num = train_dataset.features['label'].num_classes
all_train_examples = get_examples(train_dataset, text_key)

train_examples, valid_examples = general_split(all_train_examples, test_size=test_size, train_size=1-test_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A judgment statement should be added: if the dataset originally has a validation set, there is no need to split the val set from the training set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants