Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Refactor madx_run_clm.py #27

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

haileyschoelkopf
Copy link
Collaborator

Current changes: just some unused / commented out code from madx_run_clm.py. There is more, but I was not certain why certain parts are commented out.

We'll need to refactor the script as well once we add new ft strategies.

I also wonder whether it would be helpful to turn language experiments into a single packaged script (train tokenizer + adapt model + possibly run eval?) So that it is easier to onboard and have the others run experiments.

@yongzx
Copy link
Collaborator

yongzx commented Jun 28, 2022

I also wonder whether it would be helpful to turn language experiments into a single packaged script (train tokenizer + adapt model + possibly run eval?) So that it is easier to onboard and have the others run experiments.

Might be in the future, but at least during these sprints, let's keep it separate.

@lintangsutawika
Copy link
Collaborator

We might also want to reconfigure the file structure?

My thoughts would be something like:

multilingual-modeling/
- lang-adapt/
    - README.md
    - scripts/
    - finetune/
        - *.py
    -  *.py
- evaluation/
    - eval_xnli/
    - eval_exp_sentence_retreival_eval/

@yongzx
Copy link
Collaborator

yongzx commented Jun 29, 2022

Yea the structure is a mess right now. There's too many duplication (e.g., on the eval side, we actually don't need eval_xnli) due to legacy codes before.

I am working on it right now.

@yongzx
Copy link
Collaborator

yongzx commented Jun 29, 2022

1fb6504

multilingual-modeling/
- lang-adapt/
    - README.md
    - scripts/
    -  *.py
- evaluation/
    - wikiann/  #scripts
    - xnli/  #scripts
    - eval.py
    - README.md
- exp_sentence_retreival_eval/

for now.

@lintangsutawika What do you have in mind in the finetune/ folder?

@haileyschoelkopf
Copy link
Collaborator Author

this makes sense to me, but I had problems downloading XNLI when there was a folder called "xnli" in the same path. Renaming to anything else (xnli_scripts, etc) fixes this problem.

@yongzx
Copy link
Collaborator

yongzx commented Jul 1, 2022

@haileyschoelkopf Fixed by a8486d4 (using scripts_*) instead.

@lintangsutawika
Copy link
Collaborator

@yongzx I'm not sure. I think parameter-efficient finetuning should be included in lang-adapt/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants