Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run RAG model for tokenizing, fine-tuning, and predicting with jiant on branch IRT_experiments #1331

Open
pk1130 opened this issue Aug 2, 2021 · 2 comments

Comments

@pk1130
Copy link

pk1130 commented Aug 2, 2021

Describe the bug

Hey @sleepinyourhat @zphang @jeswan @HaokunLiu! I noticed that you guys worked on adding new models to the JiantTransformersModel so tagged you here :) I was trying to run a RAG model for fine-tuning on the MrQA-NQ dataset using jiant but this does not seem to be supported. It throws a KeyError: rag when I run the following command:

source jiant/irt_scripts/run_train_task.sh
run_train_task facebook/rag-token-base mrqa_natural_questions 1

where run_train_task.sh is a shell script that I wrote to run exactly the same commands as run_train_task.sbatch without using sbatch.

To Reproduce

  1. Tell use which version of jiant you're using - I've git cloned the repo as it is, but since I'm running an IRT experiment, I'm using the irt_scripts/ directory in tandem with the jiant/ directory on the branch IRT_experiments
  2. Describe the environment where you're using jiant, e.g, "2 P40 GPUs" - I'm using jiant in Google Colab along with Google Drive.

Expected behavior
I expected the RAG model to start finetuning and generate the cache files in the experiments/cache/ directory as outlined in the README.

Screenshots
image

Additional context
On investigating further, I realized that IRT_experiments was still using transformers==3.1.0 which does not support rag architectures. I uninstalled that version of transformers and tried upgrading to transformers>=3.5.0 to see if that would fix the issue. But that resulted in a new issue saying ModuleNotFoundError: No module named 'transformers.tokenization_bert'. Looks like transformers refactored their code in later versions while incorporating different models. If the IRT_experiments branch was up to date with the changes on the master branch, would that fix things? Because I noticed that jiant on master was using transformers==4.5.0. Is there any other way that I can use a RAG model along with the scripts in irt_scripts/ for my IRT research? Specifically, I need to use the fine-tuning, predicting and post-processing scripts which are available in the IRT_experiments branch in the irt_scripts/ directory. Please respond at your earliest convenience! Thanks!

@pk1130
Copy link
Author

pk1130 commented Aug 2, 2021

Even further investigation has led me to understand that in order to carry out my research with the existing irt_scripts/ framework and the latest version of jiant, I would have to port that directory to the master branch on my local machine and add the RAG model to the ModelArchitectures and TOKENIZER_DICT in jiant/proj/main/modeling/primary.py as outlined here:
https://github.com/nyu-mll/jiant/blob/51e9be2a8ed8589e884ea927e348df8342c40fcf/guides/models/adding_models.md

I'm having trouble understanding how to implement the normalize_tokenizations(), get_mlm_weights_dict(), and get_feat_spec() functions in the subclass created for the RAG model. Any suggestions or advice on how to move forward @sleepinyourhat @zphang @jeswan @HaokunLiu? Thanks a lot!

@pk1130 pk1130 changed the title Unable to run RAG model for fine-tuning and predicting with jiant Unable to run RAG model for tokenizing, fine-tuning, and predicting with jiant on branch IRT_experiments Aug 2, 2021
@zphang
Copy link
Collaborator

zphang commented Aug 24, 2021

Hi, sorry for the delay in my response.

  • normalize_tokenizations has to do with aligning token spans between raw text and the model tokenizer's tokens. Depending on which tokenizer you're using, you might be able to piggyback off an existing implementation.
  • get_mlm_weights_dict gets the weights for the MLM head from the pretrained model. In contrast to standard NLU tasks, which use a new classifier head, an MLM-task ought to reuse the MLM head from pretraining. Conversely, if you are not using an MLM task, this should not impact you.
  • get_feat_spec is a somewhat older abstraction for describing different tokenizer setups, e.g. padding IDs. Like with normalize_tokenizations, you might be able to piggyback off an existing implementation if you are using a similar tokenizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants