Code Reproduction Issues #26

andreapisa9 · 2024-06-28T12:49:16Z

trying to reproduce your results for coursework.

I find there are a number of issues in running the code. Here is a list of what I found so far.

Importing

Several files within the experiments folder, e.g., experiments/agents/ppo/base_ppo_agent.py, call imports such as, from experiments.agents.etc import etc. These imports fail as the files themselves are already inside the experiments folder, as per my tests. Simply eliminating the experiments name at the beginning of the import call solves these issues.

Class Naming

Some classes fail to import as, I believe, have changed names over time. This is the case for, e.g., ValueModuleFn (imported in post-training_tests.py) which I assume has changed name to ValueHeadModuleFn.

Training Using Lamorel (on one GPU)

After fixing these minor issues, I managed to launch a training with the default single-GPU config, using the Flan-T5-small LLM and using Lamorel in two separate server-client instances as described in #11 -- obtaining weird results (reward flat to 0.0 for all episodes). I attach my logs
std.txt and final results log.csv for clarity.

Testing Using Lamorel (on one GPU)

I could not manage to run the post-training_tests.py script in the same way as the training script, as I find several issues with it. Using once again the standard Lamorel config for one GPU, two server-client instances and the previously trained model, errors occur when initializing the LLMPPOAgent named algo at line 390. Once again, I believe there was some change in the initialization process as there are several parameters that require specification and are missing, such as num_frames_per_proc (default is None, but if kept at default an error occurs when initializing self.obss in the BasePPOAgent superclass initialization). Required logging information (saving_path_logs, saving_path_model, id_expe) is not passed as well. Also, some optional parameters (reshape_reward, subgoals) are not explicitly named (ex.: LLMPPOAgent(..., reshape_reward=reshape_reward)), thus their values are stored in different variables when the object is initialized. Again, these were easy fixes. However, I later find myself stuck at an error occurring at line 310, which is reported below:

/Absolute/Path/To/Grounding_LLMs_with_online_RL/lamorel/lamorel/src/lamorel_launcher/launch.py:15: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path='', config_name='')
/Absolute/Path/To/miniconda3/envs/dlp/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[DATE TIME][root][INFO] - Using nproc_per_node=1.
/Absolute/Path/To/Grounding_LLMs_with_online_RL/experiments/train_language_agent.py:373: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path='config', config_name='config')
/Absolute/Path/To/Grounding_LLMs_with_online_RL/experiments/post-training_tests.py:281: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path='config', config_name='config')
/Absolute/Path/To/miniconda3/envs/dlp/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
[DATE TIME][lamorel_logger][INFO] - Init rl group for process 1
[DATE TIME][lamorel_logger][INFO] - Init llm group for process 1
[DATE TIME][lamorel_logger][INFO] - Init rl-llm group for process 1
[DATE TIME][lamorel_logger][INFO] - Using CPU on process 1 (index 0)
Parallelising HF LLM on 1 devices
Loading model t5-small
/Absolute/Path/To/miniconda3/envs/dlp/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Error executing job with overrides: ['rl_script_args.path=/Absolute/Path/To/Grounding_LLMs_with_online_RL/experiments/post-training_tests.py', 'lamorel_args.accelerate_args.machine_rank=1']
Traceback (most recent call last):
  File "/Absolute/Path/To/Grounding_LLMs_with_online_RL/experiments/post-training_tests.py", line 310, in main
    lm_server = Caller(config_args.lamorel_args, custom_updater=LoadSpecificWeightsUpdater,
  File "/Absolute/Path/To/Grounding_LLMs_with_online_RL/lamorel/lamorel/src/lamorel/caller.py", line 53, in __init__
    Server(
  File "/Absolute/Path/To/Grounding_LLMs_with_online_RL/lamorel/lamorel/src/lamorel/server/server.py", line 60, in __init__
    self._updater.set_llm_module(
TypeError: BaseUpdater.set_llm_module() missing 1 required positional argument: 'llm_module'

As running using the Lamorel Launcher is advised, I find it difficult to solve this error with my usual procedure involving a debugger.

If you were able to reproduce the issue and help me solve it, I would be very thankful. I find your paper very insightful.

Thank you for your time!

The text was updated successfully, but these errors were encountered:

ClementRomac · 2024-08-05T16:04:43Z

Hi,

Thanks for reaching out and spotting all these issues!
I opened a PR.

Importing

It should be fixed now.

Class Naming

It should also be fixed.

Training Using Lamorel (on one GPU)

I fixed lamorel's initialization in train_language_agent.py so that the client and server could be launched together (see here).
Regarding your results, you seem to have performed 1k steps only. I am not surprised the agent has not solved a single episode, given our results in the paper (Figure 14), where Flan-T5 small reached a 0.2 success rate after approximately 10k steps. Secondly, we used 32 environments in our experiments (leading to much more diversity in batches).

Testing Using Lamorel (on one GPU)

It should also be fixed.

I hope this helps :)

#26

ClementRomac added a commit that referenced this issue Aug 23, 2024

Merge pull request #29 from flowersteam/#26

4a59b47

#26

andreapisa9 closed this as completed Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Reproduction Issues #26

Code Reproduction Issues #26

andreapisa9 commented Jun 28, 2024

ClementRomac commented Aug 5, 2024

Code Reproduction Issues #26

Code Reproduction Issues #26

Comments

andreapisa9 commented Jun 28, 2024

Importing

Class Naming

Training Using Lamorel (on one GPU)

Testing Using Lamorel (on one GPU)

ClementRomac commented Aug 5, 2024

Importing

Class Naming

Training Using Lamorel (on one GPU)

Testing Using Lamorel (on one GPU)