Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout for Gloo adjustable by env vars #43

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

HRezaei
Copy link
Contributor

@HRezaei HRezaei commented Sep 1, 2024

This PR adds more flexibility by allowing the developer to configure timeout for Gloo using a new env variable "GLOO_TIMEOUT".

The default value is 1800 seconds and thus we have to wait 30 minutes to get error if other required processes are not running. In experimental deployments we'd rather to set this to a smaller value to avoid 30 minutes waiting and see the timeout error sooner.

On the other hand, we need to set this to a larger value for debugging, otherwise the execution is halted and the timeout error occurs if the line by line debugging takes more than 30 minutes.

@ClementRomac
Copy link
Collaborator

Hi, thanks for your PR!
You are right, this 30 minutes timeout can be painful.
Would you mind adding the timeout variable as a parameter of lamorel's config instead of an environment variable? It would allow users to set this as any other parameters in lamorel (e.g. number of LLMs).

@HRezaei
Copy link
Contributor Author

HRezaei commented Sep 20, 2024

You're welcome!

Would you mind adding the timeout variable as a parameter of lamorel's config instead of an environment variable?

Yes, sure, is it OK to be directly under the lamorel_args? I mean something like this:

lamorel_args:
  log_level: info
  gloo_timeout: 1800
...

The other options are to nest it under either lamorel_args.accelerate_args or lamorel_args.distributed_setup_args even though I'm not sure if it fits there.

@ClementRomac
Copy link
Collaborator

Hey, sorry for the late answer.
Putting it directly under lamorel_args seems good to me. I will add it and merge your PR.

Copy link
Collaborator

@ClementRomac ClementRomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for the contribution!

@ClementRomac ClementRomac merged commit 4318812 into flowersteam:main Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants