Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A syntax error in __call_model #19

Open
Clayfigure opened this issue Sep 27, 2023 · 3 comments
Open

A syntax error in __call_model #19

Clayfigure opened this issue Sep 27, 2023 · 3 comments

Comments

@Clayfigure
Copy link

In the __call_model funtion in lamorel/caller.py, you set object_gather_list=None. However, it is not allowed in torch.dist.
image

@ClementRomac
Copy link
Collaborator

Hi,

Did you get an error? What is the version of Pytorch you are using?

@Clayfigure
Copy link
Author

i am using 1.9.0+cu111,and error message says: argument "gather_list" must be specified on destination rank.
Also, I am confused that what is the point to gather all information in None in the llm master process?

@ClementRomac
Copy link
Collaborator

As per Pytorch 1.9.0's documentation (https://pytorch.org/docs/1.9.0/distributed.html), the torch.distributed.gather_object method still takes an object_gather_list argument. So I don't get why you have this error.

Concerning the None, the object_gather_list argument is used to specify the variable on which to gather (on the destination process) all obj passed by the other processes. So each process sending an obj has no need to specify an object_gather_list. On the contrary, the destination process (here self._llm_master_process) does not specify any obj but does give an object_gather_list (as it is receiving objects but not sending one). You can find the destination process' code here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants