Skip to content

Commit

Permalink
resolve KeyError: 'PDSH_SSH_ARGS_APPEND' (#5318)
Browse files Browse the repository at this point in the history
when start job with `deepspeed --hostfile hostfile --master_addr
$MASTER_IP --ssh_port 20023 src/train_bash.py `

get error: KeyError: 'PDSH_SSH_ARGS_APPEND' in
https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/launcher/multinode_runner.py#L77

because PDSH_SSH_ARGS_APPEND not in environment.

---------

Co-authored-by: Logan Adams <[email protected]>
  • Loading branch information
Lzhang-hub and loadams authored Apr 1, 2024
1 parent b5e2045 commit cc897ec
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion deepspeed/launcher/multinode_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ def name(self):
def get_cmd(self, environment, active_resources):
environment['PDSH_RCMD_TYPE'] = 'ssh'
if self.args.ssh_port is not None: # only specify ssh port if it is specified
environment["PDSH_SSH_ARGS_APPEND"] += f" -p {self.args.ssh_port}"
environment["PDSH_SSH_ARGS_APPEND"] = f"{environment.get('PDSH_SSH_ARGS_APPEND', '')} \
-p {self.args.ssh_port}"

active_workers = ",".join(active_resources.keys())
logger.info("Running on the following workers: %s" % active_workers)
Expand Down

0 comments on commit cc897ec

Please sign in to comment.