-
Notifications
You must be signed in to change notification settings - Fork 967
Issues: huggingface/accelerate
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
OOM error when training llama 7B model using Accelerate FSDP setting
#3239
opened Nov 14, 2024 by
JlPang863
2 of 4 tasks
slurmstepd: error: execve(): accelerate: No such file or directory
#3237
opened Nov 13, 2024 by
huiyang865
2 of 4 tasks
Code Logical Bug: Using Init Handler Kwargs for Grad Scaler In FP8 Training (accelerate/accelerator.py)
#3233
opened Nov 11, 2024 by
immortalCO
1 of 4 tasks
fsdp checkpoint saving leads to NCCL WARN Cuda failure 2 'out of memory'
#3232
opened Nov 10, 2024 by
edchengg
2 of 4 tasks
Error while fine tuning with peft, lora, accelerate, SFTConfig and SFTTrainer
#3230
opened Nov 8, 2024 by
Isdriai
2 of 4 tasks
torch.cuda.is_available() false when running multi-gpu inference with accelerate launch
#3225
opened Nov 6, 2024 by
paulgekeler
2 of 4 tasks
"mat2 must be a matrix" error when finetuning Dreambooth flux with FSDP
#3224
opened Nov 5, 2024 by
weixiong-ur
2 of 4 tasks
Incorrect type in output of
utils.pad_across_processes
when input is torch.bool
#3218
opened Nov 4, 2024 by
mariusarvinte
2 of 4 tasks
PyPI published Accelerate==1.1.0 is missing Source Distributions
#3216
opened Nov 4, 2024 by
helloworld1
4 tasks
ConnectionError: Tried to launch distributed communication on port
29401
, but another process is utilizing it. Please specify a different port (such as using the --main_process_port
flag or specifying a different main_process_port
in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to 0
.
#3214
opened Nov 4, 2024 by
qinchangchang
1 of 4 tasks
How could I convert ZeRO-0 deepspeed weights into fp32 model checkpoint?
#3210
opened Nov 1, 2024 by
liming-ai
The optimizer is not receiving the FSDP model parameters.
#3209
opened Nov 1, 2024 by
eljandoubi
2 of 4 tasks
Command line arguments related to deepspeed for
accelerate launch
do not override those of default_config.yaml
#3203
opened Oct 29, 2024 by
JdbermeoUZH
2 of 4 tasks
Unable to access model gradients with DeepSpeed and Accelerate
#3184
opened Oct 22, 2024 by
shouyezhe
2 of 4 tasks
accelerator.prepare() get OOM,but available in single GPU
#3182
opened Oct 21, 2024 by
lqf0624
2 of 4 tasks
Previous Next
ProTip!
no:milestone will show everything without a milestone.