SIGBUS error while using multiprocessor training. #1595
Unanswered
shahzaibbaig123
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using mistral model and finetuning it. My server has 4 gpus. My training works fine when I use one or two gpus. But it gives me error when I try to increase the number of process in my training command. The command I am using is this:
accelerate launch --num_processes=3 -m axolotl.cli.train /data/axo_configs/mistral/extract/config.yml --deepspeed deepspeed_configs/zero1.json
Also, this is the command which i use for getting the image and make a podman container:
podman run -v /data:/data --device nvidia.com/gpu=all --security-opt=label=disable --rm -it axolotl
This is the error:
Beta Was this translation helpful? Give feedback.
All reactions