You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=120000) ran for 120010 milliseconds before timing out.
This usually happens when during loading takes a very long (e.g. when loading from very slow storage). Initially I thought that may be caused by EETQ quantization, but that seems pretty fast on all 70B's matrices (even survives a 10 second NCCL timeout on an L4).
Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=120000) ran for 120010 milliseconds before timing out.
This usually happens when during loading takes a very long (e.g. when loading from very slow storage). Initially I thought that may be caused by EETQ quantization, but that seems pretty fast on all 70B's matrices (even survives a 10 second NCCL timeout on an L4).
Thanks for the help. I was able fix this by reinstalling and updating CUDA. Now it works well.
System Info
text-generation-inference docker: sha-5e0fb46 (latest)
OS: Ubuntu 22.04
Model: meta-llama/Llama-3.1-70B-Instruct
GPU Used: 4
nvidia-smi
:Information
Tasks
Reproduction
Expected behavior
Expecting TGI to be able to run distributed inference over 4xA10 GPUs.
The text was updated successfully, but these errors were encountered: