Parallelization on RTX 4060 Ti cards. #1789
Unanswered
AntonThai2022
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have 4 RTX 4060 Ti video cards, they are connected to one PCI Express Bridge. It is known about them that they do not support NVIDIA Direct P2P technology. I need to run the TensorRT-LLM Engine built using the library on them. After I built this engine with the command:
trtllm-build --checkpoint_dir /workspace/TensorRT-LLM/quantized-llama-3-70b-pp1-tp4-awq-w4a16-kvint8-gs64 --output_dir ./quantized-llama-3-70b --gemm_plugin auto
And I'm trying to run it with the command
mpirun -n 4 --allow-run-as-root python3 ../run.py --max_output_len=40 --tokenizer_dir ./llama70b_hf/models--meta-llama--Meta-Llama-3-70B-Instruct/ snapshots/7129260dd854a80eb10ace5f61c20324b472b31c/ --engine_dir quantized-llama-3-70b --input_text "In Bash, how do I list all text files?"
I use a ready-made checkpoint.
When I run this engine for execution I get an error
[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.
Traceback (most recent call last):
File "/workspace/TensorRT-LLM/TensorRT-LLM/examples/llama/../run.py", line 632, in
main(args)
File "/workspace/TensorRT-LLM/TensorRT-LLM/examples/llama/../run.py", line 478, in main
runner = runner_cls.from_dir(**runner_kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 222, in from_dir
executor = trtllm.Executor(engine_dir, trtllm.ModelType.DECODER_ONLY,
RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices
It is obvious that the cards cannot communicate directly via the PCI Express bus.
How can I change the settings for building the engine or launching it, so that the cards interact through RAM.
Or maybe I need to rework the language model.
Beta Was this translation helpful? Give feedback.
All reactions