Parallelization on RTX 4060 Ti cards. #1789

AntonThai2022 · 2024-06-17T11:25:56Z

AntonThai2022
Jun 17, 2024

I have 4 RTX 4060 Ti video cards, they are connected to one PCI Express Bridge. It is known about them that they do not support NVIDIA Direct P2P technology. I need to run the TensorRT-LLM Engine built using the library on them. After I built this engine with the command:

trtllm-build --checkpoint_dir /workspace/TensorRT-LLM/quantized-llama-3-70b-pp1-tp4-awq-w4a16-kvint8-gs64 --output_dir ./quantized-llama-3-70b --gemm_plugin auto

And I'm trying to run it with the command

mpirun -n 4 --allow-run-as-root python3 ../run.py --max_output_len=40 --tokenizer_dir ./llama70b_hf/models--meta-llama--Meta-Llama-3-70B-Instruct/ snapshots/7129260dd854a80eb10ace5f61c20324b472b31c/ --engine_dir quantized-llama-3-70b --input_text "In Bash, how do I list all text files?"

I use a ready-made checkpoint.

When I run this engine for execution I get an error

[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.

Traceback (most recent call last):
File "/workspace/TensorRT-LLM/TensorRT-LLM/examples/llama/../run.py", line 632, in
main(args)
File "/workspace/TensorRT-LLM/TensorRT-LLM/examples/llama/../run.py", line 478, in main
runner = runner_cls.from_dir(**runner_kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 222, in from_dir
executor = trtllm.Executor(engine_dir, trtllm.ModelType.DECODER_ONLY,
RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in error: peer access is not supported between these two devices

It is obvious that the cards cannot communicate directly via the PCI Express bus.

How can I change the settings for building the engine or launching it, so that the cards interact through RAM.
Or maybe I need to rework the language model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization on RTX 4060 Ti cards. #1789

{{title}}

Replies: 0 comments

Select a reply

Parallelization on RTX 4060 Ti cards. #1789

AntonThai2022 Jun 17, 2024

Replies: 0 comments

AntonThai2022
Jun 17, 2024