Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use msccl-tools' xml #37

Open
Eevan-zq opened this issue Sep 9, 2024 · 2 comments
Open

Cannot use msccl-tools' xml #37

Eevan-zq opened this issue Sep 9, 2024 · 2 comments

Comments

@Eevan-zq
Copy link

Eevan-zq commented Sep 9, 2024

Why wasn't the method I generated using msccl-tools from the XML invoked when I executed the command :

mpirun --allow-run-as-root -np 8 -x LD_LIBRARY_PATH=/home/msccl-tool/msccl/executor/msccl-executor-nccl/build/lib/:$LD_LIBRARY_PATH -x NCCL_DEBUG=INFO -x NCCL_DEBUG_SUBSYS=ALL /home/msccl-tool/msccl/tests/msccl-tests-nccl/build/all_reduce_perf -b 1 -e 32MB -f 2 -g 1 -n 100 -w 20 -z 0

and I check the code here:
image
I find status.algoMetas.size() = 0 and then I trace here:
75c9bf4de73d3e8fdfc16da7fc5e71d

I find all .xml files that generated by msccl-tools don't containts minBytes, is this the reason why the algorithm included in the XML wasn't scheduled when I executed the mpirun command? If so, what should I do?

@jiangxiaobin96
Copy link

New msccl-tool fix this error.

@Eevan-zq
Copy link
Author

by the way,
1: when I run this command:
image

the xml header is
image
Why are minBytes and maxBytes equal to 0? Will it have any impact?

2: And the following will appear at the end of this XML file:
image
This may be due to an error in the final Check validation in allreduce_a100_pcie_hierarchical.py:
image

I am currently unsure if the XML file generated by running python ./allreduce_a100_pcie_hierarchical.py --protocol=LL 8 1 > test.xml is correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants