Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: test transport layer improvements #395

Closed
wants to merge 3 commits into from

Conversation

csegarragonz
Copy link
Collaborator

This PR combines:

we will merge them separately, but ideally we want to have both PRs above green, and then measure the benefits (hopefully we are on par with #382)

@csegarragonz csegarragonz changed the title Transport improvements wip: test transport layer improvements Mar 13, 2024
@lgarithm
Copy link
Contributor

lgarithm commented Mar 13, 2024

5b7667e

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0130s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0124s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0123s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0123s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0122s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0124s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0120s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0116s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0116s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0116s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.2637s, total workload: 1.144GiB, rate: 4.337GiB/s
bench_allreduce(np=4) took 0.2543s, total workload: 1.144GiB, rate: 4.497GiB/s
bench_allreduce(np=4) took 0.2547s, total workload: 1.144GiB, rate: 4.491GiB/s
bench_allreduce(np=4) took 0.2571s, total workload: 1.144GiB, rate: 4.448GiB/s
bench_allreduce(np=4) took 0.2604s, total workload: 1.144GiB, rate: 4.391GiB/s
bench_allreduce(np=4) took 0.2504s, total workload: 1.144GiB, rate: 4.567GiB/s
bench_allreduce(np=4) took 0.2560s, total workload: 1.144GiB, rate: 4.468GiB/s
bench_allreduce(np=4) took 0.2545s, total workload: 1.144GiB, rate: 4.494GiB/s
bench_allreduce(np=4) took 0.2544s, total workload: 1.144GiB, rate: 4.495GiB/s
bench_allreduce(np=4) took 0.2533s, total workload: 1.144GiB, rate: 4.515GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3831s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3537s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3587s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3512s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3410s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3503s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3591s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3488s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3470s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3470s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.0902s, total workload: 1.144GiB, rate: 1.049GiB/s
bench_allreduce(np=4) took 1.0794s, total workload: 1.144GiB, rate: 1.060GiB/s
bench_allreduce(np=4) took 0.9957s, total workload: 1.144GiB, rate: 1.149GiB/s
bench_allreduce(np=4) took 0.8494s, total workload: 1.144GiB, rate: 1.346GiB/s
bench_allreduce(np=4) took 0.7802s, total workload: 1.144GiB, rate: 1.466GiB/s
bench_allreduce(np=4) took 0.7733s, total workload: 1.144GiB, rate: 1.479GiB/s
bench_allreduce(np=4) took 0.7723s, total workload: 1.144GiB, rate: 1.481GiB/s
bench_allreduce(np=4) took 0.7585s, total workload: 1.144GiB, rate: 1.508GiB/s
bench_allreduce(np=4) took 0.7731s, total workload: 1.144GiB, rate: 1.479GiB/s
bench_allreduce(np=4) took 0.7653s, total workload: 1.144GiB, rate: 1.494GiB/s
END ======================================== bench_allreduce remote ========================================

ba0d691

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0124s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0120s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0120s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0119s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0119s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0115s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0114s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0114s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0114s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0115s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.2637s, total workload: 1.144GiB, rate: 4.336GiB/s
bench_allreduce(np=4) took 0.2531s, total workload: 1.144GiB, rate: 4.519GiB/s
bench_allreduce(np=4) took 0.2512s, total workload: 1.144GiB, rate: 4.553GiB/s
bench_allreduce(np=4) took 0.2541s, total workload: 1.144GiB, rate: 4.500GiB/s
bench_allreduce(np=4) took 0.2541s, total workload: 1.144GiB, rate: 4.501GiB/s
bench_allreduce(np=4) took 0.2536s, total workload: 1.144GiB, rate: 4.510GiB/s
bench_allreduce(np=4) took 0.2544s, total workload: 1.144GiB, rate: 4.495GiB/s
bench_allreduce(np=4) took 0.2535s, total workload: 1.144GiB, rate: 4.512GiB/s
bench_allreduce(np=4) took 0.2543s, total workload: 1.144GiB, rate: 4.497GiB/s
bench_allreduce(np=4) took 0.2528s, total workload: 1.144GiB, rate: 4.524GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3830s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3861s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3815s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3696s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3268s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3261s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3305s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3322s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3092s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3340s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.1843s, total workload: 1.144GiB, rate: 0.966GiB/s
bench_allreduce(np=4) took 1.0431s, total workload: 1.144GiB, rate: 1.096GiB/s
bench_allreduce(np=4) took 1.0408s, total workload: 1.144GiB, rate: 1.099GiB/s
bench_allreduce(np=4) took 0.9793s, total workload: 1.144GiB, rate: 1.168GiB/s
bench_allreduce(np=4) took 0.9878s, total workload: 1.144GiB, rate: 1.158GiB/s
bench_allreduce(np=4) took 1.0487s, total workload: 1.144GiB, rate: 1.091GiB/s
bench_allreduce(np=4) took 0.9917s, total workload: 1.144GiB, rate: 1.153GiB/s
bench_allreduce(np=4) took 1.0071s, total workload: 1.144GiB, rate: 1.136GiB/s
bench_allreduce(np=4) took 0.9798s, total workload: 1.144GiB, rate: 1.167GiB/s
bench_allreduce(np=4) took 1.0379s, total workload: 1.144GiB, rate: 1.102GiB/s
END ======================================== bench_allreduce remote ========================================

71d3f79

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0038s, total workload: 384000B, rate: 0.093GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.098GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0035s, total workload: 384000B, rate: 0.103GiB/s
bench_allreduce(np=4) took 0.0033s, total workload: 384000B, rate: 0.108GiB/s
bench_allreduce(np=4) took 0.0032s, total workload: 384000B, rate: 0.110GiB/s
bench_allreduce(np=4) took 0.2252s, total workload: 1.144GiB, rate: 5.079GiB/s
bench_allreduce(np=4) took 0.2040s, total workload: 1.144GiB, rate: 5.606GiB/s
bench_allreduce(np=4) took 0.2034s, total workload: 1.144GiB, rate: 5.622GiB/s
bench_allreduce(np=4) took 0.2023s, total workload: 1.144GiB, rate: 5.652GiB/s
bench_allreduce(np=4) took 0.2098s, total workload: 1.144GiB, rate: 5.450GiB/s
bench_allreduce(np=4) took 0.2046s, total workload: 1.144GiB, rate: 5.590GiB/s
bench_allreduce(np=4) took 0.2052s, total workload: 1.144GiB, rate: 5.573GiB/s
bench_allreduce(np=4) took 0.2044s, total workload: 1.144GiB, rate: 5.596GiB/s
bench_allreduce(np=4) took 0.2060s, total workload: 1.144GiB, rate: 5.551GiB/s
bench_allreduce(np=4) took 0.2059s, total workload: 1.144GiB, rate: 5.553GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3308s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3140s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3002s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3112s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3261s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3326s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3107s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3072s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.2969s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.2862s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.0909s, total workload: 1.144GiB, rate: 1.048GiB/s
bench_allreduce(np=4) took 0.9866s, total workload: 1.144GiB, rate: 1.159GiB/s
bench_allreduce(np=4) took 0.9789s, total workload: 1.144GiB, rate: 1.168GiB/s
bench_allreduce(np=4) took 0.9276s, total workload: 1.144GiB, rate: 1.233GiB/s
bench_allreduce(np=4) took 0.9301s, total workload: 1.144GiB, rate: 1.230GiB/s
bench_allreduce(np=4) took 0.9553s, total workload: 1.144GiB, rate: 1.197GiB/s
bench_allreduce(np=4) took 0.9490s, total workload: 1.144GiB, rate: 1.205GiB/s
bench_allreduce(np=4) took 0.9726s, total workload: 1.144GiB, rate: 1.176GiB/s
bench_allreduce(np=4) took 0.8959s, total workload: 1.144GiB, rate: 1.277GiB/s
bench_allreduce(np=4) took 0.9749s, total workload: 1.144GiB, rate: 1.173GiB/s
END ======================================== bench_allreduce remote ========================================

lgarithm#2 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants