Skip to content

~NGC release testing #52

~NGC release testing

~NGC release testing #52

Manually triggered April 18, 2024 22:34
Status Failure
Total duration 6h 1m 27s
Artifacts 4

ngc-release-testing.yaml

on: workflow_dispatch
Matrix: test-jax / run-unit-test
Matrix: test-rosetta-pax / rosetta-pax-multi-node-te
Matrix: test-rosetta-pax / rosetta-pax-multi-node
Matrix: test-rosetta-pax / rosetta-pax-single-node-dropout-te
Matrix: test-rosetta-pax / single-process-evaluation-te
Matrix: test-rosetta-pax / single-process-multi-device-te
test-jax  /  ...  /  launch-slurm-runner
2h 27m
test-jax / runner / launch-slurm-runner
test-rosetta-pax  /  summary
0s
test-rosetta-pax / summary
test-rosetta-pax  /  metrics
0s
test-rosetta-pax / metrics
test-rosetta-pax  /  ...  /  sitrep
5s
test-rosetta-pax / sitrep / sitrep
test-rosetta-pax  /  outcome
0s
test-rosetta-pax / outcome
finalize  /  workflow-badge
3s
finalize / workflow-badge
finalize  /  report
5s
finalize / report
finalize  /  upload-badge
4s
finalize / upload-badge
finalize  /  publish-badge
4s
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

36 errors
test-jax / jax-V100-unit-test
Process completed with exit code 1.
test-jax / jax-A100-unit-test
Process completed with exit code 1.
test-rosetta-pax / rosetta-pax-multi-node (1, 4, 1, 2)
The job running on runner GitHub Actions 423 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-multi-node (1, 4, 1, 2)
The operation was canceled.
test-rosetta-pax / rosetta-pax-multi-node-te (4DP1FSDP2TP1PP_TE, 1, 4, 1, 2, 4)
The job running on runner GitHub Actions 85 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-multi-node (4, 2, 1, 1)
The job running on runner GitHub Actions 270 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-multi-node (4, 2, 1, 1)
The operation was canceled.
test-rosetta-pax / single-process-multi-device-te (1, 1, 2, 4)
The job running on runner GitHub Actions 105 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / single-process-multi-device-te (1, 1, 2, 4)
The operation was canceled.
test-rosetta-pax / rosetta-pax-multi-node (1, 8, 1, 1)
The job running on runner GitHub Actions 325 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-multi-node (1, 8, 1, 1)
The operation was canceled.
test-rosetta-pax / rosetta-pax-multi-node-te (5B_fused_attn_1, 1, 1, 8, 1, 2, --model-type 5B --enable-fused-attn)
The job running on runner GitHub Actions 478 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-multi-node-te (5B_fused_attn_0, 1, 1, 8, 1, 2, --model-type 5B)
The job running on runner GitHub Actions 97 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-single-node-dropout-te (1, 8, 1, 1)
The job running on runner GitHub Actions 414 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-single-node-dropout-te (1, 8, 1, 1)
The operation was canceled.
test-rosetta-pax / rosetta-pax-multi-node (4, 2, 1, 2)
The job running on runner GitHub Actions 370 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-multi-node (4, 2, 1, 2)
The operation was canceled.
test-rosetta-pax / rosetta-pax-multi-node-te (1DP8FSDP1TP1PP_TE, 1, 1, 8, 1, 4)
The job running on runner GitHub Actions 353 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / single-process-evaluation-te (1, 8, 1, 1)
The job running on runner GitHub Actions 438 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / single-process-evaluation-te (1, 8, 1, 1)
The operation was canceled.
test-rosetta-pax / rosetta-pax-multi-node-te (1DP1FSDP1TP1PP_TE, 1, 1, 1, 1, 4)
The job running on runner GitHub Actions 471 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-multi-node-te (LLaMA_eval_TE, 1, 1, 8, 1, 4, true, --model-type LLaMA70BProxy --evalu...
The job running on runner GitHub Actions 395 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / single-process-multi-device-te (1, 8, 1, 1)
The job running on runner GitHub Actions 178 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / single-process-multi-device-te (1, 8, 1, 1)
The operation was canceled.
test-rosetta-pax / rosetta-pax-multi-node-te (16DP1FSDP1TP1PP_TE, 1, 16, 1, 1, 4)
The job running on runner GitHub Actions 187 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-multi-node-te (8DP1FSDP1TP1PP_TE, 1, 8, 1, 1, 4)
The job running on runner GitHub Actions 460 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / sitrep / sitrep
Process completed with exit code 2.
test-rosetta-pax / outcome
Process completed with exit code 2.

Artifacts

Produced during runtime
Name Size
artifact-final-report Expired
230 Bytes
artifact-workflow-metadata Expired
267 Bytes
jax-unit-test-A100 Expired
19.4 KB
jax-unit-test-V100 Expired
22.8 KB