Skip to content

feat(checkpoint): support universal checkpoint #1251

feat(checkpoint): support universal checkpoint

feat(checkpoint): support universal checkpoint #1251

Triggered via pull request December 25, 2024 07:37
Status Cancelled
Total duration 2h 3m 54s
Artifacts

e2e_test.yaml

on: pull_request
training_4GPU
0s
training_4GPU
training_8GPU_ISP
0s
training_8GPU_ISP
training_8GPU_ISP_CKPT
0s
training_8GPU_ISP_CKPT
training_8GPU_4DP2PP_ZB
0s
training_8GPU_4DP2PP_ZB
Matrix: training_16GPU_4DP2TP2PP_FSP
Matrix: training_16GPU_4DP2TP2PP_MSP
Matrix: training_16GPU_4DP2TP2PP_MTP
Matrix: training_8GPU_4DP2PP
Matrix: training_8GPU_4DP2TP
Matrix: training_8GPU_4DP2TPSP
Matrix: training_llama2
Fit to window
Zoom out
Zoom in

Annotations

11 errors
training_8GPU_ISP_CKPT
The run was canceled by @kkscilife.
training_4GPU
The run was canceled by @kkscilife.
training_8GPU_4DP2TPSP (t_cluster)
The run was canceled by @kkscilife.
training_8GPU_ISP
The run was canceled by @kkscilife.
training_8GPU_4DP2PP_ZB
The run was canceled by @kkscilife.
training_8GPU_4DP2PP (t_cluster)
The run was canceled by @kkscilife.
training_8GPU_4DP2TP (t_cluster)
The run was canceled by @kkscilife.
training_16GPU_4DP2TP2PP_MTP (t_cluster)
The run was canceled by @kkscilife.
training_16GPU_4DP2TP2PP_FSP (t_cluster)
The run was canceled by @kkscilife.
training_16GPU_4DP2TP2PP_MSP (t_cluster)
The run was canceled by @kkscilife.
training_llama2 (t_cluster)
The run was canceled by @kkscilife.