feat(checkpoint): support universal checkpoint #1251
Triggered via pull request
December 25, 2024 07:37
Status
Cancelled
Total duration
2h 3m 54s
Artifacts
–
e2e_test.yaml
on: pull_request
training_4GPU
0s
training_8GPU_ISP
0s
training_8GPU_ISP_CKPT
0s
training_8GPU_4DP2PP_ZB
0s
Matrix: training_16GPU_4DP2TP2PP_FSP
Matrix: training_16GPU_4DP2TP2PP_MSP
Matrix: training_16GPU_4DP2TP2PP_MTP
Matrix: training_8GPU_4DP2PP
Matrix: training_8GPU_4DP2TP
Matrix: training_8GPU_4DP2TPSP
Matrix: training_llama2
Annotations
11 errors
training_8GPU_ISP_CKPT
The run was canceled by @kkscilife.
|
training_4GPU
The run was canceled by @kkscilife.
|
training_8GPU_4DP2TPSP (t_cluster)
The run was canceled by @kkscilife.
|
training_8GPU_ISP
The run was canceled by @kkscilife.
|
training_8GPU_4DP2PP_ZB
The run was canceled by @kkscilife.
|
training_8GPU_4DP2PP (t_cluster)
The run was canceled by @kkscilife.
|
training_8GPU_4DP2TP (t_cluster)
The run was canceled by @kkscilife.
|
training_16GPU_4DP2TP2PP_MTP (t_cluster)
The run was canceled by @kkscilife.
|
training_16GPU_4DP2TP2PP_FSP (t_cluster)
The run was canceled by @kkscilife.
|
training_16GPU_4DP2TP2PP_MSP (t_cluster)
The run was canceled by @kkscilife.
|
training_llama2 (t_cluster)
The run was canceled by @kkscilife.
|