v1.3
Quick start solutions
- Add finetuning gemma on GKE with L4 GPUs example (#697)
- Jetstream autoscaling guide (#703)
- Enable Ray autoscaler for RAG example application (#722)
- Fix GKE training tutorial (#706)
- Update Kueue to 0.7.0 (#707)
ML Platform release (#715)
- Documentation
- Add notebook packaging guide to docs (#690)
- Added enhancements to the data processing use cases
- Infrastructure
- Added H100 and A100 40GB DWS node pools
- Moved cpu node pool from n2 to n4 machines
- Updated Kueue to 0.7.0
- Added initial test harness
- Configuration Management
- ConfigSync git repository name allows for easier use of multiple environments. Standardized GitOps scripts
- Added a GitLab project module and allowed users to choose between GitHub and GitLab
- Observability
- Added NVIDIA DCGM
- Added environment_name to the Ray dashboard endpoint
- Added Config Controller Terraform module
- Security
- Added allow KubeRay Operator to the namespace network policy
- Added Secret Manager add-on to the cluster
TPU Provisioner
- Add admission label selectors and e2e test script (#702)
Benchmarking
General
Add custom metrics stackdriver adapter Terraform module (#718)
Add prometheus adapter Terraform module (#716)
Add Jetstream MaxText Terraform module (#719)