Release Notes - v1.0.0

We're thrilled to announce the initial release of sagemaker-hyperpod-recipes!

🎉 Features

Unified Job Submission: Submit training and fine-tuning workflows to SageMaker HyperPod or SageMaker training jobs using a single entry point
Flexible Configuration: Customize your training jobs with three types of configuration files:
- General Configuration (ex: recipes_collection/config.yaml)
- Cluster Configuration (ex: recipes_collection/cluster/slurm.yaml)
- Recipe Configuration (ex: recipes_collection/recipes/training/llama/hf_llama3_8b_seq16k_gpu_p5x16_pretrain.yaml)
Pre-defined LLM Recipes: Access a collection of ready-to-use recipes for training Large Language Models
Cluster Agnostic: Compatible with SageMaker HyperPod (with Slurm or Amazon EKS orchestrators) and SageMaker training jobs
Built on Nvidia NeMo Framework: Leverages the Nvidia NeMo Framework Launcher for efficient job management

General Configuration: Common settings like default parameters and environment variables
Cluster Configuration: Cluster-specific settings (e.g., volume, label for Kubernetes; job name for Slurm)
Recipe Configuration: Training job settings including model types, sharding degree, and dataset paths

We welcome contributions to enhance the capabilities of sagemaker-hyperpod-recipes. Please refer to our contributing guidelines for more information.

Thank you for choosing sagemaker-hyperpod-recipes for your large-scale language model training needs!