Skip to content

Release v1.0.0

Compare
Choose a tag to compare
@jessech-en jessech-en released this 07 Dec 00:52
· 7 commits to main since this release
5c66df4

Release Notes - v1.0.0

We're thrilled to announce the initial release of sagemaker-hyperpod-recipes!

๐ŸŽ‰ Features

  • Unified Job Submission: Submit training and fine-tuning workflows to SageMaker HyperPod or SageMaker training jobs using a single entry point
  • Flexible Configuration: Customize your training jobs with three types of configuration files:
    • General Configuration (ex: recipes_collection/config.yaml)
    • Cluster Configuration (ex: recipes_collection/cluster/slurm.yaml)
    • Recipe Configuration (ex: recipes_collection/recipes/training/llama/hf_llama3_8b_seq16k_gpu_p5x16_pretrain.yaml)
  • Pre-defined LLM Recipes: Access a collection of ready-to-use recipes for training Large Language Models
  • Cluster Agnostic: Compatible with SageMaker HyperPod (with Slurm or Amazon EKS orchestrators) and SageMaker training jobs
  • Built on Nvidia NeMo Framework: Leverages the Nvidia NeMo Framework Launcher for efficient job management

๐Ÿ—‚๏ธ Repository Structure

  • main.py: Primary entry point for submitting training jobs
  • launcher_scripts/: Collection of commonly used scripts for LLM training
  • recipes_collection/: Pre-defined LLM recipes provided by developers

๐Ÿ”ง Key Components

  1. General Configuration: Common settings like default parameters and environment variables
  2. Cluster Configuration: Cluster-specific settings (e.g., volume, label for Kubernetes; job name for Slurm)
  3. Recipe Configuration: Training job settings including model types, sharding degree, and dataset paths

๐Ÿ“š Documentation

  • Refer to the README.md for detailed usage instructions and examples

๐Ÿค Contributing

We welcome contributions to enhance the capabilities of sagemaker-hyperpod-recipes. Please refer to our contributing guidelines for more information.

Thank you for choosing sagemaker-hyperpod-recipes for your large-scale language model training needs!