The FMBench Orchestrator is a tool designed to automate the deployment and management of FMBENCH on multiple EC2 instances in the AWS cloud. This orchestrator automates the creation of Security Groups, Key Pairs, EC2 instances, runs FMBENCH for a specific config, retrieves the results, and shuts down the instances after completion. It simplifies the benchmarking process and ensures a streamlined and scalable workflow.
- Prerequisites
- Installation
- Conda Environment Setup
- Configuration
- Usage
- Workflow
- Cleaning Up
- Contributing
- License
-
IAM ROLE: You need an active AWS account having an IAM Role necessary permissions to create, manage, and terminate EC2 instances. See this link for the permissions and trust policies that this IAM role needs to have. Call this IAM role as
fmbench-orchestrator
. -
Service quota: Your AWS account needs to have appropriately set service quota limits to be able to start the Amazon EC2 instances that you may want to use for benchmarking. This may require you to submit service quota increase requests, use this link for submitting a service quota increase requests. This would usually mean increasing the CPU limits for your accounts, getting quota for certain instance types etc.
-
EC2 Instance: It is recommended to run the orchestrator on an EC2 instance, attaching the IAM Role with permissions, preferably located in the same AWS region where you plan to launch the multiple EC2 instances (although launching instances across regions is supported as well).
- Use
Ubuntu
as the instance OS, specifically theubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20240927
AMI. - Use
t3.xlarge
as the instance type with preferably at least 100GB of disk space. - Associate the
fmbench-orchestrator
IAM role with this instance.
- Use
-
Install
conda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b # Run the Miniconda installer in batch mode (no manual intervention) rm -f Miniconda3-latest-Linux-x86_64.sh # Remove the installer script after installation eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)" # Initialize conda for bash shell conda init # Initialize conda, adding it to the shell
-
Clone the Repository
git clone https://github.com/awslabs/fmbench-orchestrator.git cd fmbench-orchestrator
-
Create a Conda Environment with Python 3.11:
conda create --name fmbench-orchestrator-py311 python=3.11 -y
-
Activate the Environment:
conda activate fmbench-orchestrator-py311
-
Install Required Packages:
pip install -r requirements.txt
You can either use an existing config file included in this repo, such as configs/config.yml
or create your own using the files provided in the configs
director as a template.
python main.py -f configs/config.yml
This configuration file is used to manage the deployment and orchestration of multiple EC2 instances for running FMBENCH benchmarks. The file defines various parameters, including AWS settings, run steps, security group settings, key pair management, and instance-specific configurations. This guide explains the purpose of each section and provides details on how to customize the file according to your requirements.
This section contains the basic settings required to interact with AWS services.
region
: Specifies the AWS region where the EC2 instances will be launched. Ensure this is set to the region where you want to deploy your resources (e.g.,us-east-1
).iam_instance_profile_arn
: The Amazon Resource Name (ARN) of the IAM instance profile that will be attached to the EC2 instances. This profile should have the necessary permissions for the orchestrator's operations.hf_token_fpath
: Your Hugging Face token for accessing specific resources or APIs. Replace{hf_token_fp_here}
with your actual Hugging Face token filepath.
Defines the various steps in the orchestration process. Set each step to yes
or no
based on whether you want that step to be executed.
security_group_creation
: Whether to create a new security group for the EC2 instances. Set toyes
to create a new security group orno
to use an existing one.key_pair_generation
: Whether to generate a new key pair for accessing the EC2 instances. If set tono
, ensure you have an existing key pair available.deploy_ec2_instance
: Whether to deploy the EC2 instances as specified in theinstances
section.run_bash_script
: Whether to run a startup script on each EC2 instance after deployment.delete_ec2_instance
: Whether to terminate the EC2 instances after completing the benchmarks.
This section configures the security group settings for the EC2 instances.
group_name
: Name of the security group to be created or used. If a group with this name already exists, it will be used.description
: A brief description of the security group, such as "MultiDeploy EC2 Security Group."vpc_id
: The VPC ID where the security group will be created. Leave this blank to use the default VPC.
Manages the SSH key pair used for accessing the EC2 instances.
key_pair_name
: Name of the key pair to be created or used. Ifkey_pair_generation
is set tono
, ensure this matches the name of an existing key pair.key_pair_fpath
: The file path where the key pair file (.pem
) will be stored locally. Update this path if you have an existing key pair.
Defines the EC2 instances to be launched for running the benchmarks. This section can contain multiple instance configurations.
instance_type
: The type of EC2 instance to be launched (e.g.,g5.2xlarge
). Choose based on your resource requirements.deploy
: (Optional, default:yes
) set toyes
if you want to run benchmarking on this instance,no
otherwise (comes in handy if you want to skip a particular instance from the run but do not want to remove it from the config file).ami_id
: The Amazon Machine Image (AMI) ID to use for the instance. Different AMIs can be specified for different instance types.startup_script
: Path to the startup script that will be executed when the instance is launched. This script should be stored in thestartup_scripts
directory.post_startup_script
: Path to a script that will be executed after the initial startup script. Use this for any additional configuration or benchmark execution commands.fmbench_config
: URL or file path to the FMBENCH configuration file that will be used by the orchestrator.
The following is an example configuration for deploying a g5.2xlarge
and g5.12xlarge
instance with specific AMI (Ubuntu Deep Learning OSS) and startup scripts:
Note: This example uses html link for one, and local file path for the other.
instances:
- instance_type: {instance_name_here}
region: {region_here}
ami_id: {ami_id_here}
device_name: /dev/sda1
ebs_del_on_termination: True | False
ebs_Iops: 16000
ebs_VolumeSize: {Volume_Size_Here}
ebs_VolumeType: {Volume_type_Here}
#Defaults to none, You can use either Reservation Id ARN or both
CapacityReservationPreference: open | none
CapacityReservationId: {The ID of the Capacity Reservation in which to run the instance.}
CapacityReservationResourceGroupArn: {The ARN of the Capacity Reservation resource group in which to run the instance.}
startup_script: startup_scripts/gpu_ubuntu_startup.txt
post_startup_script: post_startup_scripts/fmbench.txt
# Timeout period in Seconds before a run is stopped
fmbench_complete_timeout: 1200
fmbench_config: {fmbench_config_here}
- AWS Settings: Ensure that the
region
andiam_instance_profile_arn
are set according to your AWS account and region. - Run Steps: Toggle the steps based on your requirements. If you have an existing key pair or security group, set the respective steps to
no
. - Security Group: Update the
vpc_id
if you have a specific VPC. Otherwise, if left blank the default VPC will be used. - Key Pair Management: If you choose not to generate a new key pair, ensure the existing key pair is specified and stored in the specified path and registered with aws.
- Instances: Modify the instance configurations as per your benchmarking requirements. Ensure the AMI ID and instance type are available in the chosen region.
- Initialization: Reads the configuration file and initializes the necessary AWS resources.
- Instance Creation: Launches the specified number of EC2 instances with the provided configuration.
- FMBENCH Execution: Runs the FMBENCH benchmark script on each instance.
- Results Collection: Collects the results from each instance and uploads them to the specified S3 bucket.
- Instance Termination: Terminates all instances to prevent unnecessary costs.
Cleanup is done automatically. But if you select no in config, you would have to manually terminate the instances from EC2 console.
Contributions are welcome! Please fork the repository and submit a pull request with your changes. For major changes, please open an issue first to discuss what you would like to change.
See CONTRIBUTING for more information.
This project is licensed under the MIT-0 License - see the LICENSE file for details.