Skip to content

Commit

Permalink
SageMaker HyperPod recipes release 1.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
ArjunKrishnak committed Dec 4, 2024
1 parent 03e58c4 commit 0a556d9
Show file tree
Hide file tree
Showing 195 changed files with 14,921 additions and 0 deletions.
8 changes: 8 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[run]
# Exclude submodule directory from coverage
omit =
launcher/nemo/nemo_framework_launcher/*
template/*

[report]
fail_under = 85
31 changes: 31 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
name: Bug report
about: File a report to help us reproduce and fix the problem
title: ''
labels: 'bug'
assignees: ''

---

## Describe the bug
A clear and concise description of what the bug is.

## How to Reproduce?
A clear, step-by-step set of instructions to reproduce the bug.
The provided code need to be **complete** and **runnable**, if additional data is needed, please include them in the issue.

## Expected behavior
A clear and concise description of what you expected to happen.

## Screenshots, error messages or logs
If applicable, please share with us screenshots, error messages or logs to help explain your problem.

## System information
A description of your system. Please provide:
- **Docker image you ran against**:
- **Source code version you ran against**:
- **Python version**:
- **Hardware accelerator used**:

## Additional context
Add any other context about the problem here. Please provide any additional steps you have tried to solve your issue here.
17 changes: 17 additions & 0 deletions .github/ISSUE_TEMPLATE/documentation_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
name: Documentation request
about: Request improved documentation
title: ''
labels: 'documentation request'
assignees: ''

---

## What did you find confusing?
A clear and concise description of what you found confusing. Ex. I tried to [...] but I didn't understand how to [...]

## Describe how documentation can be improved
A clear and concise description of where documentation was lacking and how it can be improved.

## Additional context
Add any other context or screenshots about the documentation request here.
20 changes: 20 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest new functionality for this project
title: ''
labels: 'feature request'
assignees: ''

---

## Describe the feature you'd like
A clear and concise description of the functionality you want.

## How would this feature be used?
A clear and concise description of the use case for this feature. Please provide an example, if possible.

## Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

## Additional context
Add any other context about the feature request here.
25 changes: 25 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
## Description

### Motivation
Explain the motivation

### Changes
* List your changes

### Testing
Explain how the changes were tested

## Merge Checklist
Put an x in the boxes that apply. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

### General
- [ ] I have read the [CONTRIBUTING](../CONTRIBUTING.md) doc
- [ ] I have run `pre-commit run --all-files` on my code. It will check for [this configuration](../.pre-commit-config.yaml).
- [ ] I have updated any necessary documentation, including [READMEs](../README.md) and API docs (if appropriate)
- [ ] I have verified the licenses used in the license-files artifact generated in the Python License Scan CI check. If the license workflow fails, kindly check the licenses used in the artifact.

### Tests
- [ ] I have run `pytest` on my code and all unit tests passed.
- [ ] I have added tests that prove my fix is effective or that my feature works (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
28 changes: 28 additions & 0 deletions .github/workflows/pre-commit-check-runner-push.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Python Pre Commit Check CI After Commit

on:
push:
branches:
- main # Triggers on direct pushes to the main branch

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.8' # Set python version to 3.8

- name: Install pre-commit dependencies
run: |
python -m pip install --upgrade pip
pip install pre-commit
- name: Run pre-commit checks
run: |
pre-commit run --all-files
69 changes: 69 additions & 0 deletions .github/workflows/repo-monitoring-cron.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
name: Repository Monitoring

on:
schedule:
- cron: '0 16 * * *'

concurrency:
group: ${{ github.workflow }}-${{ github.run_id }}
cancel-in-progress: true

permissions:
id-token: write # This is required for requesting the JWT
contents: read # This is required for actions/checkout

jobs:
check-pr-alerts:
runs-on: ubuntu-latest
if: github.event.repository.visibility == 'public'
timeout-minutes: 10
outputs:
pr_count: ${{ steps.pr-count.outputs.count }}
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Check for open PRs
id: pr-count
env:
GITHUB_TOKEN: ${{ secrets.GH_PAT }}
run: |
pr_count=$(gh pr list --state open --limit 1000 | wc -l)
echo "count=$pr_count" >> $GITHUB_OUTPUT
check-issue-alerts:
runs-on: ubuntu-latest
if: github.event.repository.visibility == 'public'
timeout-minutes: 10
outputs:
issue_count: ${{ steps.issue-count.outputs.count }}
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Check for open issues
id: issue-count
env:
GITHUB_TOKEN: ${{ secrets.GH_PAT }}
run: |
issue_count=$(gh issue list --state open --limit 1000 | wc -l)
echo "count=$issue_count" >> $GITHUB_OUTPUT
put-metric-data:
runs-on: ubuntu-latest
if: github.event.repository.visibility == 'public'
timeout-minutes: 10
needs: [check-pr-alerts, check-issue-alerts]
steps:
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ secrets.RUNNER_ROLE_ARN }}
role-session-name: repo-monitoring-cron-session
aws-region: us-west-2

- name: Put PR Alert Metric Data
run: |
aws cloudwatch put-metric-data --metric-name PRAlert --namespace RepoMetrics --value ${{ needs.check-pr-alerts.outputs.pr_count }} --unit Count --dimensions ProjectName=sagemaker-hyperpod-recipes
- name: Put Issue Alert Metric Data
run: |
aws cloudwatch put-metric-data --metric-name IssueAlert --namespace RepoMetrics --value ${{ needs.check-issue-alerts.outputs.issue_count }} --unit Count --dimensions ProjectName=sagemaker-hyperpod-recipes
100 changes: 100 additions & 0 deletions .github/workflows/security-monitoring-cron.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
name: Security Monitoring

on:
schedule:
- cron: '0 16 * * *'

concurrency:
group: ${{ github.workflow }}-${{ github.run_id }}
cancel-in-progress: true

permissions:
id-token: write # This is required for requesting the JWT
contents: read # This is required for actions/checkout

jobs:
check-dependabot-alerts:
runs-on: ubuntu-latest
outputs:
dependabot_alert_status: ${{ steps.check-dependabot-alerts.outputs.dependabot_alert_status }}
steps:
- name: Check for dependabot alerts
id: check-dependabot-alerts
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea
with:
github-token: ${{ secrets.GH_PAT }}
script: |
async function checkAlerts() {
const owner = '${{ github.repository_owner }}';
const repo = '${{ github.event.repository.name }}';
const dependabotAlerts = await github.rest.dependabot.listAlertsForRepo({
owner,
repo,
headers: {
'accept': 'applications/vnd.github+json'
}
});
const activeDependabotAlerts = dependabotAlerts.data.filter(alert => alert.state === 'open');
core.setOutput('dependabot_alert_status', activeDependabotAlerts.length > 0 ? '1': '0');
}
await checkAlerts();
check-code-scanning-alerts:
runs-on: ubuntu-latest
outputs:
code_scanning_alert_status: ${{ steps.check-code-scanning-alerts.outputs.code_scanning_alert_status }}
steps:
- name: Check for security alerts for public repository
id: check-code-scanning-alerts
if: github.event.repository.visibility == 'public'
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea
with:
github-token: ${{ secrets.GH_PAT }}
script: |
async function checkAlerts() {
const owner = '${{ github.repository_owner }}';
const repo = '${{ github.event.repository.name }}';
const ref = 'refs/heads/main';
const codeScanningAlerts = await github.rest.codeScanning.listAlertsForRepo({
owner,
repo,
ref: ref
});
const activeCodeScanningAlerts = codeScanningAlerts.data.filter(alert => alert.state === 'open');
return activeCodeScanningAlerts.length > 0 ? '1': '0';
}
await checkAlerts();
- name: Set code scanning alerts output
id: set-code-scanning-alerts-output
run: |
if ${{ github.event.repository.visibility == 'public' }}; then
echo "code_scanning_alert_status=${{ steps.check-code-scanning-alerts.outputs.result }}" >> $GITHUB_OUTPUT
else
echo "code_scanning_alert_status=0" >> $GITHUB_OUTPUT
fi
put-metric-data:
runs-on: ubuntu-latest
needs: [check-dependabot-alerts, check-code-scanning-alerts]
steps:
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@12e3392609eaaceb7ae6191b3f54bbcb85b5002b
with:
role-to-assume: ${{ secrets.RUNNER_ROLE_ARN }}
aws-region: us-west-2
- name: Put Dependabot Alert Metric Data
run: |
if [ "${{ needs.check-dependabot-alerts.outputs.dependabot_alert_status }}" == "1" ]; then
aws cloudwatch put-metric-data --metric-name DependabotAlert --namespace SecurityMonitoringMetrics --value 1 --unit Count --dimensions ProjectName=sagemaker-hyperpod-recipes
else
aws cloudwatch put-metric-data --metric-name DependabotAlert --namespace SecurityMonitoringMetrics --value 0 --unit Count --dimensions ProjectName=sagemaker-hyperpod-recipes
fi
- name: Put Code Scanning Alert Metric Data
run: |
if [ "${{ needs.check-code-scanning-alerts.outputs.code_scanning_alert_status }}" == "1" ]; then
aws cloudwatch put-metric-data --metric-name CodeScanningAlert --namespace SecurityMonitoringMetrics --value 1 --unit Count --dimensions ProjectName=sagemaker-hyperpod-recipes
else
aws cloudwatch put-metric-data --metric-name CodeScanningAlert --namespace SecurityMonitoringMetrics --value 0 --unit Count --dimensions ProjectName=sagemaker-hyperpod-recipes
fi
32 changes: 32 additions & 0 deletions .github/workflows/unit-test-runner-push.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Python Unit Test CI After Commit

on:
push:
branches:
- main # Triggers on direct pushes to the main branch

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v3
with:
submodules: recursive # Checkout submodules as well

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.8' # Set python version to 3.8

- name: Install unit test dependencies
run: |
python -m pip install --upgrade pip
pip install -r launcher/nemo/nemo_framework_launcher/requirements.txt
pip install pytest
pip install pytest-cov
- name: Run unit tests
run: |
python -m pytest
27 changes: 27 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# log and data files
trace
.DS_Store
.hydra
.bash_history.local
results/
outputs/
tmp/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
**.pyc
core.*

# Unit test / coverage reports
coverage_html_report/
.coverage
.coverage.*
.cache
*.cover
.hypothesis/
.pytest_cache/

# Playground area
mypg/
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "launcher/nemo/nemo_framework_launcher"]
path = launcher/nemo/nemo_framework_launcher
url = https://github.com/NVIDIA/NeMo-Framework-Launcher.git
Loading

0 comments on commit 0a556d9

Please sign in to comment.