Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running under a matrix strategy #1419

Open
leonsparrowJM opened this issue Jul 30, 2024 · 6 comments
Open

Running under a matrix strategy #1419

leonsparrowJM opened this issue Jul 30, 2024 · 6 comments

Comments

@leonsparrowJM
Copy link

leonsparrowJM commented Jul 30, 2024

Hello,
I'm trying to centralise this action; we have close to 800 repos in our org, and I don't want to commit (or use .git) this action to each of them.

So, using the APP authentication, I gather a list of repos that have changed since it last ran:

echo "SARIFLASTRUNTIMESTAMP: $TIMESTAMP"
 response=$(curl -s -H "Authorization: token ${{ steps.access_token.outputs.token }}" \
              "https://api.github.com/search/repositories?q=org:ourOrgName+archived:false+pushed:>=$TIMESTAMP&per_page=$per_page&page=$page")

Then I update the timestamp with the now date.

so I have a JSON list of repos that the sarif uploader need to run on, so I create a matrix:

  run-scorecard:
    needs: fetch-repos
    runs-on: ubuntu-latest
    if: ${{ needs.fetch-repos.outputs.repo-list != '[]' }}
    strategy:
      matrix:
        repo: ${{ fromJson(needs.fetch-repos.outputs.repo-list) }}

with the matrix item I re-authenticate with the APP, and run the scorecard (sarif)

The trouble is, the scorecard gathers the repo name from the repo info in file: /github/workflow/event.json - main workflow json;

Is there any way, as I have checked out the 'child repo' in the matrix, I can pass a input to the:

      - name: "Run analysis"
        uses: ossf/[email protected]
        with:
          results_file: results.sarif
          results_format: sarif
          repo_token: ${{ secrets.GITHUB_TOKEN }}
          publish_results: false

so it runs against the repo I have checked out in the matrix?

By using this technique I stay in control of the updates of the security alerts (aligning with my other tools), and I can for example, nightly update the alerts for only repos that have had commits and produce scorecards for the org at the same time which get published...

I'll include my code (it's full of debug information steps atm, dirty wip) for completeness so you know what I am talking about:

name: New Central Sarif Alert Uploader

on:
  workflow_dispatch:


permissions:
  contents: read
  issues: read
  pull-requests: read
  security-events: write  
  id-token: write
  actions: read
  checks: read

jobs:
  fetch-repos:
    runs-on: ubuntu-latest
    outputs:
      repo-list: ${{ steps.set-repo-list.outputs.repo-list }}
    steps:
      - name: Generate JWT
        id: jwt
        run: |
          python - <<EOF
          import os
          import time
          import jwt
          private_key = os.getenv('SECDEVOPS_APP_PRIVATE_KEY', '').replace('\\n', '\n')
          payload = {
              'iat': int(time.time()),
              'exp': int(time.time()) + (10 * 60),
              'iss': os.getenv('SECDEVOPS_APP_ID')
          }
          token = jwt.encode(payload, private_key, algorithm='RS256')
          print(f"::set-output name=token::{token}")
          EOF
        env:
          SECDEVOPS_APP_PRIVATE_KEY: ${{ secrets.SECDEVOPS_APP_PRIVATE_KEY }}
          SECDEVOPS_APP_ID: ${{ secrets.SECDEVOPS_APP_ID }}

      - name: Get Installation Access Token
        id: access_token
        run: |
          curl -X POST \
            -H "Authorization: Bearer ${{ steps.jwt.outputs.token }}" \
            -H "Accept: application/vnd.github.v3+json" \
            https://api.github.com/app/installations/${{ secrets.SECDEVOPS_APP_INSTALLATION_ID }}/access_tokens \
            > response.json
          token=$(jq -r .token < response.json)
          echo "::set-output name=token::$token"
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Fetch repository list based on timestamp
        id: fetch
        run: |
          # Define test mode flag
          TEST_MODE=${TEST_MODE:-false}  # Set TEST_MODE to true to enable test mode true/false

          # Set timestamp to a very early date if in test mode
          if [ "$TEST_MODE" = true ]; then
            echo "Test mode is enabled. Overriding TIMESTAMP to a very early date."
            TIMESTAMP="1970-01-01T00:00:00Z"
          else
            TIMESTAMP=${{ vars.SARIFLASTRUNTIMESTAMP }}
          fi
          echo "SARIFLASTRUNTIMESTAMP: $TIMESTAMP"

          # Validate timestamp format (ISO 8601 format check)
          if [[ ! "$TIMESTAMP" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$ ]]; then
            echo "Invalid or missing SARIFLASTRUNTIMESTAMP: $TIMESTAMP"
            exit 1
          fi

          page=1
          per_page=100
          > raw_repos.json

          while true; do
            response=$(curl -s -H "Authorization: token ${{ steps.access_token.outputs.token }}" \
              "https://api.github.com/search/repositories?q=org:<redactedorg>+archived:false+pushed:>=$TIMESTAMP&per_page=$per_page&page=$page")
            echo "API response for page $page:"
            echo "$response"
            repo_count=$(echo "$response" | jq -r '.items | length')
            if [ "$repo_count" -eq 0 ]; then
              break
            fi
            echo "$response" | jq -r '.items[] | "\(.full_name) - created_at: \(.created_at), pushed_at: \(.pushed_at)"'
            echo "$response" | jq -r '.items[] | select(.size > 0) | .full_name' >> raw_repos.json # added check to not include empty repos
            echo "Fetched page $page with $repo_count repositories"
            page=$((page + 1))
          done

          echo "Repositories found (raw):"
          cat raw_repos.json || echo "No repositories found"

          # Filter to include only test repositories
          filtered_repos=()
          while IFS= read -r repo; do
            repo=$(echo "$repo" | tr -d '\n' | tr -d '\r')
            if [[ "$repo" =~ SecDevOps---(Test) ]]; then
             filtered_repos+=("$repo")
            fi
          done < raw_repos.json

          echo "Filtered repositories:"
          printf "%s\n" "${filtered_repos[@]}"

          if [ ${#filtered_repos[@]} -eq 0 ]; then
            echo "No repositories to process."
            echo "::set-output name=repo-list::[]"
            exit 0
          fi

          printf "%s\n" "${filtered_repos[@]}" | jq -R -s -c 'split("\n")[:-1]' > repo_list.json
          cat repo_list.json

          # Toggle between filtered and raw repositories
          USE_FILTERED=true  # Default to false
          if [ "$TEST_MODE" = true ]; then
            USE_FILTERED=true  # Override to true in test mode
          fi

          if [ "$USE_FILTERED" = true ]; then
            echo "Using filtered repositories."
            cp repo_list.json output.json
          else
            echo "Using raw repositories."
            jq -R -s -c 'split("\n")[:-1]' < raw_repos.json > output.json
          fi

      - name: Set repository list output
        id: set-repo-list
        run: |
          if [ -f output.json ]; then
            echo "::set-output name=repo-list::$(cat output.json)"
          else
            echo "::set-output name=repo-list::[]"
          fi

      - name: Print repository list
        run: |
          echo "Repository list contents:"
          cat output.json || echo "No repositories found"

      - name: Upload repository list
        uses: actions/upload-artifact@v3
        with:
          name: repo-list
          path: output.json
          retention-days: 1

      - name: Upload raw repository list
        uses: actions/upload-artifact@v3
        with:
          name: raw-repos
          path: raw_repos.json
          retention-days: 1

      - name: Update SARIFLASTRUNTIMESTAMP
        run: |
         if [ "${{ env.UPDATE_TIMESTAMP }}" = "true" ]; then
            echo "Setting SARIFLASTRUNTIMESTAMP variable"
            NEW_TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
            echo "Generated new timestamp: $NEW_TIMESTAMP"
            
            gh api \
            -H "Authorization: Bearer $GH_TOKEN" \
            -X PATCH \
            /repos/${{ github.repository }}/actions/variables/SARIFLASTRUNTIMESTAMP \
            -f name=SARIFLASTRUNTIMESTAMP \
            -f value="$NEW_TIMESTAMP"
            
            echo "Verifying the variable was set"
            gh api \
             -H "Authorization: Bearer $GH_TOKEN" \
             /repos/${{ github.repository }}/actions/variables \
             | grep SARIFLASTRUNTIMESTAMP
         else
            echo "UPDATE_TIMESTAMP is set to false. Skipping timestamp update."
         fi
        env:
         GH_TOKEN: ${{ steps.access_token.outputs.token }}
         UPDATE_TIMESTAMP: true  # Set to false to disable the timestamp update

  #log-repo-list:
  #  needs: fetch-repos
  #  runs-on: ubuntu-latest
  #  steps:
  #    - name: Log Repository List
  #      run: echo "${{ needs.fetch-repos.outputs.repo-list }}"

  run-scorecard:
    needs: fetch-repos
    runs-on: ubuntu-latest
    if: ${{ needs.fetch-repos.outputs.repo-list != '[]' }}
    strategy:
      matrix:
        repo: ${{ fromJson(needs.fetch-repos.outputs.repo-list) }}
    
    steps:
      - name: Print repository list
        run: echo "${{ needs.fetch-repos.outputs.repo-list }}"
             echo "The repo in the matrix is ${{ matrix.repo }}"

      - name: Generate JWT
        id: jwt
        run: |
          python - <<EOF
          import os
          import time
          import jwt
          private_key = os.getenv('SECDEVOPS_APP_PRIVATE_KEY', '').replace('\\n', '\n')
          payload = {
              'iat': int(time.time()),
              'exp': int(time.time()) + (10 * 60),
              'iss': os.getenv('SECDEVOPS_APP_ID')
          }
          token = jwt.encode(payload, private_key, algorithm='RS256')
          print(f"::set-output name=token::{token}")
          EOF
        env:
          SECDEVOPS_APP_PRIVATE_KEY: ${{ secrets.SECDEVOPS_APP_PRIVATE_KEY }}
          SECDEVOPS_APP_ID: ${{ secrets.SECDEVOPS_APP_ID }}
    
      - name: Get Installation Access Token
        id: access_token
        run: |
          curl -X POST \
            -H "Authorization: Bearer ${{ steps.jwt.outputs.token }}" \
            -H "Accept: application/vnd.github.v3+json" \
            https://api.github.com/app/installations/${{ secrets.SECDEVOPS_APP_INSTALLATION_ID }}/access_tokens \
            > response.json
          token=$(jq -r .token < response.json)
          echo "::set-output name=token::$token"
        env:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: "Checkout code"
        id: checkout
        uses: actions/checkout@v2
        with:
            repository: ${{ matrix.repo }}
            token: ${{ steps.access_token.outputs.token }}
        continue-on-error: true

      - name: Confirm checkout and list directory
        run: |
            echo "Checkout directory: $(pwd)"
            echo "Listing contents of the checked-out directory:"
            ls -lah
            
      - name: Check for checkout error
        id: check_checkout
        run: |
            if [ ! -d .git ]; then
            echo "checkout_failed=true" >> $GITHUB_ENV
            echo "::warning::Checkout failed for ${{ matrix.repo }}"
            fi

      - name: Set Directory Permissions
        run: sudo chmod -R 777 ${{ github.workspace }}
        if: ${{ env.checkout_failed != 'true' }}
        continue-on-error: true

      - name: Extract repository name
        id: extract_repo_name
        run: |
            echo "Matrix repo: ${{ matrix.repo }}"
            repo_name=$(basename "${{ matrix.repo }}")
            echo "Extracted repo name: $repo_name"
            echo "::set-output name=repo_name::$repo_name"
        if: ${{ env.checkout_failed != 'true' }}
        continue-on-error: true

      - name: Test GITHUB_ENV
        run: |
         echo "TEST_VAR=TestValue" >> $GITHUB_ENV

      - name: Show TEST_VAR
        run: |
         echo "Test Variable: $TEST_VAR"

      - name: Test modify GITHUB_ENV
        run: |
         echo "TEST_VAR=TestValue-${{ matrix.repo }}" >> $GITHUB_ENV

      - name: Show modded TEST_VAR
        run: |
         echo "Test Variable: $TEST_VAR"

      - name: Display event.json content
        run: cat /home/runner/work/_temp/_github_workflow/event.json
       
      - name: Show original variables
        run: |
          echo "Original GITHUB_REPOSITORY: $GITHUB_REPOSITORY"
          echo "Original GITHUB_WORKSPACE: $GITHUB_WORKSPACE"
          echo "Original GITHUB_EVENT_NAME: $GITHUB_EVENT_NAME"
          echo "Original GITHUB_EVENT_PATH: $GITHUB_EVENT_PATH"
          echo "Original GITHUB_REF: $GITHUB_REF"
          echo "Original GITHUB_API_URL: $GITHUB_API_URL"

      - name: "Run analysis"
        uses: ossf/[email protected]
        with:
          results_file: results.sarif
          results_format: sarif
          repo_token: ${{ secrets.GITHUB_TOKEN }}
          publish_results: false



      - name: Debug SARIF file
        run: |
            echo "Contents of results.sarif:"
            cat results.sarif
@leonsparrowJM
Copy link
Author

leonsparrowJM commented Jul 30, 2024

yo can see here the name of the matrix repo that is checked out is SecDevOps--Test:
image

image

but as we cannot parse the repo name as a input as it gathers from here:

image

which means it always runs against the repo I am running the parent action from:

image

@leonsparrowJM
Copy link
Author

If it helps, I do keep all my json scorecards in a central directory; but not sure if it's relevant for:
https://github.com/ossf/scorecard-action/blob/main/docs/development.md
Or how to use the docker correctly from a action:

     - name: "Run analysis"
        run: |
          set +e
          docker run --rm \
            -v "${GITHUB_WORKSPACE}:/github/workspace" \
            -e "GITHUB_AUTH_TOKEN=${{ steps.access_token.outputs.token }}" \
            -e INPUT_RESULTS_FILE="results.sarif" \
            -e INPUT_RESULTS_FORMAT="sarif" \
            -e INPUT_REPO_TOKEN="${{ steps.access_token.outputs.token }}" \
            -e INPUT_PUBLISH_RESULTS="false" \
            -e GITHUB_WORKSPACE="${{ github.workspace }}" \
            -e GITHUB_REF="${{ github.ref }}" \
            -e GITHUB_EVENT_NAME="branch_protection_rule" \
            -e GITHUB_EVENT_PATH="MyOrg/SecDevOps---Scorecard/scorecards/SecDevOps---Test-scorecard.json" \
            -e GITHUB_REPOSITORY="ossf/scorecard" \
            gcr.io/openssf/scorecard-action:v2.4.0

@spencerschrock
Copy link
Member

Is there any way, as I have checked out the 'child repo' in the matrix, I can pass a input so it runs against the repo I have checked out in the matrix?

The checkout step is there to handle the pull_request event. We don't currently support what you're trying to do.

Using the Scorecard docker seems like a reasonable workaround, which is what you posted in your OpenSSF Slack message, which I'll copy here to save from getting deleted:

- name: Run Scorecard
        id: run_scorecard
        run: |
          mkdir -p scorecards
          sudo chmod -R 777 scorecards
          repo_name=$(basename ${{ matrix.repo }})
          set +e
          docker run --rm \
            -v "${GITHUB_WORKSPACE}:/github/workspace" \
            -e "GITHUB_AUTH_TOKEN=${{ steps.access_token.outputs.token }}" \
            gcr.io/openssf/scorecard:stable \
            --repo "[https://github.com/${{](https://github.com/$%7B%7B) matrix.repo }}" \
            --format json \
            --checks ${{ steps.construct_checks.outputs.checks }} \
            --show-details \
            -o "/github/workspace/scorecards/${repo_name}-scorecard.json"
          SCORECARD_EXIT_CODE=$?
          if [ $SCORECARD_EXIT_CODE -ne 0 ]; then
            echo "scorecard_failed=true" >> $GITHUB_ENV
            echo "::warning::Scorecard failed for ${{ matrix.repo }}"
          fi
          set -e
        if: ${{ env.checkout_failed != 'true' }}
        continue-on-error: true

I will note that Scorecard supports reading some of these GitHub app values directly if it saves you from needing to do some of the JWT steps:

https://github.com/ossf/scorecard/blob/d50480ac12bb49caae2d27bbc5ff1a655a6a7e01/clients/githubrepo/roundtripper/roundtripper.go#L32-L37

@spencerschrock
Copy link
Member

Perhaps there's some overlap with https://github.com/ossf/scorecard-monitor, but that only looks at the visualization aspect, not the generation which is what you're trying to do.

@leonsparrowJM
Copy link
Author

leonsparrowJM commented Aug 21, 2024

Hmm, I prefer my own implementation, it's much less of a wall of text:

image

It's just a shame scorecard-action cannot be centrally run via a github App; no one sane wants to make workflow commits to 900+ repos and all the wonderful automated deployments it will kick off :-)

I have a piece of work now to dissect the scorecard.json files and try and create a batch remote sarif uploader so this can be centralised; the trouble is, the sarif specs document is so complex.

I wonder if this suite of centralised scorecard tools and centralised sarif uploader ever gets finished, if anyone would be interested in the open souce?

@lelia
Copy link

lelia commented Sep 5, 2024

It's just a shame scorecard-action cannot be centrally run via a github App; no one sane wants to make workflow commits to 900+ repos and all the wonderful automated deployments it will kick off :-)

A related issue was just filed in the main Scorecard repository and discussed today at the community meeting: ossf/scorecard#4333

EDIT: I've filed a top-level tracking issue to formalize support for large-scale use cases: ossf/scorecard#4339

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants