Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to turn off publishing assets in parallel #220

Open
moltar opened this issue May 30, 2022 · 18 comments
Open

option to turn off publishing assets in parallel #220

moltar opened this issue May 30, 2022 · 18 comments

Comments

@moltar
Copy link

moltar commented May 30, 2022

It is a bit annoying and error-prone to re-generate the GitHub Actions file every time there is a new asset is added. Which can be ambiguous as many constructs add assets under the hood.

Is there a way to combine the assets into a single job?

CodePipeline has that option.

@kaizencc
Copy link
Contributor

Hi @moltar, thanks for opening the issue. I'm assuming by GitHub Actions file you are referring to the deploy.yml workflow file. It's definitely an interesting concept but I'm not convinced that this is the behavior we should go down.

Is there a way to combine the assets into a single job?

I don't think there's anything blocking this behavior, though it would be a large refactor. Basically, instead of writing a file per asset, we would write one well-known file containing instructions for deploying every single asset and link that to deploy.yml. But then suddenly we'd have to keep track of this well-known file along with deploy.yml. There's also all sorts of weird issues that arise when you're expecting deploy.yml not to change but the pipeline behavior to change.

CodePipeline has that option.

The feature parity we strive for is for cdk-pipelines-github to be functionally equivalent to @aws-cdk/pipelines whenever possible. AFAIK, @aws-cdk/pipelines publishes each asset individually, but I could be wrong. Can you link me to the behavior you see in CodePipeline, or better yet the equivalent behavior in @aws-cdk/pipelines?

@moltar
Copy link
Author

moltar commented Jun 1, 2022

Hi @kaizencc, appreciate your detailed response!

The prop I am talking about is publishAssetsInParallel.

Docs: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.pipelines.CodePipeline.html#publishassetsinparallel

asset and link that to deploy.yml. But then suddenly we'd have to keep track of this well-known file along with deploy.yml

Why can't we use cdk.out manifests for tracking the assets? If this goes into the artifact, and then come out of artifact all intact this should work. I mean it works in the CodeBuild setup, and it doesn't have any capacity to save state between executions, besides artifacts as well. So if it works in CB, should work with GHA too. No?

@kaizencc
Copy link
Contributor

kaizencc commented Jun 1, 2022

Why can't we use cdk.out manifests for tracking the assets?

We don't actually track the contents of cdk.out and recommend you not commit this folder to your repository. What should be happening is that the synth step of your pipeline will create a cdk.out and that is the assembly that gets deployed. It becomes problematic if the same deploy.yml file can produce different cdk.out folders imo.

Regarding publishAssetsInParallel, I see your point. The default behavior of true is the only behavior that is allowed in cdk-pipelines-github. I suppose it is reasonable to expect a way to not publish assets in parallel via github actions. However, I don't think introducing something similar in this module would mean that deploy.yml doesn't change when you introduce a new asset to the pipeline. In this scenario, sure there's only one "publish asset" step but deploy.yml still updates to a new publish-assets-<hash>.sh file. I think you want deploy.yml to change every time your pipeline changes.

Updating this issue to reflect the feature request of a publishAssetsInParallel option, though I don't think it's on my immediate roadmap without a few additional +1s.

@kaizencc kaizencc changed the title Combine asset publishing in one job option to publish assets in parallel Jun 1, 2022
@kaizencc kaizencc changed the title option to publish assets in parallel option to turn off publishing assets in parallel Jun 1, 2022
@moltar
Copy link
Author

moltar commented Jun 1, 2022

and recommend you not commit this folder to your repository.

Of course, but that's not what I meant.

I meant tracking the assets via artifacts in GH.

  1. Build generates cdk.out
  2. Adds it to artifact
  3. Next step downloads the cdk.out artifact
  4. Read manifest file for available assets
  5. Upload all assets in one step

@kaizencc
Copy link
Contributor

kaizencc commented Jun 2, 2022

Yep. I agree with all that. What I disagree with is keeping the manifest file "well-known", so that deploy.yml does not change. That will introduce unintended consequences imo. I think this works if the manifest file name is hashed, so deploy.yml will target a different file name whenever the contents change. You'll get your assets in one step, but deploy.yml will still change every time your assets change.

I'll think this through a bit more, but those are my thoughts right now.

@moltar
Copy link
Author

moltar commented Jun 2, 2022

I don't understand tho why manifest needs to be well known?

To my understanding, cdk can work with cdk.out alone. That's how CodePipeline via AWS works now, I think.

Once you do cdk synth and package up the cdk.out, that is all that is needed to have as the final artifact to be able to do cdk deploy.

npx cdk deploy --app "cdk.out/"

@maryal-iag
Copy link

Did this go anywhere? It would great if all the assets could be uploaded in a single job rather than multiple in parallel. We are getting pipeline which look like the following and it is spawning a separate runner for each publish...

image

@nick-kang
Copy link

To add to the above comment, GitHub charges on a per-minute granularity. Most of these Publish Asset jobs are less than a minute (some even less than 10 seconds). Unfortunately, they still get billed as 1 minute, which adds up when you have 20+ of these jobs per workflow and workflows are run multiple times a day.

It'd be useful to have the publishAssetsInParallel option to save on compute costs.

@automartin5000
Copy link

automartin5000 commented Apr 28, 2024

A bit confused what the scope of this issue is after the initial conversation, but for my part, I agree with combining the "Publish Assets". I'd prefer the reduced noise (and I suppose cost savings) over what probably amounts to not a lot of time savings. Anyways, it seems like the aws cdk should be implementing this within the CLI instead of exposing this complexity to the CI/CD implementations (maybe that should be an issue in the aws cdk repo 😅).

@moltar
Copy link
Author

moltar commented Apr 28, 2024

The CDK pipeline has the option to combine asset publishing into 2 steps already. It doesn't need any CLI changes. Just execute the CLI many times in one job.

@svyatogor
Copy link

We are deploying a serverless application from a monorepo and the number of jobs is really getting out of hand. Ability to combine the uploads into a single job, especially as this flag is already there for aws codepipelines would be a great addition.

@kichik
Copy link
Contributor

kichik commented May 1, 2024

I was also hit by this. It was so bad, I had to hack around the workflow file to combine all the assets into one job. As others have mentioned, this wasted hundreds of dollars a month. The large number of assets also made the UI unusably slow.

@nick-kang
Copy link

The CDK pipeline has the option to combine asset publishing into 2 steps already. It doesn't need any CLI changes. Just execute the CLI many times in one job.

To add to this, the 2 steps are Docker asset publishing (for ECS/EKS/Docker Lambda) and File asset publishing (for lambda).

We're also facing this issue.

@huchister
Copy link

I would like to add to this and am wondering if publishing the asset can be done within a single Docker container, instead of replicating over 50 action jobs. The current approach seems to be increasing our GitHub usage hours and cluttering the action flow UI.

@huchister
Copy link

I was also hit by this. It was so bad, I had to hack around the workflow file to combine all the assets into one job. As others have mentioned, this wasted hundreds of dollars a month. The large number of assets also made the UI unusably slow.

How were you able to hack around the workflow file to combine all asset into one job? Could you provide some insight for this hack around?

@kichik
Copy link
Contributor

kichik commented Aug 13, 2024

Adjust the OIDC step and call this nasty method on your workflow object after app.synth().

export function combineAssetJobs(pipeline: GitHubWorkflow) {
  const assetCommands: string[] = [];
  const assetOutputs: { [key: string]: string } = {};

  const workflow = parse(pipeline.workflowFile.toYaml());
  Object.entries(workflow.jobs).forEach(([id, job]: [string, any]) => {
    // asset jobs need to be combined
    if (id.startsWith('Assets-FileAsset')) {
      // remove the job
      workflow.jobs[id] = undefined;

      // add the publish step to the combined publish assets job
      const publishJobCommand = job.steps.slice(-1)[0].run.trim();
      if (!publishJobCommand.startsWith('/bin/bash ')) {
        throw new Error(`Expected publish command to start with /bin/bash\nFound: ${publishJobCommand}\nFile: ${publishJobCommand}`);
      }

      // patch and execute the publish command
      const publishScriptPath = publishJobCommand.split(' ')[1];
      assetCommands.push(`sed -i -e "s/echo 'asset-hash=/echo '${id}-asset-hash=/" ${publishScriptPath}`);
      assetCommands.push(publishJobCommand);
      assetOutputs[`${id}-asset-hash`] = `$\{{ steps.Publish.outputs.${id}-asset-hash }}`;
    }

    // deploy job needs to point to new asset job
    if (id.endsWith('-Deploy')) {
      // remove old asset jobs
      job.needs = job.needs.filter((need: string) => !need.startsWith('Assets-FileAsset'));
      // add new job
      job.needs.push('Assets');
      // fix the output
      const oldTemplate = job.steps[2].with.template;
      job.steps[2].with.template = oldTemplate.replace(/needs\.Assets-FileAsset(\d+)\.outputs\.asset-hash/g, 'needs.Assets.outputs.Assets-FileAsset$1-asset-hash');
    }
  });

  // add combined assets job
  workflow.jobs.Assets = {
    'name': 'Publish Assets',
    'needs': ['Build-Build'],
    'permissions': {
      'contents': 'read',
      'id-token': 'write',
    },
    'runs-on': 'ubuntu-latest',
    'outputs': assetOutputs,
    'steps': [
      {
        name: 'Download cdk.out',
        uses: 'actions/download-artifact@v4',
        with: {
          name: 'cdk.out',
          path: 'cdk.out',
        },
      },
      {
        name: 'Install',
        run: 'npm install --no-save cdk-assets',
      },
      {
        name: 'Authenticate Via OIDC Role',
        uses: 'aws-actions/configure-aws-credentials@v4',
        with: {
          'aws-region': 'us-east-1',
          'role-duration-seconds': 1800,
          'role-skip-session-tagging': true,
          'role-to-assume': 'arn:xxxx',
        },
      },
      {
        id: 'Publish',
        name: 'Publish Assets',
        run: assetCommands.join('\n'),
      },
    ],
  };

  // match download and upload action versions
  workflow.jobs['Build-Build'].steps.slice(-1)[0].uses = 'actions/upload-artifact@v4';

  // update workflow file
  const patchedWorkflow = '# ' + pipeline.workflowFile.commentAtTop!.split('\n').join('\n# ') + '\n' + stringify(workflow);
  fs.writeFileSync('.github/workflows/deploy.yml', patchedWorkflow);
}

@huchister
Copy link

Adjust the OIDC step and call this nasty method on your workflow object after app.synth().

export function combineAssetJobs(pipeline: GitHubWorkflow) {
  const assetCommands: string[] = [];
  const assetOutputs: { [key: string]: string } = {};

  const workflow = parse(pipeline.workflowFile.toYaml());
  Object.entries(workflow.jobs).forEach(([id, job]: [string, any]) => {
    // asset jobs need to be combined
    if (id.startsWith('Assets-FileAsset')) {
      // remove the job
      workflow.jobs[id] = undefined;

      // add the publish step to the combined publish assets job
      const publishJobCommand = job.steps.slice(-1)[0].run.trim();
      if (!publishJobCommand.startsWith('/bin/bash ')) {
        throw new Error(`Expected publish command to start with /bin/bash\nFound: ${publishJobCommand}\nFile: ${publishJobCommand}`);
      }

      // patch and execute the publish command
      const publishScriptPath = publishJobCommand.split(' ')[1];
      assetCommands.push(`sed -i -e "s/echo 'asset-hash=/echo '${id}-asset-hash=/" ${publishScriptPath}`);
      assetCommands.push(publishJobCommand);
      assetOutputs[`${id}-asset-hash`] = `$\{{ steps.Publish.outputs.${id}-asset-hash }}`;
    }

    // deploy job needs to point to new asset job
    if (id.endsWith('-Deploy')) {
      // remove old asset jobs
      job.needs = job.needs.filter((need: string) => !need.startsWith('Assets-FileAsset'));
      // add new job
      job.needs.push('Assets');
      // fix the output
      const oldTemplate = job.steps[2].with.template;
      job.steps[2].with.template = oldTemplate.replace(/needs\.Assets-FileAsset(\d+)\.outputs\.asset-hash/g, 'needs.Assets.outputs.Assets-FileAsset$1-asset-hash');
    }
  });

  // add combined assets job
  workflow.jobs.Assets = {
    'name': 'Publish Assets',
    'needs': ['Build-Build'],
    'permissions': {
      'contents': 'read',
      'id-token': 'write',
    },
    'runs-on': 'ubuntu-latest',
    'outputs': assetOutputs,
    'steps': [
      {
        name: 'Download cdk.out',
        uses: 'actions/download-artifact@v4',
        with: {
          name: 'cdk.out',
          path: 'cdk.out',
        },
      },
      {
        name: 'Install',
        run: 'npm install --no-save cdk-assets',
      },
      {
        name: 'Authenticate Via OIDC Role',
        uses: 'aws-actions/configure-aws-credentials@v4',
        with: {
          'aws-region': 'us-east-1',
          'role-duration-seconds': 1800,
          'role-skip-session-tagging': true,
          'role-to-assume': 'arn:xxxx',
        },
      },
      {
        id: 'Publish',
        name: 'Publish Assets',
        run: assetCommands.join('\n'),
      },
    ],
  };

  // match download and upload action versions
  workflow.jobs['Build-Build'].steps.slice(-1)[0].uses = 'actions/upload-artifact@v4';

  // update workflow file
  const patchedWorkflow = '# ' + pipeline.workflowFile.commentAtTop!.split('\n').join('\n# ') + '\n' + stringify(workflow);
  fs.writeFileSync('.github/workflows/deploy.yml', patchedWorkflow);
}

Superb, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants