-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all: faster builds by caching the build cache ⚡ #41033
base: main
Are you sure you want to change the base?
Conversation
This pull request doesn't have a |
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
dev-tools/mage/crossbuild.go
Outdated
// To speed up cross-compilation, we need to persist the build cache so that subsequent builds | ||
// for the same arch are faster (⚡). | ||
// | ||
// As we want to persist the build cache, we need to mount the cache directory to the Docker host. | ||
// This is done by mounting the host directory to the container. | ||
// | ||
// Path of the cache directory on the host: | ||
// <repoInfo.RootDir>/build/.go-build/<b.Platform> | ||
// Example: <rootdir>/build/.go-build/linux/amd64 | ||
// | ||
// As per: https://docs.docker.com/engine/storage/bind-mounts/#differences-between--v-and---mount-behavior | ||
// If the directory doesn't exist, Docker does not automatically create it for you, but generates an error. | ||
// So, we need to create the directory before mounting it. | ||
// | ||
// Also, in the container, the cache directory is mounted to /root/.cache/go-build. | ||
buildCacheHostDir := filepath.Join(repoInfo.RootDir, "build", ".go-build", b.Platform) | ||
buildCacheContainerDir := "/root/.cache/go-build" | ||
if err = os.MkdirAll(buildCacheHostDir, 0755); err != nil { | ||
return fmt.Errorf("failed to create directory %s: %w", buildCacheHostDir, err) | ||
} | ||
|
||
// Common arguments | ||
args = append(args, | ||
"--rm", | ||
"--env", "GOFLAGS=-mod=readonly -buildvcs=false", | ||
"--env", "MAGEFILE_VERBOSE="+verbose, | ||
"--env", "MAGEFILE_TIMEOUT="+EnvOr("MAGEFILE_TIMEOUT", ""), | ||
"--env", fmt.Sprintf("SNAPSHOT=%v", Snapshot), | ||
"--env", "SNAPSHOT="+strconv.FormatBool(Snapshot), | ||
|
||
// To persist the build cache, we need to mount the cache directory to the Docker host. | ||
// With docker run, mount types are: bind, volume and tmpfs. For our use case, we have | ||
// decide to use the bind mount type. | ||
"--mount", fmt.Sprintf("type=bind,source=%s,target=%s", buildCacheHostDir, buildCacheContainerDir), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speed-up part by persisting build cache is here.
echo "~~~ Downloading artifacts" | ||
buildkite-agent artifact download x-pack/agentbeat/build/distributions/** . --step 'agentbeat-package-linux' | ||
ls -lah x-pack/agentbeat/build/distributions/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't why we were doing this. For the integration test, test binaries will be created which is already handled in mage goIntegTest.
// Parallelize conservatively to avoid overloading the host. | ||
if maxParallel >= 2 { | ||
return maxParallel / 2 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be discussed. I did this change i.e., n/2 (compared to ~n before) because there were cases where BK agent was lost. From the Slack threads and BK notice about "Agent lost", seems like it is happening a lot and maybe we should keep this change.
Now it is working with the change. Cross-compilations are quite expensive I think we shouldn't spawn all of them at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suspicion the BK agents being lost is related to a more infra level problem, I recall seeing some communication about this.
IMO we should use as much CPU as possible to ensure our builds are as fast as possible. The Buildkite agent's only job is to compile and test things quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your point. However, excessive parallelization can be problematic in certain scenarios. Prior to implementing this change, I frequently encountered the "Agent Lost" error. But after the change, none! So, I thought let me keep the change and discuss once.
Certainly, I can remove this change for now and revisit it later. The primary focus of this PR is to discuss the persistence of the build cache.
After playing a lot with BK, I think I'd need some help with how things work in CI. Locally, it has been working great. For example: Inside Here's what I need help with:
|
// To speed up cross-compilation, we need to persist the build cache so that subsequent builds | ||
// for the same arch are faster (⚡). | ||
// | ||
// As we want to persist the build cache, we need to mount the cache directory to the Docker host. | ||
// This is done by mounting the host directory to the container. | ||
// | ||
// Path of the cache directory on the host: | ||
// <os.TempDir>/build/.go-build/<b.Platform> | ||
// Example: /tmp/build/.go-build/linux/amd64 | ||
// Reason for using <os.TempDir> and not <repoInfo.RootDir> as base because for | ||
// builds happening on CI, the paths looks similar to: | ||
// /opt/buildkite-agent/builds/bk-agent-prod-gcp-1727515099712207954/elastic/beats-xpack-agentbeat/ | ||
// where bk-agent-prod-gcp-1727515099712207954 is the agent so it keeps changing. So even if we do cache the | ||
// build, it will be useless as the cache directory will be different for every build. | ||
// | ||
// As per: https://docs.docker.com/engine/storage/bind-mounts/#differences-between--v-and---mount-behavior | ||
// If the directory doesn't exist, Docker does not automatically create it for you, but generates an error. | ||
// So, we need to create the directory before mounting it. | ||
// | ||
// Also, in the container, the cache directory is mounted to /root/.cache/go-build. | ||
buildCacheHostDir := filepath.Join(os.TempDir(), "build", ".go-build", b.Platform) | ||
buildCacheContainerDir := "/root/.cache/go-build" | ||
|
||
if err = os.MkdirAll(buildCacheHostDir, 0755); err != nil { | ||
return fmt.Errorf("failed to create directory %s: %w", buildCacheHostDir, err) | ||
} | ||
|
||
// Common arguments | ||
args = append(args, | ||
"--rm", | ||
"--env", "GOFLAGS=-mod=readonly -buildvcs=false", | ||
"--env", "MAGEFILE_VERBOSE="+verbose, | ||
"--env", "MAGEFILE_TIMEOUT="+EnvOr("MAGEFILE_TIMEOUT", ""), | ||
"--env", fmt.Sprintf("SNAPSHOT=%v", Snapshot), | ||
"--env", "SNAPSHOT="+strconv.FormatBool(Snapshot), | ||
|
||
// To persist the build cache, we need to mount the cache directory to the Docker host. | ||
// With docker run, mount types are: bind, volume and tmpfs. For our use case, we have | ||
// decide to use the bind mount type. | ||
"--mount", fmt.Sprintf("type=bind,source=%s,target=%s", buildCacheHostDir, buildCacheContainerDir), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part where build caching is happening.
There's also
This is another option that we can use in our CI. |
Why are we making this change? The CI agents are ephemeral. So, any caching between runs is lost unless we share the cache. Are we improving a local dev workflow that we want to make re-building faster or improving CI timing? |
Yeah currently the change is working well in dev but not on CI because of the reason you've mentioned. See the 3rd and 4th point here: #41033 (comment) So I need some help to do this in CI. We can use this but that would take some more work. |
Answering each bullet
Why would we need to persist the build cache in CI, and what problem are we solving?
Again, I agree with you, and this is why I am asking you to define the problem we are solving. Overall, I see this as follows: We maintain a set of custom VM images for Beats that come pre-bundled with container images and go packages to help improve CI times. Can we improve something there? |
Thank you @alexsapran . That makes sense. I propose we proceed as follows:
Changes for points 1 and 2 are ready. I can separate these changes into separate PRs. Additionally, I'll create an issue to track the work for point 3? What do you think? |
From my side, splitting the PR is optional. Thanks in advance for raising the issue for 3. |
This pull request is now in conflicts. Could you fix it? 🙏
|
1 similar comment
This pull request is now in conflicts. Could you fix it? 🙏
|
Proposed commit message
Tested the change to build the binary of agenbeat for linux/amd64 where my docker host is darwin/arm64. After debugging, I found that most of the time is spent in the cross-compilation part during packaging. So to isolate and benchmark the change, I pulled out only the docker command that's responsible for the building of binaries:
Along with the introduction of build caching, I've made some general improvements including fixing code smells, better errors, tiny optimizations, etc.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs