Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3882: [Java][Build] Capture build scans on ge.apache.org to benefit from deep build insights #2546

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

clayburn
Copy link

@iemejia and @RyanSkraba, it was nice meeting you at Community over Code this weekend. This is a PR that would utilize Develocity as discussed.

What is the purpose of the change

This PR publishes a build scan for every CI build and for every local build from an authenticated Apache committer. The build will not fail if publishing fails.

The build scans of the Apache Avro project are published to the Develocity instance at ge.apache.org, hosted by the Apache Software Foundation and run in partnership between the ASF and Gradle. This Develocity instance has all features and extensions enabled and is freely available for use by the Apache Avro project and all other Apache projects.

On this Develocity instance, Apache Avro will have access not only to all of the published build scans but other aggregate data features such as:

  • Dashboards to view all historical build scans, along with performance trends over time
  • Build failure analytics for enhanced investigation and diagnosis of build failures
  • Test failure analytics to better understand trends and causes around slow, failing, and flaky tests

Please let me know if there are any questions about the value of Develocity or the changes in this pull request and I’d be happy to address them.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Documentation

  • Does this pull request introduce a new feature? no

@github-actions github-actions bot added the build label Oct 10, 2023
@KalleOlaviNiemitalo
Copy link
Contributor

I am not an authenticated Apache committer. How does this know that it should not publish my builds? Is it because I have not provisioned a Gradle Enterprise access key for ge.apache.org as instructed in https://docs.gradle.com/enterprise/maven-extension/#authenticating_with_gradle_enterprise, or does this anyway ask the server whether I am authenticated?

@clayburn
Copy link
Author

It is the former: because you have no access key, it will not attempt to authenticate. The unauthenticated contributor won't see anything different happen in their build.

@martin-g
Copy link
Member

What useful information would GE give to the developer ?
I checked several projects (e.g. Bean, Kafka and Camel) and I didn't see anything that is not already in the logs.
Since it should not affect my local builds I don't mind having this, but I doubt that anyone would check the GE results after the "excitement" period

Copy link
Member

@martin-g martin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think there is no need to upload data for the non-Java workflows, i.e. the interop jobs.

@KalleOlaviNiemitalo
Copy link
Contributor

Related apache.org links:

Comment on lines +25 to +29
<extension>
<groupId>com.gradle</groupId>
<artifactId>gradle-enterprise-maven-extension</artifactId>
<version>1.19.2</version>
</extension>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the license terms for gradle-enterprise-maven-extension, when invoked in an Avro build without an access key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://mvnrepository.com/artifact/com.gradle/gradle-enterprise-maven-extension/1.19.2 points to Gradle Terms of Service, which says it does not apply to "Develocity software", and it's not clear to me whether the extension is included in that.

(This PR adds common-custom-user-data-maven-extension as well, but https://mvnrepository.com/artifact/com.gradle/common-custom-user-data-maven-extension/1.12.4 points to Apache-2.0.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this PR cause new legal responsibilities for a person who is not an Apache committer but makes a pull request to this repository, triggering a GitHub Action that accesses the Develocity server? That action seems to be "contributing code", and the following definition in Develocity Software License Agreement would then cover the person:

“User” means a person who accesses, uses, interacts with or directs the Software in the performance of its functions and/or contributing code to a source repository that has a build that is connected with Develocity to collect build data, use the Develocity build cache or use the Develocity testing agent. For the avoidance of doubt, any person who accesses, uses or interacts with the Software for oversight or governance purposes (i.e. internal audit, external audit, or internal security risk personnel) shall not be deemed to be a “User” under this Agreement for purposes of Seat count.

Would such a "User" then also be an "end user" as referenced in clause 3.2 ("any end user of the Software shall be subject to and bound by the terms of this Agreement")?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a pull request in itself == 'committing code'. That would happen when a committer, who is already covered by their ICLA, commits the pull request

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license agreement references "contributing code", not "committing code".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. I believe there is a particular license in place for usage with ASF. Let me track that information down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there ... I was pointed to this issue as something to review and help to resolve. As the Officer in charge of Infrastructure, I'll get an FAQ developed for our communities. As @clayburn notes: there is a specific agreement that I signed with Gradle, Inc for our usage of Develocity, which is not the same as their general/public terms.

We'll get this sorted, and come back with more information.

@clayburn
Copy link
Author

Also I think there is no need to upload data for the non-Java workflows, i.e. the interop jobs.

Done, although I left if for the Java one. If it doesn't make sense there either, let me know.

What useful information would GE give to the developer ?
I checked several projects (e.g. Bean, Kafka and Camel) and I didn't see anything that is not already in the logs.
Since it should not affect my local builds I don't mind having this, but I doubt that anyone would check the GE results after the "excitement" period

Good question. Most that we work with do not like having to parse through logs, but beyond that:

  • Having the information in this format makes the information much more shareable. You can link directly to different parts of the build scan to more easily collaborate.
  • Will measure the wall clock impact of downloading dependencies (this one is not as interesting a zero seconds), plus how long downloads take
  • Can filter through dependencies. For example, these are all dependencies that resolved to a different version than requested
  • It will store the cache report, since you are using the Maven cache
  • You can compare two builds to see what is different between them

By aggregating all of this data, you can start to see trends as well through some of the performance, failure, and test dashboards. One popular use case is identifying flaky tests so that they can be acted on (for example).

Perhaps those I tagged in the PR will have more direct ideas on how this can help Avro based on what they saw. I created a build scan of the Avro project over here if you are interested in seeing the specifics for this project.

@RyanSkraba
Copy link
Contributor

Wow -- thanks for this PR, it's really educational for me! I'm currently still on a vacation after the Community over Code conference, but I'm looking forward to looking at this next week ❤️

Copy link
Contributor

@RyanSkraba RyanSkraba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey -- I've reviewed this PR and it's consistent with the instructions in the INFRA onboarding. Taking a look at the example build: it looks pretty nice, and even just having the timings for the tasks and tests is a pretty compelling feature.

I don't see any harm in enabling this, but I'll reach out on the dev@ mailing list regardless before merging, so people can see and be reassured that nothing changes for local development or on their own forks.

As a foundation-wide service, I'm assuming this was vetted as safe to merge, but I'll also wait to hear back from @gstein! (Thanks for reading so carefully @KalleOlaviNiemitalo).

<enabled>false</enabled>
</local>
<remote>
<enabled>false</enabled>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is probably out of scope for this PR but I am curious @clayburn
If we enable remote caching, will it improve our build speed even if we use maven?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although there are some ASF Infra side changes we want to make before supporting remote caching for projects in order to make sure that projects cannot read cache entries from other projects.

It may also be the case that there are some behaviors in the project that invalidate cache entries frequently. After this PR is accepted, we will put in time to investigating these things and offering a PR to fix them if they exist.

@iemejia
Copy link
Member

iemejia commented Oct 29, 2023

I am +1 with this, the changes seem not intrusive so we can 'easily' roll them back if we decide so and the advantages are there.

(Ryan just for my own reference where can we find the INFRA instructions for this tool).

<testLogging>true</testLogging>
</capture>
<backgroundBuildScanUpload>#{isFalse(env['GITHUB_ACTIONS'])}</backgroundBuildScanUpload>
<publish>ALWAYS</publish>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a way to 'easily' see/find the URL of the published build in the run CI jobs? I was expecting some change on the github actions file for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this for gradle builds via the gradle-build-action. It creates a job summary with links to the build scans. For example, you can see them in this GitHub Actions build.

We don't yet have this for Maven builds, although it is in our backlog to make something like this for Maven.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants