GPU Pixel Testing With Gold

This page describes various extra details of the Skia Gold service that the GPU pixel tests use. For information on running the tests locally, see this section. For common information on triaging, modification, or general pixel wrangling, see GPU Pixel Wrangling or these sections (1, 2) of the general GPU testing documentation.

[TOC]

Skia Gold

Gold is an image diff service developed by the Skia team. It was originally developed solely for Skia's usage and only supported post-submit tests, but has been picked up by other projects such as Chromium and PDFium and now supports trybots. Unlike other image diff solutions in Chromium, comparisons are done in an external service instead of locally on the testing machine.

Why Gold

Gold has three main advantages over the traditional local image comparison historically used by Chromium:

Triage time can be much lower. Because triaging is handled by an external service, new golden images don't need to go through the CQ and wait for waterfall bots to pick up the CL. Once an image is triaged in Gold, it becomes immediately available for future test runs.
Gold supports multiple approved images per test. It is not uncommon for tests to produce images that are visually indistinguishable, but differ in a handful of pixels by a small RGB value. Fuzzy image diffing can solve this problem, but introduces its own set of issues such as possibly causing a test to erroneously pass. Since most tests that exhibit this behavior only actually produce 2 or 3 possible valid images, being able to say that any of those images are acceptable is simpler and less error-prone.
Better image storage. Traditionally, images had to either be included directly in the repository or uploaded to a Google Storage bucket and pulled in using the image's hash. The former allowed users to easily see which images were currently approved, but storing large sized or numerous binary files in git is generally discouraged due to the way git's history works. The latter worked around the git issues, but made it much more difficult to actually see what was being used since the only thing the user had to go on was a hash. Gold moves the images out of the repository, but provides a GUI interface for easily seeing which images are currently approved for a particular test.

How It Works

Gold consists of two main parts: the Gold instance/service and the goldctl binary. A Gold instance in turn consists of two parts: a Google Storage bucket that data is uploaded to and a server running on GCE that ingests the data and provides a way to triage diffs. goldctl simply provides a standardized way of interacting with Gold - uploading data to the correct place, retrieving baselines/golden information, etc.

In general, the following order of events occurs when running a Gold-enabled test:

The test produces an image and passes it to goldctl, along with some information about the hardware and software configuration that the image was produced on, the test name, etc.
goldctl checks whether the hash of the produced image is in the list of approved hashes.
1. If it is, goldctl exits with a non-failing return code and nothing else happens. At this point, the test is finished.
2. If it is not, goldctl uploads the image and metadata to the storage bucket and exits with a failing return code.
The server sees the new data in the bucket and ingests it, showing a new untriaged image in the GUI.
A user approves the new image in the GUI, and the server adds the image's hash to the baselines. See the Waterfall Bots and Trybots sections for specifics on this.
The next time the test is run, the new image is in the baselines, and assuming the test produces the same image again, the test passes.

While this is the general order of events, there are several differences between waterfall/CI bots and trybots.

Waterfall Bots

Waterfall bots are the simpler of the two bot types. There is only a single set of baselines to worry about, which is whatever baselines were approved for a git revision. Additionally, any new images that are produced on waterfalls are all lumped into the same group of "untriaged images on master", and any images that are approved from here will immediately be added to the set of baselines for master.

Since not all waterfall bots have a trybot counterpart that can be relied upon to catch newly produced images before a CL is committed, it is likely that a change that produces new goldens on the CQ will end up making some of the waterfall bots red for a bit, particularly those on chromium.gpu.fyi. They will remain red until the new images are triaged as positive or the tests stop producing the untriaged images. So, it is best to keep an eye out for a few hours after your CL is committed for any new images from the waterfall bots that need triaging.

Trybots

Trybots are a little more complicated when it comes to retrieving and approving images. First, the set of baselines that are provided when requested by a test is the union of the master baselines for the current revision and any baselines that are unique to the CL. For example, if an image with the hash abcd is in the master baselines for FooTest and the CL being tested has also approved an image with the hash abef for FooTest, then the provided baselines will contain both abcd and abef for FooTest.

When an image associated with a CL is approved, the approval only applies to that CL until the CL is merged. Once this happens, any baselines produced by the CL are automatically merged into the master baselines for whatever git revision the CL was merged as. In the above example, if the CL was merged as commit ffff, then both abcd and abef would be approved images on master from ffff onward.

Triaging Less Common Failures

Triaging Images Without A Specific Build

You can see all currently untriaged images that are currently being produced on ToT on the GPU Gold instance's main page and currently untriaged images for a CL by substituting the Gerrit CL number into https://chrome-gold.skia.org/search?issue=[CL Number]&unt=true&master=true.

It's possible, particularly if a test is regularly producing multiple images, for an image to be untriaged but not show up on the front page of the Gold instance (for details, see this crbug comment). To see all such images, visit this link.

Finding A Failed Build

If for some reason you know that a test run produced a bad image, but do not have a direct link to the failed build (e.g. you found a bad image using the untriaged non-ToT link from above), you may want to find the failed Swarming task to help debug the issue. Gold currently provides a list of CLs that were under test when a particular image was produced, but does not provide a link to the build that produced it, so the following workaround can be used.

Assuming the failure is relatively recent (within the past month or so), you can use the test history view to help find the failed run. To do so, search for the test name at https://ci.chromium.org/ui/search?t=TESTS and look through the history for the failed build (represented in red). Click on the group of builds and follow the link for the failing build, from which you can get to the Swarming task like normal by scrolling to the failed step and clicking on the link for the failed shard number.

Triaging A Specific Image

If for some reason an image is not showing up in Gold but you know the hash, you can manually navigate to the page for it by filling in the correct information to https://chrome-gold.skia.org/detail?test=[test_name]&digest=[hash]. From there, you should be able to triage it as normal.

If this happens, please also file a bug in Skia's bug tracker so that the root cause can be investigated and fixed. It's likely that you will be unable to directly edit the owner, CC list, etc. directly, in which case ping kjlubick@ with a link to the filed bug to help speed up triaging. Include as much detail as possible, such as a links to the failed swarming task and the triage link for the problematic image.

Inexact Matching

By default, Gold uses exact matching with support for multiple baselines per test. This works well for most of the GPU tests, but there are a handful of tests such as Pixel_CSS3DBlueBox that are prone to noise which causes them to need additional triaging at times.

For cases like this, using inexact matching can help, as it allows a comparison to pass if there are only minor differences between the produced image and a known-good image. Images that pass in this way will be automatically approved in Gold, so there is still a record of exactly what was produced.

To enable this functionality, simply add a matching_algorithm field to the PixelTestPage definition for the test (see other uses of this in the file for concrete examples).

In order to determine which values to use, you can use the script located at //content/test/gpu/gold_inexact_matching/determine_gold_inexact_parameters.py.

More complete documentation can be found in the --help output of the script, but in general:

Use the binary_search optimization algorithm if you only want to vary a single parameter, e.g. you only want to use a Sobel filter.
Use the local_minima optimization algorithm if you want to vary multiple parameters, such as using fuzzy diffing + a Sobel filter together.
The default boundaries and weights generally work and give good results, but you may need to tune them to better suit your particular test, e.g. increasing the maximum number of differing pixels if your image is large.

Working On Gold

Modifying Gold And goldctl

Although uncommon, changes to the Gold service and goldctl binary may be needed. To do so, simply get a checkout of the Skia infrastructure repo and go through the same steps as a Chromium CL (git cl upload, etc.).

The Gold service code is located in the //golden/ directory, while goldctl is located in //gold-client/. Once your change is merged, you will have to either contact kjlubick@google.com to roll the service version or follow the steps in Rolling goldctl to roll the goldctl version used by Chromium.

Rolling goldctl

goldctl is available as a CIPD package and is DEPSed in as part of gclient sync To update the binary used in Chromium, perform the following steps:

(One-time only) get an infra checkout
Run infra $ eval ``./go/env.py`` to ensure that the environment in the terminal is correct
Run infra $ cd go/src/infra
Run infra/go/src/infra $ go get go.skia.org/infra
Run infra/go/src/infra $ go mod tidy
Upload the changelist (sample CL)
Once the CL is merged, the goldctl autoroller should automatically detect it and create Chromium CLs to roll the DEPS version.

If you want to make sure that goldctl builds after the update before committing (e.g. to ensure that no extra third party dependencies were added), run the following after the go mod tidy step:

infra/go/src/infra $ rm -f "$GOBIN/goldctl" to avoid accidentally checking a stale binary at the end
infra/go/src/infra $ go install -v go.skia.org/infra/gold-client/cmd/goldctl
infra/go/src/infra $ "$GOBIN/goldctl to ensure that the binary runs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu_pixel_testing_with_gold.md

gpu_pixel_testing_with_gold.md

GPU Pixel Testing With Gold

Skia Gold

Why Gold

How It Works

Waterfall Bots

Trybots

Triaging Less Common Failures

Triaging Images Without A Specific Build

Finding A Failed Build

Triaging A Specific Image

Inexact Matching

Working On Gold

Modifying Gold And goldctl

Rolling goldctl

Files

gpu_pixel_testing_with_gold.md

Latest commit

History

gpu_pixel_testing_with_gold.md

File metadata and controls

GPU Pixel Testing With Gold

Skia Gold

Why Gold

How It Works

Waterfall Bots

Trybots

Triaging Less Common Failures

Triaging Images Without A Specific Build

Finding A Failed Build

Triaging A Specific Image

Inexact Matching

Working On Gold

Modifying Gold And goldctl

Rolling goldctl