This page describes various extra details of the Skia Gold service that the GPU pixel tests use. For information on running the tests locally, see this section. For common information on triaging, modification, or general pixel wrangling, see GPU Pixel Wrangling or these sections (1, 2) of the general GPU testing documentation.
[TOC]
Gold is an image diff service developed by the Skia team. It was originally developed solely for Skia's usage and only supported post-submit tests, but has been picked up by other projects such as Chromium and PDFium and now supports trybots. Unlike other image diff solutions in Chromium, comparisons are done in an external service instead of locally on the testing machine.
Gold has three main advantages over the traditional local image comparison historically used by Chromium:
- Triage time can be much lower. Because triaging is handled by an external service, new golden images don't need to go through the CQ and wait for waterfall bots to pick up the CL. Once an image is triaged in Gold, it becomes immediately available for future test runs.
- Gold supports multiple approved images per test. It is not uncommon for tests to produce images that are visually indistinguishable, but differ in a handful of pixels by a small RGB value. Fuzzy image diffing can solve this problem, but introduces its own set of issues such as possibly causing a test to erroneously pass. Since most tests that exhibit this behavior only actually produce 2 or 3 possible valid images, being able to say that any of those images are acceptable is simpler and less error-prone.
- Better image storage. Traditionally, images had to either be included directly in the repository or uploaded to a Google Storage bucket and pulled in using the image's hash. The former allowed users to easily see which images were currently approved, but storing large sized or numerous binary files in git is generally discouraged due to the way git's history works. The latter worked around the git issues, but made it much more difficult to actually see what was being used since the only thing the user had to go on was a hash. Gold moves the images out of the repository, but provides a GUI interface for easily seeing which images are currently approved for a particular test.
Gold consists of two main parts: the Gold instance/service and the goldctl
binary. A Gold instance in turn consists of two parts: a Google Storage bucket
that data is uploaded to and a server running on GCE that ingests the data and
provides a way to triage diffs. goldctl
simply provides a standardized way
of interacting with Gold - uploading data to the correct place, retrieving
baselines/golden information, etc.
In general, the following order of events occurs when running a Gold-enabled test:
- The test produces an image and passes it to
goldctl
, along with some information about the hardware and software configuration that the image was produced on, the test name, etc. goldctl
checks whether the hash of the produced image is in the list of approved hashes.- If it is,
goldctl
exits with a non-failing return code and nothing else happens. At this point, the test is finished. - If it is not,
goldctl
uploads the image and metadata to the storage bucket and exits with a failing return code.
- If it is,
- The server sees the new data in the bucket and ingests it, showing a new untriaged image in the GUI.
- A user approves the new image in the GUI, and the server adds the image's hash to the baselines. See the Waterfall Bots and Trybots sections for specifics on this.
- The next time the test is run, the new image is in the baselines, and assuming the test produces the same image again, the test passes.
While this is the general order of events, there are several differences between waterfall/CI bots and trybots.
Waterfall bots are the simpler of the two bot types. There is only a single set of baselines to worry about, which is whatever baselines were approved for a git revision. Additionally, any new images that are produced on waterfalls are all lumped into the same group of "untriaged images on master", and any images that are approved from here will immediately be added to the set of baselines for master.
Since not all waterfall bots have a trybot counterpart that can be relied upon to catch newly produced images before a CL is committed, it is likely that a change that produces new goldens on the CQ will end up making some of the waterfall bots red for a bit, particularly those on chromium.gpu.fyi. They will remain red until the new images are triaged as positive or the tests stop producing the untriaged images. So, it is best to keep an eye out for a few hours after your CL is committed for any new images from the waterfall bots that need triaging.
Trybots are a little more complicated when it comes to retrieving and approving
images. First, the set of baselines that are provided when requested by a test
is the union of the master baselines for the current revision and any baselines
that are unique to the CL. For example, if an image with the hash abcd
is in
the master baselines for FooTest
and the CL being tested has also approved
an image with the hash abef
for FooTest
, then the provided baselines will
contain both abcd
and abef
for FooTest
.
When an image associated with a CL is approved, the approval only applies to
that CL until the CL is merged. Once this happens, any baselines produced by the
CL are automatically merged into the master baselines for whatever git revision
the CL was merged as. In the above example, if the CL was merged as commit
ffff
, then both abcd
and abef
would be approved images on master from
ffff
onward.
You can see all currently untriaged images that are currently being produced on
ToT on the GPU Gold instance's main page and currently
untriaged images for a CL by substituting the Gerrit CL number into
https://chrome-gold.skia.org/search?issue=[CL Number]&unt=true&master=true
.
It's possible, particularly if a test is regularly producing multiple images, for an image to be untriaged but not show up on the front page of the Gold instance (for details, see this crbug comment). To see all such images, visit this link.
If for some reason you know that a test run produced a bad image, but do not have a direct link to the failed build (e.g. you found a bad image using the untriaged non-ToT link from above), you may want to find the failed Swarming task to help debug the issue. Gold currently provides a list of CLs that were under test when a particular image was produced, but does not provide a link to the build that produced it, so the following workaround can be used.
Assuming the failure is relatively recent (within the past month or so), you
can use the test history view to help find the failed run. To do so, search for
the test name at https://ci.chromium.org/ui/search?t=TESTS
and look through
the history for the failed build (represented in red). Click on the group of
builds and follow the link for the failing build, from which you can get to the
Swarming task like normal by scrolling to the failed step and clicking on the
link for the failed shard number.
If for some reason an image is not showing up in Gold but you know the hash, you
can manually navigate to the page for it by filling in the correct information
to https://chrome-gold.skia.org/detail?test=[test_name]&digest=[hash]
.
From there, you should be able to triage it as normal.
If this happens, please also file a bug in Skia's bug tracker so that the root cause can be investigated and fixed. It's likely that you will be unable to directly edit the owner, CC list, etc. directly, in which case ping kjlubick@ with a link to the filed bug to help speed up triaging. Include as much detail as possible, such as a links to the failed swarming task and the triage link for the problematic image.
By default, Gold uses exact matching with support for multiple baselines per
test. This works well for most of the GPU tests, but there are a handful of
tests such as Pixel_CSS3DBlueBox
that are prone to noise which causes them to
need additional triaging at times.
For cases like this, using inexact matching can help, as it allows a comparison to pass if there are only minor differences between the produced image and a known-good image. Images that pass in this way will be automatically approved in Gold, so there is still a record of exactly what was produced.
To enable this functionality, simply add a matching_algorithm
field to the
PixelTestPage
definition for the test (see other uses of this in the file for
concrete examples).
In order to determine which values to use, you can use the script located at
//content/test/gpu/gold_inexact_matching/determine_gold_inexact_parameters.py
.
More complete documentation can be found in the --help
output of the script,
but in general:
- Use the
binary_search
optimization algorithm if you only want to vary a single parameter, e.g. you only want to use a Sobel filter. - Use the
local_minima
optimization algorithm if you want to vary multiple parameters, such as using fuzzy diffing + a Sobel filter together. - The default boundaries and weights generally work and give good results, but you may need to tune them to better suit your particular test, e.g. increasing the maximum number of differing pixels if your image is large.
Although uncommon, changes to the Gold service and goldctl
binary may be
needed. To do so, simply get a checkout of the
Skia infrastructure repo and go through the same steps as
a Chromium CL (git cl upload
, etc.).
The Gold service code is located in the //golden/
directory, while goldctl
is located in //gold-client/
. Once your change is merged, you will have to
either contact [email protected] to roll the service version or follow the
steps in Rolling goldctl to roll the goldctl
version used
by Chromium.
goldctl
is available as a CIPD package and is DEPSed in as part of gclient sync
To update the binary used in Chromium, perform the following steps:
- (One-time only) get an infra checkout
- Run
infra $ eval ``./go/env.py``
to ensure that the environment in the terminal is correct - Run
infra $ cd go/src/infra
- Run
infra/go/src/infra $ go get go.skia.org/infra
- Run
infra/go/src/infra $ go mod tidy
- Upload the changelist (sample CL)
- Once the CL is merged, the goldctl autoroller should automatically detect it and create Chromium CLs to roll the DEPS version.
If you want to make sure that goldctl
builds after the update before
committing (e.g. to ensure that no extra third party dependencies were added),
run the following after the go mod tidy
step:
infra/go/src/infra $ rm -f "$GOBIN/goldctl"
to avoid accidentally checking a stale binary at the endinfra/go/src/infra $ go install -v go.skia.org/infra/gold-client/cmd/goldctl
infra/go/src/infra $ "$GOBIN/goldctl
to ensure that the binary runs