Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…achdb#111088 cockroachdb#111211 cockroachdb#111240 cockroachdb#111252 cockroachdb#111277

104634: cloud,backupccl: add metric for time of last backup failure due to KMS r=rhu713 a=rhu713

Add a metric `backup.last-failed-time.kms-inaccessible` that tracks the
timestamp of the last backup failure due to a KMS error. This metric is only
tracked for BACKUP statrements that are executed with the OPTION
`updates_cluster_monitoring_metrics` set.

Release note (sql change): adds the `updates_cluster_monitoring_metrics` backup
option that allows an operator to opt-in tracking the timestamp
of the last backup failure due to a KMS error.

111049: c2c: add external connection tests r=stevendanna a=msbutler

This patch increases test coverage for replication streams created with an external connection to source. Specifically, this patch adds a dedicated External Connection client test and metamorphically creates the c2c e2e tests with an external connection.

Fixes: cockroachdb#110449

Release note: None

111074: bulk: add histograms to ingestion performance stats r=stevendanna a=adityamaru

This change adds two histograms to the IngestionPerformanceStats. A histogram that tracks the batch wait, and one that tracks the SST data size. This change also adds a count for the number of ssts that are ingested as write batches.

Release note: None
Informs: cockroachdb#108374

111088: kvflowcontrol: signaling bug fix and per-stream stats r=aadityasondhi a=sumeerbhola

The signaling logic had a bug, which is demonstrated by
TestBucketSignalingBug: if an elastic waiter consumed the entry in
signalCh, a regular waiter would not get unblocked if regular tokens
became positive.

Per-stream stats are added for the duration when there were no tokens,
and the tokens deducted. These are reset every time the stats are
logged, so the caller can easily interpret the log entries as the
delta since the previous log entry. It will give us visibility into
per stream behavior at a 30s granularity.

The code abstractions are re-worked. More of the logic is abstracted
inside `bucket` and `bucketPerWorkClass` instead of exposing internals
to `Controller`.

Epic: none

Release note: None

111211: backupccl: simplify test removing unnecessary params r=arulajmani,stevendanna a=andrewbaptist

Remove the disabling of many of the jobs to simplify TestRestoreErrorPropagates as it was previously attempting to match any key to the range, now it only matches values with RESTORE in them.

Epic: none

Release note: None

111240: oidc: use relative paths in db-console r=knz a=dhartunian

Previous changes to make all DB Console paths relative overlooked the OIDC URLs.

Epic: None
Part of: cockroachdb#91429

Release note (ui change): DB Console instances proxied at different subpaths, that use OIDC will point to the correct relative path when attempting to use OIDC login.

111252: sql: use precise capacity for row container in a few places r=yuzefovich a=yuzefovich

This commit audits all callers of `newContainerValuesNode` to specify the precise capacity of the underlying row container since in all places we know upfront how many rows the valuesNode will have. (Just happened to notice it while working around one of those places.)

Epic: None

Release note: None

111277: go.mod: bump Pebble to 660450027a8c r=RaduBerinde a=jbowens

```
66045002 sharedcache: tolerate small sizes
895ffed5 db: add SkipPoint iterator option
a48ac0f4 db: use uncompensated scores to prioritize levels for compaction
```

Epic: none
Release note (performance improvement): Improved compaction heuristics mitigate read amplification growth and admission control throttling when processing large deletes (eg, during decommissions, replica rebalancing, DROP TABLEs, etc).

Co-authored-by: Rui Hu <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: adityamaru <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
  • Loading branch information
9 people committed Sep 26, 2023
9 parents 617f581 + 2ad9e79 + 370c781 + 34eb57a + 4507acc + d814d1f + d0be0f1 + 724e1f1 + b85c926 commit 23dda82
Show file tree
Hide file tree
Showing 62 changed files with 1,555 additions and 404 deletions.
6 changes: 3 additions & 3 deletions DEPS.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -1599,10 +1599,10 @@ def go_deps():
patches = [
"@com_github_cockroachdb_cockroach//build/patches:com_github_cockroachdb_pebble.patch",
],
sha256 = "f85cd2d80b88bc9a43f02879b855205de580417e3d2250b0e3db86b130d630d1",
strip_prefix = "github.com/cockroachdb/[email protected]20230922144958-86593692e09f",
sha256 = "ae597247d1467f844be951b446bb125919418c4efeb581534968f82ae5f2df17",
strip_prefix = "github.com/cockroachdb/[email protected]20230926003129-660450027a8c",
urls = [
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230922144958-86593692e09f.zip",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230926003129-660450027a8c.zip",
],
)
go_repository(
Expand Down
2 changes: 1 addition & 1 deletion build/bazelutil/distdir_files.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@ DISTDIR_FILES = {
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/go-test-teamcity/com_github_cockroachdb_go_test_teamcity-v0.0.0-20191211140407-cff980ad0a55.zip": "bac30148e525b79d004da84d16453ddd2d5cd20528e9187f1d7dac708335674b",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/gostdlib/com_github_cockroachdb_gostdlib-v1.19.0.zip": "c4d516bcfe8c07b6fc09b8a9a07a95065b36c2855627cb3514e40c98f872b69e",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/logtags/com_github_cockroachdb_logtags-v0.0.0-20230118201751-21c54148d20b.zip": "ca7776f47e5fecb4c495490a679036bfc29d95bd7625290cfdb9abb0baf97476",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230922144958-86593692e09f.zip": "f85cd2d80b88bc9a43f02879b855205de580417e3d2250b0e3db86b130d630d1",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230926003129-660450027a8c.zip": "ae597247d1467f844be951b446bb125919418c4efeb581534968f82ae5f2df17",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/redact/com_github_cockroachdb_redact-v1.1.5.zip": "11b30528eb0dafc8bc1a5ba39d81277c257cbe6946a7564402f588357c164560",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/returncheck/com_github_cockroachdb_returncheck-v0.0.0-20200612231554-92cdbca611dd.zip": "ce92ba4352deec995b1f2eecf16eba7f5d51f5aa245a1c362dfe24c83d31f82b",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/sentry-go/com_github_cockroachdb_sentry_go-v0.6.1-cockroachdb.2.zip": "fbb2207d02aecfdd411b1357efe1192dbb827959e36b7cab7491731ac55935c9",
Expand Down
2 changes: 1 addition & 1 deletion build/teamcity/cockroach/nightlies/cloud_unit_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@ source "$dir/teamcity-support.sh" # For $root
source "$dir/teamcity-bazel-support.sh" # For run_bazel

tc_start_block "Run cloud unit tests"
BAZEL_SUPPORT_EXTRA_DOCKER_ARGS="-e GITHUB_API_TOKEN -e GITHUB_REPO -e TC_BUILDTYPE_ID -e TC_BUILD_BRANCH -e TC_BUILD_ID -e TC_SERVER_URL -e GOOGLE_EPHEMERAL_CREDENTIALS -e GOOGLE_KMS_KEY_NAME -e GOOGLE_LIMITED_KEY_ID -e ASSUME_SERVICE_ACCOUNT -e GOOGLE_LIMITED_BUCKET -e ASSUME_SERVICE_ACCOUNT_CHAIN -e AWS_DEFAULT_REGION -e AWS_SHARED_CREDENTIALS_FILE -e AWS_CONFIG_FILE -e AWS_S3_BUCKET -e AWS_ASSUME_ROLE -e AWS_ROLE_ARN_CHAIN -e AWS_KMS_KEY_ARN -e AWS_S3_ENDPOINT -e AWS_KMS_ENDPOINT -e AWS_KMS_REGION -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY -e AZURE_ACCOUNT_KEY -e AZURE_ACCOUNT_NAME -e AZURE_CONTAINER -e AZURE_CLIENT_ID -e AZURE_CLIENT_SECRET -e AZURE_TENANT_ID -e AZURE_VAULT_NAME -e AZURE_KMS_KEY_NAME -e AZURE_KMS_KEY_VERSION -e BUILD_VCS_NUMBER" \
BAZEL_SUPPORT_EXTRA_DOCKER_ARGS="-e GITHUB_API_TOKEN -e GITHUB_REPO -e TC_BUILDTYPE_ID -e TC_BUILD_BRANCH -e TC_BUILD_ID -e TC_SERVER_URL -e GOOGLE_EPHEMERAL_CREDENTIALS -e GOOGLE_KMS_KEY_NAME -e GOOGLE_LIMITED_KEY_ID -e ASSUME_SERVICE_ACCOUNT -e GOOGLE_LIMITED_BUCKET -e ASSUME_SERVICE_ACCOUNT_CHAIN -e AWS_DEFAULT_REGION -e AWS_SHARED_CREDENTIALS_FILE -e AWS_CONFIG_FILE -e AWS_S3_BUCKET -e AWS_ASSUME_ROLE -e AWS_ROLE_ARN_CHAIN -e AWS_KMS_KEY_ARN -e AWS_S3_ENDPOINT -e AWS_KMS_ENDPOINT -e AWS_KMS_REGION -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY -e AZURE_ACCOUNT_KEY -e AZURE_ACCOUNT_NAME -e AZURE_CONTAINER -e AZURE_CLIENT_ID -e AZURE_CLIENT_SECRET -e AZURE_TENANT_ID -e AZURE_VAULT_NAME -e AZURE_LIMITED_VAULT_NAME -e AZURE_KMS_KEY_NAME -e AZURE_KMS_KEY_VERSION -e BUILD_VCS_NUMBER" \
run_bazel build/teamcity/cockroach/nightlies/cloud_unit_tests_impl.sh "$@"
tc_end_block "Run cloud unit tests"
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ bazel_test_env=(--test_env=GO_TEST_WRAP_TESTV=1 \
--test_env=AZURE_CLIENT_SECRET="$AZURE_CLIENT_SECRET" \
--test_env=AZURE_TENANT_ID="$AZURE_TENANT_ID" \
--test_env=AZURE_VAULT_NAME="$AZURE_VAULT_NAME" \
--test_env=AZURE_LIMITED_VAULT_NAME="$AZURE_LIMITED_VAULT_NAME" \
--test_env=AZURE_KMS_KEY_NAME="$AZURE_KMS_KEY_NAME" \
--test_env=AZURE_KMS_KEY_VERSION="$AZURE_KMS_KEY_VERSION")
exit_status=0
Expand Down
1 change: 1 addition & 0 deletions docs/generated/metrics/metrics.html
Original file line number Diff line number Diff line change
Expand Up @@ -718,6 +718,7 @@
<tr><td>STORAGE</td><td>txnwaitqueue.query.waiting</td><td>Number of transaction status queries waiting for an updated transaction record</td><td>Waiting Queries</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>valbytes</td><td>Number of bytes taken up by values</td><td>Storage</td><td>GAUGE</td><td>BYTES</td><td>AVG</td><td>NONE</td></tr>
<tr><td>STORAGE</td><td>valcount</td><td>Count of all values</td><td>MVCC Values</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>backup.last-failed-time.kms-inaccessible</td><td>The unix timestamp of the most recent failure of backup due to errKMSInaccessible by a backup specified as maintaining this metric</td><td>Jobs</td><td>GAUGE</td><td>TIMESTAMP_SEC</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>changefeed.admit_latency</td><td>Event admission latency: a difference between event MVCC timestamp and the time it was admitted into changefeed pipeline; Note: this metric includes the time spent waiting until event can be processed due to backpressure or time spent resolving schema descriptors. Also note, this metric excludes latency during backfill</td><td>Nanoseconds</td><td>HISTOGRAM</td><td>NANOSECONDS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>changefeed.aggregator_progress</td><td>The earliest timestamp up to which any aggregator is guaranteed to have emitted all values for</td><td>Unix Timestamp Nanoseconds</td><td>GAUGE</td><td>TIMESTAMP_NS</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>changefeed.backfill_count</td><td>Number of changefeeds currently executing backfill</td><td>Count</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
Expand Down
2 changes: 2 additions & 0 deletions docs/generated/sql/bnf/backup_options.bnf
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ backup_options ::=
| 'INCREMENTAL_LOCATION' '=' string_or_placeholder_opt_list
| 'EXECUTION' 'LOCALITY' '=' string_or_placeholder
| 'INCLUDE_ALL_VIRTUAL_CLUSTERS' '=' a_expr
| 'UPDATES_CLUSTER_MONITORING_METRICS'
| 'UPDATES_CLUSTER_MONITORING_METRICS' '=' a_expr
4 changes: 4 additions & 0 deletions docs/generated/sql/bnf/stmt_block.bnf
Original file line number Diff line number Diff line change
Expand Up @@ -1452,6 +1452,7 @@ unreserved_keyword ::=
| 'UNSPLIT'
| 'UNTIL'
| 'UPDATE'
| 'UPDATES_CLUSTER_MONITORING_METRICS'
| 'UPSERT'
| 'USE'
| 'USERS'
Expand Down Expand Up @@ -2324,6 +2325,8 @@ backup_options ::=
| 'INCREMENTAL_LOCATION' '=' string_or_placeholder_opt_list
| 'EXECUTION' 'LOCALITY' '=' string_or_placeholder
| include_all_clusters '=' a_expr
| 'UPDATES_CLUSTER_MONITORING_METRICS'
| 'UPDATES_CLUSTER_MONITORING_METRICS' '=' a_expr

c_expr ::=
d_expr
Expand Down Expand Up @@ -4054,6 +4057,7 @@ bare_label_keywords ::=
| 'UNSPLIT'
| 'UNTIL'
| 'UPDATE'
| 'UPDATES_CLUSTER_MONITORING_METRICS'
| 'UPSERT'
| 'USE'
| 'USER'
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ require (
github.com/cockroachdb/go-test-teamcity v0.0.0-20191211140407-cff980ad0a55
github.com/cockroachdb/gostdlib v1.19.0
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b
github.com/cockroachdb/pebble v0.0.0-20230922144958-86593692e09f
github.com/cockroachdb/pebble v0.0.0-20230926003129-660450027a8c
github.com/cockroachdb/redact v1.1.5
github.com/cockroachdb/returncheck v0.0.0-20200612231554-92cdbca611dd
github.com/cockroachdb/stress v0.0.0-20220803192808-1806698b1b7b
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -489,8 +489,8 @@ github.com/cockroachdb/gostdlib v1.19.0/go.mod h1:+dqqpARXbE/gRDEhCak6dm0l14AaTy
github.com/cockroachdb/logtags v0.0.0-20211118104740-dabe8e521a4f/go.mod h1:Vz9DsVWQQhf3vs21MhPMZpMGSht7O/2vFW2xusFUVOs=
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b h1:r6VH0faHjZeQy818SGhaone5OnYfxFR/+AzdY3sf5aE=
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b/go.mod h1:Vz9DsVWQQhf3vs21MhPMZpMGSht7O/2vFW2xusFUVOs=
github.com/cockroachdb/pebble v0.0.0-20230922144958-86593692e09f h1:yD41mae3ywx2xyQm7RhdHBX2EXoDGWxdpylFddICz1M=
github.com/cockroachdb/pebble v0.0.0-20230922144958-86593692e09f/go.mod h1:nindLFinxeDPjP4qI9LtVHAwDef57/0s5KMfWgdktQc=
github.com/cockroachdb/pebble v0.0.0-20230926003129-660450027a8c h1:Iuggsg0vBsoRPicjCofNRpw72hmpQ3jv54GaiKk4gKA=
github.com/cockroachdb/pebble v0.0.0-20230926003129-660450027a8c/go.mod h1:nindLFinxeDPjP4qI9LtVHAwDef57/0s5KMfWgdktQc=
github.com/cockroachdb/redact v1.1.3/go.mod h1:BVNblN9mBWFyMyqK1k3AAiSxhvhfK2oOZZ2lK+dpvRg=
github.com/cockroachdb/redact v1.1.5 h1:u1PMllDkdFfPWaNGMyLD1+so+aq3uUItthCFqzwPJ30=
github.com/cockroachdb/redact v1.1.5/go.mod h1:BVNblN9mBWFyMyqK1k3AAiSxhvhfK2oOZZ2lK+dpvRg=
Expand Down
2 changes: 2 additions & 0 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,7 @@ ALL_TESTS = [
"//pkg/jobs:jobs_test",
"//pkg/keys:keys_test",
"//pkg/keyvisualizer/spanstatsconsumer:spanstatsconsumer_test",
"//pkg/kv/bulk/bulkpb:bulkpb_test",
"//pkg/kv/bulk:bulk_test",
"//pkg/kv/kvclient/kvcoord:kvcoord_disallowed_imports_test",
"//pkg/kv/kvclient/kvcoord:kvcoord_test",
Expand Down Expand Up @@ -1269,6 +1270,7 @@ GO_TARGETS = [
"//pkg/keyvisualizer/spanstatskvaccessor:spanstatskvaccessor",
"//pkg/keyvisualizer:keyvisualizer",
"//pkg/kv/bulk/bulkpb:bulkpb",
"//pkg/kv/bulk/bulkpb:bulkpb_test",
"//pkg/kv/bulk:bulk",
"//pkg/kv/bulk:bulk_test",
"//pkg/kv/kvbase:kvbase",
Expand Down
3 changes: 1 addition & 2 deletions pkg/ccl/backupccl/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ go_library(
"alter_backup_planning.go",
"alter_backup_schedule.go",
"backup_job.go",
"backup_metrics.go",
"backup_planning.go",
"backup_planning_tenant.go",
"backup_processor.go",
Expand Down Expand Up @@ -239,7 +240,6 @@ go_test(
"//pkg/jobs/jobsprotectedts",
"//pkg/jobs/jobstest",
"//pkg/keys",
"//pkg/keyvisualizer",
"//pkg/kv",
"//pkg/kv/bulk",
"//pkg/kv/kvclient/kvcoord",
Expand Down Expand Up @@ -288,7 +288,6 @@ go_test(
"//pkg/sql/sem/eval",
"//pkg/sql/sem/tree",
"//pkg/sql/sessiondata",
"//pkg/sql/sqlstats",
"//pkg/sql/stats",
"//pkg/storage",
"//pkg/testutils",
Expand Down
3 changes: 3 additions & 0 deletions pkg/ccl/backupccl/alter_backup_schedule.go
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,9 @@ func processOptionsForArgs(inOpts tree.BackupOptions, outOpts *tree.BackupOption
outOpts.IncrementalStorage = inOpts.IncrementalStorage
}
}
if inOpts.UpdatesClusterMonitoringMetrics != nil {
outOpts.UpdatesClusterMonitoringMetrics = inOpts.UpdatesClusterMonitoringMetrics
}
return nil
}

Expand Down
50 changes: 50 additions & 0 deletions pkg/ccl/backupccl/backup_cloud_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,13 @@ import (
"github.com/cockroachdb/cockroach/pkg/cloud"
"github.com/cockroachdb/cockroach/pkg/cloud/amazon"
"github.com/cockroachdb/cockroach/pkg/cloud/azure"
"github.com/cockroachdb/cockroach/pkg/jobs"
"github.com/cockroachdb/cockroach/pkg/testutils/skip"
"github.com/cockroachdb/cockroach/pkg/testutils/sqlutils"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
"github.com/stretchr/testify/require"
)

// The tests in this file talk to remote APIs which require credentials.
Expand Down Expand Up @@ -230,3 +232,51 @@ func TestCloudBackupRestoreAzure(t *testing.T) {
}
}
}

// TestCloudBackupRestoreKMSInaccessibleMetric tests that backup statements
// updates the KMSInaccessibleError metric.
func TestCloudBackupRestoreKMSInaccessibleMetric(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

azureVaultName := os.Getenv("AZURE_VAULT_NAME")
if azureVaultName == "" {
skip.IgnoreLintf(t, "AZURE_VAULT_NAME env var must be set")
}

for _, tt := range []struct {
name string
uri string
}{
{
name: "s3",
uri: "aws-kms:///non-existent-key?AUTH=implicit&REGION=us-east-1",
},
{
name: "gcs",
uri: "gcp-kms:///non-existent-key?AUTH=implicit",
},
{
name: "azure",
uri: fmt.Sprintf("azure-kms:///non-existent-key/000?AUTH=implicit&AZURE_VAULT_NAME=%s", azureVaultName),
},
} {
tc, sqlDB, _, cleanupFn := backupRestoreTestSetup(t, 1, 1, InitManualReplication)
defer cleanupFn()

t.Run(tt.name, func(t *testing.T) {
testStart := timeutil.Now().Unix()
bm := tc.Server(0).JobRegistry().(*jobs.Registry).MetricsStruct().Backup.(*BackupMetrics)

// LastKMSInaccessibleErrorTime should not be set without
// updates_cluster_monitoring_metrics despite a KMS error.
sqlDB.ExpectErr(t, "failed to encrypt data key", "BACKUP INTO 'userfile:///foo' WITH OPTIONS (kms = $1)", tt.uri)
require.Equal(t, bm.LastKMSInaccessibleErrorTime.Value(), int64(0))

// LastKMSInaccessibleErrorTime should be set with
// updates_cluster_monitoring_metrics.
sqlDB.ExpectErr(t, "failed to encrypt data key", "BACKUP INTO 'userfile:///foo' WITH OPTIONS (kms = $1, updates_cluster_monitoring_metrics)", tt.uri)
require.GreaterOrEqual(t, bm.LastKMSInaccessibleErrorTime.Value(), testStart)
})
}
}
17 changes: 13 additions & 4 deletions pkg/ccl/backupccl/backup_job.go
Original file line number Diff line number Diff line change
Expand Up @@ -1204,9 +1204,7 @@ func (b *backupResumer) maybeNotifyScheduledJobCompletion(
}

scheduleID := int64(tree.MustBeDInt(datums[0]))
if err := jobs.NotifyJobTermination(
ctx, txn, env, b.job.ID(), jobStatus, b.job.Details(), scheduleID,
); err != nil {
if err := jobs.NotifyJobTermination(ctx, txn, env, b.job.ID(), jobStatus, b.job.Details(), scheduleID); err != nil {
return errors.Wrapf(err,
"failed to notify schedule %d of completion of job %d", scheduleID, b.job.ID())
}
Expand All @@ -1226,15 +1224,26 @@ func (b *backupResumer) OnFailOrCancel(

p := execCtx.(sql.JobExecContext)
cfg := p.ExecCfg()
details := b.job.Details().(jobspb.BackupDetails)

b.deleteCheckpoint(ctx, cfg, p.User())
if err := cfg.InternalDB.Txn(ctx, func(ctx context.Context, txn isql.Txn) error {
details := b.job.Details().(jobspb.BackupDetails)
pts := cfg.ProtectedTimestampProvider.WithTxn(txn)
return releaseProtectedTimestamp(ctx, pts, details.ProtectedTimestampRecord)
}); err != nil {
return err
}

// If this backup should update cluster monitoring metrics, update the
// metrics.
if details.UpdatesClusterMonitoringMetrics {
metrics := p.ExecCfg().JobRegistry.MetricsStruct().Backup.(*BackupMetrics)
if cloud.IsKMSInaccessible(jobErr) {
now := timeutil.Now()
metrics.LastKMSInaccessibleErrorTime.Update(now.Unix())
}
}

// This should never return an error unless resolving the schedule that the
// job is being run under fails. This could happen if the schedule is dropped
// while the job is executing.
Expand Down
40 changes: 40 additions & 0 deletions pkg/ccl/backupccl/backup_metrics.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
// Copyright 2023 The Cockroach Authors.
//
// Licensed as a CockroachDB Enterprise file under the Cockroach Community
// License (the "License"); you may not use this file except in compliance with
// the License. You may obtain a copy of the License at
//
// https://github.com/cockroachdb/cockroach/blob/master/licenses/CCL.txt

package backupccl

import (
"time"

"github.com/cockroachdb/cockroach/pkg/jobs"
"github.com/cockroachdb/cockroach/pkg/util/metric"
)

type BackupMetrics struct {
LastKMSInaccessibleErrorTime *metric.Gauge
}

// MetricStruct implements the metric.Struct interface.
func (b BackupMetrics) MetricStruct() {}

// MakeBackupMetrics instantiates the metrics for backup.
func MakeBackupMetrics(time.Duration) metric.Struct {
m := &BackupMetrics{
LastKMSInaccessibleErrorTime: metric.NewGauge(metric.Metadata{
Name: "backup.last-failed-time.kms-inaccessible",
Help: "The unix timestamp of the most recent failure of backup due to errKMSInaccessible by a backup specified as maintaining this metric",
Measurement: "Jobs",
Unit: metric.Unit_TIMESTAMP_SEC,
}),
}
return m
}

func init() {
jobs.MakeBackupMetricsHook = MakeBackupMetrics
}
43 changes: 28 additions & 15 deletions pkg/ccl/backupccl/backup_planning.go
Original file line number Diff line number Diff line change
Expand Up @@ -196,9 +196,10 @@ func resolveOptionsForBackupJobDescription(
}

newOpts := tree.BackupOptions{
CaptureRevisionHistory: opts.CaptureRevisionHistory,
Detached: opts.Detached,
ExecutionLocality: opts.ExecutionLocality,
CaptureRevisionHistory: opts.CaptureRevisionHistory,
Detached: opts.Detached,
ExecutionLocality: opts.ExecutionLocality,
UpdatesClusterMonitoringMetrics: opts.UpdatesClusterMonitoringMetrics,
}

if opts.EncryptionPassphrase != nil {
Expand Down Expand Up @@ -532,6 +533,7 @@ func backupTypeCheck(
exprutil.Bools{
backupStmt.Options.CaptureRevisionHistory,
backupStmt.Options.IncludeAllSecondaryTenants,
backupStmt.Options.UpdatesClusterMonitoringMetrics,
}); err != nil {
return false, nil, err
}
Expand Down Expand Up @@ -658,6 +660,16 @@ func backupPlanHook(
}
}

var updatesClusterMonitoringMetrics bool
if backupStmt.Options.UpdatesClusterMonitoringMetrics != nil {
updatesClusterMonitoringMetrics, err = exprEval.Bool(
ctx, backupStmt.Options.UpdatesClusterMonitoringMetrics,
)
if err != nil {
return nil, nil, nil, false, err
}
}

fn := func(ctx context.Context, _ []sql.PlanNode, resultsCh chan<- tree.Datums) error {
// TODO(dan): Move this span into sql.
ctx, span := tracing.ChildSpan(ctx, stmt.StatementTag())
Expand Down Expand Up @@ -751,18 +763,19 @@ func backupPlanHook(
}

initialDetails := jobspb.BackupDetails{
Destination: jobspb.BackupDetails_Destination{To: to, IncrementalStorage: incrementalStorage},
EndTime: endTime,
RevisionHistory: revisionHistory,
IncludeAllSecondaryTenants: includeAllSecondaryTenants,
IncrementalFrom: incrementalFrom,
FullCluster: backupStmt.Coverage() == tree.AllDescriptors,
ResolvedCompleteDbs: completeDBs,
EncryptionOptions: &encryptionParams,
AsOfInterval: asOfInterval,
Detached: detached,
ApplicationName: p.SessionData().ApplicationName,
ExecutionLocality: executionLocality,
Destination: jobspb.BackupDetails_Destination{To: to, IncrementalStorage: incrementalStorage},
EndTime: endTime,
RevisionHistory: revisionHistory,
IncludeAllSecondaryTenants: includeAllSecondaryTenants,
IncrementalFrom: incrementalFrom,
FullCluster: backupStmt.Coverage() == tree.AllDescriptors,
ResolvedCompleteDbs: completeDBs,
EncryptionOptions: &encryptionParams,
AsOfInterval: asOfInterval,
Detached: detached,
ApplicationName: p.SessionData().ApplicationName,
ExecutionLocality: executionLocality,
UpdatesClusterMonitoringMetrics: updatesClusterMonitoringMetrics,
}
if backupStmt.CreatedByInfo != nil && backupStmt.CreatedByInfo.Name == jobs.CreatedByScheduledJobs {
initialDetails.ScheduleID = backupStmt.CreatedByInfo.ID
Expand Down
Loading

0 comments on commit 23dda82

Please sign in to comment.