From fa73f1e7d10148e39b689d5b64f41a2238a3f6db Mon Sep 17 00:00:00 2001 From: Rich Loveland Date: Mon, 22 Jan 2024 16:57:58 -0500 Subject: [PATCH] Add guidance on freeing up disk space more quickly Fixes: - DOC-8242 - DOC-9641 --- .../healthy-storage-capacity.md | 2 +- .../v23.2/storage/free-up-disk-space.md | 1 + src/current/v23.2/cockroach-debug-ballast.md | 2 +- src/current/v23.2/common-issues-to-monitor.md | 2 + src/current/v23.2/delete.md | 2 + src/current/v23.2/import.md | 2 +- src/current/v23.2/monitoring-and-alerting.md | 2 + src/current/v23.2/operational-faqs.md | 74 +++++++++++++++++++ .../v23.2/query-replication-reports.md | 2 + .../v23.2/recommended-production-settings.md | 4 + src/current/v23.2/restore.md | 2 +- src/current/v23.2/ui-cluster-overview-page.md | 2 + src/current/v23.2/ui-storage-dashboard.md | 2 + 13 files changed, 95 insertions(+), 4 deletions(-) create mode 100644 src/current/_includes/v23.2/storage/free-up-disk-space.md diff --git a/src/current/_includes/v23.2/prod-deployment/healthy-storage-capacity.md b/src/current/_includes/v23.2/prod-deployment/healthy-storage-capacity.md index af6253c932d..bd8c44e1a31 100644 --- a/src/current/_includes/v23.2/prod-deployment/healthy-storage-capacity.md +++ b/src/current/_includes/v23.2/prod-deployment/healthy-storage-capacity.md @@ -1 +1 @@ -**Expected values for a healthy cluster**: Used capacity should not persistently exceed 80% of the total capacity. \ No newline at end of file +**Expected values for a healthy cluster**: Used capacity should not persistently exceed 80% of the total capacity. diff --git a/src/current/_includes/v23.2/storage/free-up-disk-space.md b/src/current/_includes/v23.2/storage/free-up-disk-space.md new file mode 100644 index 00000000000..c63b70b766e --- /dev/null +++ b/src/current/_includes/v23.2/storage/free-up-disk-space.md @@ -0,0 +1 @@ +For instructions on how to free up disk space as quickly as possible after deleting data, see [How can I free up disk space quickly?]({% link {{ page.version.version }}/operational-faqs.md %}#how-can-i-free-up-disk-space-quickly) diff --git a/src/current/v23.2/cockroach-debug-ballast.md b/src/current/v23.2/cockroach-debug-ballast.md index 0a5889cc786..72411851b2f 100644 --- a/src/current/v23.2/cockroach-debug-ballast.md +++ b/src/current/v23.2/cockroach-debug-ballast.md @@ -11,7 +11,7 @@ The `cockroach debug ballast` [command]({% link {{ page.version.version }}/cockr - Do not run `cockroach debug ballast` with a unix `root` user. Doing so brings the risk of mistakenly affecting system directories or files. - `cockroach debug ballast` now refuses to overwrite the target ballast file if it already exists. This change is intended to prevent mistaken uses of the `ballast` command. Consider adding an `rm` command to scripts that integrate `cockroach debug ballast`, or provide a new file name every time and then remove the old file. -- In addition to placing a ballast file in each node's storage directory, it is important to actively [monitor remaining disk space]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#events-to-alert-on). +- In addition to placing a ballast file in each node's storage directory, it is important to actively [monitor remaining disk space]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#node-is-running-low-on-disk-space). - Ballast files may be created in many ways, including the standard `dd` command. `cockroach debug ballast` uses the `fallocate` system call when available, so it will be faster than `dd`. ## Subcommands diff --git a/src/current/v23.2/common-issues-to-monitor.md b/src/current/v23.2/common-issues-to-monitor.md index f402339097d..44553ce578f 100644 --- a/src/current/v23.2/common-issues-to-monitor.md +++ b/src/current/v23.2/common-issues-to-monitor.md @@ -281,6 +281,8 @@ CockroachDB requires disk space in order to accept writes and report node livene Ensure that you [provision sufficient storage]({% link {{ page.version.version }}/recommended-production-settings.md %}#storage). If storage is correctly provisioned and is running low, CockroachDB automatically creates an emergency ballast file that can free up space. For details, see [Disks filling up]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#disks-filling-up). {{site.data.alerts.end}} +{% include {{page.version.version}}/storage/free-up-disk-space.md %} + #### Disk IOPS Insufficient disk I/O can cause [poor SQL performance](#service-latency) and potentially [disk stalls]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#disk-stalls). diff --git a/src/current/v23.2/delete.md b/src/current/v23.2/delete.md index dcceb2d0cb5..5f6f952415d 100644 --- a/src/current/v23.2/delete.md +++ b/src/current/v23.2/delete.md @@ -58,6 +58,8 @@ the zone by setting `gc.ttlseconds` to a lower value, which will cause garbage collection to clean up deleted objects (rows, tables) more frequently. +{% include {{page.version.version}}/storage/free-up-disk-space.md %} + ## Select performance on deleted rows Queries that scan across tables that have lots of deleted rows will diff --git a/src/current/v23.2/import.md b/src/current/v23.2/import.md index 73ca9108d24..baa94313182 100644 --- a/src/current/v23.2/import.md +++ b/src/current/v23.2/import.md @@ -157,7 +157,7 @@ Imported tables are treated as new tables, so you must [`GRANT`]({% link {{ page - All nodes are used during the import job, which means all nodes' CPU and RAM will be partially consumed by the `IMPORT` task in addition to serving normal traffic. - To improve performance, import at least as many files as you have nodes (i.e., there is at least one file for each node to import) to increase parallelism. - To further improve performance, order the data in the imported files by [primary key]({% link {{ page.version.version }}/primary-key.md %}) and ensure the primary keys do not overlap between files. -- An import job will pause if a node in the cluster runs out of disk space. See [Viewing and controlling import jobs](#viewing-and-controlling-import-jobs) for information on resuming and showing the progress of import jobs. +- An import job will pause if a node in the cluster runs out of disk space. See [Viewing and controlling import jobs](#viewing-and-controlling-import-jobs) for information on resuming and showing the progress of import jobs. {% include {{page.version.version}}/storage/free-up-disk-space.md %} - An import job will [pause]({% link {{ page.version.version }}/pause-job.md %}) instead of entering a `failed` state if it continues to encounter transient errors once it has retried a maximum number of times. Once the import has paused, you can either [resume]({% link {{ page.version.version }}/resume-job.md %}) or [cancel]({% link {{ page.version.version }}/cancel-job.md %}) it. For more detail on optimizing import performance, see [Import Performance Best Practices]({% link {{ page.version.version }}/import-performance-best-practices.md %}). diff --git a/src/current/v23.2/monitoring-and-alerting.md b/src/current/v23.2/monitoring-and-alerting.md index 694f356c278..53e110372f1 100644 --- a/src/current/v23.2/monitoring-and-alerting.md +++ b/src/current/v23.2/monitoring-and-alerting.md @@ -1095,6 +1095,8 @@ Currently, not all events listed have corresponding alert rule definitions avail - **Rule definition:** Use the `StoreDiskLow` alert from our pre-defined alerting rules. +{% include {{page.version.version}}/storage/free-up-disk-space.md %} + #### Node is not executing SQL - **Rule:** Send an alert when a node is not executing SQL despite having connections. diff --git a/src/current/v23.2/operational-faqs.md b/src/current/v23.2/operational-faqs.md index 1fb39e07c53..4150914043d 100644 --- a/src/current/v23.2/operational-faqs.md +++ b/src/current/v23.2/operational-faqs.md @@ -47,6 +47,78 @@ or about 6 GiB. With on-disk compression, the actual disk usage is likely to be However, depending on your usage of time-series charts in the [DB Console]({% link {{ page.version.version }}/ui-overview-dashboard.md %}), you may prefer to reduce the amount of disk used by time-series data. To reduce the amount of time-series data stored, or to disable it altogether, refer to [Can I reduce or disable the storage of time-series data?](#can-i-reduce-or-disable-the-storage-of-time-series-data) +## Why is my disk usage not decreasing after deleting data? + +{% comment %} +The below is a lightly edited version of https://stackoverflow.com/questions/74481018/why-is-my-cockroachdb-disk-usage-not-decreasing +{% endcomment %} + +There are several reasons why disk usage may not decrease right after deleting data: + +- [The data could be preserved for MVCC history](#the-data-could-be-preserved-for-mvcc-history) +- [The data could be in the process of being compacted](#the-data-could-be-in-the-process-of-being-compacted) + +{% include {{page.version.version}}/storage/free-up-disk-space.md %} + +### The data could be preserved for MVCC history + +CockroachDB implements [Multi-Version Concurrency Control (MVCC)]({% link {{ page.version.version }}/architecture/storage-layer.md %}#mvcc), which means that it maintains a history of all mutations to a row. This history is used for a wide range of functionality: transaction isolation, historical [`AS OF SYSTEM TIME`]({% link {{ page.version.version }}/as-of-system-time.md %}) queries, [incremental backups]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}), [changefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}), [cluster replication]({% link {{ page.version.version }}/architecture/replication-layer.md %}), and so on. The requirement to preserve history means that CockroachDB "soft deletes" data: The data is marked as deleted by a tombstone record so that CockroachDB will no longer surface the deleted rows to queries, but the old data is still present on disk. + +The length of history preserved by MVCC is determined by two things: the [`gc.ttlseconds`]({% link {{ page.version.version }}/configure-replication-zones.md %}#gc-ttlseconds) of the zone that contains the data and whether any [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) exist. You can check the range's statistics to observe the `key_bytes`, `value_bytes` and `live_bytes`. The `live_bytes` metric reflects data that's not garbage. The value of (`key_bytes` + `value_bytes`) - `live_bytes` will tell you how much MVCC garbage is resident within a range. + +This information can be accessed in the following ways: + +- Using the [`SHOW RANGES`]({% link {{ page.version.version }}/show-ranges.md %}) SQL statement, which lists the above values under the names `live_bytes`, `key_bytes`, and `val_bytes`. +- In the DB Console, under [**Advanced Debug Page > Even more Advanced Debugging**]({% link {{ page.version.version }}/ui-debug-pages.md %}#even-more-advanced-debugging), click the **Range Status** link, which takes you to a page where the values are displayed in a tabular format like the following: `MVCC Live Bytes/Count | 2.5 KiB / 62 count`. + +When data has been deleted for at least the duration specified by [`gc.ttlseconds`]({% link {{ page.version.version }}/configure-replication-zones.md %}#gc-ttlseconds), CockroachDB will consider it eligible for 'garbage collection'. Asynchronously, CockroachDB will perform garbage collection of ranges that contain significant quantities of garbage and delete the garbage. Note that if there are backups or other processes that haven't completed yet but require the data, these processes may prevent the garbage collection of that data by setting a protected timestamp until these processes have completed. + +For more information about how MVCC works, see [MVCC]({% link {{ page.version.version }}/architecture/storage-layer.md %}#mvcc). + +### The data could be in the process of being compacted + +When MVCC garbage is deleted by garbage collection, the data is still not yet physically removed from the filesystem by the [Storage Layer]({% link {{ page.version.version }}/architecture/storage-layer.md %}). Removing data from the filesystem requires rewriting the files containing the data using a process also known as [compaction]({% link {{ page.version.version }}/architecture/storage-layer.md %}#compaction), which can be expensive. The storage engine has heuristics to compact data and remove deleted rows when enough garbage has accumulated to warrant a compaction. It strives to always restrict the overhead of obsolete data (called the space amplification) to at most 10%. If a lot of data was just deleted, it may take the storage engine some time to compact the files and restore this property. + +{% include {{page.version.version}}/storage/free-up-disk-space.md %} + +## How can I free up disk space quickly? + +If you've noticed that [your disk space is not freeing up quickly enough after deleting data](#why-is-my-disk-usage-not-decreasing-after-deleting-data), you can take the following steps to free up disk space more quickly. This example assumes a table `t`. + +1. Lower the [`gc.ttlseconds` parameter]({% link {{ page.version.version }}/configure-replication-zones.md %}#gc-ttlseconds) to 10 minutes. + + {% include_cached copy-clipboard.html %} + ~~~ sql + ALTER TABLE t CONFIGURE ZONE USING gc.ttlseconds = 600; + ~~~ + +1. Find the IDs of the [ranges]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-range) storing the table data using [`SHOW RANGES`]({% link {{ page.version.version }}/show-ranges.md %}): + + {% include_cached copy-clipboard.html %} + ~~~ sql + SELECT range_id FROM [SHOW RANGES FROM TABLE t]; + ~~~ + + ~~~ + range_id + ------------ + 68 + 69 + 70 + ... + ~~~ + +1. Drop the table using [`DROP TABLE`]({% link {{ page.version.version }}/drop-table.md %}): + +{% include_cached copy-clipboard.html %} +~~~ sql +DROP TABLE t; +~~~ + +1. Visit the [Advanced Debug page]({% link {{ page.version.version }}/ui-debug-pages.md %}) and click the link **Run a range through an internal queue** to visit the **Manually enqueue range in a replica queue** page. On this page, select **mvccGC** from the **Queue** dropdown and enter each of range ID from the previous step. Check the **SkipShouldQueue** checkbox to speed up the MVCC [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) process. + +1. Monitor GC progress in the DB Console by watching the [MVCC GC Queue]({% link {{ page.version.version }}/ui-queues-dashboard.md %}#mvcc-gc-queue) and the overall disk space used as shown on the [Overview Dashboard]({% link {{ page.version.version }}/ui-overview-dashboard.md %}). + ## What is the `internal-delete-old-sql-stats` process and why is it consuming my resources? When a query is executed, a process records query execution statistics on system tables. This is done by recording [SQL statement fingerprints]({% link {{ page.version.version }}/ui-statements-page.md %}). @@ -148,6 +220,8 @@ For more information about troubleshooting disk usage issues, see [storage issue In addition to using ballast files, it is important to actively [monitor remaining disk space]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#storage-capacity). {{site.data.alerts.end}} +{% include {{page.version.version}}/storage/free-up-disk-space.md %} + ## Why would increasing the number of nodes not result in more operations per second? If queries operate on different data, then increasing the number of nodes should improve the overall throughput (transactions/second or QPS). diff --git a/src/current/v23.2/query-replication-reports.md b/src/current/v23.2/query-replication-reports.md index 59c0e409c61..475624517b8 100644 --- a/src/current/v23.2/query-replication-reports.md +++ b/src/current/v23.2/query-replication-reports.md @@ -513,6 +513,8 @@ SELECT DISTINCT * FROM report; To give another example, let's say your cluster were similar to the one shown above, but configured with [tiered localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) such that you had split `us-east1` into `{region=us-east1,dc=dc1, region=us-east1,dc=dc2, region=us-east1,dc=dc3}`. In that case, you wouldn't expect any DC to be critical, because the cluster would "diversify" each range's location as much as possible across data centers. In such a situation, if you were to see a DC identified as a critical locality, you'd be surprised and you'd take some action. For example, perhaps the diversification process is failing because some localities are filled to capacity. If there is no disk space free in a locality, your cluster cannot move replicas there. +{% include {{page.version.version}}/storage/free-up-disk-space.md %} + ## See also - [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %}) diff --git a/src/current/v23.2/recommended-production-settings.md b/src/current/v23.2/recommended-production-settings.md index 88730b6cf88..24b1198cbf2 100644 --- a/src/current/v23.2/recommended-production-settings.md +++ b/src/current/v23.2/recommended-production-settings.md @@ -146,6 +146,10 @@ We recommend provisioning volumes with {% include {{ page.version.version }}/pro Under-provisioning storage leads to node crashes when the disks fill up. Once this has happened, it is difficult to recover from. To prevent your disks from filling up, provision enough storage for your workload, monitor your disk usage, and use a [ballast file]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#automatic-ballast-files). For more information, see [capacity planning issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#capacity-planning-issues) and [storage issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#storage-issues). {{site.data.alerts.end}} +{{site.data.alerts.callout_success}} +{% include {{page.version.version}}/storage/free-up-disk-space.md %} +{{site.data.alerts.end}} + ##### Disk I/O Disks must be able to achieve {% include {{ page.version.version }}/prod-deployment/provision-disk-io.md %}. diff --git a/src/current/v23.2/restore.md b/src/current/v23.2/restore.md index 6a4f9b16660..f20e4fecf48 100644 --- a/src/current/v23.2/restore.md +++ b/src/current/v23.2/restore.md @@ -254,7 +254,7 @@ CockroachDB does **not** support incremental-only restores. - The `RESTORE` process minimizes its impact to the cluster's performance by distributing work to all nodes. Subsets of the restored data (known as ranges) are evenly distributed among randomly selected nodes, with each range initially restored to only one node. Once the range is restored, the node begins replicating it others. - When a `RESTORE` fails or is canceled, partially restored data is properly cleaned up. This can have a minor, temporary impact on cluster performance. -- A restore job will pause if a node in the cluster runs out of disk space. See [Viewing and controlling restore jobs](#viewing-and-controlling-restore-jobs) for information on resuming and showing the progress of restore jobs. +- A restore job will pause if a node in the cluster runs out of disk space. See [Viewing and controlling restore jobs](#viewing-and-controlling-restore-jobs) for information on resuming and showing the progress of restore jobs. {% include {{page.version.version}}/storage/free-up-disk-space.md %} - A restore job will [pause]({% link {{ page.version.version }}/pause-job.md %}) instead of entering a `failed` state if it continues to encounter transient errors once it has retried a maximum number of times. Once the restore has paused, you can either [resume]({% link {{ page.version.version }}/resume-job.md %}) or [cancel]({% link {{ page.version.version }}/cancel-job.md %}) it. ## Restoring to multi-region databases diff --git a/src/current/v23.2/ui-cluster-overview-page.md b/src/current/v23.2/ui-cluster-overview-page.md index 2a0fe1907cc..9e2049c02cd 100644 --- a/src/current/v23.2/ui-cluster-overview-page.md +++ b/src/current/v23.2/ui-cluster-overview-page.md @@ -43,6 +43,8 @@ If a node is currently unavailable, the last-known capacity usage will be shown, {% include {{ page.version.version }}/misc/available-capacity-metric.md %} {{site.data.alerts.end}} +{% include {{page.version.version}}/storage/free-up-disk-space.md %} + ## Node List The **Node List** groups nodes by locality. The lowest-level locality tier is used to organize the Node List. Hover over a locality to see all localities for the group of nodes. diff --git a/src/current/v23.2/ui-storage-dashboard.md b/src/current/v23.2/ui-storage-dashboard.md index 1c105d63d8d..a1033b92715 100644 --- a/src/current/v23.2/ui-storage-dashboard.md +++ b/src/current/v23.2/ui-storage-dashboard.md @@ -29,6 +29,8 @@ Metric | Description {% include {{ page.version.version }}/prod-deployment/healthy-storage-capacity.md %} +{% include {{page.version.version}}/storage/free-up-disk-space.md %} + ### Capacity metrics The **Capacity** graph displays disk usage by CockroachDB data in relation to the maximum [store]({% link {{ page.version.version }}/architecture/storage-layer.md %}) size, which is determined as follows: