docs: query from deep storage (apache#14609)

* cold tier wip * wip * copyedits * wip * copyedits * copyedits * wip * wip * update rules page * typo * typo * update sidebar * moves durable storage info to its own page in operations * update screenshots * add apache license * Apply suggestions from code review Co-authored-by: Victoria Lim <[email protected]> * add query from deep storage tutorial stub * address some of the feedback * revert screenshot update. handled in separate pr * load rule update * wip tutorial * reformat deep storage endpoints * rest of tutorial * typo * cleanup * screenshot and sidebar for tutorial * add license * typos * Apply suggestions from code review Co-authored-by: Victoria Lim <[email protected]> * rest of review comments * clarify where results are stored * update api reference for durablestorage context param * Apply suggestions from code review Co-authored-by: Karan Kumar <[email protected]> * comments * incorporate apache#14720 * address rest of comments * missed one * Update docs/api-reference/sql-api.md * Update docs/api-reference/sql-api.md --------- Co-authored-by: Victoria Lim <[email protected]> Co-authored-by: demo-kratia <[email protected]> Co-authored-by: Karan Kumar <[email protected]>
kfaraz · Aug 4, 2023 · 3b5b6c6 · 3b5b6c6
1 parent d31c04c
commit 3b5b6c6
Show file tree

Hide file tree

Showing 10 changed files with 1,453 additions and 61 deletions.
diff --git a/docs/api-reference/sql-api.md b/docs/api-reference/sql-api.md
diff --git a/docs/assets/tutorial-query-deepstorage-retention-rule.png b/docs/assets/tutorial-query-deepstorage-retention-rule.png
diff --git a/docs/design/architecture.md b/docs/design/architecture.md
@@ -70,12 +70,20 @@ Druid uses deep storage to store any data that has been ingested into the system
 storage accessible by every Druid server. In a clustered deployment, this is typically a distributed object store like S3 or
 HDFS, or a network mounted filesystem. In a single-server deployment, this is typically local disk.
 
-Druid uses deep storage only as a backup of your data and as a way to transfer data in the background between
-Druid processes. Druid stores data in files called _segments_. Historical processes cache data segments on
-local disk and serve queries from that cache as well as from an in-memory cache.
-This means that Druid never needs to access deep storage
-during a query, helping it offer the best query latencies possible. It also means that you must have enough disk space
-both in deep storage and across your Historical servers for the data you plan to load.
+Druid uses deep storage for the following purposes:
+
+- To store all the data you ingest. Segments that get loaded onto Historical processes for low latency queries are also kept in deep storage for backup purposes. Additionally, segments that are only in deep storage can be used for [queries from deep storage](../querying/query-from-deep-storage.md).
+- As a way to transfer data in the background between Druid processes. Druid stores data in files called _segments_.
+
+Historical processes cache data segments on local disk and serve queries from that cache as well as from an in-memory cache.
+Segments on disk for Historical processes provide the low latency querying performance Druid is known for.
+
+You can also query directly from deep storage. When you query segments that exist only in deep storage, you trade some performance  for the ability to query more of your data without necessarily having to scale your Historical processes.
+
+When determining sizing for your storage, keep the following in mind:
+
+- Deep storage needs to be able to hold all the data that you ingest into Druid.
+- On disk storage for Historical processes need to be able to accommodate the data you want to load onto them to run queries. The data on Historical processes should be data you access frequently and need to run low latency queries for. 
 
 Deep storage is an important part of Druid's elastic, fault-tolerant design. Druid bootstraps from deep storage even
 if every single data server is lost and re-provisioned.
@@ -210,8 +218,7 @@ available before they are published, since they are only published when the segm
 any additional rows of data.
 2. **Deep storage:** Segment data files are pushed to deep storage once a segment is done being constructed. This
 happens immediately before publishing metadata to the metadata store.
-3. **Availability for querying:** Segments are available for querying on some Druid data server, like a realtime task
-or a Historical process.
+3. **Availability for querying:** Segments are available for querying on some Druid data server, like a realtime task, directly from deep storage, or a Historical process.
 
 You can inspect the state of currently active segments using the Druid SQL
 [`sys.segments` table](../querying/sql-metadata-tables.md#segments-table). It includes the following flags:

diff --git a/docs/design/deep-storage.md b/docs/design/deep-storage.md
@@ -23,9 +23,15 @@ title: "Deep storage"
   -->
 
 
-Deep storage is where segments are stored.  It is a storage mechanism that Apache Druid does not provide.  This deep storage infrastructure defines the level of durability of your data, as long as Druid processes can see this storage infrastructure and get at the segments stored on it, you will not lose data no matter how many Druid nodes you lose.  If segments disappear from this storage layer, then you will lose whatever data those segments represented.
+Deep storage is where segments are stored.  It is a storage mechanism that Apache Druid does not provide.  This deep storage infrastructure defines the level of durability of your data. As long as Druid processes can see this storage infrastructure and get at the segments stored on it, you will not lose data no matter how many Druid nodes you lose.  If segments disappear from this storage layer, then you will lose whatever data those segments represented.
 
-## Local
+In addition to being the backing store for segments, you can use [query from deep storage](#querying-from-deep-storage) and run queries against segments stored primarily in deep storage. The [load rules](../operations/rule-configuration.md#load-rules) you configure determine whether segments exist primarily in deep storage or in a combination of deep storage and Historical processes.
+
+## Deep storage options
+
+Druid supports multiple options for deep storage, including blob storage from major cloud providers. Select the one that fits your environment.
+
+### Local
 
 Local storage is intended for use in the following situations:
 
@@ -55,22 +61,28 @@ druid.storage.storageDirectory=/tmp/druid/localStorage
 The `druid.storage.storageDirectory` must be set to a different path than `druid.segmentCache.locations` or
 `druid.segmentCache.infoDir`.
 
-## Amazon S3 or S3-compatible
+### Amazon S3 or S3-compatible
 
 See [`druid-s3-extensions`](../development/extensions-core/s3.md).
 
-## Google Cloud Storage
+### Google Cloud Storage
 
 See [`druid-google-extensions`](../development/extensions-core/google.md).
 
-## Azure Blob Storage
+### Azure Blob Storage
 
 See [`druid-azure-extensions`](../development/extensions-core/azure.md).
 
-## HDFS
+### HDFS
 
 See [druid-hdfs-storage extension documentation](../development/extensions-core/hdfs.md).
 
-## Additional options
+### Additional options
 
 For additional deep storage options, please see our [extensions list](../configuration/extensions.md).
+
+## Querying from deep storage
+
+Although not as performant as querying segments stored on disk for Historical processes, you can query from deep storage to access segments that you may not need frequently or with the extreme low latency Druid queries traditionally provide. You trade some performance for a total lower storage cost because you can access more of your data without the need to increase the number or capacity of your Historical processes.
+
+For information about how to run queries, see [Query from deep storage](../querying/query-from-deep-storage.md).
diff --git a/docs/multi-stage-query/reference.md b/docs/multi-stage-query/reference.md
@@ -343,59 +343,42 @@ CLUSTERED BY user
 
 The context parameter that sets `sqlJoinAlgorithm` to `sortMerge` is not shown in the above example.
 
-## Durable Storage
+## Durable storage
 
-Using durable storage with your SQL-based ingestion can improve their reliability by writing intermediate files to a storage location temporarily. 
+SQL-based ingestion supports using durable storage to store intermediate files temporarily. Enabling it can improve reliability. For more information, see [Durable storage](../operations/durable-storage.md).
 
-To prevent durable storage from getting filled up with temporary files in case the tasks fail to clean them up, a periodic
-cleaner can be scheduled to clean the directories corresponding to which there isn't a controller task running. It utilizes
-the storage connector to work upon the durable storage. The durable storage location should only be utilized to store the output
-for cluster's MSQ tasks. If the location contains other files or directories, then they will get cleaned up as well.
-
-Enabling durable storage also enables the use of local disk to store temporary files, such as the intermediate files produced
-by the super sorter.  Tasks will use whatever has been configured for their temporary usage as described in [Configuring task storage sizes](../ingestion/tasks.md#configuring-task-storage-sizes)
-If the configured limit is too low, `NotEnoughTemporaryStorageFault` may be thrown.
-
-### Enable durable storage
-
-To enable durable storage, you need to set the following common service properties:
-
-```
-druid.msq.intermediate.storage.enable=true
-druid.msq.intermediate.storage.type=s3
-druid.msq.intermediate.storage.bucket=YOUR_BUCKET
-druid.msq.intermediate.storage.prefix=YOUR_PREFIX
-druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
-```
-
-For detailed information about the settings related to durable storage, see [Durable storage configurations](#durable-storage-configurations).
-
-
-### Use durable storage for queries
-
-When you run a query, include the context parameter `durableShuffleStorage` and set it to `true`.
-
-For queries where you want to use fault tolerance for workers,  set `faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to `true`.
-
-Set `selectDestination`:`durableStorage` for select queries that want to write the final results to durable storage instead of the task reports. Saving the results in the durable
-storage allows users to fetch large result sets. The location where the workers write the intermediate results is different than the location where final results get stored. Therefore, `durableShuffleStorage`:`false` and
-`selectDestination`:`durableStorage` is a valid configuration to use in the query context, that instructs the controller to persist only the final result in the durable storage, and not the
-intermediate results.
+### Durable storage configurations
 
+The following common service properties control how durable storage behaves:
 
-## Durable storage configurations
+|Parameter          |Default                                 | Description          |
+|-------------------|----------------------------------------|----------------------|
+|`druid.msq.intermediate.storage.enable` | true | Required. Whether to enable durable storage for the cluster. For more information about enabling durable storage, see [Durable storage](../operations/durable-storage.md).|
+|`druid.msq.intermediate.storage.type` | `s3` for Amazon S3 | Required. The type of storage to use.  `s3` is the only supported storage type.  |
+|`druid.msq.intermediate.storage.bucket` | n/a | The S3 bucket to store intermediate files.  |
+|`druid.msq.intermediate.storage.prefix` | n/a | S3 prefix to store intermediate stage results. Provide a unique value for the prefix. Don't share the same prefix between clusters. If the location includes other files or directories, then they will get cleaned up as well.  |
+|`druid.msq.intermediate.storage.tempDir`| n/a | Required. Directory path on the local disk to temporarily store intermediate stage results.  |
+|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max number times to attempt S3 API calls to avoid failures due to transient errors. | 
+|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the size of each chunk to temporarily store in `druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB and 5 GiB. A large chunk size reduces the API calls made to the durable storage, however it requires more disk space to store the temporary chunks. Druid uses a default of 100MiB if the value is not provided.| 
 
 The following common service properties control how durable storage behaves:
 
 |Parameter          |Default                                 | Description          |
 |-------------------|----------------------------------------|----------------------|
-|`druid.msq.intermediate.storage.bucket` | n/a | The bucket in S3 where you want to store intermediate files.  |
-|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the size of each chunk to temporarily store in `druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB and 5 GiB. A large chunk size reduces the API calls made to the durable storage, however it requires more disk space to store the temporary chunks. Druid uses a default of 100MiB if the value is not provided.| 
 |`druid.msq.intermediate.storage.enable` | true | Required. Whether to enable durable storage for the cluster.|
-|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max number times to attempt S3 API calls to avoid failures due to transient errors. | 
+|`druid.msq.intermediate.storage.type` | `s3` if your deep storage is S3 | Required. The type of storage to use. Currently only `s3` is supported.  |
+|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the size of each chunk to temporarily store in `druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB and 5 GiB. A large chunk size reduces the API calls made to the durable storage, however it requires more disk space to store the temporary chunks. Druid uses a default of 100MiB if the value is not provided.|
+|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max number times to attempt S3 API calls to avoid failures due to transient errors. |
+|`druid.msq.intermediate.storage.bucket` | n/a | The bucket in S3 where you want to store intermediate files.  |
 |`druid.msq.intermediate.storage.prefix` | n/a | S3 prefix to store intermediate stage results. Provide a unique value for the prefix. Don't share the same prefix between clusters. If the location  includes other files or directories, then they will get cleaned up as well.  |
 |`druid.msq.intermediate.storage.tempDir`| n/a | Required. Directory path on the local disk to temporarily store intermediate stage results.  |
-|`druid.msq.intermediate.storage.type` | `s3` if your deep storage is S3 | Required. The type of storage to use. You can either set this to `local` or `s3`.  |
+
+In addition to the common service properties, there are certain properties that you configure on the Overlord specifically to clean up intermediate files:
+
+|Parameter          |Default                                 | Description          |
+|-------------------|----------------------------------------|----------------------|
+|`druid.msq.intermediate.storage.cleaner.enabled`| false | Optional. Whether durable storage cleaner should be enabled for the cluster.  |
+|`druid.msq.intermediate.storage.cleaner.delaySeconds`| 86400 | Optional. The delay (in seconds) after the last run post which the durable storage cleaner would clean the outputs.  |
 
 In addition to the common service properties, there are certain properties that you configure on the Overlord specifically to clean up intermediate files:
 

diff --git a/docs/operations/durable-storage.md b/docs/operations/durable-storage.md
@@ -0,0 +1,86 @@
+---
+id: durable-storage
+title: "Durable storage for the multi-stage query engine"
+sidebar_label: "Durable storage"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+You can use durable storage to improve querying from deep storage and SQL-based ingestion.
+
+> Note that only S3 is supported as a durable storage location.
+
+Durable storage for queries from deep storage provides a location where you can write the results of deep storage queries to. Durable storage for SQL-based ingestion is used to temporarily house intermediate files, which can improve reliability.
+
+Enabling durable storage also enables the use of local disk to store temporary files, such as the intermediate files produced
+while sorting the data. Tasks will use whatever has been configured for their temporary usage as described in [Configuring task storage sizes](../ingestion/tasks.md#configuring-task-storage-sizes).
+If the configured limit is too low, Druid may throw the error, `NotEnoughTemporaryStorageFault`.
+
+## Enable durable storage
+
+To enable durable storage, you need to set the following common service properties:
+
+```
+druid.msq.intermediate.storage.enable=true
+druid.msq.intermediate.storage.type=s3
+druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
+druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+```
+
+For detailed information about the settings related to durable storage, see [Durable storage configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+
+
+## Use durable storage for SQL-based ingestion queries
+
+When you run a query, include the context parameter `durableShuffleStorage` and set it to `true`.
+
+For queries where you want to use fault tolerance for workers,  set `faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to `true`.
+
+## Use durable storage for queries from deep storage
+
+Depending on the size of the results you're expecting, saving the final results for queries from deep storage to durable storage might be needed.
+
+By default, Druid saves the final results for queries from deep storage to task reports. Generally, this is acceptable for smaller result sets but may lead to timeouts for larger result sets. 
+
+When you run a query, include the context parameter `selectDestination` and set it to `DURABLESTORAGE`:
+
+```json
+    "context":{
+        ...
+        "selectDestination": "DURABLESTORAGE"
+    }
+```
+
+You can also write intermediate results to durable storage (`durableShuffleStorage`) for better reliability. The location where workers write intermediate results is different than the location where final results get stored. This means that durable storage for results can be enabled even if you don't write intermediate results to durable storage. 
+
+If you write the results for queries from deep storage to durable storage, the results are cleaned up when the task is removed from the metadata store. 
+
+## Durable storage clean up
+
+To prevent durable storage from getting filled up with temporary files in case the tasks fail to clean them up, a periodic
+cleaner can be scheduled to clean the directories corresponding to which there isn't a controller task running. It utilizes
+the storage connector to work upon the durable storage. The durable storage location should only be utilized to store the output
+for the cluster's MSQ tasks. If the location contains other files or directories, then they will get cleaned up as well.
+
+Use `druid.msq.intermediate.storage.cleaner.enabled` and `druid.msq.intermediate.storage.cleaner.delaySEconds` to configure the cleaner. For more information, see [Durable storage configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+
+Note that if you choose to write query results to durable storage,the results are cleaned up when the task is removed from the metadata store.
+