[8.x] [DOCS] Concept cleanup 2 - ES settings (#119373) (#119642)

elastic · Jan 10, 2025 · ae3db60 · ae3db60
1 parent 8a14c14
commit ae3db60
Show file tree

Hide file tree

Showing 51 changed files with 959 additions and 856 deletions.
diff --git a/docs/plugins/discovery-ec2.asciidoc b/docs/plugins/discovery-ec2.asciidoc
@@ -241,7 +241,7 @@ The `discovery-ec2` plugin can automatically set the `aws_availability_zone`
 node attribute to the availability zone of each node. This node attribute
 allows you to ensure that each shard has copies allocated redundantly across
 multiple availability zones by using the
-{ref}/modules-cluster.html#shard-allocation-awareness[Allocation Awareness]
+{ref}/shard-allocation-awareness.html#[Allocation Awareness]
 feature.
 
 In order to enable the automatic definition of the `aws_availability_zone`
@@ -333,7 +333,7 @@ labelled as `Moderate` or `Low`.
 
 * It is a good idea to distribute your nodes across multiple
 https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html[availability
-zones] and use {ref}/modules-cluster.html#shard-allocation-awareness[shard
+zones] and use {ref}/shard-allocation-awareness.html[shard
 allocation awareness] to ensure that each shard has copies in more than one
 availability zone.
 

diff --git a/docs/reference/cat/nodeattrs.asciidoc b/docs/reference/cat/nodeattrs.asciidoc
@@ -17,7 +17,7 @@ console. They are _not_ intended for use by applications. For application
 consumption, use the <<cluster-nodes-info,nodes info API>>.
 ====
 
-Returns information about custom node attributes.
+Returns information about <<custom-node-attributes,custom node attributes>>.
 
 [[cat-nodeattrs-api-request]]
 ==== {api-request-title}

diff --git a/docs/reference/cluster.asciidoc b/docs/reference/cluster.asciidoc
@@ -35,7 +35,7 @@ one of the following:
   master-eligible nodes, all data nodes, all ingest nodes, all voting-only
   nodes, all machine learning nodes, and all coordinating-only nodes.
 * a pair of patterns, using `*` wildcards, of the form `attrname:attrvalue`,
-  which adds to the subset all nodes with a custom node attribute whose name
+  which adds to the subset all nodes with a <<custom-node-attributes,custom node attribute>> whose name
   and value match the respective patterns. Custom node attributes are
   configured by setting properties in the configuration file of the form
   `node.attr.attrname: attrvalue`.

diff --git a/docs/reference/commands/node-tool.asciidoc b/docs/reference/commands/node-tool.asciidoc
@@ -23,8 +23,8 @@ bin/elasticsearch-node repurpose|unsafe-bootstrap|detach-cluster|override-versio
 This tool has a number of modes:
 
 * `elasticsearch-node repurpose` can be used to delete unwanted data from a
-  node if it used to be a <<data-node,data node>> or a
-  <<master-node,master-eligible node>> but has been repurposed not to have one
+  node if it used to be a <<data-node-role,data node>> or a
+  <<master-node-role,master-eligible node>> but has been repurposed not to have one
   or other of these roles.
 
 * `elasticsearch-node remove-settings` can be used to remove persistent settings

diff --git a/docs/reference/data-management.asciidoc b/docs/reference/data-management.asciidoc
@@ -43,7 +43,7 @@ Data older than this period can be deleted by {es} at a later time.
 
 **Elastic Curator** is a tool that allows you to manage your indices and snapshots using user-defined filters and predefined actions. If ILM provides the functionality to manage your index lifecycle, and you have at least a Basic license, consider using ILM in place of Curator. Many stack components make use of ILM by default. {curator-ref-current}/ilm.html[Learn more].
 
-NOTE: <<xpack-rollup,Data rollup>> is a deprecated Elasticsearch feature that allows you to manage the amount of data that is stored in your cluster, similar to the downsampling functionality of {ilm-init} and data stream lifecycle. This feature should not be used for new deployments.
+NOTE: <<xpack-rollup,Data rollup>> is a deprecated {es} feature that allows you to manage the amount of data that is stored in your cluster, similar to the downsampling functionality of {ilm-init} and data stream lifecycle. This feature should not be used for new deployments.
 
 [TIP]
 ====

diff --git a/docs/reference/data-management/migrate-index-allocation-filters.asciidoc b/docs/reference/data-management/migrate-index-allocation-filters.asciidoc
@@ -2,7 +2,7 @@
 [[migrate-index-allocation-filters]]
 == Migrate index allocation filters to node roles
 
-If you currently use custom node attributes and
+If you currently use <<custom-node-attributes,custom node attributes>> and
 <<shard-allocation-filtering, attribute-based allocation filters>> to
 move indices through <<data-tiers, data tiers>> in a
 https://www.elastic.co/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management[hot-warm-cold architecture],

diff --git a/docs/reference/data-store-architecture.asciidoc b/docs/reference/data-store-architecture.asciidoc
@@ -9,10 +9,16 @@ from any node.
 The topics in this section provides information about the architecture of {es} and how it stores and retrieves data: 
 
 * <<nodes-shards,Nodes and shards>>: Learn about the basic building blocks of an {es} cluster, including nodes, shards, primaries, and replicas.
+* <<node-roles-overview,Node roles>>: Learn about the different roles that nodes can have in an {es} cluster.
 * <<docs-replication,Reading and writing documents>>: Learn how {es} replicates read and write operations across shards and shard copies.
 * <<shard-allocation-relocation-recovery,Shard allocation, relocation, and recovery>>: Learn how {es} allocates and balances shards across nodes.
+** <<shard-allocation-awareness,Shard allocation awareness>>: Learn how to use custom node attributes to distribute shards across different racks or availability zones.
+* <<shard-request-cache,Shard request cache>>: Learn how {es} caches search requests to improve performance.
 --
 
 include::nodes-shards.asciidoc[]
+include::node-roles.asciidoc[]
 include::docs/data-replication.asciidoc[leveloffset=-1]
-include::modules/shard-ops.asciidoc[]
+include::modules/shard-ops.asciidoc[]
+include::modules/cluster/allocation_awareness.asciidoc[leveloffset=+1]
+include::shard-request-cache.asciidoc[leveloffset=-1]
diff --git a/docs/reference/data-streams/downsampling.asciidoc b/docs/reference/data-streams/downsampling.asciidoc
@@ -72,6 +72,45 @@ the granularity of `cold` archival data to monthly or less.
 .Downsampled metrics series
 image::images/data-streams/time-series-downsampled.png[align="center"]
 
+[discrete]
+[[downsample-api-process]]
+==== The downsampling process
+
+The downsampling operation traverses the source TSDS index and performs the
+following steps:
+
+. Creates a new document for each value of the `_tsid` field and each
+`@timestamp` value, rounded to the `fixed_interval` defined in the downsample
+configuration.
+. For each new document, copies all <<time-series-dimension,time
+series dimensions>> from the source index to the target index. Dimensions in a
+TSDS are constant, so this is done only once per bucket.
+. For each <<time-series-metric,time series metric>> field, computes aggregations
+for all documents in the bucket. Depending on the metric type of each metric
+field a different set of pre-aggregated results is stored:
+
+** `gauge`: The `min`, `max`, `sum`, and `value_count` are stored; `value_count`
+is stored as type `aggregate_metric_double`.
+** `counter`: The `last_value` is stored.
+. For all other fields, the most recent value is copied to the target index.
+
+[discrete]
+[[downsample-api-mappings]]
+==== Source and target index field mappings
+
+Fields in the target, downsampled index are created based on fields in the
+original source index, as follows:
+
+. All fields mapped with the `time-series-dimension` parameter are created in
+the target downsample index with the same mapping as in the source index.
+. All fields mapped with the `time_series_metric` parameter are created
+in the target downsample index with the same mapping as in the source
+index. An exception is that for fields mapped as `time_series_metric: gauge`
+the field type is changed to `aggregate_metric_double`.
+. All other fields that are neither dimensions nor metrics (that is, label
+fields), are created in the target downsample index with the same mapping
+that they had in the source index.
+
 [discrete]
 [[running-downsampling]]
 === Running downsampling on time series data

diff --git a/docs/reference/datatiers.asciidoc b/docs/reference/datatiers.asciidoc
@@ -190,7 +190,7 @@ tier].
 [[configure-data-tiers-on-premise]]
 ==== Self-managed deployments
 
-For self-managed deployments, each node's <<data-node,data role>> is configured
+For self-managed deployments, each node's <<data-node-role,data role>> is configured
 in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster
 might be assigned to both the hot and content tiers:
 

diff --git a/docs/reference/high-availability/cluster-design.asciidoc b/docs/reference/high-availability/cluster-design.asciidoc
@@ -87,7 +87,7 @@ the same thing, but it's not necessary to use this feature in such a small
 cluster.
 
 We recommend you set only one of your two nodes to be
-<<master-node,master-eligible>>. This means you can be certain which of your
+<<master-node-role,master-eligible>>. This means you can be certain which of your
 nodes is the elected master of the cluster. The cluster can tolerate the loss of
 the other master-ineligible node. If you set both nodes to master-eligible, two
 nodes are required for a master election. Since the election will fail if either
@@ -164,12 +164,12 @@ cluster that is suitable for production deployments.
 [[high-availability-cluster-design-three-nodes]]
 ==== Three-node clusters
 
-If you have three nodes, we recommend they all be <<data-node,data nodes>> and
+If you have three nodes, we recommend they all be <<data-node-role,data nodes>> and
 every index that is not a <<searchable-snapshots,searchable snapshot index>>
 should have at least one replica. Nodes are data nodes by default. You may
 prefer for some indices to have two replicas so that each node has a copy of
 each shard in those indices. You should also configure each node to be
-<<master-node,master-eligible>> so that any two of them can hold a master
+<<master-node-role,master-eligible>> so that any two of them can hold a master
 election without needing to communicate with the third node. Nodes are
 master-eligible by default. This cluster will be resilient to the loss of any
 single node.
@@ -188,8 +188,8 @@ service provides such a load balancer.
 
 Once your cluster grows to more than three nodes, you can start to specialise
 these nodes according to their responsibilities, allowing you to scale their
-resources independently as needed. You can have as many <<data-node,data
-nodes>>, <<ingest,ingest nodes>>, <<ml-node,{ml} nodes>>, etc. as needed to
+resources independently as needed. You can have as many <<data-node-role,data
+nodes>>, <<ingest,ingest nodes>>, <<ml-node-role,{ml} nodes>>, etc. as needed to
 support your workload. As your cluster grows larger, we recommend using
 dedicated nodes for each role. This allows you to independently scale resources
 for each task.

diff --git a/docs/reference/ilm/apis/migrate-to-data-tiers.asciidoc b/docs/reference/ilm/apis/migrate-to-data-tiers.asciidoc
@@ -11,7 +11,7 @@
 For the most up-to-date API details, refer to {api-es}/group/endpoint-ilm[{ilm-cap} APIs].
 --
 
-Switches the indices, ILM policies, and legacy, composable and component templates from using custom node attributes and
+Switches the indices, ILM policies, and legacy, composable and component templates from using <<custom-node-attributes,custom node attributes>> and
 <<shard-allocation-filtering, attribute-based allocation filters>> to using <<data-tiers, data tiers>>, and
 optionally deletes one legacy index template.
 Using node roles enables {ilm-init} to <<data-tier-migration, automatically move the indices>> between

diff --git a/docs/reference/index-modules/allocation/data_tier_allocation.asciidoc b/docs/reference/index-modules/allocation/data_tier_allocation.asciidoc
@@ -13,7 +13,7 @@ This setting corresponds to the data node roles:
 * <<data-cold-node, data_cold>>
 * <<data-frozen-node, data_frozen>>
 
-NOTE: The <<data-node, data>> role is not a valid data tier and cannot be used
+NOTE: The <<data-node-role, data>> role is not a valid data tier and cannot be used
 with the `_tier_preference` setting. The frozen tier stores <<partially-mounted,partially
 mounted indices>> exclusively.
 

diff --git a/docs/reference/index-modules/allocation/filtering.asciidoc b/docs/reference/index-modules/allocation/filtering.asciidoc
@@ -6,7 +6,7 @@ a particular index. These per-index filters are applied in conjunction with
 <<cluster-shard-allocation-filtering, cluster-wide allocation filtering>> and
 <<shard-allocation-awareness, allocation awareness>>.
 
-Shard allocation filters can be based on custom node attributes or the built-in
+Shard allocation filters can be based on <<custom-node-attributes,custom node attributes>> or the built-in
 `_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference`
 attributes. <<index-lifecycle-management, Index lifecycle management>> uses filters based
 on custom node attributes to determine how to reallocate shards when moving
@@ -114,7 +114,7 @@ The index allocation settings support the following built-in attributes:
 
 NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only
 a subset of roles are <<data-tiers, data tier>> roles, and the generic
-<<data-node, data role>> will match any tier filtering.
+<<data-node-role, data role>> will match any tier filtering.
 
 You can use wildcards when specifying attribute values, for example:
 

diff --git a/docs/reference/indices/downsample-data-stream.asciidoc b/docs/reference/indices/downsample-data-stream.asciidoc
@@ -81,6 +81,8 @@ DELETE _index_template/*
 ////
 // end::downsample-example[]
 
+Check the <<downsampling,Downsampling>> documentation for an overview, details about the downsampling process, and examples of running downsampling manually and as part of an ILM policy.
+
 [[downsample-api-request]]
 ==== {api-request-title}
 
@@ -121,44 +123,4 @@ to aggregate the original time series index. For example, `60m` produces a
 document for each 60 minute (hourly) interval. This follows standard time
 formatting syntax as used elsewhere in {es}.
 +
-NOTE: Smaller, more granular intervals take up proportionally more space.
-
-[[downsample-api-process]]
-==== The downsampling process
-
-The downsampling operation traverses the source TSDS index and performs the
-following steps:
-
-. Creates a new document for each value of the `_tsid` field and each
-`@timestamp` value, rounded to the `fixed_interval` defined in the downsample
-configuration.
-. For each new document, copies all <<time-series-dimension,time
-series dimensions>> from the source index to the target index. Dimensions in a
-TSDS are constant, so this is done only once per bucket.
-. For each <<time-series-metric,time series metric>> field, computes aggregations
-for all documents in the bucket. Depending on the metric type of each metric
-field a different set of pre-aggregated results is stored:
-
-** `gauge`: The `min`, `max`, `sum`, and `value_count` are stored; `value_count`
-is stored as type `aggregate_metric_double`.
-** `counter`: The `last_value` is stored.
-. For all other fields, the most recent value is copied to the target index.
-
-[[downsample-api-mappings]]
-==== Source and target index field mappings
-
-Fields in the target, downsampled index are created based on fields in the
-original source index, as follows:
-
-. All fields mapped with the `time-series-dimension` parameter are created in
-the target downsample index with the same mapping as in the source index.
-. All fields mapped with the `time_series_metric` parameter are created
-in the target downsample index with the same mapping as in the source
-index. An exception is that for fields mapped as `time_series_metric: gauge`
-the field type is changed to `aggregate_metric_double`.
-. All other fields that are neither dimensions nor metrics (that is, label
-fields), are created in the target downsample index with the same mapping
-that they had in the source index.
-
-Check the <<downsampling,Downsampling>> documentation for an overview and
-examples of running downsampling manually and as part of an ILM policy.
+NOTE: Smaller, more granular intervals take up proportionally more space.
diff --git a/docs/reference/modules/cluster.asciidoc b/docs/reference/modules/cluster.asciidoc
@@ -27,7 +27,23 @@ include::cluster/shards_allocation.asciidoc[]
 
 include::cluster/disk_allocator.asciidoc[]
 
-include::cluster/allocation_awareness.asciidoc[]
+[[shard-allocation-awareness-settings]]
+==== Shard allocation awareness settings
+
+You can use <<custom-node-attributes,custom node attributes>> as _awareness attributes_ to enable {es}
+to take your physical hardware configuration into account when allocating shards.
+If {es} knows which nodes are on the same physical server, in the same rack, or
+in the same zone, it can distribute the primary shard and its replica shards to
+minimize the risk of losing all shard copies in the event of a failure. <<shard-allocation-awareness,Learn more about shard allocation awareness>>.
+
+`cluster.routing.allocation.awareness.attributes`::
+      (<<dynamic-cluster-setting,Dynamic>>)
+      The node attributes that {es} should use as awareness attributes. For example, if you have a `rack_id` attribute that specifies the rack in which each node resides, you can set this setting to `rack_id` to ensure that primary and replica shards are not allocated on the same rack. You can specify multiple attributes as a comma-separated list.
+
+`cluster.routing.allocation.awareness.force.*`:: 
+        (<<dynamic-cluster-setting,Dynamic>>)
+        The shard allocation awareness values that must exist for shards to be reallocated in case of location failure. Learn more about <<forced-awareness,forced awareness>>.
+
 
 include::cluster/allocation_filtering.asciidoc[]
 

diff --git a/docs/reference/modules/cluster/allocation_awareness.asciidoc b/docs/reference/modules/cluster/allocation_awareness.asciidoc
@@ -1,18 +1,13 @@
 [[shard-allocation-awareness]]
-==== Shard allocation awareness
+== Shard allocation awareness
 
 You can use custom node attributes as _awareness attributes_ to enable {es}
 to take your physical hardware configuration into account when allocating shards.
 If {es} knows which nodes are on the same physical server, in the same rack, or
 in the same zone, it can distribute the primary shard and its replica shards to
 minimize the risk of losing all shard copies in the event of a failure.
 
-When shard allocation awareness is enabled with the
-<<dynamic-cluster-setting,dynamic>>
-`cluster.routing.allocation.awareness.attributes` setting, shards are only
-allocated to nodes that have values set for the specified awareness attributes.
-If you use multiple awareness attributes, {es} considers each attribute
-separately when allocating shards.
+When shard allocation awareness is enabled with the `cluster.routing.allocation.awareness.attributes` setting, shards are only allocated to nodes that have values set for the specified awareness attributes. If you use multiple awareness attributes, {es} considers each attribute separately when allocating shards.
 
 NOTE: The number of attribute values determines how many shard copies are
 allocated in each location. If the number of nodes in each location is
@@ -22,11 +17,11 @@ unassigned.
 TIP: Learn more about <<high-availability-cluster-design-large-clusters,designing resilient clusters>>.
 
 [[enabling-awareness]]
-===== Enabling shard allocation awareness
+=== Enabling shard allocation awareness
 
 To enable shard allocation awareness:
 
-. Specify the location of each node with a custom node attribute. For example, 
+. Specify the location of each node with a <<custom-node-attributes,custom node attribute>>. For example, 
 if you want Elasticsearch to distribute shards across different racks, you might 
 use an awareness attribute called `rack_id`. 
 +
@@ -94,7 +89,7 @@ copies of a particular shard from being allocated in the same location, you can
 enable forced awareness.
 
 [[forced-awareness]]
-===== Forced awareness
+=== Forced awareness
 
 By default, if one location fails, {es} spreads its shards across the remaining
 locations. This might be undesirable if the cluster does not have sufficient

diff --git a/docs/reference/modules/cluster/allocation_filtering.asciidoc b/docs/reference/modules/cluster/allocation_filtering.asciidoc
@@ -6,7 +6,7 @@ allocates shards from any index. These cluster wide filters are applied in
 conjunction with <<shard-allocation-filtering, per-index allocation filtering>>
 and <<shard-allocation-awareness, allocation awareness>>.
 
-Shard allocation filters can be based on custom node attributes or the built-in
+Shard allocation filters can be based on <<custom-node-attributes,custom node attributes>> or the built-in
 `_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id` and `_tier` attributes.
 
 The `cluster.routing.allocation` settings are <<dynamic-cluster-setting,dynamic>>, enabling live indices to
@@ -59,9 +59,9 @@ The cluster allocation settings support the following built-in attributes:
 
 NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only
 a subset of roles are <<data-tiers, data tier>> roles, and the generic
-<<data-node, data role>> will match any tier filtering.
+<<data-node-role, data role>> will match any tier filtering.
 a subset of roles that are <<data-tiers, data tier>> roles, but the generic
-<<data-node, data role>> will match any tier filtering.
+<<data-node-role, data role>> will match any tier filtering.
 
 
 You can use wildcards when specifying attribute values, for example:
-Original file line number
+Diff line change
@@ Expand Up @@
     **Elastic Curator** is a tool that allows you to manage your indices and snapshots using user-defined filters and predefined actions. If ILM provides the functionality to manage your index lifecycle, and you have at least a Basic license, consider using ILM in place of Curator. Many stack components make use of ILM by default. {curator-ref-current}/ilm.html[Learn more].
-    NOTE: <<xpack-rollup,Data rollup>> is a deprecated Elasticsearch feature that allows you to manage the amount of data that is stored in your cluster, similar to the downsampling functionality of {ilm-init} and data stream lifecycle. This feature should not be used for new deployments.
+    NOTE: <<xpack-rollup,Data rollup>> is a deprecated {es} feature that allows you to manage the amount of data that is stored in your cluster, similar to the downsampling functionality of {ilm-init} and data stream lifecycle. This feature should not be used for new deployments.
     [TIP]
     ====
@@ Expand Down @@