-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metadata fields for mappings (content gap initiative) #6933
Changes from 36 commits
a7d4567
7b9aa58
3ad5c25
653d2f4
48d3a6e
8722b15
b2dfc7a
cf4a80c
061cc80
a047b4c
0b4bf5e
bd3b85c
d6cadd1
4b6f1d1
57be501
a9f6f4f
5b7ab8b
2b4db59
edd60c6
a593836
0e41a24
6c0c5c8
5e11c4a
3dcfd50
9fcc2eb
3d30dfb
56991f3
a7f39c0
c286a28
078c49f
02cc9dd
aa6ac4d
415ef50
11e1a1c
ffd96dc
5c64fd6
8e7a288
459f3d0
0c451a5
8eefb2b
709d4b3
ced8826
23943c8
91d0f18
8485d22
a5ccbd9
d5a6655
f478040
f3337a6
e21c9af
f2516c9
365b145
2292abc
a114b6e
47a1bf9
00fc9c1
ae99ef0
61a9f6b
40d9b43
284b73e
c7d9585
dc2792e
83bfc25
4389093
02af070
f83fddf
9c819eb
20bdec2
e38ec49
8f22a9c
c1a2315
bd8ac77
cf4222a
243357a
375488b
74b82c3
1262bae
2e2c00b
1d622df
022eb23
c23b805
9f7694e
97cb348
4524c1a
27c40c8
ff7c843
c33c047
596fa72
57cbe5e
3d9f7ba
60ecace
30313fe
4f0b8e4
fc62bcd
3cb42e1
53354fe
cfc9997
31597db
95e7ef9
d67b974
5ad15d9
6a30faf
28a2923
c25d86b
01131a4
a49b1ac
e2d2d0d
bfa2952
9744ab7
4898e09
c44a46a
e1cef5f
7eaefbc
7f268dc
71e6ed4
e49c1d3
a708be7
5ee0a4d
8f2c11e
e5bc821
1c9328d
df556d7
e1876b6
4202902
b970b51
2aefa3c
381f3c9
2b6ada7
88d3c1e
46c0639
9087d12
f1a195d
e53ec2a
e90eb17
b700ff7
670328f
b9ccc47
116cecb
ce89a83
7bc5bfd
7b051f5
7213aa9
581b118
916a0dc
9d8a1e7
09e64a5
234d6a0
230c830
5b24526
f360baf
c893666
5802589
3552222
ae73ae8
d2cb1b5
b0f0b92
cd6b1aa
2e8ff8b
a5074df
9654a1b
097f3ef
5e263e3
421e97f
c834b5a
044fb6c
ba7b39b
dca4c32
92a949e
e3d8d0b
5573bc9
eac468b
1037b6e
215a3b7
0cb56be
219b1e3
2dc0e2a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,38 +14,57 @@ | |
|
||
You can define how documents and their fields are stored and indexed by creating a _mapping_. The mapping specifies the list of fields for a document. Every field in the document has a _field type_, which defines the type of data the field contains. For example, you may want to specify that the `year` field should be of type `date`. To learn more, see [Supported field types]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/). | ||
|
||
If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and its fields. However, if you know exactly what types your data falls under and want to enforce that standard, then you can use explicit mappings. | ||
If you're starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and its fields. However, if you know exactly what types your data falls under and want to enforce that standard, then you can use explicit mappings. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
For example, if you want to indicate that `year` should be of type `text` instead of an `integer`, and `age` should be an `integer`, you can do so with explicit mappings. By using dynamic mapping, OpenSearch might interpret both `year` and `age` as integers. | ||
|
||
This section provides an example for how to create an index mapping and how to add a document to it that will get ip_range validated. | ||
This documentation provides an example for how to create an index mapping and how to add a document to it that will get `ip_range` validated. | ||
|
||
#### Table of contents | ||
1. TOC | ||
{:toc} | ||
|
||
|
||
--- | ||
## Dynamic mapping | ||
|
||
When you index a document, OpenSearch adds fields automatically with dynamic mapping. You can also explicitly add fields to an index mapping. | ||
|
||
#### Dynamic mapping types | ||
### Dynamic mapping types | ||
|
||
Type | Description | ||
:--- | :--- | ||
null | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if that field has no values. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
boolean | OpenSearch accepts `true` and `false` as boolean values. An empty string is equal to `false.` | ||
Boolean | OpenSearch accepts `true` and `false` as Boolean values. An empty string is equal to `false.` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be lowercase only i.e.
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
float | A single-precision 32-bit floating point number. | ||
double | A double-precision 64-bit floating point number. | ||
integer | A signed 32-bit number. | ||
object | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`. | ||
array | Arrays in OpenSearch can only store values of one type, such as an array of just integers or strings. Empty arrays are treated as though they are fields with no values. | ||
array | Arrays in OpenSearch can only store values of one type, such as an array of only integers or strings. Empty arrays are treated as though they are fields with no values. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should highlight that there is no specific array type, and a set of values can be passed for fields There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Revised
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
text | A string sequence of characters that represent full-text values. | ||
keyword | A string sequence of structured characters, such as an email address or ZIP code. | ||
date detection string | Enabled by default, if new string fields match a date's format, then the string is processed as a `date` field. For example, `date: "2012/03/11"` is processed as a date. | ||
numeric detection string | If disabled, OpenSearch may automatically process numeric values as strings when they should be processed as numbers. When enabled, OpenSearch can process strings into `long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`, and `unsigned_long`. Default is disabled. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we cover all field types from here? https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/indices/IndicesModule.java#L151-L174 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we link to the GitHub file instead of listing the field types in the documentation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Linking to code can be tricky as the code can move due to refactoring making the documentation point to old/unusable links. Hence, I'd prefer to keep it within the documentation if possible. |
||
### Dynamic templates | ||
|
||
Dynamic templates are used to define custom mappings for dynamically added fields based on data type, field name, or field path. They allow you to define a flexible schema for your data, which can automatically adapt to changes in the structure or format of the input data. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The syntax for defining a dynamic mapping template in OpenSearch looks like the following: | ||
|
||
```json | ||
"dynamic_templates": [ | ||
{ | ||
"template_name": { | ||
... match conditions ... | ||
"mapping": { ... } | ||
} | ||
}, | ||
... | ||
] | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can share an actual example here. Let me know if that will be better There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, please share an example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The above will create a field like |
||
{% include copy-curl.html %} | ||
|
||
Note the following: | ||
|
||
- The template name can be any string value. | ||
- The match conditions can include any of the following: `match_mapping_type`, `match`, `match_pattern`, `unmatch`, `path_match`, or `path_unmatch`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The supported params for matching conditions with
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added. |
||
- Specify the `mapping` to be used for the matched field. | ||
|
||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Explicit mapping | ||
|
||
If you know exactly what your field data types need to be, you can specify them in your request body when creating your index. | ||
|
@@ -62,15 +81,17 @@ | |
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
### Response | ||
#### Response | ||
```json | ||
{ | ||
"acknowledged": true, | ||
"shards_acknowledged": true, | ||
"index": "sample-index1" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
To add mappings to an existing index or data stream, you can send a request to the `_mapping` endpoint using the `PUT` or `POST` HTTP method: | ||
|
||
|
@@ -84,11 +105,30 @@ | |
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
You cannot change the mapping of an existing field, you can only modify the field's mapping parameters. | ||
{: .note} | ||
|
||
## Mapping parameters | ||
|
||
Mapping parameters are used to configure the behavior of fields in an index. See the [Mapping parameters](inert-link-to-page) page for more information. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this link in progress?
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Mapping limit settings | ||
|
||
OpenSearch has certain limits or settings related to mappings, such as the settings listed in the following table. Settings can be configured based on your requirements. | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
| Setting | Default value | Allowed value | Type | Description | | ||
|-|-|-|-|-| | ||
| index.mapping.nested_fields.limit | 50 | [0,) | Dynamic | Limits the maximum number of nested fields that can be defined in an index mapping. | | ||
| index.mapping.nested_objects.limit | 10000 | [0,) | Dynamic | Limits the maximum number of nested objects that can be created within a single document. | | ||
| index.mapping.total_fields.limit | 1000 | [0,) | Dynamic | Limits the maximum number of fields that can be defined in an index mapping. | | ||
| index.mapping.depth.limit | 20 | [1,100] | Dynamic | Limits the maximum depth of nested objects and nested fields that can be defined in an index mapping. | | ||
| index.mapping.field_name_length.limit | 50000 | [1,50000] | Dynamic | Limits the maximum length of field names that can be defined in an index mapping. | | ||
| index.mapper.dynamic | true | {true,false} | Dynamic | Determines whether new fields should be added dynamically to the mapping when they are encountered in a document. | | ||
Check failure on line 128 in _field-types/index.md GitHub Actions / vale[vale] _field-types/index.md#L128
Raw output
|
||
|
||
--- | ||
|
||
## Mapping example usage | ||
|
||
The following example shows how to create a mapping to specify that OpenSearch should ignore any documents with malformed IP addresses that do not conform to the [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) data type. You accomplish this by setting the `ignore_malformed` parameter to `true`. | ||
|
@@ -110,6 +150,7 @@ | |
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
You can add a document that has a malformed IP address to your index: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should examples like this be moved to mapping parameters documentation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we can move them. |
||
|
||
|
@@ -119,6 +160,7 @@ | |
"ip_address" : "malformed ip address" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
This indexed IP address does not throw an error because `ignore_malformed` is set to true. | ||
|
||
|
@@ -127,6 +169,7 @@ | |
```json | ||
GET /test-index/_search | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
The response shows that the `ip_address` field is ignored in the indexed document: | ||
|
||
|
@@ -162,6 +205,7 @@ | |
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Get a mapping | ||
|
||
|
@@ -170,21 +214,24 @@ | |
```json | ||
GET <index>/_mapping | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
In the above request, `<index>` may be an index name or a comma-separated list of index names. | ||
In the previous request, `<index>` may be an index name or a comma-separated list of index names. | ||
|
||
To get all mappings for all indexes, use the following request: | ||
|
||
```json | ||
GET _mapping | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
To get a mapping for a specific field, provide the index name and the field name: | ||
|
||
```json | ||
GET _mapping/field/<fields> | ||
GET /<index>/_mapping/field/<fields> | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Both `<index>` and `<fields>` can be specified as one value or a comma-separated list. | ||
|
||
|
@@ -193,6 +240,7 @@ | |
```json | ||
GET sample-index1/_mapping/field/year,age | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
The response contains the specified fields: | ||
|
||
|
@@ -220,3 +268,33 @@ | |
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Map string fields to `text` and `keyword` types | ||
|
||
This request creates an index named `movies1` with a dynamic template that maps all string fields to both `text` and `keyword` types. | ||
|
||
```json | ||
PUT movies1 | ||
{ | ||
"mappings": { | ||
"dynamic_templates": [ | ||
{ | ||
"strings": { | ||
"match_mapping_type": "string", | ||
"mapping": { | ||
"type": "text", | ||
"fields": { | ||
"keyword": { | ||
"type": "keyword", | ||
"ignore_above": 256 | ||
} | ||
} | ||
} | ||
} | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,41 @@ | ||||||
--- | ||||||
layout: default | ||||||
title: Field names | ||||||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
nav_order: 10 | ||||||
has_children: false | ||||||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
parent: Metadata fields | ||||||
--- | ||||||
|
||||||
# Field names | ||||||
|
||||||
The `field_names` field indexes the names of fields within a document that contain non-null values. This field support the `exists` query, which identifies documents with or without non-null values for a specified field. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mgodwan Please review this narrative for technical accuracy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, term queries on this metadata field are deprecated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Revised. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
The `field_names` only indexes field names when both `doc_values` and `norms` are disabled for those fields. If either `doc_values` or `norms` are enabled, the `exists` query remains functional but does not rely on `field_names`. | ||||||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
## Mapping example | ||||||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
{ | ||||||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
"mappings": { | ||||||
"properties": { | ||||||
"field_names": { | ||||||
"type": "keyword" | ||||||
}, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
"title": { | ||||||
"type": "text", | ||||||
"doc_values": false, | ||||||
"norms": false | ||||||
}, | ||||||
"description": { | ||||||
"type": "text", | ||||||
"doc_values": true, | ||||||
"norms": false | ||||||
}, | ||||||
"price": { | ||||||
"type": "float", | ||||||
"doc_values": false, | ||||||
"norms": true | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
{% include copy-curl.html %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
--- | ||
layout: default | ||
title: ID | ||
nav_order: 20 | ||
has_children: false | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
parent: Metadata fields | ||
--- | ||
|
||
# ID | ||
|
||
Each document has an `_id` field that uniquely identifies it. This field is indexed, allowing documents to be retrieved either through the `GET` API or the [`ids` query]({{site.url}}{{site.baseurl}}/query-dsl/term/ids/). | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line above: Is "GET" intentionally in code font? |
||
The following examples creates an index `test-index1` and add two documents with different `_id` values: | ||
|
||
```json | ||
PUT test-index1/_doc/1 | ||
{ | ||
"text": "Document with ID 1" | ||
} | ||
|
||
PUT test-index1/_doc/2?refresh=true | ||
{ | ||
"text": "Document with ID 2" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Now, you can query the documents using the `_id` field: | ||
|
||
```json | ||
GET test-index1/_search | ||
{ | ||
"query": { | ||
"terms": { | ||
"_id": ["1", "2"] | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
The following response shows that this query returns both documents with `_id` values of `1` and `2`. | ||
|
||
```json | ||
{ | ||
"took": 10, | ||
"timed_out": false, | ||
"_shards": { | ||
"total": 1, | ||
"successful": 1, | ||
"skipped": 0, | ||
"failed": 0 | ||
}, | ||
"hits": { | ||
"total": { | ||
"value": 2, | ||
"relation": "eq" | ||
}, | ||
"max_score": 1, | ||
"hits": [ | ||
{ | ||
"_index": "test-index1", | ||
"_id": "1", | ||
"_score": 1, | ||
"_source": { | ||
"text": "Document with ID 1" | ||
} | ||
}, | ||
{ | ||
"_index": "test-index1", | ||
"_id": "2", | ||
"_score": 1, | ||
"_source": { | ||
"text": "Document with ID 2" | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{% include copy-curl.html %} | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Querying on the `_id` field | ||
|
||
While the `_id` field is accessible in various queries, it is restricted from use in aggregations, sorting, and scripting. See [IDs query]({{site.url}}{{site.baseurl}}/query-dsl/term/ids/) for an example of the field's usage. If you need to sort or aggregate on the `_id` field, it is recommended to duplicate the content of the `_id` field into another field that has `doc_values` enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are certain caveats of using the dynamic mappings (e.g. performance impact). I believe we should highlight the same and recommend to use explicit mappings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised