-
Notifications
You must be signed in to change notification settings - Fork 508
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adding a page for mapping charechter filters
- Loading branch information
1 parent
9230b00
commit fdb2ddd
Showing
1 changed file
with
89 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
--- | ||
layout: default | ||
title: Mapping Character Filter | ||
parent: Character Filters | ||
nav_order: 95 | ||
--- | ||
|
||
# Mapping character filter | ||
|
||
The `mapping character filter` allows you to define a map of `keys` and `values` for character replacements. Whenever the filter encounters a string of characters matching a key, it replaces them with the corresponding value. | ||
|
||
Matching is greedy, meaning that the longest matching pattern is prioritized. Replacements can also be empty strings if needed. | ||
|
||
The mapping character filter helps in scenarios where specific text replacements are required before tokenization. | ||
|
||
## Example of the mapping filter | ||
The following example demonstrates a mapping filter that converts Roman numerals (I, II, III, IV, etc.) into their corresponding Arabic numerals (1, 2, 3, 4, etc.). | ||
``` | ||
GET /_analyze | ||
{ | ||
"tokenizer": "keyword", | ||
"char_filter": [ | ||
{ | ||
"type": "mapping", | ||
"mappings": [ | ||
"I => 1", | ||
"II => 2", | ||
"III => 3", | ||
"IV => 4", | ||
"V => 5" | ||
] | ||
} | ||
], | ||
"text": "I have III apples and IV oranges" | ||
} | ||
``` | ||
This filter will produce the following text: | ||
``` | ||
I have 3 apples and 4 oranges | ||
``` | ||
|
||
## Configuring the mapping filter | ||
There are two ways to configure the mappings. | ||
1. `mappings`: Provide an array of key-value pairs in the form `key => value`. For every key found, the corresponding value will replace it in the input text. | ||
2. `mappings_path`: Specify the path to a UTF-8 encoded file containing key-value mappings. Each mapping should be on a new line in the format `key => value`. The path can be absolute or relative to the OpenSearch configuration directory. | ||
|
||
### Using a custom mapping character filter | ||
You can create a custom mapping character filter by defining your own set of mappings. The following example demonstrates the creation of a custom character filter that replaces common abbreviations in a text. | ||
``` | ||
PUT /text-index | ||
{ | ||
"settings": { | ||
"analysis": { | ||
"analyzer": { | ||
"custom_abbr_analyzer": { | ||
"tokenizer": "standard", | ||
"char_filter": [ | ||
"custom_abbr_filter" | ||
] | ||
} | ||
}, | ||
"char_filter": { | ||
"custom_abbr_filter": { | ||
"type": "mapping", | ||
"mappings": [ | ||
"BTW => By the way", | ||
"IDK => I don't know", | ||
"FYI => For your information" | ||
] | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
Use the custom analyzer as shown | ||
``` | ||
GET /text-index/_analyze | ||
{ | ||
"tokenizer": "keyword", | ||
"char_filter": [ "custom_abbr_filter" ], | ||
"text": "FYI, updates to the workout schedule are posted. IDK when it takes effect, but we have some details. BTW, the finalized schedule will be released Monday." | ||
} | ||
``` | ||
This filter will produce the following text: | ||
``` | ||
For your information, updates to the workout schedule are posted. I don't know when it takes effect, but we have some details. By the way, the finalized schedule will be released Monday. | ||
``` | ||
|