adding a page for mapping charechter filters

opensearch-project · Sep 20, 2024 · fdb2ddd · fdb2ddd
1 parent 9230b00
commit fdb2ddd
Showing 1 changed file with 89 additions and 0 deletions.
diff --git a/_analyzers/character-filters/mapping-character-filter.md b/_analyzers/character-filters/mapping-character-filter.md
@@ -0,0 +1,89 @@
+---
+layout: default
+title: Mapping Character Filter
+parent: Character Filters
+nav_order: 95
+---
+
+# Mapping character filter
+
+The `mapping character filter` allows you to define a map of `keys` and `values` for character replacements. Whenever the filter encounters a string of characters matching a key, it replaces them with the corresponding value.
+
+Matching is greedy, meaning that the longest matching pattern is prioritized. Replacements can also be empty strings if needed.
+
+The mapping character filter helps in scenarios where specific text replacements are required before tokenization.
+
+## Example of the mapping filter
+The following example demonstrates a mapping filter that converts Roman numerals (I, II, III, IV, etc.) into their corresponding Arabic numerals (1, 2, 3, 4, etc.). 
+```
+GET /_analyze
+{
+  "tokenizer": "keyword",
+  "char_filter": [
+    {
+      "type": "mapping",
+      "mappings": [
+        "I => 1",
+        "II => 2",
+        "III => 3",
+        "IV => 4",
+        "V => 5"
+      ]
+    }
+  ],
+  "text": "I have III apples and IV oranges"
+}
+```
+This filter will produce the following text:
+```
+I have 3 apples and 4 oranges
+```
+
+## Configuring the mapping filter
+There are two ways to configure the mappings. 
+1. `mappings`: Provide an array of key-value pairs in the form `key => value`. For every key found, the corresponding value will replace it in the input text.
+2. `mappings_path`: Specify the path to a UTF-8 encoded file containing key-value mappings. Each mapping should be on a new line in the format `key => value`. The path can be absolute or relative to the OpenSearch configuration directory.
+
+### Using a custom mapping character filter
+You can create a custom mapping character filter by defining your own set of mappings. The following example demonstrates the creation of a custom character filter that replaces common abbreviations in a text.
+```
+PUT /text-index
+{
+  "settings": {
+    "analysis": {
+      "analyzer": {
+        "custom_abbr_analyzer": {
+          "tokenizer": "standard",
+          "char_filter": [
+            "custom_abbr_filter"
+          ]
+        }
+      },
+      "char_filter": {
+        "custom_abbr_filter": {
+          "type": "mapping",
+          "mappings": [
+            "BTW => By the way",
+            "IDK => I don't know",
+            "FYI => For your information"
+          ]
+        }
+      }
+    }
+  }
+}
+```
+Use the custom analyzer as shown 
+```
+GET /text-index/_analyze
+{
+  "tokenizer": "keyword",
+  "char_filter": [ "custom_abbr_filter" ],
+  "text": "FYI, updates to the workout schedule are posted. IDK when it takes effect, but we have some details. BTW, the finalized schedule will be released Monday."
+}
+```
+This filter will produce the following text:
+```
+For your information, updates to the workout schedule are posted. I don't know when it takes effect, but we have some details. By the way, the finalized schedule will be released Monday.
+```
+