From ef336a699ab2d6c10f15cbe8d2233507872277b2 Mon Sep 17 00:00:00 2001 From: Manfred Moser Date: Tue, 15 Oct 2024 17:44:03 -0700 Subject: [PATCH] Improve grammar and wording To enable better understanding of how to write routing rules. Co-authored-by: Will Morrison --- docs/routing-rules.md | 266 ++++++++++++++++++++++-------------------- 1 file changed, 138 insertions(+), 128 deletions(-) diff --git a/docs/routing-rules.md b/docs/routing-rules.md index 76c4d37ec..78def563c 100644 --- a/docs/routing-rules.md +++ b/docs/routing-rules.md @@ -3,25 +3,29 @@ Trino Gateway includes a routing rules engine. By default, Trino Gateway reads the `X-Trino-Routing-Group` request header to -route requests. If this header is not specified, requests are sent to default -routing group (adhoc). +route requests. If this header is not specified, requests are sent to the +default routing group called `adhoc`. -The routing rules engine feature enables you to either write custom logic to route -requests based on the request info such as any of the [request +The routing rules engine feature enables you to either write custom logic to +route requests based on the request info such as any of the [request headers](https://trino.io/docs/current/develop/client-protocol.html#client-request-headers), -or set a URL address to make an HTTP POST request and route based on the returned result. +or set a URL address to make an HTTP POST request and route based on the +returned result. -Routing rules are separated from Trino Gateway application code to a -configuration file or a separate service. This separate service is specified as a URL -and can implement any dynamic rule changes or other behavior. +Routing rules are defined in a configuration file or implemented in separate, +custom service application. The connection to the separate service is configured +as a URL. It can implement any dynamic rule changes or other behavior. ### Enabling the routing rules engine -To enable the routing rules engine, find the following lines in `gateway-ha-config.yml`. +To enable the routing rules engine, find the following lines in +`gateway-ha-config.yml`: * Set `rulesEngineEnabled` to `true`, then `rulesType` as `FILE` or `EXTERNAL`. -* Then either add `rulesConfigPath` to the path to your rules config file or set `rulesExternalConfiguration` - to the URL of an external service for routing rules processing. +* If you set `rulesType: FILE`, then set `rulesConfigPath` to the path to your + rules config file. +* If you set `rulesType: EXTERNAL`, set `rulesExternalConfiguration` to the URL + of an external service for routing rules processing. * `rulesType` is by default `FILE` unless specified. ```yaml @@ -36,21 +40,24 @@ routingRules: - 'Accept-Encoding' ``` -* Redirect URLs are not supported -* Optionally add headers to the `excludeHeaders` list to exclude requests with corresponding header values - from being sent in the POST request. -* Check headers to exclude when making API requests, specifics depend on the network configuration. +* Redirect URLs are not supported. +* Optionally, add headers to the `excludeHeaders` list to exclude requests with + corresponding header values from being sent in the POST request. +* Check headers to exclude when making API requests, specifics depend on the + network configuration. -If there is error parsing the routing rules configuration file, an error is logged, -and requests are routed using the routing group header `X-Trino-Routing-Group` as default. +If there is error parsing the routing rules configuration file, an error is +logged, and requests are routed using the routing group header +`X-Trino-Routing-Group` as default. ### Use an external service for routing rules You can use an external service for processing your routing by setting the `rulesType` to `EXTERNAL` and configuring the `rulesExternalConfiguration`. -Trino Gateway then sends all headers as a map in the body of a POST request to the external service. -Headers specified in `excludeHeaders` are excluded. If `requestAnalyzerConfig.analyzeRequest` is set to `true`, +Trino Gateway then sends all headers, other than those specified in +`excludeHeaders`, as a map in the body of a POST request to the external +service. If `requestAnalyzerConfig.analyzeRequest` is set to `true`, `TrinoRequestUser` and `TrinoQueryProperties` are also included. Additionally, the following HTTP information is included: @@ -64,8 +71,8 @@ Additionally, the following HTTP information is included: - `remoteHost` - `parameterMap` -The external service can process the information in any way desired -and must return a result with the following criteria: +The external service can process the information in any way desired and must +return a result with the following criteria: * Response status code of OK (200) * Message in JSON format @@ -86,10 +93,10 @@ and must return a result with the following criteria: ### Configure routing rules with a file To express and fire routing rules, we use the -[easy-rules](https://github.com/j-easy/easy-rules) engine. These rules should be +[easy-rules](https://github.com/j-easy/easy-rules) engine. These rules must be stored in a YAML file. Rules consist of a name, description, condition, and list -of actions. If the condition of a particular rule evaluates to true, its actions -are fired. +of actions. If the condition of a particular rule evaluates to `true`, its +actions are fired. ```yaml --- @@ -112,15 +119,16 @@ object called `request`. Rules may also utilize [trinoRequestUser](#trinorequestuser) and [trinoQueryProperties](#trinoqueryproperties) objects, which provide information about the user and query respectively. -There should be at least one action of the form -`result.put(\"routingGroup\", \"foo\")` which says that if a request satisfies -the condition, it should be routed to `foo`. +You must include an action of the form `result.put(\"routingGroup\", \"foo\")` +to trigger routing of a request that satisfies the condition to the specific +routing group. Without this action, the default adhoc group is used and the +whole routing rule is redundant. The condition and actions are written in [MVEL](http://mvel.documentnode.com/), -an expression language with Java-like syntax. In most cases, users can write -their conditions/actions in Java syntax and expect it to work. There are some -MVEL-specific operators that could be useful though. For example, instead of -doing a null-check before accessing the `String.contains` method like this: +an expression language with Java-like syntax. In most cases, you can write +conditions and actions in Java syntax and expect it to work. There are some +MVEL-specific operators. For example, instead of doing a null-check before +accessing the `String.contains` method like this: ```yaml condition: 'request.getHeader("X-Trino-Client-Tags") != null && request.getHeader("X-Trino-Client-Tags").contains("label=foo")' @@ -132,73 +140,77 @@ You can use the `contains` operator condition: 'request.getHeader("X-Trino-Client-Tags") contains "label=foo"' ``` -If no rules match, then request is routed to adhoc. +If no rules match, then the request is routed to the default `adhoc` routing +group. ### TrinoStatus -This class attempts to track the current state of Trino cluster. It is updated per every healthcheck. -There are three possible states +The `TrinoStatus` class attempts to track the current state of the configured +Trino clusters. The three possible states of these cluster are updated with +every healthcheck: -- PENDING - - A Trino cluster will show this state when it is still starting up. It will be treated as - unhealthy by RoutingManager, and therefore requests will not be routed to PENDING clusters -- HEALTHY - - A Trino cluster will show this state when healthchecks report clusters as healthy and ready. - RoutingManager will only route requests to healthy clusters -- UNHEALTHY - - A Trino cluster will show this state when healthchecks report clusters as unhealthy. RoutingManager - will not route requests to unhealthy clusters. +- `PENDING`: A Trino cluster shows this state when it is still starting up. It + is treated as unhealthy by `RoutingManager`, and therefore requests are + not be routed to these clusters. +- `HEALTHY`: A Trino cluster shows this state when healthchecks report + the cluster as healthy and ready. `RoutingManager` only routes requests to + healthy clusters. +- `UNHEALTHY`: A Trino cluster shows this state when healthchecks report the + cluster as unhealthy. `RoutingManager` does not route requests to unhealthy + clusters. ### TrinoRequestUser -This class attempts to extract the user from a request. In order, it attempts +The `TrinoRequestUser` class attempts to extract user information from a +request, in the following order: -1. The `X-Trino-User` header -2. The `Authorization: Basic` header -3. The `Authorization: Bearer` header. Requires configuring an OAuth2 User Info URL -4. The `Trino-UI-Token` or `__Secure-Trino-ID-Token` cookie +1. `X-Trino-User` header. +2. `Authorization: Basic` header. +3. `Authorization: Bearer` header. Requires configuring an OAuth2 User Info URL. +4. `Trino-UI-Token` or `__Secure-Trino-ID-Token` cookie. Kerberos and Certificate authentication are not currently supported. If the -request contains the `Authorization: Bearer` header, an attempt will be made to -treat the token as a JWT and deserialize it. If this is successful, the -value of the claim named in `requestAnalyzerConfig.tokenUserField` is used as -the username. By default, this is the `email` claim. If the token is not a valid -JWT, and `requestAnalyzerConfig.oauthTokenInfoUrl` is configured, then the token -will be exchanged with the Info URL. Responses are cached for 10 minutes to -avoid triggering rate limits. +request contains the `Authorization: Bearer` header, an attempt is made to treat +the token as a JWT and deserialize it. If this is successful, the value of the +claim named in `requestAnalyzerConfig.tokenUserField` is used as the username. +By default, this is the `email` claim. If the token is not a valid JWT, and +`requestAnalyzerConfig.oauthTokenInfoUrl` is configured, then the token is +exchanged with the Info URL. Responses are cached for 10 minutes to avoid +triggering rate limits. You may call `trinoRequestUser.getUser()` and `trinoRequestUser.getUserInfo()` -in your routing rules. If a user was not successfully extracted, -`trinoRequestUser.getUser()` will return an empty -[Optional](https://docs.oracle.com/javase/8/docs/api/java/util/Optional.html). -`trinoRequestUser.getUserInfo()` will return an -[Optional\](https://www.javadoc.io/doc/com.nimbusds/oauth2-oidc-sdk/5.34/com/nimbusds/openid/connect/sdk/claims/UserInfo.html) +in your routing rules. If user information was not successfully extracted, +`trinoRequestUser.getUser()` returns an empty `Optional`. +`trinoRequestUser.getUserInfo()` returns an `Optional`, with an +[OpenID Connect UserInfo](https://www.javadoc.io/doc/com.nimbusds/oauth2-oidc-sdk/5.34/com/nimbusds/openid/connect/sdk/claims/UserInfo.html) if a token is successfully exchanged with the `oauthTokenInfoUrl`, and an empty `Optional` otherwise. `trinoRequestUser.userExistsAndEquals("usernameToTest")` can be used to check a -username against the extracted user. It will return `False` if a user has not -been extracted. +username against the extracted user. It returns `false` if a user has not been +extracted. User extraction is only available if enabled by configuring `requestAnalyzerConfig.analyzeRequest = True` ### TrinoQueryProperties -This class attempts to parse the body of a request as SQL. Note that only a -syntactic analysis is performed! If a query œreferences a view, then that -view will not be expanded, and tables referenced by the view will not be -recognized. Note that Views and Materialized Views are treated as tables and -added to the list of tables in all contexts, including statements such as -`CREATE VIEW`. +The `TrinoQueryProperties` class attempts to parse the body of a request to +determine the SQL statement and other information. Note that only a +syntactic analysis is performed. + +If a query references a view, then that view is not expanded, and tables +referenced by the view are not recognized. Views and materialized views are +treated as tables and added to the list of tables in all contexts, including +statements such as `CREATE VIEW`. A routing rule can call the following methods on the `trinoQueryProperties` object: -* `String errorMessage()`: the error message only if there was any error while - creating `trinoQueryProperties` object. -* `boolean isNewQuerySubmission()`: is the request a POST to the `v1/statement` - query endpoint. +* `String errorMessage()`: the error message, only if there was any error while + creating the `trinoQueryProperties` object. +* `boolean isNewQuerySubmission()`: boolean flag to indicate if the + request is a POST to the `v1/statement` query endpoint. * `String getQueryType()`: the class name of the `Statement`, e.g. `ShowCreate`. Note that these are not mapped to the `ResourceGroup` query types. For a full list of potential query types, see the classes in @@ -227,50 +239,51 @@ object: ### Configuration The `trinoQueryProperties` are configured under the `requestAnalyzerConfig` -configuration node. +configuration node. -#### analyzeRequest +`analyzeRequest`: -Set to `True` to make `trinoQueryProperties` and `trinoRequestUser` available +Set to `True` to make `trinoQueryProperties` and `trinoRequestUser` available. -#### maxBodySize +`maxBodySize`: By default, the max body size is 1,000,000 characters. This can be modified by configuring `maxBodySize`. If the request body is greater or equal to this -limit, Trino Gateway will not process the query. A buffer of length -`maxBodySize` will be allocated per query, so reduce this value if you observe -excessive GC. `maxBodySize` cannot be set to values larger than 2**31-1, the -maximum size of a Java String. +limit, Trino Gateway does not process the query. A buffer of length +`maxBodySize` is allocated per query. Reduce this value if you observe +excessive garbage collection at runtime. `maxBodySize` cannot be set to values +larger than 2**31-1, the maximum size of a Java String. -#### isClientsUseV2Format +`isClientsUseV2Format`: Some commercial extensions to Trino use the V2 Request Structure -[V2 style request structure](https://github.com/trinodb/trino/wiki/Trino-v2-client-protocol#submit-a-query). Support for V2-style requests can be enabled -by setting this property to true. If you use a commercial version of Trino, ask -your vendor how to set this configuration. +[V2 style request structure](https://github.com/trinodb/trino/wiki/Trino-v2-client-protocol#submit-a-query). +Support for V2-style requests can be enabled by setting this property to `true`. +If you use a commercial version of Trino, ask your vendor how to set this +configuration. -#### tokenUserField +`tokenUserField`: When extracting the user from a JWT token, this field is used as the username. By default, the `email` claim is used. -#### oauthTokenInfoUrl +`oauthTokenInfoUrl`: -If configured, then Trino will attempt to retrieve user info by exchanging +If configured, Trino Gateway attempts to retrieve the user info by exchanging potential authorization tokens with this URL. Responses are cached for 10 minutes to avoid triggering rate limits. -### Execution of Rules +### Execution of rules -All rules whose conditions are satisfied will fire. For example, in the -"airflow" and "airflow special" example rules given above, a query with source -`airflow` and label `special` will satisfy both rules. The `routingGroup` is set -to `etl` and then to `etl-special` because of the order in which the rules of -defined. If we swap the order of the rules, then we would possibly get `etl` -instead, which is undesirable. +All rules whose conditions are satisfied fire. For example, in the "airflow" +and "airflow special" example rules from the following rule priority section, a +query with source `airflow` and label `special` satisfies both rules. The +`routingGroup` is set to `etl` and then to `etl-special` because of the order in +which the rules of defined. If you swap the order of the rules, then you get +`etl` instead. -One could solve this by writing the rules such that they're atomic (any query -will match exactly one rule). For example we can change the first rule to +You can avoid this ordering issue by writing atomic rules, so any query matches +exactly one rule. For example you can change the first rule to the following: ```yaml --- @@ -282,17 +295,16 @@ actions: --- ``` -This could be hard to maintain as we add more rules. To have better control over -the execution of rules, we could use rule priorities and composite rules. -Overall, with priorities, composite rules, and the constructs that MVEL support, -you should likely be able to express your routing logic. +This can difficult to maintain with more rules. To have better control over the +execution of rules, we can use rule priorities and composite rules. Overall, +priorities, composite rules, and other constructs that MVEL support allows +you to express your routing logic. -#### Rule Priority +#### Rule priority -We can assign an integer value `priority` to a rule. The lower this integer is, -the earlier it will fire. If the priority is not specified, the priority is -defaulted to INT_MAX. We can add priorities to our airflow and airflow special -rule like so: +You can assign an integer value `priority` to a rule. The lower this integer is, +the earlier it fires. If the priority is not specified, the priority defaults to +`INT_MAX`. Following is an example with priorities: ```yaml --- @@ -311,25 +323,23 @@ actions: - 'result.put("routingGroup", "etl-special")' ``` -Note that both rules will still fire. The difference is that we've guaranteed +Note that both rules still fire. The difference is that you are guaranteed that the first rule (priority 0) is fired before the second rule (priority 1). Thus `routingGroup` is set to `etl` and then to `etl-special`, so the -`routingGroup` will always be `etl-special` in the end. +`routingGroup` is always `etl-special` in the end. -Above, the more specific rules have less priority since we want them to be the -last to set `routingGroup`. This is a little counterintuitive. To further -control the execution of rules, for example to have only one rule fire, we can -use composite rules. +More specific rules must be set to a lesser priority so they are evaluated last +to set a `routingGroup`. To further control the execution of rules, for example +to have only one rule fire, you can use composite rules. -##### Composite Rules +##### Composite rules -First, please refer to easy-rule composite rules docs: -https://github.com/j-easy/easy-rules/wiki/defining-rules#composite-rules +First, please refer to the [easy-rule composite rules documentation](https://github.com/j-easy/easy-rules/wiki/defining-rules#composite-rules). -Above, we saw how to control the order of rule execution using priorities. In -addition to this, we could have only the first rule matched to be fired (the -highest priority one) and the rest ignored. We can use `ActivationRuleGroup` to -achieve this. +The preceding section covers how to control the order of rule execution using +priorities. In addition, you can configure evaluation so that only the first +rule matched fires (the highest priority one) and the rest is ignored. You can +use `ActivationRuleGroup` to achieve this: ```yaml --- @@ -352,14 +362,14 @@ composingRules: ``` Note that the priorities have switched. The more specific rule has a higher -priority, since we want it to be fired first. A query coming from airflow with -special label is matched to the "airflow special" rule first, since it's higher +priority, since it should fire first. A query coming from airflow with special +label is matched to the "airflow special" rule first, since it's higher priority, and the second rule is ignored. A query coming from airflow with no labels does not match the first rule, and is then tested and matched to the second rule. -We can also use `ConditionalRuleGroup` and `ActivationRuleGroup` to implement an -if/else workflow. The following logic in pseudocode: +You can also use `ConditionalRuleGroup` and `ActivationRuleGroup` to implement +an if/else workflow. The following logic in pseudocode: ```text if source == "airflow": @@ -371,7 +381,7 @@ if source == "airflow": return "etl" ``` -Can be implemented with these rules: +This logic can be implemented with the following rules: ```yaml name: "airflow rule group" @@ -408,10 +418,10 @@ composingRules: ##### If statements (MVEL Flow Control) -Above, we saw how we can use `ConditionalRuleGroup` and `ActivationRuleGroup` to -implement and `if/else` workflow. We could also take advantage of the fact that -MVEL supports `if` statements and other flow control (loops, etc). The following -logic in pseudocode: +In the preceding section you see how `ConditionalRuleGroup` and +`ActivationRuleGroup` are used to implement an `if/else` workflow. You can +use MVEL support for `if` statements and other flow control. The following logic +in pseudocode: ```text if source == "airflow": @@ -423,7 +433,7 @@ if source == "airflow": return "etl" ``` -Can be implemented with these rules: +This logic Can be implemented with the following rules: ```yaml ---