Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation improvements for the aggregate processor. #5035

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dlvenable
Copy link
Member

@dlvenable dlvenable commented Oct 8, 2024

Description

Adds property and class description to configurations. Corrects property order. Adds configuration classes with documentation for put_all and remove_duplicates which now allows for including these. Corrects use of enums and using @JsonValue to have usable documentation on these enums.

Continues the work done in #5026, #5025, #5019, and #5023.

Output

{
  "$schema" : "https://json-schema.org/draft/2020-12/schema",
  "type" : "object",
  "properties" : {
    "identification_keys" : {
      "description" : "An unordered list by which to group events. Events with the same values as these keys are put into the same group. If an event does not contain one of the <code>identification_keys</code>, then the value of that key is considered to be equal to <code>null</code>. At least one <code>identification_key</code> is required. And example configuration is [\"sourceIp\", \"destinationIp\", \"port\"].",
      "minItems" : 1,
      "type" : "array",
      "items" : {
        "type" : "string"
      }
    },
    "action" : {
      "anyOf" : [ {
        "type" : "object",
        "properties" : {
          "rate_limiter" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "events_per_second" : {
                "type" : "integer",
                "description" : "The number of events allowed per second."
              },
              "when_exceeds" : {
                "type" : "string",
                "enum" : [ "drop", "block" ],
                "description" : "Indicates what action the <code>rate_limiter</code> takes when the number of events received is greater than the number of events allowed per second. Default value is block, which blocks the processor from running after the maximum number of events allowed per second is reached until the next second. Alternatively, the drop option drops the excess events received in that second. Default is block"
              }
            },
            "required" : [ "events_per_second" ],
            "description" : "The <code>rate_limiter</code> action controls the number of events aggregated per second. By default, <code>rate_limiter</code> blocks the <code>aggregate</code> processor from running if it receives more events than the configured number allowed. You can overwrite the number events that triggers the <code>rate_limited</code> by using the <code>when_exceeds</code> configuration option."
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "percent_sampler" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "percent" : {
                "type" : "number",
                "description" : "The percentage of events to be processed during a one second interval. Must be greater than 0.0 and less than 100.0."
              }
            },
            "required" : [ "percent" ],
            "description" : "The <code>percent_sampler</code> action controls the number of events aggregated based on a percentage of events. The action drops any events not included in the percentage."
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "append" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "keys_to_append" : {
                "description" : "A list of keys to append to for the aggregated result.",
                "type" : "array",
                "items" : {
                  "type" : "string"
                }
              }
            },
            "description" : "Appends multiple events into a single event."
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "histogram" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "key" : {
                "type" : "string",
                "description" : "Name of the field in the events the histogram generates."
              },
              "output_format" : {
                "type" : "string",
                "enum" : [ "otel_metrics", "raw" ],
                "description" : "Format of the aggregated event. otel_metrics is the default output format which outputs in OTel metrics SUM type with count as value. Other options is - raw - which generates a JSON object with the count_key field as a count value and the start_time_key field with aggregation start time as value."
              },
              "units" : {
                "type" : "string",
                "description" : "The name of units for the values in the key. For example, bytes, traces etc"
              },
              "metric_name" : {
                "type" : "string",
                "description" : "Metric name to be used when otel format is used."
              },
              "generated_key_prefix" : {
                "type" : "string",
                "description" : "Key prefix used by all the fields created in the aggregated event. Having a prefix ensures that the names of the histogram event do not conflict with the field names in the event."
              },
              "buckets" : {
                "description" : "A list of buckets (values of type double) indicating the buckets in the histogram.",
                "type" : "array",
                "items" : {
                  "type" : "number"
                }
              },
              "record_minmax" : {
                "type" : "boolean",
                "description" : "A Boolean value indicating whether the histogram should include the min and max of the values in the aggregation."
              }
            },
            "required" : [ "key", "units", "buckets" ],
            "description" : "The <code>histogram</code> action aggregates events belonging to the same group and generates a new event with values of the <code>identification_keys</code> and histogram of the aggregated events based on a configured <code>key</code>. The histogram contains the number of events, sum, buckets, bucket counts, and optionally min and max of the values corresponding to the <code>key</code>. The action drops all events that make up the combined event."
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "remove_duplicates" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "description" : "The <code>remove_duplicates</code> action processes the first event for a group immediately and drops any events that duplicate the first event from the source."
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "put_all" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "description" : "The <code>put_all</code> action combines events belonging to the same group by overwriting existing keys and adding new keys, similarly to the Java `Map.putAll`. The action drops all events that make up the combined event."
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "count" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "output_format" : {
                "type" : "string",
                "enum" : [ "otel_metrics", "raw" ],
                "description" : "Format of the aggregated event. Specifying <code>otel_metrics</code> outputs aggregate events in OTel metrics SUM type with count as value. Specifying <code>raw</code> outputs aggregate events as with the <code>count_key</code> field as a count value and includes the <code>start_time_key</code> and <code>end_time_key</code> keys."
              },
              "metric_name" : {
                "type" : "string",
                "description" : "Metric name to be used when the OTel metrics format is used. The default value is <code>count</code>."
              },
              "count_key" : {
                "type" : "string",
                "description" : "The key in the aggregate event that will have the count value. This is the count of events in the aggregation. Default name is <code>aggr._count</code>."
              },
              "start_time_key" : {
                "type" : "string",
                "description" : "The key in the aggregate event that will have the start time of the aggregation. Default name is <code>aggr._start_time</code>."
              },
              "end_time_key" : {
                "type" : "string",
                "description" : "The key in the aggregate event that will have the end time of the aggregation. Default name is <code>aggr._end_time</code>."
              },
              "unique_keys" : {
                "description" : "List of unique keys to count.",
                "type" : "array",
                "items" : {
                  "type" : "string"
                }
              }
            },
            "description" : "The <code>count</code> action counts events that belong to the same group and generates a new event with values of the <code>identification_keys</code> and the count, which indicates the number of new events."
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "tail_sampler" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "wait_period" : {
                "type" : "string",
                "format" : "duration",
                "description" : "Period to wait before considering that a trace event is complete"
              },
              "percent" : {
                "type" : "integer",
                "description" : "Percent value to use for sampling non error events. Must be greater than 0.0 and less than 100.0"
              },
              "condition" : {
                "type" : "string",
                "description" : "A <a href=\"https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/\">conditional expression</a>, such as '/some-key == \"test\"', that will be evaluated to determine whether the event is an error event or not"
              }
            },
            "required" : [ "wait_period", "percent" ],
            "description" : "The <code>tail_sampler</code> action samples OpenTelemetry traces after collecting spans for a trace."
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      } ]
    },
    "group_duration" : {
      "type" : "string",
      "format" : "duration",
      "description" : "The amount of time that a group should exist before it is concluded automatically. Supports ISO_8601 notation strings (\"PT20.345S\", \"PT15M\", etc.) as well as simple notation for seconds (\"60s\") and milliseconds (\"1500ms\"). Default value is 180s."
    },
    "local_mode" : {
      "type" : "boolean",
      "description" : "When <code>local_mode<code> is set to true, the aggregation is performed locally on each node instead of forwarding events to a specific node based on the <code>identification_keys</code> using a hash function. Default is false."
    },
    "output_unaggregated_events" : {
      "type" : "boolean",
      "description" : "A boolean indicating if the unaggregated events should be forwarded to the next processor or sink in the chain."
    },
    "aggregated_events_tag" : {
      "type" : "string",
      "description" : "Tag to be used for aggregated events to distinguish aggregated events from unaggregated events."
    },
    "aggregate_when" : {
      "type" : "string",
      "description" : "A <a href=\"https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/\">conditional expression</a>, such as '/some-key == \"test\"', that will be evaluated to determine whether the processor will be run on the event."
    }
  },
  "required" : [ "identification_keys", "action", "local_mode" ],
  "description" : "The <code>aggregate</code> processor groups events based on the values of identification_keys. Then, the processor performs an action on each group, helping reduce unnecessary log volume and creating aggregated logs over time.",
  "name" : "aggregate",
  "documentation" : "https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/aggregate/"
}

Issues Resolved

None

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Adds property and class description to configurations. Corrects property order. Adds configuration classes with documentation for put_all and remove_duplicates which now allows for including these. Corrects use of enums and using @JsonValue to have usable documentation on these enums.

Signed-off-by: David Venable <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants