Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Dots Discovered Key Names #4977

Open
Conklin-Spencer-bah opened this issue Sep 24, 2024 · 4 comments
Open

[BUG] Dots Discovered Key Names #4977

Conklin-Spencer-bah opened this issue Sep 24, 2024 · 4 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@Conklin-Spencer-bah
Copy link

Describe the bug
Keys with "." in them are not able to be processed.

When ingesting logs from FluentBit -> S3 -> SQS -> Data Prepper / OSIS -> OpenSearch any key that has a dot "." in it is throwing an error on ingestion, see below error from OSIS. I believe this is because the Kubernetes metadata in labels contains dots.

2024-09-24T14:08:46.611 [s3-log-pipeline-sink-worker-2-thread-2] WARN  org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy - operation = Index, status = 400, error = can't merge a non object mapping [kubernetes.labels.app] with an object mapping

The JSON blob looks as such

    "labels": {
      "app": "fooservice",
      "app.kubernetes.io/component": "foo",
      "app.kubernetes.io/instance": "foo-in-cluster",
      "app.kubernetes.io/managed-by": "Helm",
      "app.kubernetes.io/name": "fooservice",
      "app.kubernetes.io/version": "somelonghash",

If these labels aren't in the log ingestion succeeds. One challenge is that the labels vary from service to service so predicting what they will be is difficult. It would be preferable if there was a way to say "If the key found has a "." (or some other char) substitute it with "_" or whatever the user chooses.

It is possible that this is able to be done and I am unaware on how to do so.

To Reproduce

Attempt to process and ingest a log file to OpenSearch with Data Prepper with a log that has Keys that contain dots "."

Such as:

    "labels": {
      "app": "fooservice",
      "app.kubernetes.io/component": "foo",
      "app.kubernetes.io/instance": "foo-in-cluster",
      "app.kubernetes.io/managed-by": "Helm",
      "app.kubernetes.io/name": "fooservice",
      "app.kubernetes.io/version": "somelonghash",

Expected behavior
The key in double quotes is processed as a key even when dots are present.

Environment (please complete the following information):

  • AWS Managed OpenSearch Ingestion Service

Additional context
Seems this is related and was merged with a Fix. But it is unclear on how to resolve this issue.

#450

@Conklin-Spencer-bah Conklin-Spencer-bah added bug Something isn't working untriaged labels Sep 24, 2024
@KarstenSchnitter
Copy link
Collaborator

Thanks for reporting this issue. This is actually a conflict between different field types in OpenSearch. During indexing, the document is rejected because of it. The issue arises, because OpenSearch interprets dots "." in field names as nested JSON objects. Let me take your sample data and reduce it a little to illustrate the issue.

Let's say, we want to index just the following document in OpenSearch:

{
  "labels": {
    "app": "fooservice",
    "app.kubernetes.io/component": "foo"
  }
}

OpenSearch expands the key app.kubernetes.io/component and gets a conflict:

{
  "labels": {
    "app": "fooservice",
    "app": {                            // Error, is "app" a string or an object?
      "kubernetes": {
        "io/component": "foo"
      }
    }
  }
}

This issue happens a lot, when logging K8s labels or annotations. It would also occur, if Fluent Bit wrote to OpenSearch directly and is not a bug in DataPrepper per se. You can work around this issue, by replacing the dots "." with underscores "_" using a small Lua script in Fluent Bit. We have developed this snippet for our own use-cases. Such a transformation is usually known by the name dedotting in case you want to google it.

Data Prepper faces a similar issue for OpenTelemetry attributes. Here its processors dedot the attribute names by replacing certain dots "." by "@". In that case, the dedotting is hard-coded into the OpenTelemetry processors of Data Prepper. I am not that experienced with the generic Data Prepper processors, to give an example using those. The main problem to me is, that you would not want to list all field names, that should be dedotted in the pipeline configuration. In your example, it could be applied to all fields under label, but it might be different for others.

Note, that any dedotting procedure increases the divide between deployment and observability due to the altered names. Unfortunately, there is no easy way around this. The unfolding of dotted names is a major feature of OpenSearch.

@dlvenable dlvenable added enhancement New feature or request question Further information is requested and removed bug Something isn't working untriaged labels Oct 1, 2024
@Conklin-Spencer-bah
Copy link
Author

Conklin-Spencer-bah commented Oct 2, 2024

Thanks for the lead. For whatever reason doing this fixed it? All the labels and timestamp will still show up in OpenSearch. So it is somewhat puzzling.

  - delete_entries:
        with_keys: ["/kubernetes/labels/app", "ts"]

@dlvenable
Copy link
Member

@KarstenSchnitter , Thank you for the detailed comment. Do you think having a dedot processor would help here? That could be a useful feature to help with situations like this, which are somewhat common.

@Conklin-Spencer-bah , I think deleting /kubernetes/labels/app is working because you are deleting this string value. With this OpenSearch is creating documents with a structure similar to the following I expect:

{
  "kubernetes": {
    "labels" : {
      "app" : {
        "kubernetes" : {
          "io" : {
            "component" : "foo",
            "instance" : "foo-in-cluster",
            "managed-by" : "Helm",
            ...
          }
        }
      }
    }
  }
}

This is also why you needed to delete app. OpenSearch had decided that app is an object, but one app value is a string.

@dlvenable
Copy link
Member

Somewhat relatedly, we are working on dynamic key renaming in #4849. The approach in there is to support renaming keys by pattern. Still, dedotting seems a common enough pattern to possibly warrant its own processor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
Development

No branches or pull requests

3 participants