Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for field with multiple names / hard link aliases #109954

Open
ruflin opened this issue Jun 20, 2024 · 7 comments
Open

Support for field with multiple names / hard link aliases #109954

ruflin opened this issue Jun 20, 2024 · 7 comments
Assignees
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@ruflin
Copy link
Contributor

ruflin commented Jun 20, 2024

With the move to ECS and semconv, what fields to use has become more and more standardised and users are migrating to it. Part of the migration is using field aliases to point to the ECS field so the data shipper does not have be adjusted but queries against the data can be written in a standardised way. The challenge is, multiple shippers send data to the same data stream with different names for the same field.

Lets go through an example where host.name exists in 3 fields:

  • host.name: The recommended ECS field, used for querying and eventually shipping data
  • host_name: The field that is used by some shippers
  • resource.attributes.host.name: Field that comes from some otel data

The user wants to write queries against host.name. At first, an alias is created to point host.name to host_name:

PUT alias-challenge
{
  "mappings": {
    "properties": {
      "host_name": {
        "type": "keyword"
      },
      "host.name": {
        "type": "alias",
        "path": "host_name"
      },
      "resource.host.name": {
        "type": "keyword"
      }
    }
  }
}

Now the user can query on host_name and host.name. Unfortunately the resource.host.name is not included yet because an array for aliases is not supported:

PUT alias-challenge
{
  "mappings": {
    "properties": {
      "host_name": {
        "type": "keyword"
      },
      "host.name": {
        "type": "alias",
        "path": ["host_name", "resource.host.name"]
      },
      "resource.host.name": {
        "type": "keyword"
      }
    }
  }
}

But having an array for aliases is not enough. Eventually some of the shippers migrate to use host.name which means data is sent to the alias itself:

POST alias-challenge/_doc
{
  "host.name": "elastic.co"
}

This leads to the error "reason": "[2:16] Cannot write to a field alias [host.name].". The ideal scenario would be that a single field could have multiple names or compared to the linux file system, hard links can be created. It is possible to query and ingest into all field names and the field exists until the last reference is removed.

Having support for hard links would simplify the migration to ECS / semconv for users.

Doing the standardisation in an ingest pipeline is a not a solution as also the old fields still have to be queried. One solution that sometimes is used to work around the limitation is duplicate the data into each field, but that is not simple and has a negative impact on storage.

Implementation ideas

Two ideas below on how this could look like, but I'm sure better solutions can be found.

Idea 1:

PUT alias-challenge
{
  "mappings": {
    "properties": {
      "host.name": {
        "type": "keyword",
        "hard_links": ["host_name", "resource.host.name"]
      }
    }
  }
}

Idea 2:

PUT alias-challenge
{
  "mappings": {
    "properties": {
      "host_name": {
        "type": "hard_link"
        "path": "host.name"
      },
      "host.name": {
        "type": "keyword",
      },
      "resource.host.name": {
        "type": "hard_link",
        "path": "host.name"
      }
    }
  }
}

Related links / discussions

@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jun 20, 2024
@felixbarny
Copy link
Member

felixbarny commented Jun 20, 2024

Another idea:

PUT alias-challenge
{
  "mappings": {
    "properties": {
      "host_name": {
        "type": "keyword"
      },
      "host.name": {
        "type": "keyword",
      },
      "resource.attributes.host.name": {
        "type": "keyword"
      }
    }
  },
  "aliases": {
    ["host.name", "host_name", "resource.attributes.host.name"]
  }
}

This has the advantage that the field types don't need to be changed (which isn't possible for an existing index). The aliases section could behave similar to the runtime section in the sense that it can be dynamically changed on an existing index mapping.

While we would expect that only one of the fields is present, there's the edge case where both host.name and host_name is present on the doc. The order at which the aliases are defined then determines from which field to fetch (first one wins).

@henningandersen henningandersen added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types and removed needs:triage Requires assignment of a team area label labels Jun 20, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jun 20, 2024
@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@dbrimley dbrimley self-assigned this Jul 19, 2024
@flash1293
Copy link
Contributor

flash1293 commented Jan 13, 2025

@felixbarny For the use case we discussed, I'm not sure about one semantic detail. By the existing otel aliasing logic, resource.attributes.my.attribute and my.attribute are the same field, and body.structured.my.attribute and my.attribute are the same field, but resource.attributes.my.attribute and body.structured.my.attribute are not.

In your API example from above, how would this look like? I could imagine the following:

PUT alias-challenge
{
  "mappings": {
    "properties": {
      "body.structured.my.attribute": {
        "path": "keyword"
      },
      "my.attribute": {
        "type": "keyword",
      },
      "resource.attributes.my.attribute": {
        "type": "keyword"
      }
    }
  },
  "aliases": [
    ["my.attribute", "resource.attributes.my.attribute"],
    ["my.attribute", "body.structured.my.attribute"]
  ]
}

Then, when I search

  • my.attribute, it will consider data in my.attribute, resource.attributes.my.attribute and body.structured.my.attribute (because both alias "groups" are matched)
  • resource.attributes.my.attribute, it will consider data in my.attribute and resource.attributes.my.attribute, but not body.structured.my.attribute
  • body.structured.my.attribute, it will consider data in my.attribute and body.structured.my.attribute, but not resource.attributes.my.attribute

Does this make sense?

@felixbarny
Copy link
Member

Given some of the recent changes in OTel regarding body vs attributes for events (open-telemetry/semantic-conventions#1651), I think we should not map body.structured as a passthrough field and instead map it as a flattened field. This would also simplify the aliasing issue you mentioned, because body.structured.my.attribute and attributes.my.attribute are not equivalent.

@flash1293
Copy link
Contributor

@felixbarny would we still have the same problem with attributes.* vs resource.attributes.*?

@felixbarny
Copy link
Member

Yeah, you're right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

7 participants