[APM] Avoid using _source for OTel compatibility #189947

gregkalapos · 2024-08-06T08:21:14Z

As we work towards OTel native support, we expect data to be stored in more OTel native format in Elasticsearch. E.g. see: elastic/elasticsearch#111091

The result of this is that the shape of the data will be different compared to what we currently have in the APM data streams.

At the same time, we also add a compatibility layer to make sure the current UI works with the new data. This layer is mainly based on aliases and passthrough fields.

The problem where this currently breaks is that the UI in some cases uses _source to accesses data. That is currently a blocker for the compatibly layer as some of these fields are not directly available under _source.

Specific example:

On the service summary page in this part the UI accesses fields from _source to populate the icons:

kibana/x-pack/plugins/observability_solution/apm/server/routes/services/get_service_metadata_icons.ts

Line 75 in 7167096

    
           _source: [KUBERNETES, CLOUD_PROVIDER, CONTAINER_ID, AGENT_NAME, CLOUD_SERVICE_NAME],

In this example we have a field that stores agent name.

Here is how an OTel native data in Elasticsearch will look like:

{
  "@timestamp": "2024-08-05T18:31:19.828218000Z",
  "attributes": {
    "metricset.interval": "1m",
    "metricset.name": "service_transaction",
    "processor.event": "metric",
    "transaction.root": true,
    "transaction.type": "unknown"
  },
  "data_stream": {
    "dataset": "generic.otel",
    "namespace": "default",
    "type": "metrics"
  },
  "metrics": {
    "transaction.duration.histogram": {
      "counts": [
        1
      ],
      "values": [
        12500
      ]
    }
  },
  "resource": {
    "attributes": {
      "metricset.interval": "1m",
      "service.name": "sendotlp",
      "some.resource.attribute": "resource.attr",
      "telemetry.sdk.language": "go",
      "telemetry.sdk.name": "opentelemetry",
      "telemetry.sdk.version": "1.28.0",
      "agent.name": "opentelemetry/go",
      "agent.name.text": "opentelemetry/go"
    },
    "dropped_attributes_count": 0,
    "schema_url": ""
  },
  "scope": {
    "name": "otelcol/spanmetricsconnectorv2"
  }
}

See field resource.attributes.agent.name - that is how we store attributes in OTel native data. Everything under resource.attributes can be queried as a top level field, but those fields under _source are of course still under resource.attributes.*. So in practice there is an alias from agent.name to resource.attributes.agent.name.

Currently the query above does something like this:

{
               "track_total_hits": 1,
                "size": 1,
                "_source": [
                    "kubernetes",
                    "cloud.provider",
                    "container.id",
                    "agent.name",
                    "cloud.service.name"
                ],
            
               "query": {
                    "bool": {
                        "filter": [
                            {
                                "terms": {
                                    "processor.event": [
                                        "metric",
                                        "error",
                                        "metric"
                                    ]
                                }
                            }
                        ],
                        "must": [
                            //... rest of the query
}

Where agent.name will not be returned, because it's used from _source.

Question is: is using _source needed? If e.g. this would be rewritten to use the fields API, then this will work:

                "size": 1,
                "fields": [ //<--- here use `fields` instead of `source`
                    "kubernetes",
                    "cloud.provider",
                    "container.id",
                    "agent.name",
                    "cloud.service.name"
                ],
            
               "query": {
               //... rest of the query

Of course there may be other ways to do it and there may be some downside of using fields - which I don't know of.

So the 1. proposal is to check if using the fields API is acceptable and if the answer is yes, then the APM UI should move to using that instead of _source. If that's not possible, we should discuss other options.

Non exhaustive list of _source usages

Give feedback

get_service_metadata_icons.ts
get_trace_samples_hits
get_error_group_main_statistics
Options

Sub tasks

Give feedback

Step 1: Replace _source with fields queries: [APM][Otel] PoC Otel data with APM UI: Replacing _source with fields #192606
Step 2a: Refactor the normalize function into per data structure serialization logic
Adapt all tests where needed
Step 3: Make stacktrace and span links work for OTel data (not a blocker for 8.16)
Options

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-08-06T15:00:53Z

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

cauemarcondes · 2024-08-06T15:38:15Z

From what I could see we can change to fields.

@gregkalapos There are other places on APM where we use the _source response. Do we need to change all those places too?

gregkalapos · 2024-08-06T15:43:24Z

From what I could see we can change to fields.

Nice 🎉 Great to hear that.

@gregkalapos There are other places on APM where we use the _source response. Do we need to change all those places too?

Yes, this issue is about considering completely moving away from _source in general as it can break our OTel effort. The one above was just one specific example to help understanding, but it's a general issue.

felixbarny · 2024-08-07T09:11:54Z

From an efficiency perspective particularly in combination with synthetic _source, using fields is preferable over using _source filtering. That's because the full _source first needs to be synthesized using all fields and then only a subset of _source is returned. With synthetic _source, there's an overhead proportional to the number of fields that are fetched. Re-constructing _source needs to fetch all fields.

carsonip · 2024-08-15T11:08:07Z

get_trace_samples_hits is also using _source: https://github.com/elastic/kibana/blob/74c9570258f9c58d6d84272c5c96d5b5b2282d6e/x-pack/plugins/observability_solution/apm/server/routes/transactions/trace_samples/index.ts#L112C1-L117C9

This causes an error in the UI.

carsonip · 2024-08-20T14:47:55Z

Same for get_error_group_main_statistics:

{
  "track_total_hits": false,
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "processor.event": [
              "error"
            ]
          }
        }
      ],
      "must": [
        {
          "bool": {
            "filter": [
              {
                "term": {
                  "service.name": "sendotlp"
                }
              },
              {
                "range": {
                  "@timestamp": {
                    "gte": 1724164013052,
                    "lte": 1724164913052,
                    "format": "epoch_millis"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "error_groups": {
      "terms": {
        "field": "error.grouping_key",
        "size": 500,
        "order": {
          "_count": "desc"
        }
      },
      "aggs": {
        "sample": {
          "top_hits": {
            "size": 1,
            "_source": [
              "trace.id",
              "error.log.message",
              "error.exception.message",
              "error.exception.handled",
              "error.exception.type",
              "error.culprit",
              "error.grouping_key",
              "@timestamp"
            ],
            "sort": {
              "@timestamp": "desc"
            }
          }
        }
      }
    }
  }
}

I'm going to start a tasklist in this issue to capture _source usages we've encountered

felixbarny · 2024-08-20T14:51:14Z

One usage of _source that we probably can't remove but need to adjust for OTel are span links. They're stored differently but only the _source will have the right ordering of the object array.

There are also other aspects of span links (in particular incoming links from other traces) that need adjustment for OTel.

bryce-b · 2024-08-22T21:56:33Z

I've run into a couple of queries that are a bit tricky and I'm not sure the best way to go about resolving them.
For example :

kibana/x-pack/plugins/observability_solution/apm/server/routes/traces/get_trace_items.ts

Line 209 in 0c911c8

async function getTraceDocsPerPage({

This returns an entire transaction with a nested format e.g.:

{
   ....
   transaction : {
     id: "1234",
     type: "page-load",
     duration : { 
       us: 123456
     }
     ....
   }
   ....
}

where fields will return :

{ 
  "transaction.id" : ["1234"],
  "transaction.type" : ["page-load"],
  "transaction.duration.us" : [1234567],
}

should the new field responses be marshaled into the nested format, or should downstream dependencies be rebuilt to use the new format?

bryce-b · 2024-08-29T15:57:09Z

I've got an initial PR covering a few APIs so far: #191647
I went with updating the downstream dependencies to avoid data processing in the browser, with is the preference of the UI team.

AlexanderWert · 2024-09-25T08:20:21Z

closing in favour of #192606

botelastic bot added the needs-team Issues missing a team label label Aug 6, 2024

AlexanderWert added the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label Aug 6, 2024

botelastic bot removed the needs-team Issues missing a team label label Aug 6, 2024

smith added technical debt Improvement of the software architecture and operational architecture apm OpenTelemetry apm:opentelemetry APM UI - OTEL Work needs-refinement A reason and acceptance criteria need to be defined for this issue v8.16.0 labels Aug 8, 2024

bryce-b self-assigned this Aug 13, 2024

smith removed the needs-refinement A reason and acceptance criteria need to be defined for this issue label Aug 31, 2024

crespocarlos mentioned this issue Sep 3, 2024

[APM][ECO] Order tabs accordingly based on the Signal types available #191935

Merged

jennypavlova mentioned this issue Sep 4, 2024

[APM][Otel] Add synthtrace scenarios to test with otel data #192115

Closed

AlexanderWert closed this as not planned Won't fix, can't repro, duplicate, stale Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[APM] Avoid using _source for OTel compatibility #189947

[APM] Avoid using _source for OTel compatibility #189947

gregkalapos commented Aug 6, 2024 •

edited by jennypavlova

Loading

Non exhaustive list of _source usages

Sub tasks

elasticmachine commented Aug 6, 2024

cauemarcondes commented Aug 6, 2024

gregkalapos commented Aug 6, 2024

felixbarny commented Aug 7, 2024

carsonip commented Aug 15, 2024

carsonip commented Aug 20, 2024

felixbarny commented Aug 20, 2024

bryce-b commented Aug 22, 2024 •

edited

Loading

bryce-b commented Aug 29, 2024

AlexanderWert commented Sep 25, 2024

[APM] Avoid using _source for OTel compatibility #189947

[APM] Avoid using _source for OTel compatibility #189947

Comments

gregkalapos commented Aug 6, 2024 • edited by jennypavlova Loading

Specific example:

Non exhaustive list of _source usages

Sub tasks

elasticmachine commented Aug 6, 2024

cauemarcondes commented Aug 6, 2024

gregkalapos commented Aug 6, 2024

felixbarny commented Aug 7, 2024

carsonip commented Aug 15, 2024

carsonip commented Aug 20, 2024

felixbarny commented Aug 20, 2024

bryce-b commented Aug 22, 2024 • edited Loading

bryce-b commented Aug 29, 2024

AlexanderWert commented Sep 25, 2024

gregkalapos commented Aug 6, 2024 •

edited by jennypavlova

Loading

bryce-b commented Aug 22, 2024 •

edited

Loading