Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify non-blocking request handling #9069

Closed
marclop opened this issue Sep 8, 2022 · 4 comments
Closed

Simplify non-blocking request handling #9069

marclop opened this issue Sep 8, 2022 · 4 comments

Comments

@marclop
Copy link
Contributor

marclop commented Sep 8, 2022

In #8979, we introduced a new async flag that can be set by clients to request asynchronous handling of their events. We should improve how we handle asynchronous requests by:

  • Separate the BatchProcessor chains into two, communicating with a channel:
    • Fast chain (or frontend?): May briefly block for authorization and rate-limiting purposes.
      • If the channel is full, fail fast for async requests, otherwise block until send completes (channel has capacity), or ctx.Done().
    • Slow chain (or backend?): Performs slower operations which may block or simnifically slow down callers.
    • Channel size, we should explore what kind of size makes sense in light of the sized semaphore that exists for the intake handler.
  • Modify the context.Context indicating that the slow so the batch processor chain can distinguish between blocking and non-blocking requests.
  • The stream processor no longer needs to differentiate between async/sync requests and can just call processor.ProcessBatch in line and the handling of the batch can be an implementation detail.
@axw
Copy link
Member

axw commented Sep 8, 2022

One of the motivations for doing this is to ensure that authorization and rate-limiting errors are returned to the client. When we fix that, we should make sure to update https://github.com/elastic/apm-server/blob/main/docs/api-events.asciidoc#endpoint

@simitt
Copy link
Contributor

simitt commented Sep 8, 2022

@axw can you elaborate on this? Afaics authorization errors are also returned in the current async implementation.

What exactly would the user facing changes be with this simplification? Does it make sense to label the async flag as tech preview until changes are implemented?

@axw
Copy link
Member

axw commented Sep 9, 2022

@simitt all of these model processors will be run in the background goroutine:

// Add pre-processing batch processors to the beginning of the chain,
// applying only to the events that are decoded from agent/client payloads.
preBatchProcessors := modelprocessor.Chained{
// Add a model processor that rate limits, and checks authorization for the
// agent and service for each event. These must come at the beginning of the
// processor chain.
model.ProcessBatchFunc(rateLimitBatchProcessor),
model.ProcessBatchFunc(authorizeEventIngestProcessor),
// Pre-process events before they are sent to the final processors for
// aggregation, sampling, and indexing.
modelprocessor.SetHostHostname{},
modelprocessor.SetServiceNodeName{},
modelprocessor.SetMetricsetName{},
modelprocessor.SetGroupingKey{},
modelprocessor.SetErrorMessage{},
modelprocessor.SetUnknownSpanType{},
}
if s.config.DefaultServiceEnvironment != "" {
preBatchProcessors = append(preBatchProcessors, &modelprocessor.SetDefaultServiceEnvironment{
DefaultServiceEnvironment: s.config.DefaultServiceEnvironment,
})
}
serverParams.BatchProcessor = append(preBatchProcessors, serverParams.BatchProcessor)

That includes authorization. Authentication is still done synchronously, as that happens in HTTP middleware, prior to the processor/stream code being reached. There's also request-level rate limiting happening in middleware.

The parts that will be happening asynchronously now are per-event authorization checks, and per-event rate limiting. They're still happening, it's just that any authorization error (which, in the case of Lambda, are highly unlikely) will only logged in the server and won't surface to the client. Similarly, rate limiting isn't likely to affect Lambda; it's only relevant when auth is disabled, which is usually only the case for RUM. None of the other processors are likely to fail.

What exactly would the user facing changes be with this simplification? Does it make sense to label the async flag as tech preview until changes are implemented?

IMO that's not necessary. If we already had support for service.name constraints on API Keys, then I might have a different opinion, because then per-event authorization checks would be more relevant.

@simitt simitt added this to the 8.6 milestone Sep 22, 2022
@simitt simitt added the v8.6.0 label Oct 4, 2022
@simitt simitt removed this from the 8.6 milestone Feb 24, 2023
@kruskall
Copy link
Member

async parameter for non-blocking request handling was removed in elastic/apm-data#197

This can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants