Simplify non-blocking request handling #9069

marclop · 2022-09-08T07:08:14Z

In #8979, we introduced a new async flag that can be set by clients to request asynchronous handling of their events. We should improve how we handle asynchronous requests by:

Separate the BatchProcessor chains into two, communicating with a channel:
- Fast chain (or frontend?): May briefly block for authorization and rate-limiting purposes.
  - If the channel is full, fail fast for async requests, otherwise block until send completes (channel has capacity), or ctx.Done().
- Slow chain (or backend?): Performs slower operations which may block or simnifically slow down callers.
- Channel size, we should explore what kind of size makes sense in light of the sized semaphore that exists for the intake handler.
Modify the context.Context indicating that the slow so the batch processor chain can distinguish between blocking and non-blocking requests.
The stream processor no longer needs to differentiate between async/sync requests and can just call processor.ProcessBatch in line and the handling of the batch can be an implementation detail.

The text was updated successfully, but these errors were encountered:

axw · 2022-09-08T07:21:01Z

One of the motivations for doing this is to ensure that authorization and rate-limiting errors are returned to the client. When we fix that, we should make sure to update https://github.com/elastic/apm-server/blob/main/docs/api-events.asciidoc#endpoint

simitt · 2022-09-08T13:21:31Z

@axw can you elaborate on this? Afaics authorization errors are also returned in the current async implementation.

What exactly would the user facing changes be with this simplification? Does it make sense to label the async flag as tech preview until changes are implemented?

axw · 2022-09-09T02:19:03Z

@simitt all of these model processors will be run in the background goroutine:

apm-server/internal/beater/beater.go

Lines 670 to 693 in bd0969d

    
           // Add pre-processing batch processors to the beginning of the chain, 
        
           // applying only to the events that are decoded from agent/client payloads. 
        
           preBatchProcessors := modelprocessor.Chained{ 
        
           	// Add a model processor that rate limits, and checks authorization for the 
        
           	// agent and service for each event. These must come at the beginning of the 
        
           	// processor chain. 
        
           	model.ProcessBatchFunc(rateLimitBatchProcessor), 
        
           	model.ProcessBatchFunc(authorizeEventIngestProcessor), 
        
           	// Pre-process events before they are sent to the final processors for 
        
           	// aggregation, sampling, and indexing. 
        
           	modelprocessor.SetHostHostname{}, 
        
           	modelprocessor.SetServiceNodeName{}, 
        
           	modelprocessor.SetMetricsetName{}, 
        
           	modelprocessor.SetGroupingKey{}, 
        
           	modelprocessor.SetErrorMessage{}, 
        
           	modelprocessor.SetUnknownSpanType{}, 
        
           } 
        
           if s.config.DefaultServiceEnvironment != "" { 
        
           	preBatchProcessors = append(preBatchProcessors, &modelprocessor.SetDefaultServiceEnvironment{ 
        
           		DefaultServiceEnvironment: s.config.DefaultServiceEnvironment, 
        
           	}) 
        
           } 
        
           serverParams.BatchProcessor = append(preBatchProcessors, serverParams.BatchProcessor)

That includes authorization. Authentication is still done synchronously, as that happens in HTTP middleware, prior to the processor/stream code being reached. There's also request-level rate limiting happening in middleware.

The parts that will be happening asynchronously now are per-event authorization checks, and per-event rate limiting. They're still happening, it's just that any authorization error (which, in the case of Lambda, are highly unlikely) will only logged in the server and won't surface to the client. Similarly, rate limiting isn't likely to affect Lambda; it's only relevant when auth is disabled, which is usually only the case for RUM. None of the other processors are likely to fail.

What exactly would the user facing changes be with this simplification? Does it make sense to label the async flag as tech preview until changes are implemented?

IMO that's not necessary. If we already had support for service.name constraints on API Keys, then I might have a different opinion, because then per-event authorization checks would be more relevant.

kruskall · 2024-05-17T04:32:41Z

async parameter for non-blocking request handling was removed in elastic/apm-data#197

This can be closed

marclop added enhancement 8.6-candidate labels Sep 8, 2022

simitt added this to the 8.6 milestone Sep 22, 2022

simitt removed the 8.6-candidate label Sep 26, 2022

simitt added the v8.6.0 label Oct 4, 2022

simitt removed this from the 8.6 milestone Feb 24, 2023

kruskall closed this as completed May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify non-blocking request handling #9069

Simplify non-blocking request handling #9069

marclop commented Sep 8, 2022

axw commented Sep 8, 2022

simitt commented Sep 8, 2022 •

edited

Loading

axw commented Sep 9, 2022

kruskall commented May 17, 2024

Simplify non-blocking request handling #9069

Simplify non-blocking request handling #9069

Comments

marclop commented Sep 8, 2022

axw commented Sep 8, 2022

simitt commented Sep 8, 2022 • edited Loading

axw commented Sep 9, 2022

kruskall commented May 17, 2024

simitt commented Sep 8, 2022 •

edited

Loading