Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime-Error: invalid memory address or nil pointer dereference when agent is dropping spans #11655

Closed
SecTex opened this issue Sep 18, 2023 · 1 comment
Labels

Comments

@SecTex
Copy link

SecTex commented Sep 18, 2023

APM Server version (apm-server version): 8.9.2

Description of the problem including expected versus actual behavior: A "invalid memory address or null pointer dereference" error seems to be triggered when an APM agent drops spans, e.g. due to a very high number of spans and a low TransactionMaxSpans value.
In my case, the APM agent was configured to use the default value of TransactionMaxSpans. After adding instrumentation
via Elastic.Apm.StackExchange.Redis, the agent had to drop spans, which triggered this error. Increasing TransactionMaxSpans temporarily fixes this error.

Steps to reproduce:

  1. Setup an Elastic-Agent
  2. Add the APM integration and enable API key authentication
  3. Create a new API key for authentication
  4. Send the following request to the APM server (replace <YOUR-API-KEY> and <APM-SERVER-HOST>):
    curl -i -s -k -X $'POST' \
        -H $'Authorization: ApiKey <YOUR-API-KEY>' \
        -H $'Content-Type: application/x-ndjson; charset=utf-8' \
        --data-binary $'{\"metadata\":{\"service\":{\"agent\":{\"name\":\"dotnet\",\"version\":\"1.0.0\"},\"name\":\"Service-Name\"}}}\x0a{\"transaction\":{\"dropped_spans_stats\":[{\"destination_service_resource\":\"redis\",\"service_target_type\":\"redis\",\"duration.count\":0,\"duration.sum.us\":0}],\"duration\":0,\"id\":\"bde2ae8f01da883a\",\"sampled\":false,\"span_count\":{\"dropped\":25,\"started\":500},\"timestamp\":1695039052901393,\"trace_id\":\"4844c038b6e7bf7ce1fb89e10ee9ed01\",\"type\":\"request\"}}\x0a' \
    $'https://<APM-SERVER-HOST>/intake/v2/events'
  5. Observe a HTTP 500 response
  6. The same request without the dropped_spans_stats array does not trigger this error, suggesting that this is the cause of the problem

Provide logs (if relevant):

goroutine 2670 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x65
github.com/elastic/apm-server/internal/beater/middleware.RecoverPanicMiddleware.func1.1.1()
	github.com/elastic/apm-server/internal/beater/middleware/recover_panic_middleware.go:55 +0x12f
panic({0x55c4b8a17ee0, 0x55c4b9c7bd40})
	runtime/panic.go:884 +0x212
github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics.(*Aggregator).processDroppedSpanStats(0xc00025e900, 0xc05abd06e0, 0xc00023e070)
	github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics/aggregator.go:294 +0x6b
github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics.(*Aggregator).ProcessBatch(0xc00025e900, {0xc?, 0xc05c2ca80a?}, 0xc0601d0000?)
	github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics/aggregator.go:220 +0x27a
github.com/elastic/apm-data/model/modelprocessor.Chained.ProcessBatch(...)
	github.com/elastic/[email protected]/model/modelprocessor/chained.go:33
github.com/elastic/apm-data/model/modelprocessor.Chained.ProcessBatch(...)
	github.com/elastic/[email protected]/model/modelprocessor/chained.go:33
github.com/elastic/apm-data/input/elasticapm.(*Processor).processBatch(0x55c4b9ca7400?, {0x55c4b8cc0d80?, 0xc05c7966f0?}, {0x55c4b8cb64a0?, 0xc000013de8?}, 0xc05a4fdea8?)
	github.com/elastic/[email protected]/input/elasticapm/processor.go:373 +0x78
github.com/elastic/apm-data/input/elasticapm.(*Processor).handleStream(0xc000347cc0, {0x55c4b8cc0d80?, 0xc05c7966f0}, 0x0, 0x0?, 0x0?, 0x0?, {0x55c4b8cb64a0?, 0xc000013de8},     0xc059fde580, ...)
	github.com/elastic/[email protected]/input/elasticapm/processor.go:357 +0x33b
github.com/elastic/apm-data/input/elasticapm.(*Processor).HandleStream(0xc000347cc0, {0x55c4b8cc0d80, 0xc05c7966f0}, 0x0, 0xc0003ffc80?, {0x55c4b8cb1680, 0xc05c0f8f20},     0x14?, {0x55c4b8cb64a0, 0xc000013de8}, ...)
	github.com/elastic/[email protected]/input/elasticapm/processor.go:294 +0x3f0
github.com/elastic/apm-server/internal/beater/api/intake.Handler.func1(0xc05c0f8f20)
	github.com/elastic/apm-server/internal/beater/api/intake/handler.go:101 +0x2d0
github.com/elastic/apm-server/internal/beater/middleware.AnonymousRateLimitMiddleware.func1.1(0x55c4b8cc0d80?)
	github.com/elastic/apm-server/internal/beater/middleware/rate_limit_middleware.go:49 +0x19a
github.com/elastic/apm-server/internal/beater/middleware.AuthMiddleware.func1.1(0xc05c0f8f20)
	github.com/elastic/apm-server/internal/beater/middleware/auth_middleware.go:70 +0x2af
github.com/elastic/apm-server/internal/beater/middleware.ResponseHeadersMiddleware.func1.1(0xc05c0f8f20)
	github.com/elastic/apm-server/internal/beater/middleware/response_headers_middleware.go:33 +0xdb
github.com/elastic/apm-server/internal/beater/middleware.MonitoringMiddleware.func1.2(0xc05c0f8f20)
	github.com/elastic/apm-server/internal/beater/middleware/monitoring_middleware.go:40 +0x51
github.com/elastic/apm-server/internal/beater/middleware.RecoverPanicMiddleware.func1.1(0x55c4b8cadb50?)
	github.com/elastic/apm-server/internal/beater/middleware/recover_panic_middleware.go:60 +0x54
github.com/elastic/apm-server/internal/beater/middleware.TimeoutMiddleware.func1.1(0xc05c0f8f20)
	github.com/elastic/apm-server/internal/beater/middleware/timeout_middleware.go:35 +0x38
github.com/elastic/apm-server/internal/beater/middleware.LogMiddleware.func1.1(0xc05c0f8f20)
	github.com/elastic/apm-server/internal/beater/middleware/log_middleware.go:47 +0xbc
github.com/elastic/apm-server/internal/beater/request.(*ContextPool).HTTPHandler.func1({0x55c4b8cbfcd0, 0xc05afbee00}, 0x7f4c775e0d28?)
	github.com/elastic/apm-server/internal/beater/request/context_pool.go:47 +0x10a
net/http.HandlerFunc.ServeHTTP(0x100?, {0x55c4b8cbfcd0?, 0xc05afbee00?}, 0x0?)
	net/http/server.go:2109 +0x2f
go.elastic.co/apm/module/apmhttp/v2.(*handler).ServeHTTP(0xc05c448f00?, {0x55c4b8cbfcd0?, 0xc05afbee00?}, 0x55c4b8a8eb80?)
	go.elastic.co/apm/module/apmhttp/[email protected]/handler.go:74 +0xc2
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000672540, {0x55c4b8cbfcd0, 0xc05afbee00}, 0xc05c448e00)
	github.com/gorilla/[email protected]/mux.go:210 +0x1cf
github.com/elastic/gmux.(*mux).withGRPCInsecure.func1({0x55c4b8cbfcd0?, 0xc05afbee00?}, 0xc05c796570?)
	github.com/elastic/[email protected]/mux.go:114 +0x5af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x55c4b8cbfcd0?, 0xc05afbee00?}, 0x55c4b70a3b74?)
	net/http/server.go:2109 +0x2f
net/http.serverHandler.ServeHTTP({0x55c4b8cba410?}, {0x55c4b8cbfcd0, 0xc05afbee00}, 0xc05c448e00)
	net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc05a302140, {0x55c4b8cc0d80, 0xc0004aabd0})
	net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
	net/http/server.go:3102 +0x4db
@SecTex SecTex added the bug label Sep 18, 2023
@carsonip
Copy link
Member

Hi @SecTex , we are aware of this issue. The root cause of this issue is that .NET agent is flattening the duration fields, which is not expected by apm-server. The correct way to pass the duration field is to use nested objects.

The handling for this unexpected flattened duration field is missing in 8.9 and it is already fixed in 8.10. However, although apm-server will no longer panic, the flattened duration field is still ignored. It will require .NET agent to fix the root cause for the field to be parsed properly.

The fix in .NET agent is WIP at the moment: elastic/apm-agent-dotnet#2178

Closing the issue as the handling is already fixed in apm-server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants