-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NPE due to reslice only initializing new slice elements #327
Conversation
input/elasticapm/internal/modeldecoder/modeldecoderutil/slice.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: As PopulateNil allows chaining, wdyt about collapsing all the calls to e.g. out = PopulateNil(Reslice(out, len(h)), modelpb.HTTPHeaderFromVTPool)
? I think it looks slightly better than calling it inside mapTo*
e.g.
out.Stacktrace = modeldecoderutil.Reslice(out.Stacktrace, len(from.Stacktrace))
mapToStracktraceModel(from.Stacktrace, modeldecoderutil.PopulateNil(out.Stacktrace, modelpb.StacktraceFrameFromVTPool))
this might be similar to #165 |
@kruskall Yup, you are right they are doing the same thing. Do you want to take that forward instead? I think they are mostly same so we can merge either one... both should fix the NPE edge case. |
@kruskall While I like the other PR for removing the util method for populate nil, I think the behavior of Reslice would be different if there are elements in the slice already added. Since this is active now, let's go with this PR unless you have any concerns. |
input/elasticapm/internal/modeldecoder/modeldecoderutil/slice.go
Outdated
Show resolved
Hide resolved
✅ test-plan-ok Tested manually by checking out to 8.15.0 tag and 8.15 branch Adding the following test (supplied in original issue) to func TestBadReslice(t *testing.T) {
metadata := `{"metadata": {"service": {"name": "1234_service-12a3","node": {"configured_name": "node-123"},"version": "5.1.3","environment": "staging","language": {"name": "ecmascript","version": "8"},"runtime": {"name": "node","version": "8.0.0"},"framework": {"name": "Express","version": "1.2.3"},"agent": {"name": "elastic-node","version": "3.14.0"}},"user": {"id": "123user", "username": "bar", "email": "[email protected]"}, "labels": {"tag0": null, "tag1": "one", "tag2": 2}, "process": {"pid": 1234,"ppid": 6789,"title": "node","argv": ["node","server.js"]},"system": {"hostname": "prod1.example.com","architecture": "x64","platform": "darwin", "container": {"id": "container-id"}, "kubernetes": {"namespace": "namespace1", "pod": {"uid": "pod-uid", "name": "pod-name"}, "node": {"name": "node-name"}}},"cloud":{"account":{"id":"account_id","name":"account_name"},"availability_zone":"cloud_availability_zone","instance":{"id":"instance_id","name":"instance_name"},"machine":{"type":"machine_type"},"project":{"id":"project_id","name":"project_name"},"provider":"cloud_provider","region":"cloud_region","service":{"name":"lambda"}}}}`
errstacktrace := `{"error": {"id": "cdefab0123456789", "trace_id": null, "timestamp": 1533826745999000,"exception": {"message": "Cannot read property 'baz' no defined", "stacktrace": [{"classname": "cl1", "abs_path": "abs_path1"}, {"classname": "cl2", "abs_path": "abs_path2"}, {"classname": "cl3", "abs_path": "abs_path3"}, {"classname": "cl4", "abs_path": "abs_path4"}, {"classname": "cl5", "abs_path": "abs_path5"}]}}}`
batchProcessor := modelpb.ProcessBatchFunc(func(ctx context.Context, batch *modelpb.Batch) error {
return nil
})
p := NewProcessor(Config{
MaxEventSize: 100 * 1024,
Semaphore: semaphore.NewWeighted(1),
})
// Below, we try to emulate a case where all of the capacity of Error#Exception#Stacktrace slice
// is not populated with non-nil values. In practical cases, this could happen with OTLP handling
// of exceptions: https://github.com/elastic/apm-data/blob/cee5ac3fc5f2ee66f156133992882cae758e2b66/input/otlp/exceptions.go#L165
event1 := modelpb.APMEventFromVTPool()
event1.Error = modelpb.ErrorFromVTPool()
event1.Error.Exception = modelpb.ExceptionFromVTPool()
for i := 0; i < 3; i++ {
// Add 3 so that reallocation makes the capacity 4
// this will cause the 4th element to be nil
event1.Error.Exception.Stacktrace = append(event1.Error.Exception.Stacktrace, modelpb.StacktraceFrameFromVTPool())
}
require.Equal(t, 4, cap(event1.Error.Exception.Stacktrace))
event1.ReturnToVTPool()
event2 := modelpb.APMEventFromVTPool()
payload2 := strings.Join([]string{metadata, errstacktrace}, "\n")
err := p.HandleStream(
context.Background(), event2,
strings.NewReader(payload2), 10, batchProcessor,
&Result{},
)
require.NoError(t, err)
} Confirmed that it fails in v1.9.0 and passes in v1.9.1. On a side note, I cannot wrap my head around how this would manifest practically. After analyzing the code statically, there is no code in apm-server other than otlp code that would possibly insert nil to stacktrace slice, but otlp APMEvents are not pooled. But investigation into that is a separate matter. This PR is test-plan-ok as it defends against bad pooled slices containing nils. |
Fixes #326
Run the test case in #326 (comment) to understand the root issue. Note that the test case is not added here since it is too specific for reproducing the bug and, IMO, will not be useful as a test for long-term maintenance.