Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NPE due to reslice only initializing new slice elements #327

Merged
merged 5 commits into from
Jul 24, 2024

Conversation

lahsivjar
Copy link
Contributor

Fixes #326

Run the test case in #326 (comment) to understand the root issue. Note that the test case is not added here since it is too specific for reproducing the bug and, IMO, will not be useful as a test for long-term maintenance.

@lahsivjar lahsivjar requested a review from a team as a code owner July 23, 2024 17:14
@obltmachine obltmachine added the safe-to-test Changes are safe to run in the CI label Jul 23, 2024
@lahsivjar lahsivjar requested a review from a team July 23, 2024 18:10
Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: As PopulateNil allows chaining, wdyt about collapsing all the calls to e.g. out = PopulateNil(Reslice(out, len(h)), modelpb.HTTPHeaderFromVTPool)? I think it looks slightly better than calling it inside mapTo* e.g.

out.Stacktrace = modeldecoderutil.Reslice(out.Stacktrace, len(from.Stacktrace))
mapToStracktraceModel(from.Stacktrace, modeldecoderutil.PopulateNil(out.Stacktrace, modelpb.StacktraceFrameFromVTPool))

carsonip
carsonip previously approved these changes Jul 23, 2024
@kruskall
Copy link
Member

this might be similar to #165

@lahsivjar
Copy link
Contributor Author

lahsivjar commented Jul 24, 2024

@kruskall Yup, you are right they are doing the same thing. Do you want to take that forward instead? I think they are mostly same so we can merge either one... both should fix the NPE edge case.

@lahsivjar
Copy link
Contributor Author

lahsivjar commented Jul 24, 2024

@kruskall While I like the other PR for removing the util method for populate nil, I think the behavior of Reslice would be different if there are elements in the slice already added. Since this is active now, let's go with this PR unless you have any concerns.

@lahsivjar lahsivjar enabled auto-merge (squash) July 24, 2024 10:33
@lahsivjar lahsivjar requested a review from carsonip July 24, 2024 11:02
@lahsivjar lahsivjar merged commit eaed618 into elastic:main Jul 24, 2024
5 checks passed
@lahsivjar lahsivjar deleted the issue-326 branch July 24, 2024 11:38
@carsonip carsonip self-assigned this Sep 2, 2024
@carsonip
Copy link
Member

carsonip commented Sep 2, 2024

✅ test-plan-ok

Tested manually by checking out to 8.15.0 tag and 8.15 branch

Adding the following test (supplied in original issue) to input/elasticapm/processor_test.go:

func TestBadReslice(t *testing.T) {
	metadata := `{"metadata": {"service": {"name": "1234_service-12a3","node": {"configured_name": "node-123"},"version": "5.1.3","environment": "staging","language": {"name": "ecmascript","version": "8"},"runtime": {"name": "node","version": "8.0.0"},"framework": {"name": "Express","version": "1.2.3"},"agent": {"name": "elastic-node","version": "3.14.0"}},"user": {"id": "123user", "username": "bar", "email": "[email protected]"}, "labels": {"tag0": null, "tag1": "one", "tag2": 2}, "process": {"pid": 1234,"ppid": 6789,"title": "node","argv": ["node","server.js"]},"system": {"hostname": "prod1.example.com","architecture": "x64","platform": "darwin", "container": {"id": "container-id"}, "kubernetes": {"namespace": "namespace1", "pod": {"uid": "pod-uid", "name": "pod-name"}, "node": {"name": "node-name"}}},"cloud":{"account":{"id":"account_id","name":"account_name"},"availability_zone":"cloud_availability_zone","instance":{"id":"instance_id","name":"instance_name"},"machine":{"type":"machine_type"},"project":{"id":"project_id","name":"project_name"},"provider":"cloud_provider","region":"cloud_region","service":{"name":"lambda"}}}}`
	errstacktrace := `{"error": {"id": "cdefab0123456789", "trace_id": null, "timestamp": 1533826745999000,"exception": {"message": "Cannot read property 'baz' no defined", "stacktrace": [{"classname": "cl1", "abs_path": "abs_path1"}, {"classname": "cl2", "abs_path": "abs_path2"}, {"classname": "cl3", "abs_path": "abs_path3"}, {"classname": "cl4", "abs_path": "abs_path4"}, {"classname": "cl5", "abs_path": "abs_path5"}]}}}`

	batchProcessor := modelpb.ProcessBatchFunc(func(ctx context.Context, batch *modelpb.Batch) error {
		return nil
	})
	p := NewProcessor(Config{
		MaxEventSize: 100 * 1024,
		Semaphore:    semaphore.NewWeighted(1),
	})

	// Below, we try to emulate a case where all of the capacity of Error#Exception#Stacktrace slice
	// is not populated with non-nil values. In practical cases, this could happen with OTLP handling
	// of exceptions: https://github.com/elastic/apm-data/blob/cee5ac3fc5f2ee66f156133992882cae758e2b66/input/otlp/exceptions.go#L165
	event1 := modelpb.APMEventFromVTPool()
	event1.Error = modelpb.ErrorFromVTPool()
	event1.Error.Exception = modelpb.ExceptionFromVTPool()
	for i := 0; i < 3; i++ {
		// Add 3 so that reallocation makes the capacity 4
		// this will cause the 4th element to be nil
		event1.Error.Exception.Stacktrace = append(event1.Error.Exception.Stacktrace, modelpb.StacktraceFrameFromVTPool())
	}
	require.Equal(t, 4, cap(event1.Error.Exception.Stacktrace))

	event1.ReturnToVTPool()
	event2 := modelpb.APMEventFromVTPool()
	payload2 := strings.Join([]string{metadata, errstacktrace}, "\n")
	err := p.HandleStream(
		context.Background(), event2,
		strings.NewReader(payload2), 10, batchProcessor,
		&Result{},
	)
	require.NoError(t, err)
}

Confirmed that it fails in v1.9.0 and passes in v1.9.1.

On a side note, I cannot wrap my head around how this would manifest practically. After analyzing the code statically, there is no code in apm-server other than otlp code that would possibly insert nil to stacktrace slice, but otlp APMEvents are not pooled. But investigation into that is a separate matter. This PR is test-plan-ok as it defends against bad pooled slices containing nils.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe-to-test Changes are safe to run in the CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic in stacktrace mapping
4 participants