Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume APM configuration over the elastic-agent control protocol #11381

Closed
joshdover opened this issue Aug 11, 2023 · 20 comments
Closed

Consume APM configuration over the elastic-agent control protocol #11381

joshdover opened this issue Aug 11, 2023 · 20 comments

Comments

@joshdover
Copy link

In order to enable APM tracing when APM Server is managed by Elastic Agent, APM Server needs to apply APM configuration settings when they're passed over the control protocol. This is necessary to enable tracing of APM Server in ESS and ECE.

These settings should be used to enable APM Server's self tracing functionality, and send the traces to the configured APM Server passed via the control protocol This may require that we change how the apm.Tracer object is created and passed to internal objects as the config may change during runtime.

References:

@simitt simitt added this to the 8.11 milestone Aug 11, 2023
@simitt
Copy link
Contributor

simitt commented Oct 2, 2023

elastic/elastic-agent#2612 has been merged; moving to 8.12-candidate to fit timelines.

@simitt simitt removed this from the 8.11 milestone Oct 2, 2023
@simitt simitt added this to the 8.13 milestone Jan 4, 2024
@kruskall
Copy link
Member

kruskall commented Feb 6, 2024

It seems beats is already using the agent control protocol under the hood (#12508 (comment)) so we can't add an additional client.

We can either stop using the beats manager and implement the feature in apm-server or try to work with beats (I'm not sure if this is possible since beats is hardcoding the values passed to the control protocol).

Another issue is that some of the libraries we use hardcode the tracer (e.g. go-docappender) so we can only stop and recreate the indexer or implement an option to retrieve the indexer dynamically.

Any feedback, opinion or alternative solution ?

@axw
Copy link
Member

axw commented Feb 7, 2024

We can either stop using the beats manager and implement the feature in apm-server or try to work with beats (I'm not sure if this is possible since beats is hardcoding the values passed to the control protocol).

Could we update libbeat to consume the TriggeredAPMChange, translate it to YAML settings, and merge that into a config change and update the beat config?

Another issue is that some of the libraries we use hardcode the tracer (e.g. go-docappender) so we can only stop and recreate the indexer or implement an option to retrieve the indexer dynamically.

We already hot-reload everything on config changes, so if we do the above then we can construct a new tracer with the new config.

@axw
Copy link
Member

axw commented Feb 7, 2024

@joshdover @cmacknz does my suggest above (to consume TriggeredAPMChange, update the beat config, and trigger a reload) sound sensible? How do you intend to deal with TriggeredAPMChange for Beats? I think this should probably be solved in libbeat, and then apm-server can consume updates from there. Is that something the control plane team can take on?

@cmacknz
Copy link
Member

cmacknz commented Feb 7, 2024

Taking a closer look at this, the APM configuration is being received in libbeat but it is not exposed in any way.

The APMConfig we need is attached to each unit that may have changed already as part of the expected state, we just don't do anything with it.

https://github.com/elastic/elastic-agent-client/blob/16361c4c4895b633926dcf784ed90f92c985a1a3/pkg/client/unit.go#L156-L163

// Expected contains the expected state, log level, features and config for a unit.
type Expected struct {
	Config    *proto.UnitExpectedConfig
	Features  *proto.Features
	LogLevel  UnitLogLevel
	State     UnitState
	APMConfig *proto.APMConfig
}

Here it is in our management code at the point where we reload the Beat configurations.
https://github.com/elastic/beats/blob/7546ae1ceefa19c9a4886ab012fa661eaeaccf4b/x-pack/libbeat/management/managerV2.go#L568-L569

	for _, unit := range units {
		expected := unit.Expected()

The question I have is what we should do with the APM configuration at this point? In what form can APM server actually use it? Our agent client is quite isolated from the rest of Beats, so at the point where we receive the APM configuration we could change a Beat global tracing configuration or add a way to watch changes to it externally.

With respect to TriggeredAPMChange the intent is to allow us to tell when the APM configuration has actually changed when we receive a new configuration from agent. This is something we would want to be looking at in the block below we wanted to build an API for externally observing changes to the tracing configuration outside of the management client:

https://github.com/elastic/beats/blob/7546ae1ceefa19c9a4886ab012fa661eaeaccf4b/x-pack/libbeat/management/managerV2.go#L494-L507

			switch change.Type {
			// Within the context of how we send config to beats, I'm not sure if there is a difference between
			// A unit add and a unit change, since either way we can't do much more than call the reloader
			case client.UnitChangedAdded:
				cm.addUnit(change.Unit)
				// reset can be called here because `<-t.C` is handled in the same select
				t.Reset(cm.changeDebounce)
			case client.UnitChangedModified:
				cm.modifyUnit(change.Unit)
				// reset can be called here because `<-t.C` is handled in the same select
				t.Reset(cm.changeDebounce)

This Beat management code was originally written before the change triggers existed so it doesn't make use of them yet.

@axw
Copy link
Member

axw commented Feb 7, 2024

The question I have is what we should do with the APM configuration at this point? In what form can APM server actually use it? Our agent client is quite isolated from the rest of Beats, so at the point where we receive the APM configuration we could change a Beat global tracing configuration or add a way to watch changes to it externally.

What do you think about introducing a new libbeat/common/reload registry key (say "apm"), and calling any registered reloader(s) when TriggeredAPMChange is received? We would need to marshal it to a config.C, and I think we could just use the original Elastic Agent config YAML structure for that.

In APM Server we would update

if err := reload.RegisterV2.RegisterList(reload.InputRegName, reloadableListFunc(r.reloadInputs)); err != nil {
return nil, fmt.Errorf("failed to register inputs reloader: %w", err)
}
if err := reload.RegisterV2.Register(reload.OutputRegName, reload.ReloadableFunc(r.reloadOutput)); err != nil {
return nil, fmt.Errorf("failed to register output reloader: %w", err)
}
to also watch for APM config changes, and hot-reload the process like we already do for input & output config changes.

@simitt simitt added the blocked label Feb 8, 2024
@simitt
Copy link
Contributor

simitt commented Feb 8, 2024

Moving this out of 8.13 scope due to the uncovered issues. @cmacknz is it reasonable for the EA team to make the required changes (once there is agreement on a path forward) early enough in 8.14 dev cycle so that we can ensure this works gets into apm-server 8.14?

@simitt simitt modified the milestones: 8.13, 8.14 Feb 8, 2024
@cmacknz
Copy link
Member

cmacknz commented Feb 8, 2024

Hooking into the existing libbeat configuration reload API makes sense to me as a way to expose this.

Our team could do this, answering when we could do it I'll have to leave with @pierrehilbert.

@simitt
Copy link
Contributor

simitt commented Mar 6, 2024

@pierrehilbert could you please provide an update when this could be done by the Elastic Agent team? We are eager of enabling apm tracing when in managed mode.

@simitt
Copy link
Contributor

simitt commented Apr 24, 2024

@pierrehilbert any update on this please? We are really eager to finally get this in.

@pierrehilbert
Copy link

Really sorry @simitt I missed the first mentions.
@pchila will probably be the best candidate to spend time on this as he already spend some time on the APM topic.
What would be the ideal realistic target to have this done?
cc @ycombinator as @pchila will be in his team with the new org on Agent side.

@simitt
Copy link
Contributor

simitt commented Apr 30, 2024

What would be the ideal realistic target to have this done?

Not certain if the question is directed towards @pchila or me, but from our end, the sooner the better. If you can aim for providing functionality in 8.15, so we can do the follow up in 8.16, it would be good.

@ycombinator
Copy link
Contributor

Created elastic/beats#39606 to summarize the discussion between @cmacknz and @axw and track this request in the Ingest team's roadmap.

@simitt simitt assigned kyungeunni and unassigned kruskall Jun 12, 2024
@simitt simitt removed this from the 8.14 milestone Jun 13, 2024
@blakerouse
Copy link

I have PR elastic/beats#40030 up for review. You can start with that PR as a base until it is merged, to implement the feature you need in APM server. If you have any questions just let me know and I can help.

@kyungeunni
Copy link
Contributor

I've done the manual testing by running the APM Server in managed mode with locally built Elastic Agent image and left the result in the comment.

@simitt
Copy link
Contributor

simitt commented Jul 3, 2024

@kyungeunni please ensure with your testing that the self instrumentation can be used in the ESS setup. That is the main use case for us to be able to leverage apm self instrumentation when troubleshooting SDHs.

@kyungeunni
Copy link
Contributor

I'll test this in ESS.

Just sharing some notes:

Fleet-server uses the same APMConfig and seems to have the tracing enabled on their side, not relying on instrumentation package but having their own. Here's the link to the doc for the APM Tracing.

The configs used to make the tracing work on fleet-server are as follows:

  • Hosts
  • SecretToken
  • Global Labels

In the current state, we will be able to send the tracing based on what I've tested locally, but we won't be able to set the Global Labels so we need extra work on this part.

I see three options to support global labels:

  1. set ELASTIC_APM_GLOBAL_LABELS env var before creating a new tracer using instrumentation package from beats (needs to be verified if it works or no)
  2. update beats' instrumentation package to be somewhat similar to what is implemented in fleet-server
  3. have our own implementation of instrumentation like fleet-server does so we don't need to rely on beats.

1 is quick and short, but it's not declarative and a bit hacky. if we choose to go with 2 or 3, we will be able to parse and support more config values coming from APMConfig (e.g. TLS)

@axw
Copy link
Member

axw commented Jul 4, 2024

1 is quick and short, but it's not declarative and a bit hacky. if we choose to go with 2 or 3, we will be able to parse and support more config values coming from APMConfig (e.g. TLS)

If (3) is feasible without significantly more effort, then I would be in favour of that, to reduce our dependency on libbeat.

@simitt
Copy link
Contributor

simitt commented Jul 12, 2024

Required follow up tasks to make this fully work are created and assigned:

@1pkg
Copy link
Member

1pkg commented Jul 18, 2024

Validated as a part of #13604

@1pkg 1pkg unassigned kyungeunni and 1pkg Aug 13, 2024
@1pkg 1pkg closed this as completed Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests