Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send fleet-server elasticsearch config under new bootstrap attribute #4643

Merged
merged 21 commits into from
May 25, 2024

Conversation

michel-laterman
Copy link
Contributor

@michel-laterman michel-laterman commented Apr 30, 2024

What does this PR do?

Alter the fleet-server bootstrap component modifier to insert all (elasticsearch)
output configuration options specified by enrollment args under a new elasticsearch.boostrap key
instead of overwriting any existing keys. This will allow elastic-agent to send the list of hosts
(and other config options) retrieved from a policy to fleet server as well as the config needed
to form the initial connection to elasticsearch used to collect policy information.
Fleet-server will be altered to use the bootstrap config that is passed if the policy attributes
are unspecified or fail.

Why is it important?

Fleet-server instances running under an elastic-agent are not able to connect to multiple ES hosts, or use any output settings that update local configuration that has been specified during enrolment.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

See description for: elastic/fleet-server#3506

Related issues

@michel-laterman michel-laterman added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Apr 30, 2024
Copy link
Contributor

mergify bot commented Apr 30, 2024

This pull request does not have a backport label. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@cmacknz
Copy link
Member

cmacknz commented May 1, 2024

Approach LGTM.

Do we have an automated test to ensure the agent can bootstrap a real fleet server already? If not we are going to need one.

@michel-laterman
Copy link
Contributor Author

We don't have a full e2e test where we run a compiled agent with a fleet-server. This can be added in a followup PR as this change requires fleet-server's pr to also be merged.

We have partial integration tests in fleet-server where we fake agent communications to validate that the bootstrap sequence works as expected (i.e., state is degraded if no agent id is provided and goes healthy after): https://github.com/elastic/fleet-server/blob/main/internal/pkg/server/agent_integration_test.go

And fleet-server also has some disabled e2e tests (that never ran properly through ci) to test agent bootstrapping.

@blakerouse tells me that we should be able to add a fleet-server bootstrap test as part of the integration test suite in elastic-agent as long as we don't use containers.

@cmacknz
Copy link
Member

cmacknz commented May 2, 2024

We probably need a test that fleet server can be bootstrapped in both repositories, so we can prevent merging changes that would break it before they hit the snapshot image.

@michel-laterman
Copy link
Contributor Author

Currently; when I try to use a binarry built with this PR in ESS I get

{"log.level":"error","@timestamp":"2024-05-09T16:47:16.527Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":608},"message":"Spawned new component fleet-server-default: input not supported","log":{"source":"elastic-agent"},"component":{"id":"fleet-server-default","state":"FAILED"},"ecs.version":"1.6.0"}

@cmacknz
Copy link
Member

cmacknz commented May 9, 2024

The spec file declaring the fleet-server input type is missing.

@michel-laterman
Copy link
Contributor Author

OK, using a VM I was finally able to build and push an image for testing.
The image includes the corresponding fleet-server binary from elastic/fleet-server#350

I have confirmed that:

  • the agent running fleet-server starts and appears healthy in the UI.
  • log levels can be changed on the ESS fleet-server instance, and the UI can be used to gather diagnostics bundles
  • We are able to enroll a self-managed fleet-server instance, it also appears as healthy in the UI.
  • a self-managed fleet-server instance can change log levels and collect diagnostics bundles
  • the diagnostics bundle of a self-managed fleet-server can show multiple hosts in the output specification.

There are still some other automated tests we need to verify before merging these 2 PRs

@michel-laterman michel-laterman marked this pull request as ready for review May 14, 2024 18:34
@michel-laterman michel-laterman requested a review from a team as a code owner May 14, 2024 18:34
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@michel-laterman michel-laterman requested review from AndersonQ and pchila and removed request for faec and leehinman May 14, 2024 19:12
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple change, like it!

Fix the changelog before merging.

@cmacknz
Copy link
Member

cmacknz commented May 21, 2024

Does this need to be merged together with elastic/fleet-server#3506?

I think there needs to be an automated test that Elastic Agent can bootstrap Fleet Server in one of the repositories before we merge either of these.

Edit: If the coordination of the two PRs with the test is annoying enough I'd be fine with the test being added a separate PR, but not closing the implementation issue until it exists.

@michel-laterman
Copy link
Contributor Author

I've merged the changes to fleet-server.
I'll merge this tomorrow to ensure the resulting artifacts/image is built with both changes as the agent will fail to bootstrap if the fleet-server changes are missing. Fleet-server will not fail if only those changes are present.

@michel-laterman
Copy link
Contributor Author

Here are the issues to add bootstrap tests:

@cmacknz
Copy link
Member

cmacknz commented May 22, 2024

I would like to see #4769 done as part of this PR. You are going to have to do this test manually without it, may as well make the test framework do the work for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
5 participants