Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of custom endpoints in AWS input #41504

Merged
merged 5 commits into from
Nov 6, 2024

Conversation

faec
Copy link
Contributor

@faec faec commented Nov 4, 2024

Fix custom endpoint selection in the S3/SQS input (#39718) by porting @strawgate's 8.14 fix (#39709) to main.

In addition to the previous fixes, this simplifies the logic for detecting queue region, since the 8.14 version still had some broken cases caused by requiring over-strict endpoint matching, and it was concluded (talking to @strawgate) that there's no advantage to rejecting standard region format from queue URLs just because the endpoint URL is different (if there is a genuine mismatch in the queue and endpoint we'll learn it from the connection attempt, not from getRegionFromQueueURL).

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@faec faec added bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify labels Nov 4, 2024
@faec faec requested review from zmoog and strawgate November 4, 2024 12:51
@faec faec self-assigned this Nov 4, 2024
@faec faec requested a review from a team as a code owner November 4, 2024 12:51
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Nov 4, 2024
@strawgate
Copy link
Contributor

strawgate commented Nov 4, 2024

Question for reviewers and especially @strawgate: (*s3InputManager).Create sets awsConfig.EndpointResolverWithOptions, but o.EndpointResolver is also set directly in s3ConfigModifier and sqsConfigModifier (matching the 8.14 fix) so it looks like the former assignment is redundant. Does it still have an effect?

It looks like s3ConfigModifier only sets the endpoint resolver if the non aws bucket name is configured? So it applies in a different situation from the (*s3InputManager).Create which sets awsConfig.EndpointResolverWithOptions whenever endpoint alone is provided?

Definitely a good question on who "Wins" if both are provided

@faec
Copy link
Contributor Author

faec commented Nov 4, 2024

It looks like s3ConfigModifier only sets the endpoint resolver if the non aws bucket name is configured?

That's to cover the bucket polling use case, but there's also the subsequent check covering SQS:

	} else if c.QueueURL != "" && c.AWSConfig.Endpoint != "" {
		//nolint:staticcheck // haven't migrated to the new interface yet
		o.EndpointResolver = s3.EndpointResolverFromURL(c.AWSConfig.Endpoint)
	}

and there's a similar check in sqsConfigModifier. It looks to me like these take precedence (EndpointResolverWithOptions, set in (*s3InputManager).Create, takes precendence over EndpointResolver in the AWS config struct itself, but the EndpointResolver set in the config modifiers seems to set a raw endpoint resolver during actual client creation that overrides previous options). I don't have a non-AWS service to test against, so would appreciate any help confirming if there's really any difference or if the endpoint override in (*s3InputManager).Create can be safely removed.

@strawgate
Copy link
Contributor

and there's a similar check in sqsConfigModifier

That check was added in this PR though?

// For backwards compat:
// If the endpoint does not start with S3, we will use the endpoint resolver to make all SDK requests use the specified endpoint
// If the endpoint does start with S3, we will use the default resolver uses the endpoint field but can replace s3 with the desired service name like sqs
if !strings.HasPrefix(endpointUri.Hostname(), "s3") {
Copy link
Contributor

@strawgate strawgate Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should check to see if s3 is in the URL at all -- as we probably want the same behavior (avoid the EndpointResolver for AWS URLs) for vpce endpoints like vpce.s3....

Copy link
Contributor Author

@faec faec Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that check was removed then it seems like the only fundamental change in this function is adding Source: awssdk.EndpointSourceCustom, to the existing resolver. The big question for me is still whether this whole block can be removed in lieu of the existing endpoint handling in the config modifiers (or vice versa).

Copy link
Contributor

@strawgate strawgate Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is what results in a mutable endpoint when using an endpoint with s3 in the leaf of the endpoint url.

I dont think we should remove the check, I do think it should check for s3 in other parts of the domain name thugh.

I think that both svc.EndpointResolverFromURL and o.endpoint = ... have the same end effect and the awssdk.EndpointResolverWithOptionsFunc force returns the exact endpoint value.

So my guess is as long as those two use-cases remain distinct (when to allow a mutable endpoint hostname), the logic can be combined?

@faec
Copy link
Contributor Author

faec commented Nov 4, 2024

That check was added in this PR though?

It is added in this PR but it's derived from an identical check in the 8.14 fix https://github.com/elastic/beats/pull/39709/files (input.go line 230), the only difference here is that the config modifier at head has been moved to its own helper function instead of appearing inline.

@faec
Copy link
Contributor Author

faec commented Nov 4, 2024

Following offline discussion with @strawgate, simplified the custom endpoint resolvers. There were multiple redundant ones, and the only important question was which ones should set an immutable hostname. For now, the compromise that works for most scenarios is: leave the hostname mutable in SQS mode, make it immutable in bucket polling mode. This fixes the major past bugs, and leaves room to make bucket polling mode more permissive in the future as we confirm new working configurations (or to just expose the immutable field directly in the config so users can adjust it if needed).

@zmoog
Copy link
Contributor

zmoog commented Nov 5, 2024

I am running a quick test collecting logs from a non-AWS bucket using the aws-s3 input in S3 polling mode.

I did something similar in the past; see the zmoog/public-notes#46 public note. However, since Akamai acquired Linode, I want to rerun it.

I logged into https://cloud.linode.com/object-storage/buckets and created a new s3-compatible bucket named mbranca-test.

I used the "IT, Milan" region, so the bucket URL is https://mbranca-test.it-mil-1.linodeobjects.com

Relevant docs:

I also created access keys to get access/secret keys and the S3 endpoint hostname:

  • The endpoint is https://LINODE_S3_REGION.linodeobjects.com
  • IT, Milan: it-mil-1.linodeobjects.com

A quick test with the official AWS CLI:

$ aws configure --profile linode
AWS Access Key ID [None]: [redacted]
AWS Secret Access Key [None]: [redacted]
Default region name [None]: it-mil-1
Default output format [None]: json

aws s3 ls s3://mbranca-test --profile linode --endpoint-url https://it-mil-1.linodeobjects.com

There are no errors and no output (the bucket is empty).

$ echo "this is a test" > test.log

$ aws s3 cp test.log s3://mbranca-test --profile linode --endpoint-url https://it-mil-1.linodeobjects.com
upload: ./test.log to s3://mbranca-test/test.log

$ aws s3 ls s3://mbranca-test --profile linode --endpoint-url https://it-mil-1.linodeobjects.com
2024-11-05 22:28:44         15 test.log

CleanShot 2024-11-05 at 22 12 03@2x

Let's try with Filebeat using the following config:

# filebeat.yml

filebeat.inputs:

- type: aws-s3
  access_key_id: [redacted]
  secret_access_key: [redacted]
  non_aws_bucket_name: mbranca-test
  endpoint: 'https://it-mil-1.linodeobjects.com'
  number_of_workers: 5
  bucket_list_interval: 60s

I started Filebeat, and here is the log message from Linode S3-compatible object storage using the aws-s3 input (S3 polling mode)

CleanShot 2024-11-05 at 22 48 41@2x

@strawgate, do you have any suggestions for making sure the backport of your original fix is successful?

@zmoog zmoog added the aws Enable builds in the CI for aws cloud testing label Nov 5, 2024
@zmoog
Copy link
Contributor

zmoog commented Nov 5, 2024

/test

@faec faec merged commit cf13781 into elastic:main Nov 6, 2024
20 of 22 checks passed
mergify bot pushed a commit that referenced this pull request Nov 6, 2024
Fix custom endpoint selection in the S3/SQS input (#39718) by porting @strawgate's 8.14 fix (#39709) to main.

In addition to the previous fixes, this simplifies the logic for detecting queue region, since the 8.14 version still had some broken cases caused by requiring over-strict endpoint matching, and it was concluded (talking to @strawgate) that there's no advantage to rejecting standard region format from queue URLs just because the endpoint URL is different (if there is a genuine mismatch in the queue and endpoint we'll learn it from the connection attempt, not from `getRegionFromQueueURL`).

(cherry picked from commit cf13781)
mergify bot pushed a commit that referenced this pull request Nov 6, 2024
Fix custom endpoint selection in the S3/SQS input (#39718) by porting @strawgate's 8.14 fix (#39709) to main.

In addition to the previous fixes, this simplifies the logic for detecting queue region, since the 8.14 version still had some broken cases caused by requiring over-strict endpoint matching, and it was concluded (talking to @strawgate) that there's no advantage to rejecting standard region format from queue URLs just because the endpoint URL is different (if there is a genuine mismatch in the queue and endpoint we'll learn it from the connection attempt, not from `getRegionFromQueueURL`).

(cherry picked from commit cf13781)
@strawgate
Copy link
Contributor

@strawgate, do you have any suggestions for making sure the backport of your original fix is successful?

I'd run a test with just S3 bucket polling from S3, SQS + S3 from S3, and a test with a third-party S3 compatible solution.

faec added a commit that referenced this pull request Nov 6, 2024
…#41537)

* Fix handling of custom endpoints in AWS input (#41504)

Fix custom endpoint selection in the S3/SQS input (#39718) by porting @strawgate's 8.14 fix (#39709) to main.

In addition to the previous fixes, this simplifies the logic for detecting queue region, since the 8.14 version still had some broken cases caused by requiring over-strict endpoint matching, and it was concluded (talking to @strawgate) that there's no advantage to rejecting standard region format from queue URLs just because the endpoint URL is different (if there is a genuine mismatch in the queue and endpoint we'll learn it from the connection attempt, not from `getRegionFromQueueURL`).

(cherry picked from commit cf13781)

* Update CHANGELOG.next.asciidoc

auto-merge picked up an unrelated change

---------

Co-authored-by: Fae Charlton <[email protected]>
faec added a commit that referenced this pull request Nov 6, 2024
Fix custom endpoint selection in the S3/SQS input (#39718) by porting @strawgate's 8.14 fix (#39709) to main.

In addition to the previous fixes, this simplifies the logic for detecting queue region, since the 8.14 version still had some broken cases caused by requiring over-strict endpoint matching, and it was concluded (talking to @strawgate) that there's no advantage to rejecting standard region format from queue URLs just because the endpoint URL is different (if there is a genuine mismatch in the queue and endpoint we'll learn it from the connection attempt, not from `getRegionFromQueueURL`).

(cherry picked from commit cf13781)

Co-authored-by: Fae Charlton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Enable builds in the CI for aws cloud testing backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS S3 Input custom endpoint handling broken in 8.15 Issues with SQS custom endpoint setup on Main
4 participants