Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade azure-eventhub to the new Event Hub SDK #39796

Merged
merged 41 commits into from
Aug 7, 2024

Conversation

zmoog
Copy link
Contributor

@zmoog zmoog commented Jun 4, 2024

Proposed commit message

Restructure the azure-eventhub input, rebranding the current version as processor v1. Add a brand new processor v2, allowing users to select which version to use in the config:

  • processor v1: uses the legacy Event Hub SDK (default processor, at least for 8.15)
  • processor v2: uses the modern Event Hub SDK

Why are we introducing a processor v2?

Notes for reviewers

Overview

To help with the review, here is an overview of the main flow of the processor v2-based input.

  • The processor v2 starts a new consumer for each event hub partition.
  • Each consumer creates a pipeline client.
  • When a consumer receives an event, it decodes it and sends it to the pipeline client.
  • When the pipeline successfully processes the event, it acknowledges with the consumer.
  • The consumer stores the sequence number of the last successful event in the partition blob in the storage account container.

image

New features

  • Replace the legacy SDK with the new modern and supported SDK
  • Add support for publishing ACKs
  • Add a migration assistant to migrate checkpoint v1 information to the v2 format

Replace the legacy SDK with the new modern and supported SDK

The new SDK is more flexible and allows us to implement new features and configuration options.

Add support for publishing ACKs

Now, the processor v2 updates the sequence number only when the events have been successfully delivered to Elasticsearch.

Add a migration assistant to migrate checkpoint v1 information to the v2 format

On the first start of the processor v2, the migration assistant (enabled by default) checks if checkpoint v1 information exists from processor v1 and migrates them to the v2 format.

See "Scenario 001: Migration" at x-pack/filebeat/input/azureeventhub/README.md for more details.

New configuration options

There are new configuration options for v2:

  • storage_account_connection_string (required) to authenticate with the storage account container.
  • migrate_checkpoint (optional, default: yes) controls if the processor v2 should check and migrate checkpoint v1 information on start.
  • processor_version (optional, default: v1) which processor version to use.
  • processor_update_interval (optional, default: 10s) time interval between checking if new partitions are available.
  • processor_start_position (optional, default: earliest) controls if the processor should start from the beginning earliest or the latest event in the event hub retention period.
  • partition_receive_timeout (optional, default: 5s)
  • partition_receive_count (optional, default: 100)

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

How to test this PR locally

See "Test Scenarios" section in the x-pack/filebeat/input/azureeventhub/README.md file.

Related issues

Use cases

Screenshots

Logs

Author's Checklist

@zmoog zmoog self-assigned this Jun 4, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 4, 2024
Copy link
Contributor

mergify bot commented Jun 4, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @zmoog? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 5, 2024
@pierrehilbert pierrehilbert added Team:obs-ds-hosted-services Label for the Observability Hosted Services team needs_team Indicates that the issue/PR needs a Team:* label Team:Cloud-Monitoring Label for the Cloud Monitoring team Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team labels Jun 5, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 5, 2024
@botelastic
Copy link

botelastic bot commented Jun 5, 2024

This pull request doesn't have a Team:<team> label.

@zmoog zmoog force-pushed the zmoog/azure-eventhub-sdk-upgrade branch from 88abdd1 to 9afb38a Compare June 28, 2024 06:06
@@ -93,58 +99,6 @@ func TestProcessEvents(t *testing.T) {
assert.Equal(t, message, single)
}

func TestParseMultipleRecords(t *testing.T) {
Copy link
Contributor

@gizas gizas Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you removed those? Because you think that v1 is not going to be used anymore?
And you have the decoder_test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the message parsing to the decoder to share it with the v1 and v2 processors. The decoder has its own tests based on the original v1 processor tests.

Copy link
Contributor

mergify bot commented Jul 5, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b zmoog/azure-eventhub-sdk-upgrade upstream/zmoog/azure-eventhub-sdk-upgrade
git merge upstream/main
git push upstream zmoog/azure-eventhub-sdk-upgrade

consumerGroup string) error {

// v2 checkpoint information path
// mbranca-general.servicebus.windows.net/sdh4552/$Default/checkpoint/0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, I need to rephrase this comment to make it more useful.


size, err := cln.DownloadBuffer(ctx, buff[:], nil)
if err != nil {
return fmt.Errorf("failed to download checkpoint v1 information for partition %s: %w", partitionID, err)
Copy link
Contributor

@gizas gizas Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this error be printed? I think you have to introduce a log.Error here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the migrationAssistant fails to migrate a partition, the run() function should print all the wrapped errors.

I will add a test to double-check it happens.


offset, err := strconv.ParseInt(checkpointV1.Checkpoint.Offset, 10, 64)
if err != nil {
return fmt.Errorf("failed to parse offset: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test this as well.

zmoog and others added 16 commits August 2, 2024 12:40
Alongside the partition ID, users can optional send event with a
partition key.

Add an (optional) partition key to the event hub metadata.
The new migrate_checkpoint config option controls if the input v2
should perform a migration check on start.

If migrate_checkpoint is true, the input checks and performs the
migration (if v1 info exists) on the very first v2 run.

If migrate_checkpoint is false, the input will skip the migration
assistant and will not perform any checks or migration.
Expand processor options by adding a new `start_position`
configuration.

Possible values for `start_position` are:

- "earliest" to start from the beginning of the event hub retention
  period.
- "latest" to start from new events.

The input uses the 'start_position' option when checkpoint information
from the storage account container is unavailable (on the input's
first start).
Makes the receive configuration settings available for customization
on the input settings.

The current default values (receive_timeout: 5s, receive_count: 100)
are probably fine, but it is better to make these options available
to users.
Co-authored-by: Andrew Gizas <[email protected]>
Co-authored-by: subham sarkar <[email protected]>
Also update the option description as well adding the default value.
It's better to check the private data in the event has the expected
type.
@zmoog zmoog force-pushed the zmoog/azure-eventhub-sdk-upgrade branch from 52b3fb8 to feb52b7 Compare August 2, 2024 10:41
Adding more details to the message logged on successful store:

- sequence_number
- offset
- enqueued_time
The teardown() function is responsible to release all the resources
allocated in the setup() function.
@zmoog zmoog requested a review from shmsr August 2, 2024 11:13
Copy link
Member

@shmsr shmsr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Echoing Fae's comment:

Great work and I'm happy to see this quality of code going in :-)

I agree. The code is well-structured and idiomatically written, accompanied by clear and informative comments. LGTM — approving!

@zmoog zmoog merged commit b95a8a0 into elastic:main Aug 7, 2024
122 checks passed
@zmoog zmoog deleted the zmoog/azure-eventhub-sdk-upgrade branch August 7, 2024 23:47
@zmoog zmoog added the backport-8.15 Automated backport to the 8.15 branch with mergify label Aug 8, 2024
mergify bot pushed a commit that referenced this pull request Aug 8, 2024
Restructure the `azure-eventhub` input, rebranding the current version as processor v1. Add a brand new processor v2, allowing users to select which version to use in the config:

- processor v1: uses the [legacy](https://github.com/azure/azure-event-hubs-go) Event Hub SDK (default processor, at least for 8.15)
- processor v2: uses the [modern](https://github.com/azure/azure-sdk-for-go/blob/main/sdk/messaging/azeventhubs/) Event Hub SDK

Why are we introducing a processor v2?

- processor v1 uses deprecated libraries
  - [github.com/Azure/azure-event-hubs-go](http://github.com/Azure/azure-event-hubs-go) (legacy)
  - [github.com/Azure/azure-storage-blob-go](http://github.com/Azure/azure-storage-blob-go) (legacy, [retiring](https://azure.microsoft.com/en-gb/updates/retirement-notice-the-legacy-azure-storage-go-client-libraries-will-be-retired-on-13-september-2024/) on Sep 2024)
- processor v1 does not support publishing acks (mostly due to lack of hooks; the legacy SDK is a black box)

---------

Co-authored-by: Tiago Queiroz <[email protected]>
Co-authored-by: Andrew Gizas <[email protected]>
Co-authored-by: subham sarkar <[email protected]>
(cherry picked from commit b95a8a0)

# Conflicts:
#	go.mod
#	go.sum
zmoog pushed a commit that referenced this pull request Aug 12, 2024
…DK (#40455)

* Upgrade azure-eventhub to the new Event Hub SDK (#39796)

Restructure the `azure-eventhub` input, rebranding the current version as processor v1. Add a brand new processor v2, allowing users to select which version to use in the config:

- processor v1: uses the [legacy](https://github.com/azure/azure-event-hubs-go) Event Hub SDK (default processor, at least for 8.15)
- processor v2: uses the [modern](https://github.com/azure/azure-sdk-for-go/blob/main/sdk/messaging/azeventhubs/) Event Hub SDK

Why are we introducing a processor v2?

- processor v1 uses deprecated libraries
  - [github.com/Azure/azure-event-hubs-go](http://github.com/Azure/azure-event-hubs-go) (legacy)
  - [github.com/Azure/azure-storage-blob-go](http://github.com/Azure/azure-storage-blob-go) (legacy, [retiring](https://azure.microsoft.com/en-gb/updates/retirement-notice-the-legacy-azure-storage-go-client-libraries-will-be-retired-on-13-september-2024/) on Sep 2024)
- processor v1 does not support publishing acks (mostly due to lack of hooks; the legacy SDK is a black box)

---------

Co-authored-by: Tiago Queiroz <[email protected]>
Co-authored-by: Andrew Gizas <[email protected]>
Co-authored-by: subham sarkar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport to the 8.15 branch with mergify input:azure-eventhub Team:Cloud-Monitoring Label for the Cloud Monitoring team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:obs-ds-hosted-services Label for the Observability Hosted Services team Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Azure] Migrate the azure-eventhub input to the new Azure Event Hubs Client Module for Go
9 participants