Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for Kinesis source in Data prepper #8252

Merged
merged 17 commits into from
Oct 23, 2024

Conversation

sb2k16
Copy link
Member

@sb2k16 sb2k16 commented Sep 13, 2024

Description

This PR adds documentation for a new kinesis source in Data Prepper. It includes the following:

  • Example configuration
  • Required configuration attributes along with their description
  • Required permissions to run kinesis source

Issues Resolved

N/A

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

@sb2k16 sb2k16 force-pushed the kinesis-documentation-update branch from 52890fe to 36b39e3 Compare September 13, 2024 16:11
@vagimeli vagimeli added 3 - Tech review PR: Tech review in progress data-prepper labels Sep 13, 2024
@vagimeli
Copy link
Contributor

@dlvenable This PR is tagged as a first-time contributor. Will you or your team provide a technical review? Once that review is done, I'll do a doc review. Thank you.

@dlvenable
Copy link
Member

@sb2k16 , @vagimeli , This is a new feature for Data Prepper 2.10, which has not yet been released. We should target this to a new branch. I just created one named data-prepper-2.10.

@sb2k16 , Can you edit your PR to target the data-prepper-2.10 branch?

@vagimeli
Copy link
Contributor

@sb2k16 Is this PR ready for a doc review? I can give it a first review and then give it a final review before the Data Prepper 2.10.

Option | Required | Type | Description
:--- | :--- |:--------| :---
`max_polling_records` | No | Integer | The number of records to fetch from Kinesis during a single call to get Kinesis stream records.
`idle_time_between_reads` | No | Integer | The time duration to sleep in between calls to get Kinesis stream records.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Duration, not an Integer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dlvenable. I have made the changes.


# kinesis

You can use `kinesis` source in Data Prepper to ingest records from one or more [Amazon Kinesis](https://aws.amazon.com/kinesis/) Data Streams.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link here?

https://aws.amazon.com/kinesis/data-streams/

And then update the anchor to surround all of "Amazon Kinesis Data Streams?"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dlvenable. I have made the changes.

`buffer_timeout` | No | Duration | The amount of time allowed for writing events to the Data Prepper buffer before timeout occurs. Any events that the source cannot write to the buffer during the specified amount of time are discarded. Default is `1s`.
`records_to_accumulate` | No | Integer | The number of messages that accumulate before being written to the buffer. Default is `100`.
`consumer_strategy` | No | String | Consumer strategy to use for ingesting Kinesis data streams. Default is `fan-out`. However, `polling` can also be used. if `polling` is enabled, additional configuration for `polling` will need to be added.
`polling` | No | String | if `consumer_strategy` is set to `polling`, this config will need to be added. Refer to [polling](#polling).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type here is polling. Please update the type column and have it link to the polling section below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dlvenable. I have made the changes.

@sb2k16 sb2k16 changed the base branch from main to data-prepper-2.10 September 30, 2024 23:08
@sb2k16
Copy link
Member Author

sb2k16 commented Sep 30, 2024

@sb2k16 , @vagimeli , This is a new feature for Data Prepper 2.10, which has not yet been released. We should target this to a new branch. I just created one named data-prepper-2.10.

@sb2k16 , Can you edit your PR to target the data-prepper-2.10 branch?

I have edited the PR to target data-prepper-2.10 branch.

Signed-off-by: Souvik Bose <[email protected]>
@vagimeli vagimeli added 4 - Doc review PR: Doc review in progress and removed 3 - Tech review PR: Tech review in progress labels Oct 8, 2024
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
Signed-off-by: Melissa Vagi <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
_analyzers/index.md Outdated Show resolved Hide resolved
Signed-off-by: Melissa Vagi <[email protected]>
@vagimeli
Copy link
Contributor

vagimeli commented Oct 9, 2024

@sb2k16 , @vagimeli , This is a new feature for Data Prepper 2.10, which has not yet been released. We should target this to a new branch. I just created one named data-prepper-2.10.
@sb2k16 , Can you edit your PR to target the data-prepper-2.10 branch?

I have edited the PR to target data-prepper-2.10 branch.

@dlvenable Once you approve, I'll move this PR forward for publishing.

@sb2k16 sb2k16 changed the base branch from data-prepper-2.10 to main October 21, 2024 16:26
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sb2k16 !

@dlvenable
Copy link
Member

@vagimeli , Can you push this PR forward? We'd like to get this documentation out since Data Prepper 2.10 is released.


The `codec` determines how the `kinesis` source parses each Amazon Kinesis Record. For increased and more efficient performance, you can use [codec combinations]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/codec-processor-combinations/) with certain processors.

### `newline` codec

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add JSON codec here?

Doc review complete. Edits made for clarity and style. No technical changes made. Moving to editorial. 

Signed-off-by: Melissa Vagi <[email protected]>
@vagimeli vagimeli added 5 - Editorial review PR: Editorial review in progress and removed 4 - Doc review PR: Doc review in progress labels Oct 21, 2024
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sb2k16 @vagimeli Please see my changes and let me know if you have any questions. Thanks!


# kinesis

You can use `kinesis` source in Data Prepper to ingest records from one or more [Amazon Kinesis Data Streams](https://aws.amazon.com/kinesis/data-streams/).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can use `kinesis` source in Data Prepper to ingest records from one or more [Amazon Kinesis Data Streams](https://aws.amazon.com/kinesis/data-streams/).
You can use the Data Prepper `kinesis` source to ingest records from one or more [Amazon Kinesis Data Streams](https://aws.amazon.com/kinesis/data-streams/).

Option | Required | Type | Description
:--- |:---------| :--- | :---
`stream_name` | Yes | String | Defines the name of each Kinesis stream.
`initial_position` | No | String | Sets the `initial_position` to determine where the `kinesis` source starts reading stream records. Use `LATEST` to start from the most recent record or `EARLIEST` to start from the beginning of the stream. Default is `LATEST`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`initial_position` | No | String | Sets the `initial_position` to determine where the `kinesis` source starts reading stream records. Use `LATEST` to start from the most recent record or `EARLIEST` to start from the beginning of the stream. Default is `LATEST`.
`initial_position` | No | String | Sets the `initial_position` to determine at what point the `kinesis` source starts reading stream records. Use `LATEST` to start from the most recent record or `EARLIEST` to start from the beginning of the stream. Default is `LATEST`.

`stream_name` | Yes | String | Defines the name of each Kinesis stream.
`initial_position` | No | String | Sets the `initial_position` to determine where the `kinesis` source starts reading stream records. Use `LATEST` to start from the most recent record or `EARLIEST` to start from the beginning of the stream. Default is `LATEST`.
`checkpoint_interval` | No | Duration | Configure the `checkpoint_interval` to periodically checkpoint Kinesis streams and avoid duplication of record processing. Default is `PT2M`.
`compression` | No | String | Specifies the compression format. To decompress records added by the [CloudWatch subscription filter](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html) to Kinesis, use the `gzip` compression format.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`compression` | No | String | Specifies the compression format. To decompress records added by the [CloudWatch subscription filter](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html) to Kinesis, use the `gzip` compression format.
`compression` | No | String | Specifies the compression format. To decompress records added by a [CloudWatch Logs subscription filter](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html) to Kinesis, use the `gzip` compression format.


## codec

The `codec` determines how the `kinesis` source parses each Amazon Kinesis Record. For increased and more efficient performance, you can use [codec combinations]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/codec-processor-combinations/) with certain processors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `codec` determines how the `kinesis` source parses each Amazon Kinesis Record. For increased and more efficient performance, you can use [codec combinations]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/codec-processor-combinations/) with certain processors.
The `codec` determines how the `kinesis` source parses each Kinesis stream record. For increased and more efficient performance, you can use [codec combinations]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/codec-processor-combinations/) with certain processors.


### `newline` codec

The newline codec parses each Kinesis stream record as a single log event, making it ideal for processing single-line records. It also works well with the [`parse_json` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) to parse each line.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The newline codec parses each Kinesis stream record as a single log event, making it ideal for processing single-line records. It also works well with the [`parse_json` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) to parse each line.
The `newline` codec parses each Kinesis stream record as a single log event, making it ideal for processing single-line records. It also works well with the [`parse_json` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) to parse each line.

_data-prepper/pipelines/configuration/sources/kinesis.md Outdated Show resolved Hide resolved
_data-prepper/pipelines/configuration/sources/kinesis.md Outdated Show resolved Hide resolved
_data-prepper/pipelines/configuration/sources/kinesis.md Outdated Show resolved Hide resolved

* `recordsProcessed`: Counts the number of stream records processed from Kinesis streams.
* `recordProcessingErrors`: Counts the number of processing errors for stream records from Kinesis streams.
* `acknowledgementSetSuccesses`: Tracks the total number stream records processed that were successfully added to sink.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `acknowledgementSetSuccesses`: Tracks the total number stream records processed that were successfully added to sink.
* `acknowledgementSetSuccesses`: Counts the number of processed stream records that were successfully added to the sink.

_data-prepper/pipelines/configuration/sources/kinesis.md Outdated Show resolved Hide resolved
Signed-off-by: Souvik Bose <[email protected]>
@sb2k16 sb2k16 force-pushed the kinesis-documentation-update branch from 077e12f to 4cb7caf Compare October 22, 2024 22:49
@vagimeli vagimeli merged commit ae8e763 into opensearch-project:main Oct 23, 2024
5 checks passed
@kolchfa-aws kolchfa-aws added the backport 2.17 Backport for version 2.17 label Oct 31, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.17 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.17 2.17
# Navigate to the new working tree
pushd ../.worktrees/backport-2.17
# Create a new branch
git switch --create backport/backport-8252-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ae8e7632d4e09a4d553e51ca745d26fa4f3102e8
# Push it to GitHub
git push --set-upstream origin backport/backport-8252-to-2.17
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.17

Then, create a pull request where the base branch is 2.17 and the compare/head branch is backport/backport-8252-to-2.17.

@kolchfa-aws kolchfa-aws mentioned this pull request Oct 31, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Editorial review PR: Editorial review in progress backport 2.17 Backport for version 2.17 data-prepper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants