-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for Kinesis source in Data prepper #8252
Add documentation for Kinesis source in Data prepper #8252
Conversation
Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged. Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer. When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review. |
Signed-off-by: Souvik Bose <[email protected]>
52890fe
to
36b39e3
Compare
@dlvenable This PR is tagged as a first-time contributor. Will you or your team provide a technical review? Once that review is done, I'll do a doc review. Thank you. |
Signed-off-by: Souvik Bose <[email protected]>
@sb2k16 Is this PR ready for a doc review? I can give it a first review and then give it a final review before the Data Prepper 2.10. |
Option | Required | Type | Description | ||
:--- | :--- |:--------| :--- | ||
`max_polling_records` | No | Integer | The number of records to fetch from Kinesis during a single call to get Kinesis stream records. | ||
`idle_time_between_reads` | No | Integer | The time duration to sleep in between calls to get Kinesis stream records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a Duration
, not an Integer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable. I have made the changes.
|
||
# kinesis | ||
|
||
You can use `kinesis` source in Data Prepper to ingest records from one or more [Amazon Kinesis](https://aws.amazon.com/kinesis/) Data Streams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we link here?
https://aws.amazon.com/kinesis/data-streams/
And then update the anchor to surround all of "Amazon Kinesis Data Streams?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable. I have made the changes.
`buffer_timeout` | No | Duration | The amount of time allowed for writing events to the Data Prepper buffer before timeout occurs. Any events that the source cannot write to the buffer during the specified amount of time are discarded. Default is `1s`. | ||
`records_to_accumulate` | No | Integer | The number of messages that accumulate before being written to the buffer. Default is `100`. | ||
`consumer_strategy` | No | String | Consumer strategy to use for ingesting Kinesis data streams. Default is `fan-out`. However, `polling` can also be used. if `polling` is enabled, additional configuration for `polling` will need to be added. | ||
`polling` | No | String | if `consumer_strategy` is set to `polling`, this config will need to be added. Refer to [polling](#polling). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type here is polling
. Please update the type column and have it link to the polling
section below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable. I have made the changes.
I have edited the PR to target |
Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
@dlvenable Once you approve, I'll move this PR forward for publishing. |
Signed-off-by: Souvik Bose <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sb2k16 !
@vagimeli , Can you push this PR forward? We'd like to get this documentation out since Data Prepper 2.10 is released. |
|
||
The `codec` determines how the `kinesis` source parses each Amazon Kinesis Record. For increased and more efficient performance, you can use [codec combinations]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/codec-processor-combinations/) with certain processors. | ||
|
||
### `newline` codec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add JSON codec here?
Doc review complete. Edits made for clarity and style. No technical changes made. Moving to editorial. Signed-off-by: Melissa Vagi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
# kinesis | ||
|
||
You can use `kinesis` source in Data Prepper to ingest records from one or more [Amazon Kinesis Data Streams](https://aws.amazon.com/kinesis/data-streams/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use `kinesis` source in Data Prepper to ingest records from one or more [Amazon Kinesis Data Streams](https://aws.amazon.com/kinesis/data-streams/). | |
You can use the Data Prepper `kinesis` source to ingest records from one or more [Amazon Kinesis Data Streams](https://aws.amazon.com/kinesis/data-streams/). |
Option | Required | Type | Description | ||
:--- |:---------| :--- | :--- | ||
`stream_name` | Yes | String | Defines the name of each Kinesis stream. | ||
`initial_position` | No | String | Sets the `initial_position` to determine where the `kinesis` source starts reading stream records. Use `LATEST` to start from the most recent record or `EARLIEST` to start from the beginning of the stream. Default is `LATEST`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`initial_position` | No | String | Sets the `initial_position` to determine where the `kinesis` source starts reading stream records. Use `LATEST` to start from the most recent record or `EARLIEST` to start from the beginning of the stream. Default is `LATEST`. | |
`initial_position` | No | String | Sets the `initial_position` to determine at what point the `kinesis` source starts reading stream records. Use `LATEST` to start from the most recent record or `EARLIEST` to start from the beginning of the stream. Default is `LATEST`. |
`stream_name` | Yes | String | Defines the name of each Kinesis stream. | ||
`initial_position` | No | String | Sets the `initial_position` to determine where the `kinesis` source starts reading stream records. Use `LATEST` to start from the most recent record or `EARLIEST` to start from the beginning of the stream. Default is `LATEST`. | ||
`checkpoint_interval` | No | Duration | Configure the `checkpoint_interval` to periodically checkpoint Kinesis streams and avoid duplication of record processing. Default is `PT2M`. | ||
`compression` | No | String | Specifies the compression format. To decompress records added by the [CloudWatch subscription filter](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html) to Kinesis, use the `gzip` compression format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`compression` | No | String | Specifies the compression format. To decompress records added by the [CloudWatch subscription filter](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html) to Kinesis, use the `gzip` compression format. | |
`compression` | No | String | Specifies the compression format. To decompress records added by a [CloudWatch Logs subscription filter](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html) to Kinesis, use the `gzip` compression format. |
|
||
## codec | ||
|
||
The `codec` determines how the `kinesis` source parses each Amazon Kinesis Record. For increased and more efficient performance, you can use [codec combinations]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/codec-processor-combinations/) with certain processors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `codec` determines how the `kinesis` source parses each Amazon Kinesis Record. For increased and more efficient performance, you can use [codec combinations]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/codec-processor-combinations/) with certain processors. | |
The `codec` determines how the `kinesis` source parses each Kinesis stream record. For increased and more efficient performance, you can use [codec combinations]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/codec-processor-combinations/) with certain processors. |
|
||
### `newline` codec | ||
|
||
The newline codec parses each Kinesis stream record as a single log event, making it ideal for processing single-line records. It also works well with the [`parse_json` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) to parse each line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newline codec parses each Kinesis stream record as a single log event, making it ideal for processing single-line records. It also works well with the [`parse_json` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) to parse each line. | |
The `newline` codec parses each Kinesis stream record as a single log event, making it ideal for processing single-line records. It also works well with the [`parse_json` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) to parse each line. |
|
||
* `recordsProcessed`: Counts the number of stream records processed from Kinesis streams. | ||
* `recordProcessingErrors`: Counts the number of processing errors for stream records from Kinesis streams. | ||
* `acknowledgementSetSuccesses`: Tracks the total number stream records processed that were successfully added to sink. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `acknowledgementSetSuccesses`: Tracks the total number stream records processed that were successfully added to sink. | |
* `acknowledgementSetSuccesses`: Counts the number of processed stream records that were successfully added to the sink. |
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Souvik Bose <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Souvik Bose <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Souvik Bose <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
077e12f
to
4cb7caf
Compare
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.17 2.17
# Navigate to the new working tree
pushd ../.worktrees/backport-2.17
# Create a new branch
git switch --create backport/backport-8252-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ae8e7632d4e09a4d553e51ca745d26fa4f3102e8
# Push it to GitHub
git push --set-upstream origin backport/backport-8252-to-2.17
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.17 Then, create a pull request where the |
Description
This PR adds documentation for a new
kinesis
source in Data Prepper. It includes the following:kinesis
sourceIssues Resolved
N/A
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.