-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref: Move rust-arroyo from snuba to arroyo #323
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Experimental project that will allow us to develop and test Rust consumers alongside the existing Python ones. This is entirely based off @dbanda's branch `dbanda/rust-consumer` except with all of the code except `rust_arroyo` removed. The `rust_arroyo` crate is an exact copy https://github.com/getsentry/rust_arroyo, which will allow us to use it here without publishing a separate library yet.
Arroyo is a bit finnicky about the types of configuration you are supposed to use with it, let's use the `new_consumer_config` way
`Position` struct is no longer needed. It was a hack used for the legacy Snuba consumer, and has been removed in Python Arroyo: #165. We are removing it here to bring the two libraries in line.
…3885) Same as we did in Python Arroyo #134. The processing strategy interface needs to support messages that represent a batch or a filter message - i.e. not necessarily a 1:1 relationship between messages from the Kafka broker and a message passed to the processing strategy. This means a message can be associated with a number of partitions and offsets, not just a single one.
Follw up to getsentry/snuba#3883. commit_positions and stage_positions should be called commit_offsets/ stage_offset like in Python's Arroyo library.
* merge in produce example * rename hash password to reverse string. more accurately describes strategy * add using futures in the queue * handle future * use tokio runtime
Tests are currently broken - this fixes them and runs in CI so we don't break them again
I think this test was flaky because it used the same topic as the test that preceded it. Seems to do a lot better with a different name.
Currently the consumer does not do any batching, add a reduce strategy to help with this.
Now it passes batches to the fake ClickHouse writer step instead of one message at a time. Depends on getsentry/snuba#4061
* unlock mutex to avoid deadlock * use take to make linter happy
Now ClickHouse is available in Rust CI so we can test writing to ClickHouse functionality. We are also running Rust CI runs on all branches now. This is needed since Rust consumer calls Python so a change in Python can potentially break the Rust consumer.
feat(rust-consumer): Add basic clickhouse strategy Same idea as getsentry/snuba#4437 with a few differences: - Does not switch the arroyo interface to async - Does not write to ClickHouse by default (only if --no-skip-write is passed) - Batches inserted rows into fewer writes with multiple rows
There's no point having in the clickhouse_client living in the arroyo package since it's only by the ClickHouse strategy in Snuba which is not part of Arroyo.
Break the task scheduling code out of the Clickhouse writer strategy so it can be used for parallel execution of any functions, such as custom message processors. The ClickHouse strategy now uses RunTaskInThreads under the hood.
Also refactor rust CI setup to work with Snuba installed
Fixes failing CI since Arroyo dependencies are not currently pinned.
* feat(rust): Multiprocessing with Python message processors Spawn multiple subprocesses in Rust, and run the Python message processor in each subprocess By default there is only one process. * add tests and fix a ton of bugs * fix typing * fix rust linting * just use a real processor lol
…4655)" This reverts commit bb1e799dcb8a6b03ece9ef49051737105bf5c8ba. Co-authored-by: lynnagara <[email protected]>
…tempt (#4673) * Revert "Revert "feat(rust): Multiprocessing with Python message processors (#4655)"" This reverts commit 2a45cdd4a55d84c5b9bfbf684b85aa214cdb37f9. * run without process overhead if only one process * do not commit broken code to master, impossible challenge * attempt to fix keyerror in CI * fix tests on macos * fix python typing and rustfmt
Add statsd and testing metric backends to rust_snuba. Remove the metric implementation and dependency on cadence from Arroyo - users of Arroyo should provide the trait implementation. Pass the dogstatsd host and port from Python to Rust via config. Also start logging one metric ("arroyo.consumer.join.time") to verify it works.
* wip * fix up tests and remove useless seek * clippy and fmt
* ref(rust): Remove explicit buffering of gauges statsdproxy should be able to buffer gauges efficiently * fmt
…5294)" This reverts commit 0409b3f45fd65f89885d9be6bf048b16416fb70d. Co-authored-by: untitaker <[email protected]>
* add testcase * fix assertion
Especially in benchmarks, creating a fresh runtime for every run leads to thousands of threads being spawned which makes local profiling difficult
That function is interning the topic name, and thus aquiring a Mutex. If we keep a list of subscribed topics locally, we avoid that and can also guard against errors.
…og implementation (#5311) * ref(devserver): It's all Rust now Move all consumers to use Rust consumers, be it either using the Rust message processors or using the hybrid mode to execute the existing processing functions written in Python. * fixup * remove debug logging too * fix benchmark * also run full sentry testsuite when devserver changes * add no-skip-write * fix commitlog implementation * formatting and fix benchmarks * fmt * fix arroyo test
* fix: Remove any panics in threads This causes a lot of double-erroring, and at least within RunTaskInThreads we can carry forward errors normally. We still have to remove panics in the main thread, but this should already cut down a few duplicate errors we are seeing today. * fix tests
* Add Metrics impl based on `merni` I pretty much copy-pasted the core `merni` code, which is not that much. Also built shims to statsdproxy, and updated all usages of the previous `Metrics` trait and `get_metrics` to use the global macros. * Update rust_snuba/rust_arroyo/src/metrics/globals.rs Co-authored-by: Sebastian Zivota <[email protected]> --------- Co-authored-by: Sebastian Zivota <[email protected]>
…ion tag (#5346) * ref(metrics): Refactor how global tags work, and introduce min_partition tag Port the min_partition tag from snuba's python consumer. In order to do that, we need a way to mutate global tags. So we're now using a custom global tags implementation based on statsdproxy instead of cadence's global tags support. This might be more efficient than cadence's global tags because adding them to a metric happens after aggregation, but i haven't measured it. Additionally, I found that our strategy factory's "create" function doesn't receive partitions. I don't want to add that argument though because doing so has been really painful for optimizing rebalancing performance in the past. Instead I propose to split up the API into two parts: a callback to recreate strategies, and a callback to update the currently assigned partitions. depends on getsentry/statsdproxy#42 * bump statsdproxy * fix formatting and benchmarks * bump statsdproxy to crates.io version * add basic test * better test, copypasted from statsdproxy * make private
* add rust concurrency metric * share metrics prefix * move metric strategy_name to self
untitaker
changed the title
ref/add rust arroyo
ref: Move rust-arroyo from snuba to arroyo
Jan 17, 2024
I would agree with not squashing this. |
loewenheim
approved these changes
Jan 17, 2024
lynnagara
approved these changes
Jan 17, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Move
./rust_snuba/rust_arroyo/
from snuba into arroyo, preserving history.Steps used to preserve history:
maybe we should not squash-merge this one? unsure.
this PR doesn't contain any other code changes. followup PRs include: