Skip to content

Commit

Permalink
fix: Default input block size should be larger
Browse files Browse the repository at this point in the history
We were under the assumption that we can pick an arbitrarily low value
here since it will be automatically resized. However, in the case where
a single message is larger than the block size, the consumer can still
crash.

This is too hard to fix right now and not worth it, so hardcode it to
16M (which is more than the max Kafka message size), and hope it never
happens again.

The original purpose of dynamically resizing input blocks was to make
performance tuning simpler, and IMO we still have achieved that.
  • Loading branch information
untitaker committed Nov 21, 2023
1 parent 03697f3 commit bdbf495
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions arroyo/processing/strategies/run_task_with_multiprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@
TResult = TypeVar("TResult")
TBatchValue = TypeVar("TBatchValue")

DEFAULT_INPUT_BLOCK_SIZE = 16 * 1024
DEFAULT_OUTPUT_BLOCK_SIZE = 16 * 1024
DEFAULT_INPUT_BLOCK_SIZE = 16 * 1024 * 1024
DEFAULT_OUTPUT_BLOCK_SIZE = 16 * 1024 * 1024

LOG_THRESHOLD_TIME = 20 # In seconds

Expand Down

0 comments on commit bdbf495

Please sign in to comment.