Archive of non-existent streams #107

alexeyzimarev · 2022-06-14T11:50:15Z

Aiming to solve the following scenario:

We are importing data from a legacy system directly to the events archive (Elastic)
There's one event per aggregate in the archive, its version is zero
When executing a command on an aggregate, events can only be found in the archive
When appending the event to the hot store (ESDB), we provide the expected version = 1
The append fails as ESDB doesn't have the stream

It doesn't work as the archive use case expected all the events to be first appended to the hot store (ESDB), and then archived rea time. The hot store stream gets truncated by time or size, so the stream version is kept although events are gone.

One thing we tried is to put the expected version to the meta. However, it effectively disables the native optimistic concurrency checks provided by ESDB or any other store that keeps the event number per stream as a unique constraint. In that case, this complexity moves to Eventuous code, which is not desirable.

Proposed solutions:

Append dummy events to the imported streams when importing to the archive, then delete those streams. Pro: everything works as it should, no code changes are required in Eventuous. Cons: ESDB data set will include metadata for all the deleted streams. We are talking about hundreds of millions of streams, and most of them will never get new events, so having all those streams in ESDB is inefficient.
When loading the aggregate, read events from the archive (as it works today) and append them to ESDB right after. Basically, it's an "unarchive on load" feature. Pros: little code change needed, everything will work as it should. Cons: there's no guarantee that any append will happen after load. It would work quite ok if the aggregate state is only loaded for executing commands, and not for queries.
Unarchive the stream when appending new events. Pros: events will be appended to the hot store if the command execution produced new events. Cons: more code changes are required, as the aggregate needs to keep the collection of original events for unarchiving purposes.

Both unarchiving strategies have two potential issues:

Unarchived events will get to all the subscriptions, potentially resulting in undesired side effects
The connector will try to archive those events too, which will cause data duplication, and the next read will get the archived events twice (unless it fails with a duplicate message-id during the archive)

Potential solutions:

Force delete stream after unarchive
Annotate specific (import) events so they aren't archived. With this, we'll have a more generic function to prevent some events from being archived at all, which might be useful (but dangerous). It will also spread out to the connector.

alexeyzimarev · 2022-09-28T12:23:47Z

I got an idea of using pre-defined keys with boolean values like:

ProjectionIgnore: ignore by read model projections if true
ConnectorIgnore: ignore by any connector
SubscriptionIgnore: set the value to subscription id or all to ignore.

paulopez78 · 2022-10-17T20:57:33Z

IMHO I think that's the best tradeoff of all proposed above:

Unarchive the stream when appending new events. Pros: events will be appended to the hot store if the command execution produced new events. Cons: more code changes are required, as the aggregate needs to keep the collection of original events for unarchiving purposes.

It requires some changes to the Aggregate or other components but we don't have to deal with a lot of complexity when subscribing to the stream of events. Also it's the one creating real hot/archived separation without adding more extra pressure to the hot storage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archive of non-existent streams #107

Archive of non-existent streams #107

alexeyzimarev commented Jun 14, 2022

alexeyzimarev commented Sep 28, 2022

paulopez78 commented Oct 17, 2022

Archive of non-existent streams #107

Archive of non-existent streams #107

Comments

alexeyzimarev commented Jun 14, 2022

alexeyzimarev commented Sep 28, 2022

paulopez78 commented Oct 17, 2022