Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive of non-existent streams #107

Open
alexeyzimarev opened this issue Jun 14, 2022 · 2 comments
Open

Archive of non-existent streams #107

alexeyzimarev opened this issue Jun 14, 2022 · 2 comments

Comments

@alexeyzimarev
Copy link
Contributor

Aiming to solve the following scenario:

  • We are importing data from a legacy system directly to the events archive (Elastic)
  • There's one event per aggregate in the archive, its version is zero
  • When executing a command on an aggregate, events can only be found in the archive
  • When appending the event to the hot store (ESDB), we provide the expected version = 1
  • The append fails as ESDB doesn't have the stream

It doesn't work as the archive use case expected all the events to be first appended to the hot store (ESDB), and then archived rea time. The hot store stream gets truncated by time or size, so the stream version is kept although events are gone.

One thing we tried is to put the expected version to the meta. However, it effectively disables the native optimistic concurrency checks provided by ESDB or any other store that keeps the event number per stream as a unique constraint. In that case, this complexity moves to Eventuous code, which is not desirable.

Proposed solutions:

  • Append dummy events to the imported streams when importing to the archive, then delete those streams. Pro: everything works as it should, no code changes are required in Eventuous. Cons: ESDB data set will include metadata for all the deleted streams. We are talking about hundreds of millions of streams, and most of them will never get new events, so having all those streams in ESDB is inefficient.
  • When loading the aggregate, read events from the archive (as it works today) and append them to ESDB right after. Basically, it's an "unarchive on load" feature. Pros: little code change needed, everything will work as it should. Cons: there's no guarantee that any append will happen after load. It would work quite ok if the aggregate state is only loaded for executing commands, and not for queries.
  • Unarchive the stream when appending new events. Pros: events will be appended to the hot store if the command execution produced new events. Cons: more code changes are required, as the aggregate needs to keep the collection of original events for unarchiving purposes.

Both unarchiving strategies have two potential issues:

  • Unarchived events will get to all the subscriptions, potentially resulting in undesired side effects
  • The connector will try to archive those events too, which will cause data duplication, and the next read will get the archived events twice (unless it fails with a duplicate message-id during the archive)

Potential solutions:

  • Force delete stream after unarchive
  • Annotate specific (import) events so they aren't archived. With this, we'll have a more generic function to prevent some events from being archived at all, which might be useful (but dangerous). It will also spread out to the connector.
@alexeyzimarev
Copy link
Contributor Author

I got an idea of using pre-defined keys with boolean values like:

  • ProjectionIgnore: ignore by read model projections if true
  • ConnectorIgnore: ignore by any connector
  • SubscriptionIgnore: set the value to subscription id or all to ignore.

@paulopez78
Copy link
Collaborator

IMHO I think that's the best tradeoff of all proposed above:

Unarchive the stream when appending new events. Pros: events will be appended to the hot store if the command execution produced new events. Cons: more code changes are required, as the aggregate needs to keep the collection of original events for unarchiving purposes.

It requires some changes to the Aggregate or other components but we don't have to deal with a lot of complexity when subscribing to the stream of events. Also it's the one creating real hot/archived separation without adding more extra pressure to the hot storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Hold
Development

No branches or pull requests

2 participants