Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Snapshot Load Batch Queue #434

Draft
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

to11mtm
Copy link
Member

@to11mtm to11mtm commented Aug 11, 2024

Will not fix #432 but will probably help on some level

Changes

This PR changes Getting default snapshot store load operation of getting the 'newest' snapshot (i.e. not caring about SEQ or timestamp) from being a multitude of snapshot requests into a queue, similar to how we handle Write requests on Journals.

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

Latest dev Benchmarks

We don't really have benchmarks against snapshot. if someone wants to add I'm happy to rebase/etc and test

This PR's Benchmarks

See above.

Stuff I still need to validate:

The main difference of course is that this is a -read- and more complex than a sequence number [0], and as such I did take the -small- step of splitting out deserialization from the main read logic as a separate stage.

This may have a minor performance impact on cases where only a small number of actors are recovering, however it should greatly improve recovery performance under load. Overall intent is to have the numbers be configurable for a given case (maybe we have a fallback switch for those who want old behavior?)

[0] - Hint hint ;)

@to11mtm
Copy link
Member Author

to11mtm commented Aug 11, 2024

This doesn't pipeline writes, but it does pipeline reads which...

  1. is very useful for performance in real world, alas writing a benchmark for true 'contentious' states is not fun... but similar to how write batching helps there, this -will- help read batching, it's mostly a matter of right-sizing the params for the stream for your overall case (i.e. latency vs throughput is dependent on size and number of snapshots... big thing here is we may not be faster for all cases but should be more stable, but still tunable to get faster than before in almost all cases.

tl;dr- this is a change on reads will help overall perf in a real-world system most likely.

  1. Is an example of how one -could- take the example further for other reads (e.x. expand this concept for other snapshot queries or perhaps for journal sequences, I would say 'other reads' but general journal reads are tricky for lots of reasons...)

tl;dr- we can do this pattern or a variant in other places for reads, and it will help with performance under load scenarios.

  1. Is admittedly not optimized for all SQL cases. i.e. depending on dialect there are better ways to 'do' this, they aren't that ugly/hard to wire up. I didn't do them because I am trying to get better about overdoing first revisions. >_< But I know PG for instance can do -way- better using DISTINCT ON, see https://stackoverflow.com/a/34715134 . Also the 'what do I need to Option.None' logic could use improvement.

tl;dr- this PR is a basic implementation and may require some optimization to do the 'best thing' for each database. Or in general

Copy link
Member Author

@to11mtm to11mtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added my self-comments, just in case I get bussed/whatever.

Comment on lines +26 to +33
public static class SubFlowExtensions
{
public static Source<TOut, TMat> MergeSubStreamsAsSource<TOut, TMat,
TClosed>(this SubFlow<TOut, TMat, TClosed> subFlow)
{
return (Source<TOut,TMat>)(subFlow.MergeSubstreams());
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for feedback on this method but TBH it would be nice if some form (even if it's not returning Source<TOut,TMat>) was put into akka streams proper. It's really useful

Comment on lines +47 to +64
public readonly record struct SnapshotReadGroup
{
public SnapshotReadGroup(QueryLatestSnapSet a, List<LongSnapshotRow> b, Exception? err)
{
this.a = a;
this.b = b;
this.err = err;
}
public QueryLatestSnapSet a { get; }
public List<LongSnapshotRow> b { get; }
public Exception? err { get; }
public void Deconstruct(out QueryLatestSnapSet a, out List<LongSnapshotRow> b, out Exception? err)
{
a = this.a;
b = this.b;
err = this.err;
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs cleanup also not sure if struct is actually good idea or not here :)

Comment on lines 66 to 78
public class QueryLatestSnapSet
{
public readonly Dictionary<string, List<TaskCompletionSource<Option<SelectedSnapshot>>>> Entries = new();

public void Add(LatestSnapRequestEntry entry)
{
if (Entries.TryGetValue(entry.PersistenceId, out var item) == false)
{
item = Entries[entry.PersistenceId] = new List<TaskCompletionSource<Option<SelectedSnapshot>>>();
}
item.Add(entry.TCS);
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May or may not be better to return this on .Add() for cosmetics/etc.

Comment on lines 111 to 112
int maxSubStreamsForReads = 8; // TODO: Configurable
int maxRequestsPerBatch = 50;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be configurable

.GroupBy(maxSubStreamsForReads, a=> a.PersistenceId.GetHashCode()% maxSubStreamsForReads)
.BatchWeighted(
maxRequestsPerBatch,
a => 1,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know in ShotGlass I had/have a Batcher that could assign a 'zero weight' for cases where a persistence ID was already queued...

FBOW in this first go here I instead decided a flat weight of 1 would be a lot less 'change' but also has -some- benefits for forward progress. (but really mostly because a custom stream stage felt like overkill for first intro of this.)

{
if (connection.UseDateTime)
{
//TODO: Consolidate/fixup different rowtype issues.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am admittedly unsure how we got to having separate categories here; we -should- be able to set all of this up such that either L2DB is doing the 'right thing' for a given db config, and/or we sanely do our queries against whatever mapping is there. (well, actually it's a bit of both... still...)

}
}
}
}).SelectAsync(1,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK if this should -really- be SelectAsync, alas I don't have a clean way to do .Async() on a Subflow ;_; ???

In general the desire here is to ensure that we can fire off results to things without blocking the next read. Probably a cleaner/better way to do this, open to suggestions. <3

Comment on lines 235 to 280
foreach (var result in b)
{
if (a.Entries.TryGetValue(result.PersistenceId, out var toSet))
{
try
{
var res = _longSerializer.Deserialize(result);
if (res.IsSuccess)
{
foreach (var taskCompletionSource in toSet)
{
taskCompletionSource.TrySetResult(res.Success);
}
}
else
{
foreach (var taskCompletionSource in toSet)
{
taskCompletionSource.TrySetException(res.Failure.Value);
}
}
}
catch (Exception e)
{
foreach (var taskCompletionSource in toSet)
{
taskCompletionSource.TrySetException(e);
}
}
}
else
{
tempSet.Add(result.PersistenceId);
}

foreach (var se in tempSet)
{
if (a.Entries.TryGetValue(se, out var setNo))
{
foreach (var taskCompletionSource in setNo)
{
taskCompletionSource.TrySetResult(Option<SelectedSnapshot>.None);
}
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate this and would love cleaner suggestions. <3

string persistenceId,
CancellationToken cancellationToken = default)
{
var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken, _shutdownCts.Token);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably pass the CTS or whatever into the request, for the sake of allowing the token to do the needful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: can't support multiple snapshots taken at same seqNr and persistentId
2 participants