Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2nd Edition #2

Draft
wants to merge 142 commits into
base: main
Choose a base branch
from
Draft

2nd Edition #2

wants to merge 142 commits into from

Conversation

idg10
Copy link

@idg10 idg10 commented Jun 8, 2023

No description provided.

@HowardvanRooijen HowardvanRooijen changed the title Spike/idg modifications 2nd Edition Jun 20, 2023

Data is not always valuable is its raw form. Sometimes we need to consolidate, collate, combine or condense the mountains of data we receive into more consumable bite sized chunks. Consider fast moving data from domains like instrumentation, finance, signal processing and operational intelligence. This kind of data can change at a rate of over ten values per second. Can a person actually consume this? Perhaps for human consumption, aggregate values like averages, minimums and maximums can be of more use.
Data is not always tractable is its raw form. Sometimes we need to consolidate, collate, combine or condense the mountains of data we receive. This might just be a case of reducing the volume of data to a manageable level. For example, consider fast moving data from domains like instrumentation, finance, signal processing and operational intelligence. This kind of data can change at a rate of over ten values per second for individual sources, and much higher rates if we're observing multiple sources. Can a person actually consume this? For human consumption, aggregate values like averages, minimums and maximums can be of more use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like you're talking about the 5 Vs: velocity, volume, value, variety and veracity

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I'm talking about two, of them. Or possibly three, because two of the Vs might actually be the same thing in Rx.

I'm not sure there's a distinction between volume and velocity in an event-driven world. With data at rest, Velocity is really just the rate at which Volume changes. But what's Volume in Rx? Streaming event processing is inherently not about storing data. So you could argue that volume is in fact a concept completely outside of Rx's world view. I wouldn't go that far, not least because I used the word "volume" in this excerpt, but in the context of the 5 Vs, the thing I'm calling "volume" in Rx is what that nomenclature calls "velocity".

I'm also talking about value (and I use that word in the paragraph after this one).

But as for the remaining two, I don't see it.

Veracity: I tend to think of that as being concerned with data cleansing (or at least the need for it). I don't think aggregation is particularly applicable to that (any more or less than any other Rx capability).

Variety: nothing in this chapter is especially well-suited to managing variety (any more or less than anything else in Rx). In practice, you'd likely use a combination of the Transformation and Combining Sequences operators to manage variety.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my "headcanon" Volume in Rx is about the volume of incoming data. Possibly woven into "signal to noise" the volume either being the noise, and Rx is allows you to perform Signals Intelligence. The ability to ingest and process huge volumes of raw data, rather than say, and analytical or ever observability perspective, which is about aggregating / (map) reducing to make everything human understandable. Whereas Rx we want to process the data at a machine readable fidelity (i.e. think about broadband telemetry processing).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Volume in Rx is about the volume of incoming data

Well nobody can argue with that. Volume is definitely all about the volume.

But it still doesn't clarify which of the two meanings you have in mind for "volume".

Consider this:

  1. 1,000 messages per second for 1,000 seconds
  2. 100,000 messages per second for 10 seconds

That 1 million messages either way, right?

In the classic 5 Vs, the volume (1 million messages) is the same in both cases. But the velocity is different.

In Rx, this classic 5 Vs interpretation of volume could be thought of as:

source.Sum(msg => msg.Size)

or maybe even just:

source.Count()

In the 5 Vs, volume is just: how much data is there in total? There's no time component. And that's inherently not a reactive concept. With both those queries you only get an answer after the input stops. If you're in a "there's always more" streaming world, you never actually get an answer. You could do a running total:

var volumeSoFar = source.Scan(0, (vol, msg) => vol + msg.Size);

...but it feels like an awkward thing to do. I just don't think there's really a natural place for the classic "volume" (how much is in my data lake?) from the 5 Vs in Rx.

Reading what you've written I don't think that's how you're thinking about volume in Rx. It sounds to me more like in your headcanon, volume and velocity are indeed the same thing: they're basically the rate at which data arrives. E.g.:

// Every second, this will report the number of messages received in the last second.
source.Buffer(TimeSpan.FromSecond(1)).Select(b => b.Count);
Measure Category
1 million messages Volume
100 messages per second Velocity

You can achieve 1 million messages of volume either by running for about 2 hours and 45 minutes at 100 messages per second. Or you can achieve it by running for 10 seconds at 100,000 messages per second.

If, in those examples, you think "Yes, the 100 messages per second example is exactly the same volume as the 100,000 messages per second example" then you're in the 5 Vs world (but it seems like a strange point of view in Rx to me). And if you think "No, obviously the 100,000 messages per second is a higher volume scenario than 100 messages per second" then you're using volume as a synonym for velocity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is there another interesting angle that you can process historic data and ignore the temporal aspects and instead pivot towards consuming more resources to processing that volume of data, while maintaining those temporal semantics (using virtual time) and thus you are maximising both volume and velocity, which sounds like it's a quite unique combination in the data processing space - it's certainly a USP for Reaqtor.

Copy link
Member

@HowardvanRooijen HowardvanRooijen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B_Disposables.md Reviewed

content/B_Disposables.md Outdated Show resolved Hide resolved
content/B_Disposables.md Show resolved Hide resolved
content/B_Disposables.md Outdated Show resolved Hide resolved
content/B_Disposables.md Outdated Show resolved Hide resolved
Copy link
Member

@HowardvanRooijen HowardvanRooijen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C_UsageGuidelines.md Reviewed

content/C_UsageGuidelines.md Show resolved Hide resolved
content/C_UsageGuidelines.md Outdated Show resolved Hide resolved
content/D_AlgebraicUnderpinnings.md Show resolved Hide resolved
content/D_AlgebraicUnderpinnings.md Outdated Show resolved Hide resolved
content/D_AlgebraicUnderpinnings.md Show resolved Hide resolved
content/D_AlgebraicUnderpinnings.md Outdated Show resolved Hide resolved
content/D_AlgebraicUnderpinnings.md Show resolved Hide resolved
Inexplicably, this chapter was written mostly in HTML in a `.md` file, which confused some parts of our tooling. I've recast it all as markdown.

I've also written a short note to say you can use most of these disposable helpers in non-Rx code too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants