2nd Edition #2

idg10 · 2023-06-08T16:45:52Z

No description provided.

Moved some stuff into placeholders in later chapters.

content/06_Transformation.md

HowardvanRooijen · 2023-06-20T17:57:58Z

content/07_Aggregation.md


-Data is not always valuable is its raw form. Sometimes we need to consolidate, collate, combine or condense the mountains of data we receive into more consumable bite sized chunks. Consider fast moving data from domains like instrumentation, finance, signal processing and operational intelligence. This kind of data can change at a rate of over ten values per second. Can a person actually consume this? Perhaps for human consumption, aggregate values like averages, minimums and maximums can be of more use.
+Data is not always tractable is its raw form. Sometimes we need to consolidate, collate, combine or condense the mountains of data we receive. This might just be a case of reducing the volume of data to a manageable level. For example, consider fast moving data from domains like instrumentation, finance, signal processing and operational intelligence. This kind of data can change at a rate of over ten values per second for individual sources, and much higher rates if we're observing multiple sources. Can a person actually consume this? For human consumption, aggregate values like averages, minimums and maximums can be of more use.


Feels like you're talking about the 5 Vs: velocity, volume, value, variety and veracity

I believe I'm talking about two, of them. Or possibly three, because two of the Vs might actually be the same thing in Rx.

I'm not sure there's a distinction between volume and velocity in an event-driven world. With data at rest, Velocity is really just the rate at which Volume changes. But what's Volume in Rx? Streaming event processing is inherently not about storing data. So you could argue that volume is in fact a concept completely outside of Rx's world view. I wouldn't go that far, not least because I used the word "volume" in this excerpt, but in the context of the 5 Vs, the thing I'm calling "volume" in Rx is what that nomenclature calls "velocity".

I'm also talking about value (and I use that word in the paragraph after this one).

But as for the remaining two, I don't see it.

Veracity: I tend to think of that as being concerned with data cleansing (or at least the need for it). I don't think aggregation is particularly applicable to that (any more or less than any other Rx capability).

Variety: nothing in this chapter is especially well-suited to managing variety (any more or less than anything else in Rx). In practice, you'd likely use a combination of the Transformation and Combining Sequences operators to manage variety.

In my "headcanon" Volume in Rx is about the volume of incoming data. Possibly woven into "signal to noise" the volume either being the noise, and Rx is allows you to perform Signals Intelligence. The ability to ingest and process huge volumes of raw data, rather than say, and analytical or ever observability perspective, which is about aggregating / (map) reducing to make everything human understandable. Whereas Rx we want to process the data at a machine readable fidelity (i.e. think about broadband telemetry processing).

Volume in Rx is about the volume of incoming data

Well nobody can argue with that. Volume is definitely all about the volume.

But it still doesn't clarify which of the two meanings you have in mind for "volume".

Consider this:

1,000 messages per second for 1,000 seconds

100,000 messages per second for 10 seconds

That 1 million messages either way, right?

In the classic 5 Vs, the volume (1 million messages) is the same in both cases. But the velocity is different.

In Rx, this classic 5 Vs interpretation of volume could be thought of as:

source.Sum(msg => msg.Size)

or maybe even just:

source.Count()

In the 5 Vs, volume is just: how much data is there in total? There's no time component. And that's inherently not a reactive concept. With both those queries you only get an answer after the input stops. If you're in a "there's always more" streaming world, you never actually get an answer. You could do a running total:

var volumeSoFar = source.Scan(0, (vol, msg) => vol + msg.Size);

...but it feels like an awkward thing to do. I just don't think there's really a natural place for the classic "volume" (how much is in my data lake?) from the 5 Vs in Rx.

Reading what you've written I don't think that's how you're thinking about volume in Rx. It sounds to me more like in your headcanon, volume and velocity are indeed the same thing: they're basically the rate at which data arrives. E.g.:

// Every second, this will report the number of messages received in the last second. source.Buffer(TimeSpan.FromSecond(1)).Select(b => b.Count);

Measure Category

1 million messages Volume

100 messages per second Velocity

You can achieve 1 million messages of volume either by running for about 2 hours and 45 minutes at 100 messages per second. Or you can achieve it by running for 10 seconds at 100,000 messages per second.

If, in those examples, you think "Yes, the 100 messages per second example is exactly the same volume as the 100,000 messages per second example" then you're in the 5 Vs world (but it seems like a strange point of view in Rx to me). And if you think "No, obviously the 100,000 messages per second is a higher volume scenario than 100 messages per second" then you're using volume as a synonym for velocity.

Or is there another interesting angle that you can process historic data and ignore the temporal aspects and instead pivot towards consuming more resources to processing that volume of data, while maintaining those temporal semantics (using virtual time) and thus you are maximising both volume and velocity, which sounds like it's a quite unique combination in the data processing space - it's certainly a USP for Reaqtor.

content/11_SchedulingAndThreading.md

content/A_IoStreams.md

HowardvanRooijen

B_Disposables.md Reviewed

content/B_Disposables.md

HowardvanRooijen

C_UsageGuidelines.md Reviewed

content/C_UsageGuidelines.md

content/D_AlgebraicUnderpinnings.md

Inexplicably, this chapter was written mostly in HTML in a `.md` file, which confused some parts of our tooling. I've recast it all as markdown. I've also written a short note to say you can use most of these disposable helpers in non-Rx code too.

content/B_Disposables.md

…nfused

…itHub

…ked Felix's suggestion

…erstood as I thought

idg10 added 14 commits May 5, 2023 07:22

Add experimental notebook

8a08f2e

Decent first draft of ch01

7243a4f

WIP on ch02

ef67a52

More progress on Ch02

026b062

Ch01 and ch02 structure now in place

9f6cf6b

Moved some stuff into placeholders in later chapters.

Lifetime Management chapter content moved into other chapters

ba2b914

Work in progress on Creating Observable Sequences

ce3ef9e

First full draft of Creating Observable Sequences

8371a36

Add hot/cold section to Key Types

32fa630

Add PART 2 break

ee34c18

First full draft of Filtering

b90523a

Remove Inspection chapter. Start on Transformation

3e429b4

Transformation WIP

83674a9

Feature complete draft of Transformation chapter

5dec03d

HowardvanRooijen changed the title ~~Spike/idg modifications~~ 2nd Edition Jun 20, 2023

HowardvanRooijen reviewed Jun 20, 2023

View reviewed changes

content/06_Transformation.md Outdated Show resolved Hide resolved

HowardvanRooijen reviewed Jun 20, 2023

View reviewed changes

content/06_Transformation.md Outdated Show resolved Hide resolved

HowardvanRooijen reviewed Jun 20, 2023

View reviewed changes

content/06_Transformation.md Outdated Show resolved Hide resolved

Aggregation WIP

6c9ce41

HowardvanRooijen reviewed Jun 20, 2023

View reviewed changes

idg10 added 7 commits June 21, 2023 16:27

Feature-complete draft of Aggregation chapter

ecc076b

Structurally complete Partitioning chapter

884fb83

Reworked Concat in Combining Sequences

ebf6761

Basic structure of Combining Sequences

924cd6e

Re-order part 3 content

897f1cc

Partial update of scheduling chapter

25224d5

More scheduler updates

7b96789

HowardvanRooijen reviewed Jul 25, 2023

View reviewed changes

content/11_SchedulingAndThreading.md Outdated Show resolved Hide resolved

idg10 added 2 commits August 2, 2023 14:31

Feature-complete Leaving Observable chapter

09098c9

Feature complete Timing chapter

ac46387

idg10 added 5 commits December 6, 2023 16:05

Add thanks to .NET Foundation and Richard Lander

19a14ca

Remove spurious '.' in bullet list

24d92f6

Ch03 responding to mwa's feedback

2bcaf83

Ch03 fix another typo

442dfd5

Change "ownership" to "stewards"

5abacf6

HowardvanRooijen reviewed Dec 10, 2023

View reviewed changes

content/A_IoStreams.md Outdated Show resolved Hide resolved

HowardvanRooijen reviewed Dec 10, 2023

View reviewed changes

content/B_Disposables.md Outdated Show resolved Hide resolved

content/B_Disposables.md Show resolved Hide resolved

content/B_Disposables.md Outdated Show resolved Hide resolved

content/B_Disposables.md Outdated Show resolved Hide resolved

HowardvanRooijen reviewed Dec 10, 2023

View reviewed changes

content/C_UsageGuidelines.md Show resolved Hide resolved

content/C_UsageGuidelines.md Outdated Show resolved Hide resolved

HowardvanRooijen reviewed Dec 10, 2023

View reviewed changes

idg10 added 8 commits December 13, 2023 16:24

Add sequence diagram to Ch03

e852897

Ch03 Add short section on long-term state in operators

8eb026a

Appendix A - fix awkward wording

2a4d81b

Use markdown in Appendix B.

907b9f0

Inexplicably, this chapter was written mostly in HTML in a `.md` file, which confused some parts of our tooling. I've recast it all as markdown. I've also written a short note to say you can use most of these disposable helpers in non-Rx code too.

Replace a tab with spaces

7e30713

Appendix D: fix some typos

bde5476

Ch12 - fix markdown snafu

9128d94

Ch09 add some missing diagram descriptions.

65223bf

HowardvanRooijen reviewed Dec 14, 2023

View reviewed changes

content/B_Disposables.md Outdated Show resolved Hide resolved

idg10 added 12 commits December 14, 2023 09:17

Ch12 remove backticks in links because the toolchain is apparently co…

eaca880

…nfused

Ch12 trying Felix's suggestion for links to see if it renders OK on G…

471da5f

…itHub

Ch12 put links back how they were since neither GitHub nor VS Code li…

9c58a3c

…ked Felix's suggestion

AppB: Remove spurious use of HTML escaping

4f4ec2c

Ch09 add remaining diagram descriptions and fix more typos

d30a624

Ch11 updates after review feedback

8a63658

Ch13 updates after feedback

7815053

Ch15 updates after feedback

aba1724

Ch16 fix typo

ca9a0ac

Ch02 Split example across multiple lines to help ebook rendering

62dab17

Try to clarify use of @this again

0fc7b20

Ch15 remove word degenerate because apparently it's not as widely und…

37f206a

…erstood as I thought

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2nd Edition #2

2nd Edition #2

idg10 commented Jun 8, 2023

HowardvanRooijen Jun 20, 2023

idg10 Dec 5, 2023

HowardvanRooijen Dec 6, 2023

idg10 Dec 7, 2023

HowardvanRooijen Dec 9, 2023

HowardvanRooijen left a comment

HowardvanRooijen left a comment


		Data is not always valuable is its raw form. Sometimes we need to consolidate, collate, combine or condense the mountains of data we receive into more consumable bite sized chunks. Consider fast moving data from domains like instrumentation, finance, signal processing and operational intelligence. This kind of data can change at a rate of over ten values per second. Can a person actually consume this? Perhaps for human consumption, aggregate values like averages, minimums and maximums can be of more use.
		Data is not always tractable is its raw form. Sometimes we need to consolidate, collate, combine or condense the mountains of data we receive. This might just be a case of reducing the volume of data to a manageable level. For example, consider fast moving data from domains like instrumentation, finance, signal processing and operational intelligence. This kind of data can change at a rate of over ten values per second for individual sources, and much higher rates if we're observing multiple sources. Can a person actually consume this? For human consumption, aggregate values like averages, minimums and maximums can be of more use.

2nd Edition #2

Are you sure you want to change the base?

2nd Edition #2

Conversation

idg10 commented Jun 8, 2023

HowardvanRooijen Jun 20, 2023

Choose a reason for hiding this comment

idg10 Dec 5, 2023

Choose a reason for hiding this comment

HowardvanRooijen Dec 6, 2023

Choose a reason for hiding this comment

idg10 Dec 7, 2023

Choose a reason for hiding this comment

HowardvanRooijen Dec 9, 2023

Choose a reason for hiding this comment

HowardvanRooijen left a comment

Choose a reason for hiding this comment

HowardvanRooijen left a comment

Choose a reason for hiding this comment