Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of scenario interface. #70

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 116 additions & 0 deletions proto/scenario.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
syntax = "proto3";
package proto;

message Scenario {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we document each field? (there are no comments)

I'd suggest erring on the side of each field having a comment, even if some seem more clear. It will help follow what is what (e.g. 'attack', 'sustain', 'decay')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely.

Load load = 1;
Cluster cluster = 2;
Workload workload = 3;
repeated Plugin plugin = 4;
}

message Load {
oneof arrival_rate_oneof {
LoadConstant constant = 1;
LoadPeriodic periodic = 3;
LoadCustom custom = 3;
}
oneof request_weight_oneof {
RequestConstant constant = 4;
}
}

message LoadConstant {
int32 arrivals_per_second = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between 'arrival' and 'load'? Would it make sense to stick to one term, e.g. load_rate / load_per_second ?

Alternatively, I'd go with query_per_second nomenclature, as QPS is a better known term (I think)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So an "arrival" is a "request". But I wanted to avoid http oriented language to allow for pub-sub use cases. "load" is the pattern of arrival / request.

How about sticking to the terms "load" and "operation"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen QPS being used in pubsub context, so I wouldn't worry about that. That said, I don't feel strongly, if you prefer 'arrivals', that sounds fine to me.

}

message LoadPeriodic {
oneof shape_oneof {
ShapeSine sine = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be sure how to interpret semantics of Sine and Asdr, e.g. how exactly should I interpret width_seconds / amplitude for Sin, will they repeat, for how long etc. I hope comments will clarify these things.

P.S. Might be better to define Sine fields directly for clarity, rather than trying to reuse LoadInterval structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems awkward. I'll change sine to a frequency / amplitude struct. Or I could drop it entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sine is still a useful approximation for load patterns with seasonality (which is basically all of them). It has the nice property that you can more easily look for resonant frequencies at which an autoscaler misbehaves.

ShapeAsdr asdr = 2;
}
}

message ShapeSine {
LoadInterval interval = 1;
}

message ShapeAsdr {
jchesterpivotal marked this conversation as resolved.
Show resolved Hide resolved
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ASDR case so common and helpful? Couldn't I express the same by putting a bunch of LoadInterval into LoadCustom? (seems simpler and more generic to me, and I wouldn't be constrained to ASDR model, and I couldn't abuse ASDR model by defining, for example, decay higher than attack)

P.S. Is it ASDR or ADSR? Wikipedia calls it ADSR per Jacques's link: https://en.wikipedia.org/wiki/Envelope_(music).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was shooting for a middle ground between a constant value and fully custom values. ADSR is periodic, so you define the wave shape and it repeats. It's nice for describing something simple like "I have a spike every hour on the hour."

With custom, you have a lay out as many points as the simulation is long. This is the escape hatch for all load patterns we can't express.

ADSR subsumes step and ramp patterns currently in Skenario.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If it's periodic, I can see how it can be more convenient than custom.

LoadInterval initial = 1;
LoadInterval attack = 2;
LoadInterval sustain = 3;
LoadInterval decay = 4;
LoadInterval release = 5;
}

message LoadCustom {
repeated LoadInterval interval = 1;
}

message LoadInterval {
int32 width_seconds = 1;
int32 amplitude_aps = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APS? Is this a typo for QPS (the google shorthand as I understand it)?

If so, should it be in per-second units? I can see that it's easier to approach from how folks usually express load, but it also means that we'll be internally doing a int32 / int32 to derive the correct value and will need to be careful of overflows into (I presume) float64.

I feel like it would be easier to give a total number of load units (requests, queries). That does put it back on the user to perform a calculation but it means that they don't have to perform any calculations to recover the total number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was calling it "arrivals per second" because I want to remain open to pub-sub use-cases. But "operations per second" would also work. Would that be more intuitive?

(QPS - "queries per second" is probably not the best choice)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think you're right. Arrivals/sec is a better unit. Perhaps rename as amplitude_arrivals_per_second, arrivals_per_sec or similar.

}

message RequestConstant {
int32 cpu_milliseconds = 1;
int32 io_milliseconds = 2;
int32 memory_bytes = 3;
}

message Cluster {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also skip for now, per https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it We can always add later once we know what exactly we need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touche. I'll drop it like a hot potato.

// infinite capacity
}

message Workload {
int32 readiness_delay_seconds = 1;
int32 termination_delay_seconds = 2;
}

message Plugin {
string type = 1;
int32 version = 2;
PluginPoint point = 3;
Any configuration = 4;
}

enum PluginPoint {
HORIZONTAL_AUTOSCALING = 0;
VERTICAL_AUTOSCALING = 1;
REQUEST_ROUTING = 2;
}

message Result {
repeated ResultMetric metric = 1;
}

message Metric {
MetricType type = 1;
MetricResource resource = 2
repeated MetricPoint point = 3;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by combination of statistics (average, P50, P90) and time series ("repeated MetricPoint"). For example, what would it mean to result 10 MetricPoint with type P99?

Also, how does the system know which statistics are requested?

The way I'd see it is that you should only have a timeseries in the result, maybe with requested resolution. The caller can then do whatever they want with these timeseries, e.g. draw, compute custom percentiles etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MetricPoints vary the time and value. For example P50 latency might be 1 second at T0, 2 seconds at T1, etc... Metric is a typed metric stream covering the duration of the simulation.

}

enum MetricType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we call this MetricStatisticType or MetricSummaryType.

COUNT = 0;
AVERAGE = 1;
jchesterpivotal marked this conversation as resolved.
Show resolved Hide resolved
P50 = 2;
P90 = 3;
P99 = 4;
P100 = 5;
}

enum MetricResource {
CPU_USAGE_MILLIS = 0;
CPU_CAPACITY_MILLIS = 1;
MEMORY_USAGE_BYTES = 2;
MEMORY_CAPACITY_BYTES = 3;
WORK_ARRIVAL = 3;
WORK_SUCCESS = 4;
WORK_FAILURE = 5;
LATENCY_MILLIS = 6;
POD_COUNT = 7;
}

message MetricPoint {
int32 time_millis = 1;
float64 value = 2;
}