-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO
88 lines (71 loc) · 3.26 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
2021-01-31 TODO:
* integration test w/ rabbitmq (docker-compose)
- test producer/consumer both autocreate queue
- handle rabbitmq disconnects/restarts
- need to test this manually, how to automate?
* producer has optional timeout
2021-01-18 TODO:
- how to handle gRPC ?
- unit tests and end2end tests
2021-01-15 TODO:
- More clear task breakdown of dependencies to run prototype in staging
- infra needs/dependencies
- split the workstreams: cluster autoscaling, deployment autoscaling, separate task-lists
- how to test in staging w/ real maestro codebase
- clarify roadmap for gRPC
- lets start w/ shared doc
2021-01-12 TODO:
- R. will bootstrap k8s for staging2, v1.16 or v1.17 because keda2.0 needs that.
we'll sync again on friday on how the cluster is
- x Miguel will work on multi-web deployments w/ different Hostnames (like maestro deployments)
x change namespace for prototype code not collide w/ prod code
x multi deployments: pt-en, en-pt (keda), en-es (knative)
- x Miguel will try again knative, just one more time
- x Knative, single slide of downsides and problems found
/k8s/knative.md (and /k8s/keda.md)
2021-01-10 TODO:
x simplified sidecar working
x [OK] test simplified sidecar latency profile
- multiple web deployments w/ different "Host"
x scale from 1 to 100
[was a ch.Qos() not being set, we need to set prefetch]
- allow for concurrency on receiver side.
- and tune the prefetch accordingly
- test for very fast web endpoints
go full on k8s:
x rabbitmq
x put "web" on k8s, it has 2 endpoints already
x put "sidecar" on k8s, next to "web"
x put "sender" on k8s
x add sender + http support
x put "web" w/ concurrency=1 (using channel and single go routine)
x test vanila sender + web w/ http, get latencies (at 2-3rps)
x put sender w/ callback support and block for callback to track latencies
x test sender + sidecar + web, get latencies (at 2-3rps)
-- CHECKPOINT: baseline numbers, concurrency=1
x DONE, perf is indistinguishable without autoscaling and such heavy requests
- install KEDA and configure on k8s
- configure KEDA for "web", scale to zero!
- add delays to web: (startup 10secs)
- validate basic behaviour.
-- CHECKPOINT: basic behaviour
- time to scale to zero?
as per cooldown parameter, we see some spurious, non-intended scaledowns too.
needs further investigation.
- time to scale to 1?
pooling loop, which can be 5 secs, which is great!
- how long it took to get it working?
very fast, single evening installing and creating yaml definition for rabbitMQ
- for later: what about HPA on CPU ?
there was short attempt at HPA based on queue messages, not successfull, needs investigation
- for later: sidecar for gRPC?
nothing was done
NOTES FROM J.:
- business value clarified
- execution plan for rollout
NOTES from B.:
- will work on horizontal cluster scaling, new EKS staging deployment (this will be simpler and better than doing this on the current cluster)
- better to keep it simple and bet on KEDA, enough things to work.
NOTE ON LATENCY TESTS:
what we want to have is some crude baseline of the performance profile without keda/knative w/ http and w/ sidecar
we want to limit concurrency=1 because that is the ideal case for maestro, so we want to test how these systems behave in that situation