You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The General Behavior CI (see #1634) needs a bit of thinking to scale: It compares on the whole Merlin output and the Merlin output of some samples is very very big. As a lower-hanging fruit, we can first implement a concrete Behavior Regression CI.
Purpose
Some concrete changes in behavior can almost certainly be considered a regression when happening. Two examples:
Merlin now errors where before it would succeed (error-regression CI).
The Merlin server now crashes where before it wouldn't (crash-regression CI).
So additionally to the General Behavior CI #1634, we can also add concrete Regression CI workflows monitoring these concrete changes. Adding this level of output simplicity to the CI will make it easier to scale in terms of size and help avoid missing concrete regressions in behavior.
Implementation
Let's focus on the error-regression CI for now.
merl-an already has a command merl-an error-regression for the error-regression CI. What's missing is the integration into a CI workflow.
Data to take into account
For the first and most direct PoC, @3Rafal has already gathered data:
Running 7 different queries, each on 1 sample per file on the whole of Irmin (700-800 files):
In CI:
38m 12s
Running 7 different queries, each on 30 samples per file on the whole of Irmin (700-800 files):
Locally on a Laptop with Intel i7 11 gen:
203m8.524s
This is the data for the first and most direct PoC. We can optimize the CI in terms of time on different levels (see below).
Next action points
@3Rafal has already written a PoC for the error-regression CI. There are several things we'd need to improve to make the CI more useful. The next action points are
Discuss and decide if these CIs are really useful and worth the effort.
For the error-regression CI: Find out if there have been any changes in the past that made Merlin return an error where before it would return suffesscully. If so, which ones? And how likely is it that this will happen again in the future?
Similarly for the crash-regression CI: Have there been any changes in the past which made Merlin crash where before it wouldn't? If so, which ones?
If we think it might be worth it, optimize the CI in terms of time:
On the CI-side: e.g. cache the set-up.
On the merl-an-side: Make sure we use the Merlin cache as much as possible (e.g. if the traversal is done in query -> file order, do it in file -> query order instead).
On the sample-side:
For Merlin cache, better to have lots of samples in few files than having few samples in a lot of files.
Do we want to run it on all mli-files as well?
Have a look at how long the optimized CI takes and decide a good sample set and workflow for it.
The text was updated successfully, but these errors were encountered:
The General Behavior CI (see #1634) needs a bit of thinking to scale: It compares on the whole Merlin output and the Merlin output of some samples is very very big. As a lower-hanging fruit, we can first implement a concrete Behavior Regression CI.
Purpose
Some concrete changes in behavior can almost certainly be considered a regression when happening. Two examples:
error-regression CI
).crash-regression CI
).So additionally to the General Behavior CI #1634, we can also add concrete Regression CI workflows monitoring these concrete changes. Adding this level of output simplicity to the CI will make it easier to scale in terms of size and help avoid missing concrete regressions in behavior.
Implementation
Let's focus on the
error-regression CI
for now.merl-an
already has a commandmerl-an error-regression
for theerror-regression CI
. What's missing is the integration into a CI workflow.Data to take into account
For the first and most direct PoC, @3Rafal has already gathered data:
Running 7 different queries, each on 1 sample per file on the whole of Irmin (700-800 files):
In CI:
Running 7 different queries, each on 30 samples per file on the whole of Irmin (700-800 files):
Locally on a Laptop with Intel i7 11 gen:
This is the data for the first and most direct PoC. We can optimize the CI in terms of time on different levels (see below).
Next action points
@3Rafal has already written a PoC for the
error-regression CI
. There are several things we'd need to improve to make the CI more useful. The next action points areerror-regression CI
: Find out if there have been any changes in the past that made Merlin return an error where before it would return suffesscully. If so, which ones? And how likely is it that this will happen again in the future?crash-regression CI
: Have there been any changes in the past which made Merlin crash where before it wouldn't? If so, which ones?merl-an
-side: Make sure we use the Merlin cache as much as possible (e.g. if the traversal is done in query -> file order, do it in file -> query order instead).mli
-files as well?The text was updated successfully, but these errors were encountered: