-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repro: ignore --downstream if used together with --pipeline #9692
Conversation
pipelines = get_pipelines(active) | ||
used_pipelines = [get_pipeline(pipelines, stage) for stage in stages] | ||
# create a disjointed union of all the pipelines | ||
active = nx.compose_all(used_pipelines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am building graph for all cases, except --single-item
. This might be useful if we implement #8647 and parallel repro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was it doing before? I don't quite follow how the code changes relate to
--downstream
.
@dberenbaum, if you have a dag like following:
graph TD
A-->B
B-->C
When you do dvc repro --pipeline B
, it'd run in order [A, B, C]
as expected. But on --downstream
, it would just run C
, the last node.
DVC was searching for the leaf node of each pipeline, which in this case is C
. By default, it finds all upstream nodes of that leaf node and runs them, which is a pipeline.
As a side effect of this logic, on the --downstream
, it'd try to run downstream of C
node.
Which means it only ran leaf nodes.
pipelines = get_pipelines(active)
pipelines = [get_pipeline(pipelines, stage) for stage in stages]
for pline in pipelines:
leaves.extend(node for node in pline if pline.in_degree(node) == 0)
This was the previous code, that 1) finds all the pipelines, 2) filter pipelines that only have those nodes/stages, and 3) tries to find all the leaf node in that pipeline. (in_degree == 0
check means that it has no arc pointing towards it, but our repo.graph
is reversed, so take it as having no arc pointing out.)
Regarding the code changes, we don't need to compute leaf node, then find upstream nodes at all. We can walk over the pipeline directly, so we filter like before:
used_pipelines = [get_pipeline(pipelines, stage) for stage in stages]
The other changes here is that there may be other disconnected pipelines, which are merged together, so that we can work with a single graph.
active = nx.compose_all(used_pipelines)
Then, we just walk like before, but we don't pass downstream
at all.
return get_steps(active)
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #9692 +/- ##
==========================================
+ Coverage 90.54% 90.63% +0.09%
==========================================
Files 480 480
Lines 36385 36725 +340
Branches 5227 5345 +118
==========================================
+ Hits 32944 33286 +342
- Misses 2851 2853 +2
+ Partials 590 586 -4
☔ View full report in Codecov by Sentry. |
What was it doing before? I don't quite follow how the code changes relate to |
c1d959e
to
aec0110
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the docs, it seems --pipeline should reproduce from the start of the pipeline, and ignore --downstream altogether.
There is also a bug here, when both are combined, repro runs the leaf nodes (stages that no one depend on).
Can we add an explicit test for the change?
They are exclusive, and more of an edgecase. I don't think we need to test that. |
@skshetry any docs updates after all these changes to the pipelines execution? |
Not really, I only refactored the graph building logic, and most of the work is pre-requisite for #9705. |
Also fixed 3-4 bugs along the way. But that's it. :) |
yep, it's very usual thing these days, unfortunately (in all products). If it's 3-4 bugs, I would also think about adding tests though, as @daavoo mentioned. Since we had them it means we are not covering the behavior, right? |
Testing for this specific case is not necessary, as it's noop here (we can't test everything). Besides that, I have also added this as a living document for what should happen in the case with --all-pipeline/--pipeline. Lines 176 to 178 in 63de16c
|
On top of #9691.
Looking at the docs, it seems
--pipeline
should reproduce from the start of the pipeline, and ignore--downstream
altogether.There is also a bug here, when both are combined,
repro
runs the leaf nodes (stages that no one depend on).