Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-1868] Introduce FlowCompilationValidationHelper & SharedFlowMetricsSingleton for sharing between Orchestrator & DagManager #3731

Merged
merged 6 commits into from
Aug 7, 2023

Conversation

umustafi
Copy link
Contributor

@umustafi umustafi commented Aug 4, 2023

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
    Create a Helper class FlowCompilationValidationHelper to contain functionality re-used between the DagManager and Orchestrator when launching executions of a flow spec. In the common case, the Orchestrator receives a flow to orchestrate, performs necessary validations, and forwards the execution responsibility to the DagManager. The DagManager's responsibility is to carry out any flow action requests. However, with launch executions now being stored in the DagActionStateStore, on restart or leadership change the DagManager has to perform validations before executing any launch actions the previous leader was unable to complete.
    We tried to have the DagManager refer to the Orchestrator to avoid duplicating code in a previous commit, but introduced a circular dependency between the DagManager and Orchestrator and was unable to work in runtime. This class is utilized to store the common functionality. It is stateful, requiring all stateful pieces to be passed as input from the caller upon instantiating the helper.
    Note: We expect further refactoring to be done to the DagManager in later stage of multi-active development so we do not attempt major reorganization as abstractions may change.

Also creates a class, SharedFlowMetricsSingleton to store shared metrics between the DagManager and Orchestrator.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Deploys locally and service starts

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit on PR title, you're not refactoring common utils, but rather introducing them... maybe "introduce XYZ for sharing between Orch and DM" or "extract XYZ from Orchestrator for sharing w/ DM", etc...

* Note: We expect further refactoring to be done to the DagManager in later stage of multi-active development so we do
* not attempt major reorganization as abstractions may change.
*/
public final class ExecutionChecksUtil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FlowCompilationValidationHelper?

also, I completely agree w/ the statelessness (and finality) of this abstraction, but just for comprehensibility I'd suggest, creating a ctor merely to separate "the args that stay the same" from "the changing ones", to put the latter front and center.

e.g. so if not called like this:

this.fcHelper = new FlowCompilationValidationHelper(
    sharedFlowMetricsContainer, specCompiler, quotaManager,
    optEventSubmitter, flowStatusGenerator, log, isFlowConcurrencyEnabled);

... // way later on
Optional<Dag<JobExecutionPlan>> optDag = fcHelper.createExecutionPlanIfValid(flowSpec);

then at least like this:

Optional<Dag<JobExecutionPlan>> optDag = new FlowCompilationValidationHelper(
    sharedFlowMetricsContainer, specCompiler, quotaManager,
    optEventSubmitter, flowStatusGenerator, log, isFlowConcurrencyEnabled)
    .createExecutionPlanIfValid(flowSpec);

p.s. populating the compilation failure message could remain static

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, the parameter list to these functions was getting quite long. I went with the first method above.

@umustafi umustafi changed the title [GOBBLIN-1868] Refactor Common Utils between Orchestrator & DagManager [GOBBLIN-1868] Introduce FlowExecutionUtil & SharedFlowMetricsContainer for sharing between Orchestrator & DagManager Aug 4, 2023
Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good work--nice pragmatic solution... and very close now too!

good PR title as well... although the class names there have drifted from those in the latest commit

Comment on lines 56 to 61
private SharedFlowMetricsSingleton sharedFlowMetricsSingleton;
private SpecCompiler specCompiler;
private UserQuotaManager quotaManager;
private Optional<EventSubmitter> eventSubmitter;
private FlowStatusGenerator flowStatusGenerator;
private boolean isFlowConcurrencyEnabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all of these really mutable, or could any be final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing all to be final and using @DaTa constructor instead.

Comment on lines +93 to +96
addFlowExecutionIdIfAbsent(flowMetadata, jobExecutionPlanDagOptional.get());
if (flowCompilationTimer.isPresent()) {
flowCompilationTimer.get().stop(flowMetadata);
}
Copy link
Contributor

@phet phet Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this timing still meaningful, given that validateAndHandleConcurrentExecution was called before even starting it? also, is it OK that it's never stopped in the error case?

Copy link
Contributor Author

@umustafi umustafi Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the start/stop of this to surround the call to validateAndHandleConcurrentExecution where the compilation is done. In the Orchestrator error case we never call stop so the event is never submitted unless the compilation is successful, but instead mark this flowOrchestrationFailedMeter so I added those calls to mark that meter when the optional is empty (or other failure cases) - original PR here.

Comment on lines 80 to 97
if (!jobExecutionPlanDagOptional.isPresent()) {
return Optional.absent();
}

Optional<TimingEvent> flowCompilationTimer =
eventSubmitter.transform(submitter -> new TimingEvent(submitter, TimingEvent.FlowTimings.FLOW_COMPILED));
Map<String, String> flowMetadata = TimingEventUtils.getFlowMetadata(flowSpec);

if (jobExecutionPlanDagOptional.get() == null || jobExecutionPlanDagOptional.get().isEmpty()) {
populateFlowCompilationFailedEventMessage(eventSubmitter, flowSpec, flowMetadata);
return Optional.absent();
}

addFlowExecutionIdIfAbsent(flowMetadata, jobExecutionPlanDagOptional.get());
if (flowCompilationTimer.isPresent()) {
flowCompilationTimer.get().stop(flowMetadata);
}
return jobExecutionPlanDagOptional;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

granted, the timer makes it a bit tricky (I have thoughts on reworking if you want to discuss.), but this begins to look to me like:

import org.apache.commons.lang3.tuple.Pair;
...
return jobDagOptional
    .map(dag -> Pair.of(dag, TimingEventUtils.getFlowMetadata(flowSpec)))
    .filter(p -> {
       Dag dag = p.getLeft()
        if (dag != null && !dag.isEmpty()) {
           return true;
        } else {
           populate(...)
           return false;
       }
    }).map(p -> addFlowExecutionIdIfAbsent(p.getRight(), p.getLeft()));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find this also a bit unintuitive to read especially since we're working with one dag at a time the if/else blocks with the timer are easier to handle in the current state.

@codecov-commenter
Copy link

codecov-commenter commented Aug 5, 2023

Codecov Report

Merging #3731 (41de9f6) into master (2087426) will increase coverage by 0.00%.
Report is 1 commits behind head on master.
The diff coverage is 50.89%.

@@            Coverage Diff            @@
##             master    #3731   +/-   ##
=========================================
  Coverage     47.08%   47.09%           
- Complexity    10859    10860    +1     
=========================================
  Files          2144     2146    +2     
  Lines         84748    84774   +26     
  Branches       9410     9409    -1     
=========================================
+ Hits          39907    39926   +19     
- Misses        41221    41227    +6     
- Partials       3620     3621    +1     
Files Changed Coverage Δ
.../modules/scheduler/GobblinServiceJobScheduler.java 70.41% <ø> (+0.82%) ⬆️
...modules/utils/FlowCompilationValidationHelper.java 15.38% <15.38%> (ø)
...in/service/modules/orchestration/Orchestrator.java 50.82% <70.58%> (+3.46%) ⬆️
...blin/service/modules/orchestration/DagManager.java 80.00% <75.00%> (+0.12%) ⬆️
...vice/modules/utils/SharedFlowMetricsSingleton.java 90.00% <90.00%> (ø)
...ervice/modules/core/GobblinServiceGuiceModule.java 82.81% <100.00%> (+0.13%) ⬆️

... and 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@umustafi umustafi changed the title [GOBBLIN-1868] Introduce FlowExecutionUtil & SharedFlowMetricsContainer for sharing between Orchestrator & DagManager [GOBBLIN-1868] Introduce FlowCompilationValidationHelper & SharedFlowMetricsSingleton for sharing between Orchestrator & DagManager Aug 7, 2023
@Will-Lo Will-Lo merged commit 01036cc into apache:master Aug 7, 2023
6 checks passed
phet pushed a commit to phet/gobblin that referenced this pull request Aug 15, 2023
…MetricsSingleton for sharing between Orchestrator & DagManager (apache#3731)

* refactoring skeleton, builds

* deploy works

* add javadocs

* fix failing test

* respond to comments

* Clean up in response to next review

---------

Co-authored-by: Urmi Mustafi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants