Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail a run due to step command failure #22

Open
samrose opened this issue Dec 16, 2021 · 4 comments
Open

Fail a run due to step command failure #22

samrose opened this issue Dec 16, 2021 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@samrose
Copy link
Owner

samrose commented Dec 16, 2021

A feature that we still need to account for is to respond to the stdout, stderr status returned by Rambo, so that if one pipeline depends on another, but the first has failed with status 1, then the run of pipelines should stop

example

Rambo.run("mix", "compile",log: &IO.inspect/1)
{:ok, %Rambo{err: "", out: "", status: 0}}
iex(5)> Rambo.run("mix", "compil0",log: &IO.inspect/1)
{:stderr, "** (Mix) The task \"compil0\" could not be found. Did you mean \"co"}
{:stderr, "mpile\"?\n"}
{:error,
 %Rambo{
   err: "** (Mix) The task \"compil0\" could not be found. Did you mean \"compile\"?\n",
   out: "",
   status: 1
 }}

Right now in a test ei.json if the 2nd commend is put into the ei.json as a step in a pipeline, and the command fails with status 1, then ex_integrate still proceeds with the rest of the run.

We should just react to all the possible OS or Command errors or feedback from Rambo as one handling (the running of steps) or we could set a different error if ex_integrate errors itself. But any error for the Rambo-originated errors just stops the build.

@garrettmichaelgeorge
Copy link
Collaborator

garrettmichaelgeorge commented Dec 21, 2021

@samrose yes agreed. Currently, a full run using an ei.json will not halt on Rambo failure – we need to implement this feature.

FWIW what is possible today is:

  1. StepRunner catches Rambo errors and returns an {:error, failed_step} tuple.

case Rambo.run(command_path, step.args, log: log) do
{:ok, results} -> {:ok, save_results(step, results)}
{:error, results} -> {:error, save_results(step, results)}
end

  1. PipelineRunner matches on the error response from StepRunner and stops the running of steps immediately upon step failure.

@impl GenServer
def handle_info({_ref, {:error, step}}, {state, config}) do
Logger.info("Step errored. #{inspect(step)}")
{:stop, :step_failure, state}
end
@impl GenServer
def handle_info({:DOWN, ref, _, _, reason}, {state, config}) do
Logger.info("Step task #{inspect(ref)} terminated unexpectedly. Reason: #{inspect(reason)}")
{:stop, :step_failure, state}
end

So I believe what is left is the OTP implementation to manage the full run, prevent a failed pipelines's dependent pipelines from running, and log the error or else respond to it at the highest level, e.g. for end user feedback.

@samrose
Copy link
Owner Author

samrose commented Dec 21, 2021

@garrettmichaelgeorge good assessment thank you where you wrote

else respond to it at the highest level, e.g. for end user feedback.

I recommend we put that last piece only on the back burner, and return to it, because the rest of it is a complete basic system from the OTP step runner perspective I believe

@garrettmichaelgeorge
Copy link
Collaborator

@samrose yes that makes sense – the end user part probably belongs in the interface.

So I guess we have to define the boundary – what, exactly, should ExIntegrate do at the highest level if a run is unsuccessful?

I can think of two things: signal, somehow, that the run was unsuccessful, and provide a way to obtain the results of the run, including the full logs of each command output. A hypothetical interface, e.g. ExIntegrateWeb, could then detect the result of the run and notify GitHub or some other service, or else display the results on its own web interface.

As for signaling the completion of the run, I imagine something like throwing a nonzero exit status (communicating outside of the BEAM), or responding with an error tuple (within the BEAM). Or would other applications (e.g. ExIntegrateWeb) subscribe to ExIntegrate and receive notifications upon completion?

As for sharing the run results, I suppose this connects to our discussion about persistence. I recall we have considered at different times both a graph DB (i.e. rocksdb) that could store the Elixir data structure, as well as object storage for the logs. Would we use both? Or one or the other? How would an interface application access the results?

@garrettmichaelgeorge
Copy link
Collaborator

As for sharing the run results, I suppose this connects to our discussion about persistence. I recall we have considered at different times both a graph DB (i.e. rocksdb) that could store the Elixir data structure, as well as object storage for the logs. Would we use both? Or one or the other? How would an interface application access the results?

Related to #30.

@garrettmichaelgeorge garrettmichaelgeorge changed the title Stopping run based on the error and stdout/stderr state of a step Fail a run based on the error and stdout/stderr state of a step Jan 29, 2022
@garrettmichaelgeorge garrettmichaelgeorge changed the title Fail a run based on the error and stdout/stderr state of a step Fail a run due to step command failure Jan 29, 2022
@garrettmichaelgeorge garrettmichaelgeorge moved this from Todo to In Progress in https://github.com/ex_integrate Jan 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

2 participants