How is run status handled? #3176

gpascale · 2024-06-25T15:05:39Z

❓Question

It's extremely unclear to me how run status (active, finished, failed etc...) is determined - specifically whether a run is active. In my code, I'm calling report_successful_finish when my model has finished training and testing and I've uploaded the figures I want to, but I can't tell if this actually impacts the state? Most of my runs automatically transition to the finished state, but not always. Does this happen automatically when the process exits? When the run object is destroyed?

My dashboard is littered with week-old runs that still show as in progress. In some cases, maybe the processes crashed? I can't tell. I've tried using the CLI to "close" them with little success - usually it reports no errors but the run still shows as in progress.

I've searched extensively through the documentation but I hardly see anything about this.

mihran113 · 2024-07-10T23:54:49Z

Hey @gpascale! Sorry for delayed response and thanks for the question. We try to automatically transition the run to finished state when the process exits (even if exceptions are thrown). But there are cases that the process hangs or is killed, in those cases we can't do much.

However we also have a background task in aim up command as a backup plan that checks for runs that stayed in the active state and no other process is holding locks for that run (this is the case when the process is killed). So the only un-handled case should be when the process is hang. If you can provide some more details on how specifically this cases happen, maybe I can provide some more help or try to reproduce it on my end to see what's going wrong.

gpascale added the type / question Issue type: question label Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is run status handled? #3176

How is run status handled? #3176

gpascale commented Jun 25, 2024 •

edited

Loading

mihran113 commented Jul 10, 2024

How is run status handled? #3176

How is run status handled? #3176

Comments

gpascale commented Jun 25, 2024 • edited Loading

❓Question

mihran113 commented Jul 10, 2024

gpascale commented Jun 25, 2024 •

edited

Loading