when runSimulation encounters an error, it's not displayed #7406

robnagler · 2024-12-22T15:24:39Z

15:22:50 wsreq#20 send: /run-simulation {forceRun: true, report: 'multiElectronAnimation', models: {…}, simulationType: 'srw', simulationId: 'M4LvkaPE'}
15:22:50 wsreq#20 reply: {state: 'error', error: 'another browser is running the simulation'}

@moellep it would be awesome if you could help figure this one out. This was SRW on NERSC and the browser was displaying "simulation canceled".

robnagler · 2024-12-28T18:05:12Z

From @e-carlin #7404 (review):

If I'm running a sim (doesn't need to be under sbatch) and I kill -9 it from the terminal I the GUI reports it as canceled. Seems like it should be error
If I kill -9 an agent (again doesn't need to be under sbatch) then the gui continues to report "running: awaiting output". Even after refresh.

robnagler · 2024-12-28T19:58:49Z

@e-carlin WIth PR #7404, this is what I see when I kill the agent on NERSC with the UI prompting to relogin:

After relogging in, this is what is shown:

After the refresh, it picks up the current sbatch state. The UI needs to be able to "reconnect" to the simulation, which it can't. It's basically got to call runStatus, which it won't do once it gets an error. The UI needs quite a bit of help in error handling.

robnagler · 2024-12-28T20:09:10Z

Just tested local agent with kill -9, and UI shows Simulation Canceled, which is technically correct. There's actually nothing known about what happened. The agent simply went away as far as the supervisor concerned. parameters.py is still running, which is problematic in the local case, but for docker it would be disappear with the container.

We have larger UI problems (ordinary errors getting suppressed). This isn't parameters.py running out of memory. That's caught and reported. This is the agent stopping with abruptly (or the network partitioning, which can only happen in the docker case and even then is highly unusual). I don't know that we can chase this case.

robnagler assigned moellep Dec 22, 2024

robnagler mentioned this issue Dec 28, 2024

Fix #6914 job_agent recover existing sbatch jobs #7404

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when runSimulation encounters an error, it's not displayed #7406

when runSimulation encounters an error, it's not displayed #7406

robnagler commented Dec 22, 2024

robnagler commented Dec 28, 2024

robnagler commented Dec 28, 2024 •

edited

Loading

robnagler commented Dec 28, 2024

when runSimulation encounters an error, it's not displayed #7406

when runSimulation encounters an error, it's not displayed #7406

Comments

robnagler commented Dec 22, 2024

robnagler commented Dec 28, 2024

robnagler commented Dec 28, 2024 • edited Loading

robnagler commented Dec 28, 2024

robnagler commented Dec 28, 2024 •

edited

Loading