Run Status
View the status of your experiments
The run can be in a few states:
Running
: This is when the run is in progressCompleted
: The run has successfully finished and gracefully shutdownTerminated
: This is when the run has been terminated via the user, e.g in events like double pressingCtrl+C
Failed
: Here the run has failed unexpectedly, this could be of many reasons, e.g. power lossCanceled
: The run was stopped by the user before completion
Early Termination
A lot of institutions where our users work at use HPC (High Performance Computing) clusters to run their experiments.
If you are running the experiment on a remote machine, when you notice an anomaly in any of the run metrics, you can trigger an immediate termination of the run by using the Cancel Run
button when available.
This stops the run on a best effort basis to avoid future compute charges and the run will be marked as Canceled
.
Run Status Alerts
mlop also periodically checks for any anomalies in the run to prevent unnecessary compute charges.
If we detect a period of inactivity in the run (due to a power loss for example), mlop will proactively send you an email alert on a best effort basis before marking the run as Failed
.