It appears that the 3am (UTC) loader job "got stuck" and was still running as of this morning. We've manually killed it so normal operations can resume. The catchup job is currently running.
Our monitoring catches jobs that fail, including jobs that fail because of a timeout. We are continuing to investigate the exact circumstances that lead to a job remaining in a running state without timing out.