Training cluster is down due to a provider failure

Incident Report for Baseten

Resolved

The issue has been resolved.
Posted May 09, 2026 - 20:24 PDT

Monitoring

Monitoring the fix. Most nodes are back online, but not all.
Posted May 09, 2026 - 20:18 PDT

Identified

Root-caused and fix is underway. Inference is not affected.
Posted May 09, 2026 - 17:30 PDT
This incident affected: Training.