A few A100 nodes are failing. Recovery is in progress.

Incident Report for Baseten

Resolved

The failover is complete.
Posted Apr 24, 2025 - 12:04 PDT

Identified

This is due to a single cloud provider's unplanned outage, which is impacting our A100 fleet. Recovery onto other clouds is in progress.
Posted Apr 24, 2025 - 11:31 PDT
This incident affected: Model Inference.