Elevated GPU device failures in us-east

Incident Report for Baseten

Resolved

This incident has been resolved.
Posted Apr 03, 2025 - 17:36 PDT

Monitoring

We have noticed limited elevated GPU device failures on L4 nodes in one of our US-east clusters. We have identified the issue and have a fix in place, we are monitoring for any impact to models.
Posted Apr 03, 2025 - 17:25 PDT
This incident affected: Model Inference.