UPDATE: Cloud AI Platform and Vertex AI Training elevated error rates for GPU jobs in us-central1, us-east1, and europe-west3

Incident began at 2023-03-03 21:56 (all times are US/Pacific).

Summary: Cloud AI Platform and Vertex AI Training elevated error rates for GPU jobs in us-central1, us-east1, and europe-west3

Description: Mitigation work is currently underway by our engineering team.

At this time, we believe the issue has been resolved for the us-central1 region and are working to confirm.

We do not have an ETA for mitigation in us-east1 and europe-west3 at this point.

We will provide more information by Friday, 2023-03-03 23:30 US/Pacific.

Diagnosis: Cloud AI Platform and Vertex AI Training GPU jobs may experience elevated failure rates in us-central1, us-east1, and europe-west3.

Workaround: None at this time.


Affected products: Vertex AI Training, Cloud Machine Learning

Affected locations: Frankfurt (europe-west3), Iowa (us-central1), South Carolina (us-east1)

View Incident Report

Google Cloud Outages