RESOLVED: Issues with GKE 1.20 (lower than 1.20.9-gke.2100) node pools using Docker as runtime.

Incident began at 2021-07-27 00:00 and ended at 2021-09-23 18:53 (all times are US/Pacific).

We apologize for the inconvenience this service disruption may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support

(All Times US/Pacific)

Incident Start: 27 July 2021, GKE clusters began being upgraded from 1.19 to 1.20 in the REGULAR release channel.
Incident End: 23 September 2021, incident is mitigated by pausing automatic upgrades.

Duration: 59 days

Affected Services and Features:

Google Kubernetes Engine (GKE) – Pods on nodes with affected versions will restart when docker restarts. Clusters on the REGULAR release channel were automatically upgraded into versions affected by this issue.

Regions/Zones: Global

Description:

Containers within GKE cluster node pools using docker are getting restarted in the event of docker restarts. This issue affects the following node versions:

  • All 1.20 versions below 1.20.9-gke.2100
  • All 1.21 versions below 1.21.3-gke.1600

The engineering team has halted the rollout from 1.19 to 1.20 in release channels to prevent any new impact to our customers.

Customer Impact:

GKE cluster pods restart when docker restarts. This issue affects the following node versions:

  • All 1.20 versions below 1.20.9-gke.2100
  • All 1.21 versions below 1.21.3-gke.1600

Additional details:

Customer action required: To fix this issue, either use containerd or upgrade nodes to version:

  • 1.20: 1.20.9-gke.2100 or higher
  • 1.21: 1.21.3-gke.1600 or higher

The recommended action for clusters on release channels are:

  • STATIC – Upgrade to 1.20.10-gke.301 or higher
  • RAPID – N/A – All available versions have the fix
  • REGULAR – Upgrade to 1.20.10-gke.301
  • STABLE – Downgrade affected nodepools to a 1.19 version

Our engineering team is currently releasing a fixed version for 1.20 for the STABLE release channel. This release is currently scheduled to come out by 8 October, 2021.


Affected products: Google Kubernetes Engine

View Incident Report

Leave a Reply

Your email address will not be published. Required fields are marked *