HTTP/2 performance revisited

HTTP/2 performance revisited

By Timo Tijhof

Hello, HTTP/2!

In 2016, the Wikimedia Foundation deployed HTTP/2 (or “H2”) support to our CDN. At the time, we used Nginx- for TLS termination and two layers of Varnish for caching. We anticipated a possible speed-up as part of the transition, and also identified opportunities to leverage H2 in our architecture.

The HTTP/2 protocol was standardized through the IETF, with Google Chrome shipping support for the experimental SPDY protocol ahead of the standard. Brandon Black (SRE Traffic) led the deployment and had to make a choice between SPDY or H2. We launched with SPDY in 2015, as H2 support was still lacking in many browsers, and Nginx did not support having both. By May 2016, browser support had picked up and we switched to H2.

Goodbye domain sharding?

We leverage HTTP/2 through domain consolidation. The following improvements were achieved by effectively undoing domain sharding:

  • Faster delivery of static CSS/JS assets. We changed ResourceLoader to no longer use the dedicated cookieless “bits.wikimedia.org” domain, and folded our asset entrypoint into the main wiki cluster through faster requests local to each wiki (T95448, T107430).
  • Speed up mobile page loads, specifically mobile-device redirects. The canonical and mobile domains were consolidated behind the scenes, through DNS. This allowed the browser to reuse and carry the same connection over the cross-domain “m-dot” redirect (T124482).
  • Faster Geo service and faster localized fundraising banner rendering. The service was moved from geiplookup.wikimedia.org to /geoiplookup on each wiki. The service was later removed entirely in favor of an even faster zero-roundtrip solution (0-RTT): An edge-injected cookie within the Wikimedia CDN (T100902, patch). This transfers the information directly alongside the pageview without the delay of a JavaScript payload requesting it after the fact.

Could HTTP/2 be slower than HTTP/1?

During the SPDY experiment, Peter Hedenskog noticed early on that H2 has a very real risk of being slower than HTTP/1. We observed this through our synthetic testing infrastructure.

In HTTP/1, all resources are considered equal. When your browser navigates to an article, it creates a dedicated connection and starts downloading HTML from the server. The browser streams, parses, and renders in real-time as each chunk arrives. The browser creates additional connections to fetch stylesheets and images when it encounters references to them.. For a typical article, MediaWiki’s stylesheets are notably smaller than the body content. This means, despite naturally being discovered from within (and thus after the start of) the HTML download, the CSS download generally finishes first, while chunks from the HTML continue to trickle in. This is good, because it means we can achieve the First Paint and Visually Complete milestones (above-the-fold) on page views before the HTML has fully downloaded in the background.

Page load over HTTP/1. 

In SPDY (and H2), the browser assigns a bandwidth priority to each resource, and resources share a single connection. This is different from HTTP/1, where each resource has its own connection, with lower-level networks and routers dividing their bandwidth equally as two seemingly unrelated connections. During the time where HTML and CSS downloads overlap, HTTP/1 connections each enjoyed about half the available bandwidth. This was enough for the CSS to slip through without any apparent delay. With SPDY, we observed that Chrome was not getting any CSS response until after the HTML was mostly done.

Page load over SPDY/H2.

This H2 feature can solve a similar issue in reverse. If a webpage suffers from large amounts of JavaScript code and below-the-fold images being downloaded during the page load, under HTTP1 those low-priority resources would compete for bandwidth and starve the critical HTML and CSS downloads. The H2 priority system allows the browser and server to agree, and give more bandwidth to the important resources first. A bug in Chrome caused CSS to effectively have a lower priority relative to HTML (chromium #586938).

A graph of SPDY usage vs time to first paint
First paint regression correlated with SPDY rollout. (Ori Livneh, T96848#2199791)

We confirmed the hypothesis by disabling SPDY support on the Wikimedia CDN for a week (T125979). When we transitioned from SPDY to H2 (T166129, T193221), improvements manifested both in the way web browsers give signals to the server, and the way Nginx handles those signals.

As it stands today, overall page load time is faster on H2, and the CSS often finishes before the HTML. Thus, achieving the same great early First Paint and Visually Complete milestones as HTTP/1. But, there remain edge cases where HTTP/2 is not able to re-negotiate priorities quick enough, so CSS is needlessly being held back by HTML chunks that have already filled up the network pipes for that connection (chromium #849106, still unresolved four years later as of this writing).

Lessons learned

These difficulties in controlling bandwidth prioritization taught us that domain consolidation isn’t a cure-all. We decided to keep operating our thumbnail service at upload.wikimedia.org through its own connection for now (T116132).

Browsers may reuse connections for multiple domains if an existing HTTPS connection carries a TLS certificate that includes the other domain in its SNI information, even when this connection is to an IP address that only corresponds to one of the two domains in DNS. Under certain conditions, this can lead to a surprising HTTP 404 error (mozilla #1363451, mozilla #1222136, T207340). Emanuele Rocca from SRE Traffic Team mitigated this by implementing HTTP 421 response codes in compliance with the spec. This way, visitors affected by non-compliant browsers and middleware will automatically recover and reconnect accordingly.

Tech Blog