Golang HTTP/2 connections can become blocked for extremely long periods of time

I've recently had cause to try and explain this issue to a few people, so I figured it was probably worth logging a synopsis in my Gitlab for ease of future reference. I had intended to create something more like this, but in the process of writing it up, it's ended up turning into more of a blog post - so here we are.

There is an issue in Golang's HTTP2 client which can lead to connections becoming blocked for a prolonged period of time. When this happens, subsequent requests fail, because broken connections continue to be selected from the connection pool.

I originally logged this as upstream bug Go#59690, but because it was aimed at an audience familiar with both the code and the underlying fundamentals, the description there makes a few assumptions about the knowledge of the reader.

The intent of this blog post is to provide a broader and higher(ish)-level overview of the issue as well as providing details of how to mitigate until such as time as it's permanently fixed in Go's net packages.

Copies of the repro code used in this post can be found in my article-scripts repo.


Contents


Background

Golang's HTTP/2 support is implemented in net/http2, however, when Go releases are made, a bundle (h2_bundle.go) is created in net/http allowing applications to transparently use HTTP/2 (if the upstream server has support).

So, although the issue actually exists in net/http2 most who encounter it will likely do so despite only having imported the net/http package.


The Issue

In a nutshell, the issue is that broken connections can not reliably be closed, leaving them available for attempted reuse and causing request-orientated applications to become blocked.

If we take the following simple codebase (a copy of this code can be found here):

package main

import (
    "context"
    "fmt"
    "net/http"
    "time"
    "strings"
)

func main() {

    // The url to place requests to
    url := "https://www.bentasker.co.uk/"

    // Create an empty context
    ctx := context.Background()

    // Create an io.Reader from a string to act as the request body
    bodystring := "foo\nbar\nsed"
    myReader := strings.NewReader(bodystring)

    // Instantiate client
    client := http.Client{
        Timeout: 5 * time.Second,
    }

    // Set up an infinite loop to place requests from
    for {
        req, err := http.NewRequest("POST", url, myReader)
        if err != nil {
            fmt.Println("boo")
        }
        req.Header.Set("Content-Type", "text/plain; charset=utf-8")

        // Place the request
        resp, err := client.Do(req.WithContext(ctx))

        if err != nil {
            // Attempt to tear down the connection
            fmt.Println("Timed out")
            client.CloseIdleConnections()
        } else {
            fmt.Println(resp.StatusCode)
        }

        time.Sleep(time.Second)
    }
}

The code is supposed to

  • Build a POST request
  • Place the request, applying a 5 second timeout
  • If a timeout occurs:
    • Print Timed out,
    • Tear down the connection (allowing a new one to be established on the next request)
  • else:
    • print the HTTP status code (which will be 403 because my site doesn't allow POST)
  • Sleep for a second and then place the next request

So, assuming the application places 5 requests and only the 4th request fails, we would expect to see the following output:

403
403
403
Timed Out
403

However, whether this actually happens depends on why the 5 second timeout was hit.

  1. If the remote server is just being slow to respond, the code will generally work as expected
  2. If the underlying TCP connection has broken, the code will not work as expected and subsequent requests will also fail

Repro

We can easily force scenario 2 by running the code and then using the system's firewall to prevent the upstream server's responses from reaching the application.

Start the code

go run main.go

Use netstat to see which local port is being used

$ netstat -anp | grep main
tcp        0   2185 192.168.3.217:50446     143.244.38.137:443      ESTABLISHED 2389361/./main   

Then break the connection, using iptables to block any packets coming back from the server to the local port

iptables -I INPUT -p tcp --dport 50446 -j DROP

We block the local port, rather than the remote IP because we only want to break the current connection - a new connection will use a new ephemeral port and so will sidestep the firewall rule, allowing the application to recover.

At this point, what's supposed to happen is

  • Our in-flight request fails
  • Go prints Timed Out and closes the connection
  • On the next request, a new connection is established
  • The next request will work as it should

What actually happens though is

  • Our in-flight request fails
  • Go prints Timed Out
  • The call to closeIdleConnections() completes but doesn't close the connection
  • On the next request, the broken connection is re-selected
  • Subsequent requests fail (and will continue to do so for the next ~16 minutes)

So, rather than recovering, the code will actually print the following

403
403
403
Timed Out
Timed Out
Timed Out
Timed Out

If netstat is run, it'll show that Go is still trying to use that same connection

$ netstat -anp | grep main
tcp        0   2185 192.168.3.217:50446     143.244.38.137:443      ESTABLISHED 2389361/./main   

16 Minutes to Recovery

16 minutes might sound like a fairly arbitrary number, so it's worth explaining what's actually going on under the hood.

We're going to be talking about the kernel's TCP settings, so it's probably helpful to start by introducing a simplified version of the TCP flow into our expected output:

client -> server: SYN
server -> client: SYN,ACK
client -> server: ACK
client -> server: PSH
server -> client: ACK, PSH
403
client -> server: ACK
client -> server: PSH
server -> client: ACK, PSH
403

What this shows, is

  1. a TCP 3-way handshake (SYN, SYN-ACK, ACK) as a result of the client establishing a connection to the server
  2. The client sending a packet containing the request to the server (the PSH)
  3. The server acknowledging receipt of the TCP frame (the ACK) and supplying it's response (PSH)
  4. Because the connection's already established, there's no need for a further handshake, so the next request can just be sent

In practice there's a lot more back and forth (with SSL handshakes, large payloads etc), but those don't really need to be included in order to understand the issue.

If we add a timeout to the end of this flow, it might look something like this:

client -> server: ACK
client -> server: PSH
server -> client: ACK, PSH
403
client -> server: ACK
client -> server: PSH
client -> server: PSH (retransmission)
client -> server: PSH (retransmission)
Timed Out

As before, the client sent its request payload but this time it received no response (what we're looking for is an ACK). Eventually the client system retransmits its packet, waits a while and then does so again.

Retransmissions are triggered by the kernel, rather than the application and shouldn't be confused with an application retrying a HTTP request.

Now that we've got an understanding of the flow, it brings us back to the initial question:


Why 16 minutes?

The kernel, obviously, cannot sit and retransmit packets indefinitely: there has to be some sort of a limit. In Linux, that limit is set via the sysctl net.ipv4.tcp_retries2:

$ sysctl net.ipv4.tcp_retries2
net.ipv4.tcp_retries2 = 15

So (assuming that the application doesn't close the connection first), the kernel will retransmit 15 times before considering the connection dead.

The time between each retry increases exponentially, with the minimum and maximum wait time being defined by a couple of settings within the kernel itself:

#define TCP_RTO_MAX ((unsigned)(120*HZ))
#define TCP_RTO_MIN ((unsigned)(HZ/5))

On most systems, these tick counts equate to the following values

  • TCP_RTO_MIN : 200ms
  • TCP_RTO_MAX : 120s

So, 200ms after our PSH goes unacknowledged, a retransmission will be sent. Then, 400ms after that, the next one goes out and so on:

client -> server: PSH
wait 200ms
client -> server: PSH (retransmission)
wait 400ms
client -> server: PSH (retransmission)
wait 800ms
client -> server: PSH (retransmission)

By the time that we reach the 11th retransmission, TCP_RTO_MAX has been hit and we'll be waiting 2 minutes between each subsequent re-transmission.

Eventually net.ipv4.tcp_retries2 is hit, and the kernel abandons the connection. But, with a value of 15, slightly more than a quarter of an hour has now passed since the first attempt was made.

Obviously, the vast majority of connections don't exhibit this kind of long-lived blocking behaviour.

Normally, the application would hit a timeout of it's own and tear down the connection. As we will see, the reason that we see this at all, is because Go sometimes fails to tear down broken connections, leaving the connection waiting for the kernel to exhaust it's retransmission attempts.


HTTP/2 is Multiplexed

Just before we go charging into an explanation of the issue, it's probably worth conveying a little bit of additional theory/terminology.

In HTTP/1.1 the relationship between connection & request was quite simple: there's a single request active on a connection at any one time (actually, HTTP Pipelining broke that model but it wasn't widely adopted). HTTP keep-alives allow a connection to be reused for subsequent requests, but there can still only be one-per-connection in flight at a time (and the server, in turn, to respond in whatever order suits it).

HTTP/2, however, changed that, because it uses multiplexed connections: after establishing a TCP connection, requests are passed over streams within that connection, allowing multiple requests to be in flight at any one time.

So where HTTP/1.x's model was

TCP Connection 1
  Request 1
TCP Connection 2
  Request 2

With HTTP/2 the model is:

TCP Connection 1
  Stream 1
     Request 1
  Stream 2
     Request 2

You can only have one HTTP request in flight per-stream, but because it's now per-stream rather than per-connection, it's possible to have multiple requests in flight per connection (avoiding the application level head-of-line blocking that HTTP/1.x was known for).

What follows doesn't require an in-depth understanding of HTTP/2's multiplexing, it's enough to understand that a single connection can carry multiple streams and that, as a result, it's important to consider whether any streams are active before closing the underlying connection.


Raciness

At the Go level, the underlying issue is that closeIdleConnections() is often not effective in this sort of failure scenario.

As the name suggests, it will only close connections that are considered idle, with the following conditional doing that work

if len(cc.streams) > 0 || cc.streamsReserved > 0 {
        cc.mu.Unlock()
        return
}

If there are any streams in cc.streams (or if there are reserved streams), the connection (cc) is not considered idle and so closeIfIdle() will return without closing. This results in the connection being left in the connection cache, available for attempted re-use by the application's next HTTP request (which, in turn, will be doomed to fail because the connection it's relying on is actually broken).

Unfortunately, when network failures occur the broken stream is often still present in cc.streams, and so the conditional evaluates true and the connection is left in place.

Frustratingly, the broken stream does actually get removed from cc.streams, it's just that it quite often happens a little too late: after the calling application has had opportunity to call closeIdleConnections().


The Fix

There isn't yet an agreed fix.

The underlying issue is that, even if it can't be closed, Go needs to refuse to re-use a connection that's failed.

I had originally thought that the answer should be quite simple:

  • Check if this was the only stream in the connection (to avoid non-network failures impacting working streams)
  • If so, set the doNotReuse attribute to prevent it being redrawn from the pool

Which looked a little like this

case <-cs.abort:
    // If this was the only active stream, mark the connection
    // as not for re-use in order to address raciness if the caller
    // tries to call closeIdleConnections() before the stream has been
    // removed
    if len(cc.streams) == 1 {
        cc.doNotReuse = true
    }
    return cs.abortErr
case <-ctx.Done():
    // If this was the only active stream, mark the connection
    // as not for re-use in order to address raciness if the caller
    // tries to call closeIdleConnections() before the stream has been
    // removed
    if len(cc.streams) == 1 {
        cc.doNotReuse = true
    }
    return ctx.Err()

Not very DRY, but effective.

The Golang team re-imagined this into a DRYer fix, which was ultimately included in x/net v0.10.0.

Unfortunately, that change caused a regression for some use-cases and so was reverted. Frustrating, but understandable.


Mitigation

Although (at time of writing) the issue still exists in Go's net/http2 and, by extension, net/http it is possible for applications to mitigate the issue in a number of ways, which we'll look at here.


Dirty Hack

The simplest is effective but fugly - it's possible to force net/http2 to set DoNotReuse by placing a second request with a Connection: close header

resp, err := client.Do(req.WithContext(ctx))

if err != nil {
    fmt.Println("Timed out")

    // prompt the handler into marking the connection as not
    // being reusable
    req.Header.Set("Connection", "close")
    client.Do(req.WithContext(ctx))

    // No longer required
    // client.CloseIdleConnections()
}

(The change can be seen in context here).

This is... unpleasant... but works because, when preparing to place a request, net/http2 checks for a Connection: close header and sets doNotReuse if one is present.

It's simple to add, but (aesthetics aside) should be used with care: if the connection isn't actually broken, this might lead to the server receiving the duplicate request (which really isn't great if the requests you're placing are not indempotent).

Luckily, there is another way, which is actually alluded to in the ticket which led to the commit being reverted.


HTTP/2 Healthchecks

net/http2 includes support for healthchecks: periodically sending a HTTP/2 PING to the remote server and tearing down the connection if it fails.

As we'll see, enabling these adds a little bit of complexity to the code, but (for busy applications) doesn't add much routine overhead because pings are only sent if the gap between server responses is too large.

Healthchecks are enabled by setting ReadIdleTimeout on the http2 transport object: the timeout ticker is reset every time a frame is received from the server.

It's worth noting here that this means that the gap that we care about is the time between the server's responses rather than the time between request and response. If you're only placing a request every 30s you want ReadIdleTimeout to be greater than that, otherwise healthchecks will trigger during the gap between requests (which is ostensibly harmless, but a little inefficient and noisy at scale).

Going back to our earlier flow, with healthchecks enabled, we might expect to see the following:

403
<timer starts>
403
<timer resets>
403
<timer resets>
Timed Out
<timer hits threshold>
H2 PING
<ping timeout hit>
<connection closed>
403
<timer starts>

Sounds ideal, right? However, there is a small catch.

The attribute ReadIdleTimeout needs to be set on the HTTP2 transport. This is easy if you're building requests using net/http2 directly, but most applications import net/http which doesn't provide access to the underlying http2 transport object.

Luckily the net/http2 package provides a function (configureTransports()) which can be used to gain access to the underlying transport (though it does of course mean adding an additional import).

To do so, we need to change our repro code to do 4 things

  • Also import golang.org/x/net/http2
  • Define a transport
  • Pass that transport to net/http2
  • Set ReadIdleTimeout on the result to enable the health checks
  • Pass the original transport in when creating the HTTP client

Which (import aside) looks a little like the following

// Create a transport
transport := &http.Transport{}

// Get a copy of the HTTP2 transport
http2Transport, err := http2.ConfigureTransports(transport)
if err == nil {
    // Enable healthchecks by setting the idle timeout
    http2Transport.ReadIdleTimeout = time.Second * 10
    // Change from the default of 15s
    http2Transport.PingTimeout = time.Second * 2
}

// build client this time passing the transport in
client := http.Client{
    Timeout: 5 * time.Second,
    Transport: transport,
}

(The changes can be seen in context here).

With this change, a ping should be sent 10 seconds after the last frame is received from the server. If that ping times out, the connection will be closed and a new one established by the next request.

This can be verified by enabling net/http2's debug output and running the code

export GODEBUG=http2debug=2
go run main.go

This generates a lot of noise (in fact, it's worth teeing into a file), but after breaking the connection you should ultimately see something like the following logged

2023/07/22 12:29:37 http2: Transport sending health check
2023/07/22 12:29:37 http2: Framer 0xc0002342a0: wrote PING len=8 ping="\x11&\xe8:\xd7\xc2-\x1b"
2023/07/22 12:29:39 http2: Transport health check failure: context deadline exceeded
2023/07/22 12:29:39 http2: Framer 0xc0002342a0: wrote RST_STREAM stream=17 len=4 ErrCode=CANCEL
2023/07/22 12:29:39 http2: Transport readFrame error on conn 0xc0001c2180: (*net.OpError) read tcp 192.168.3.20:49766->143.244.38.136:443: use of closed network connection
2023/07/22 12:29:39 RoundTrip failure: http2: client connection lost
Timed out
2023/07/22 12:29:40 http2: Transport failed to get client conn for www.bentasker.co.uk:443: http2: no cached connection was available
2023/07/22 12:29:40 http2: Transport creating client conn 0xc0001c2900 to 143.244.38.136:443

Walking through this

  • The healthcheck fires (http2: Transport sending health check)
  • 2 seconds later the ping times out (http2: Transport health check failure: context deadline exceeded) and the connection is closed (http2: Framer 0xc0002342a0: wrote RST_STREAM stream=17 len=4 ErrCode=CANCEL).
  • The next request can't find an existing connection in the pool (http2: no cached connection was available),
  • A new one is established (http2: Transport creating client conn 0xc0001c2900 to 143.244.38.136:443)

After this, the application recovers.

There is one final catch: not all HTTP/2 servers support H2 Ping, so if you can, you should check your upstreams for support (or, at least, set ReadIdleTimeout to a high enough level that it's unlikely to lead to unnecessary connection cycling). My article scripts repo contains a Python script which can be used to see if H2 pings are responded to.


Conclusion

This isn't quite the short synopsis that I originally intended to write, but hopefully it provides an accessible explanation of the issue, filling in the "they'll know that bit" gaps that I skipped over when raising the original bug report.

Essentially, if you're using Go and net/http or net/http2 to place HTTP requests to an upstream server and you're sometimes finding that those requests break or block for long periods of time, it's quite likely that it's caused by this bug and has been triggered by issues with the underlying network connectivity.

Perhaps it's my CDN background skewing my perspective, but this really was quite an interesting bug to dig into, not least because it causes prolonged breakage at a per-connection granularity allowing for some weird breakage models.

For example if you have 5 connections to a single upstream host, you might find that 1 becomes persistently blocked, despite the others continuing to function without issue. This would likely present as roughly 20% of requests failing, with the upstream server claiming to have no knowledge of ever having received the requests at all and neither end necessarily showing any sign of excess load.

Although an upstream fix isn't yet available, the issue can quite easily be mitigated - whether through use of a nasty hack, or by enabling HTTP/2 health-checks.