Raspberry Pi WiFi losing connectivity? Check your AP.

A couple weeks ago I plugged an old TP-LINK WiFi adapter into my Raspberry Pi 2, and found it would fall off my WiFi network after a couple hours. Connecting over a serial port showed that the connection was associated but no IP traffic could flow; pings to the default gateway and failed.

One of the things I learned when working on Google Fiber WiFi is that this is a sign of a bug. wpa_supplicant doesn’t actually drop your connection like this when things are working correctly.

So I started investigating.

Raspberry Pi 2 recipes on the Internet tend to contain workarounds for flaky WiFi. I found several when I was setting up my Pi:

One thing they mention is regularly pinging a remote host to defeat power save mode. This failed to prevented the bug from happening for me. It also didn’t seem like a productive place to investigate further after iw dev wlan0 get power_save showed power save was already off.

But manually bringing the client interface down and then up over serial would indeed restore my WiFi connection.

$ sudo /sbin/ifdown --force wlan0 ; sudo /sbin/ifup wlan0

This is an extreme step, and I wasn’t happy with the idea of using this as a workaround. (It’s only one step short of rebooting yourself.) I looked at the system logs and saw only routine log messages from wpa_supplicant.

The last log messages before they stopped referenced a group key update operation, which seemed familiar but I didn’t recognize the significance of at the time.

Then I started looking for a simpler command that would recover connectivity, and found that:

$ wpa_cli reassociate

worked. Google searches turned up a thread on the hostap mailing list, ‘lost connectivity until “wpa_cli reassociate” is issued’ that matched the symptoms I was seeing, and showed the same “everything is fine” status I’d seen on wpa_cli status:

$ wpa_cli status
Selected interface 'wlan0'

A few messages later, Jouni Malinen mentions:

There have been bugs in drivers where group rekeying mess up something in the key configuration either for the group key itself or maybe even for the pairwise keys. In other words, this does not really sound anything new, but obviously the exact reason in the driver is likely to be specific to the driver.

which sounds familiar from my work in Google Fiber. Group key rotation is complicated and tends to tickle state machine bugs in client WiFi drivers. I found the lines from our hostapd config generator, wifi/configs.py, that disable group key rotation entirely:

# Disable all rekeying.  This may slightly reduce security but might be
# useful if there are rekeying bugs.

add them to hostapd.conf on my AP, restart hostapd, reassociate, and walk away.

Two days later my Raspberry Pi still had a solid WiFi connection.

I’m pretty sure this is a previously undiagnosed bug in the ath9k_htc driver used by this WiFi adapter, and that there is a similar bug in the rt8192cu driver used by Edimax WiFi adapters from looking at the blog posts I found while investigating this.

I haven’t been able to root cause it further. USB WiFi adapters always have a general-purpose computer on board, so there’s a lot that can go wrong between the driver on the host and the firmware on the adapter.

I’m not working on WiFi any more, but I can’t escape working on WiFi.

Thanks to Avery Pennarun for comments and corrections.