> TCP keepalives would be perfect for this... if routers didn't silently drop them.
There are lots of middleboxes that don't pass along empty TCP packets. TCP keepalive is in a similar situation to IPsec: great for an Intranet, or for two public-Internet static peers with a clear layer-3 path between them; but everything falls apart in B2C scenarios.
Plus, to add to this problem: HTTP has gateways (proxies et al.) Doing TCP keepalive on the server end, only tells you whether the last gateway in the chain before the server is still connected to the server, rather than whether the client is still connected to the server.
Unless you can get every gateway in the chain to "propagate" keepalive (i.e. to push keepalive down to its client connection, iff the server pushes keepalive down onto it), silent undetected TCP disconnections will still happen—and even worse, you'll have false confidence that they aren't happening, as all your sockets will look like they're actively alive.
For what I'm doing, the client end isn't likely to have any gateways, so TCP keepalives "would be" workable for my use-case if not for the middlebox thing. But in full generality, TCP keepalives aren't workable, because there's always those corporate L7 caching proxies + outbound WAFs messing things up, even when L4 middleboxes aren't.
Keep your TCP keepalives for running connection-oriented stream protocols within your VPC. For HTTP on the open web, they're pretty unsuited. You need L7 keepalives. (If you've ever wondered, this is why websockets have their own L7 keepalives, a.k.a. "ping and pong" frames.)
> and trying to read
An HTTP client connection can legally half-close (i.e. close the output end) when it's done sending its last request; and this will result in a read(2) on the server's socket returning EOF. But this doesn't mean that the client's input end is closed! You have to do a write(2) to the server's socket to detect that.
And, since empty TCP packets aren't guaranteed to make the trip, that means you need to write a nonzero number of bytes of ...something. Without that actually messing up the state-machine of your L7 protocol.
Wouldn't setting appropriate net.ipv4.tcp_keepalive_* and trying to read work?