I was recently led to the following excellent, humorous article about the current state of Internet protocols – and the winding road that brought us here:

The world in which IPv6 was a good design

I agree with Avery as he identifies a future necessity: replacing TCP with an encrypted, UDP-based protocol like QUIC that will no longer identify sessions with a 4-tuple (clientIP, clientPort, serverIP, serverPort), but instead with a random session ID. This would allow clients to change their IP address, e.g. between WiFi connections, while continuing the session state. This is not currently possible with TCP, with IPv6 or not.

The Secure Shell protocol is built on top of TCP. This creates for SSH a number of problems:
  1. Anyone can send a TCP RST in your name (faking the IP and port; it can be brute-forced), which breaks your connection. Routers that unilaterally decide your connection is "taking too long" are in a special position to do so.

  2. If there's a data transmission error (particularly common on WiFi), TCP fails to detect it in 1/65536 cases. If you're sending gigabytes of data, the session will collapse because the SSH MAC will detect the error which TCP didn't - and SSH will disconnect because it has no mechanism to correct it.

  3. TCP implementations in operating systems tend to sport bad congestion control. It's typical to see speeds of 30% or less of what a connection permits, especially if there's any loss at all. 0.1% packet loss shouldn't affect bandwidth, especially if it's unavoidable like with WiFi. But it affects speed, badly. Unscrupulous file transfer clients "handle" this by making it trivial for users to run 10 simultaneous transfers, which "works around" the TCP deficiency by doing 10 key exchanges, burdening the server with 10 sessions, and elbowing bandwidth away from other traffic flows.

    At IETF 97 in Seoul (2016), Google presented BBR, a seemingly much better congestion control than is currently used in widespread operating systems. But applications can do nothing to migrate: if we build on TCP, migration can only happen in OS kernels, which in many cases it is common to upgrade every 10 years.

  4. TCP connections aren't mobile. Switch a phone over to a neighboring WiFi, and you lose your SSH connections and their states. Terminal windows close, file transfers need to be resumed, port forwarded connections are interrupted.
All of this could be fixed by defining new SSH transport + connection layers over UDP, perhaps similar to QUIC. A connection layer too, because a major win would be the ability to seamlessly port forward UDP traffic over this, which cannot be done well by pretending the underlying flow is a stream. The main thing it can keep from SSH is, essentially, a similar interface and top-down view, and compatibility with the same public keys for client and server authentication.

In 2014, I wrote a draft specification for a protocol like this. This was before I even knew QUIC existed. I even wrote a reference implementation in pure C++! Then... I shelved it. I had several insecurities:
  • I was unsure if the design was sound. To improve on it, I'd need feedback, but I was afraid of getting feedback of the "A je to!" variety. That would give a false sense of assurance, only for fatal problems to come up later. This would need expert reviewers, but I didn't imagine there was interest (since I didn't even know there was QUIC then).

  • Standardizing new SSH transport and connection layers cannot be done properly by one person. It requires multiple years, and the amount of tedium that needs to be performed is huge. Even if others were interested, I wasn't sure I was willing. Subsequently, I did RFC 8308 and RFC 8332 – comparatively small things, with the help of the awesome group at Curdle. The process for those was decidedly painful. A new SSH protocol is 5x or 10x the amount of work.

  • It does not seem easy to fit into an existing SSH implementation. For users, it may seem like a drop-in replacement, but at an implementation level, many things change. It would take years to iron out new issues that arise.

  • The problems being solved seem the kinds that are just small enough to ignore. SSH doesn't recover if TCP doesn't fix errors, but on a good underlying network, there might not be errors. Connections are vulnerable to rogue TCP resets, but those don't happen much except with rude routers. TCP congestion control sucks, but implementations can eventually solve this. TCP doesn't provide IP address mobility, but usually we use SSH from one spot.

Yet, the situation can use improvement. It's not going to improve itself, and if we don't improve it, the little problems will continue indefinitely. There will always be these flaws.

SSH over UDP would be the solution. But is this important enough to spend, like, 5 years of our lives on? Do people want to form an IETF working group?

If we don't, someone may do it independently, like I considered in 2014. If someone does, perhaps users might prefer it. If that happens, then for existing SSH developers and users, the transition might be messier than it could be; or we might have two solutions for a long time where one could have worked for everything.