Resources

Speed of H3 — What We Learned So Far

HTTP/3 removes one of the most famous limitations of HTTP/2 over TCP: transport-level head-of-line blocking. In a lossy network, one lost TCP segment can delay the delivery of unrelated HTTP/2 streams because all streams share the same ordered byte stream. QUIC, the transport protocol underneath HTTP/3, changes that model. Streams are independent at the transport layer, so packet loss on one stream does not have to block data delivery on another.

That can easily lead to a tempting conclusion: if HTTP/3 removes a TCP limitation, then HTTP/3 should simply be faster.

Our measurements do not support that simplified conclusion.

On the other hand, there is an opposite expectation that is also common among engineers: HTTP/2 should be faster on a well-behaved network path because it runs over battle-hardened kernel TCP. TCP benefits from decades of operating-system tuning, mature congestion-control implementations, and hardware offload support such as TSO, GRO, and optimized send paths. HTTP/3, by contrast, runs over QUIC in user space. So the intuitive expectation is that HTTP/3 should pay a performance penalty on a clean connection.

Our measurements do not support that simplified conclusion either.

The robust finding from our current tests is more nuanced:

On a real WAN path, HTTP/2 and HTTP/3 can reach practical throughput parity for single-connection downloads. HTTP/3 is not reliably faster, but it is also not inherently slower — if the client and server implementations interact well.

The important word is if.

Scope of These Measurements

This article is about controlled download throughput, not full page-load performance.

We are not measuring browser cache behavior, JavaScript execution, rendering, resource prioritization across complex pages, CDN routing differences, or mobile radio behavior. We are looking at controlled single-connection downloads over a real WAN path and comparing HTTP/2 and HTTP/3 behavior under similar conditions.

That scope matters. HTTP/3 can still provide important benefits for interactive web traffic, multi-stream behavior, connection migration, and lossy networks. But those benefits do not automatically translate into higher raw throughput on a clean single download.

Why Parity Is Already an Important Result

At first glance, HTTP/3 parity may sound unexciting. It is not.

HTTP/2 uses TCP, and TCP implementations have been optimized for decades inside operating-system kernels. Congestion control, pacing, segmentation, receive aggregation, retransmission behavior, and buffer management are deeply integrated into mature networking stacks.

HTTP/3 uses QUIC over UDP. QUIC moves many transport responsibilities into user space: packet number spaces, loss detection, congestion control, pacing, retransmission decisions, stream flow control, connection migration, and cryptographic packet protection. This gives QUIC more flexibility, but it also means that details formerly hidden inside the kernel are now part of the browser, server, library, CDN, proxy, or gateway implementation.

So the interesting result is not that HTTP/3 is magically faster. The interesting result is that a well-implemented HTTP/3 stack can keep up with HTTP/2 on a real WAN path despite running much more of the transport logic in user space.

Why the “If” Is Bigger for HTTP/3

In TCP, congestion control lives in the kernel. The major algorithms have been tuned for decades and tested at enormous scale. Sender and receiver behavior is not identical across all systems, but the interaction model is mature and familiar.

In QUIC, congestion control and loss detection move into the QUIC implementation. RFC 9002 defines a baseline loss-detection and congestion-control design, based on long-established TCP ideas such as NewReno. But real-world QUIC stacks can combine different congestion controllers, pacing strategies, ACK handling, packetization choices, flow-control strategies, and loss-detection details.

CUBIC, standardized in RFC 9438, is increasingly common and widely deployed in TCP stacks. It is also used by several QUIC implementations. BBR and other variants exist as well. These algorithms are designed to interoperate, but they do not operate in isolation.

A QUIC sender observes ACKs and loss signals. A QUIC receiver decides when to send ACKs. A pacer decides when packets actually leave the application. The sender cannot fully see the receiver's ACK strategy, and the receiver cannot fully see the sender's pacing decisions. Their interaction forms a control loop.

When that control loop works well, HTTP/3 can reach parity with HTTP/2. When it works badly, performance can collapse.

We observed three classes of problems that illustrate this point.

1. Handshake Correctness: When QUIC Cannot Even Start

One failure class appears before throughput can even be measured.

QUIC has specific packet-size requirements during connection establishment. In particular, the client Initial datagram has to be large enough to support QUIC's anti-amplification design. This is not a generic "UDP is unreliable" issue. It is a QUIC handshake correctness requirement.

During our tests, we observed a widely used browser retransmitting an Initial packet after perceived loss, but the retransmitted packet was smaller than required. In some environments, that can prevent the HTTP/3 connection from establishing at all.

This is a QUIC-specific failure mode with no direct TCP equivalent. TCP connection establishment has its own possible failures, but it does not depend on this particular combination of Initial packet padding, packet-size rules, and anti-amplification constraints.

For a user, the symptom may look simple: HTTP/3 does not work, while HTTP/2 does. But the root cause can be a low-level transport correctness issue.

2. Control-Loop Interaction: When ACK Timing Shapes Sender Pacing

The second class is more subtle. The connection establishes successfully, but throughput becomes dramatically lower than expected.

In one configuration we tested, the QUIC implementation combined CUBIC with a pacing strategy that depended on early delivery-rate samples. If the receiver's ACK pattern caused each ACK to cover only a small number of packets, the sender derived a very low delivery-rate estimate. The sender then paced the transfer at that low rate for the remainder of the download, well below what the path could support.

The result was not a small difference. We observed throughput differences of a factor of 30 or more between configurations that differed only in ACK timing, on the same path, with the same server.

This is not an argument that CUBIC is broken. It is an example of how congestion control, pacing, ACK strategy, and delivery-rate estimation can interact in unexpected ways inside QUIC implementations.

The key lesson is that ACK behavior is not just a receiver-side detail. In QUIC, it can strongly influence the sender's model of the path.

3. Packetization Mistakes: When MTU Turns into False Loss

The third class concerns packet size and path MTU.

QUIC packets are carried in UDP datagrams. If a sender emits UDP datagrams that are too large for part of the path and the packets are allowed to fragment at the IP layer, the sender may believe that it is successfully sending full-sized QUIC packets while the network is actually turning them into fragile IP fragments.

That is dangerous. Fragment loss, delayed reassembly, or reordering can surface at the QUIC layer as apparent packet loss. From the QUIC sender's perspective, packet-number gaps appear. The loss detector reacts. The congestion window may be reduced — even though the original application data was not lost in the way the sender assumes.

In our measurements, this kind of MTU mistake could also produce throughput reductions by a factor of 30 or more.

The robust implementation lesson is straightforward: HTTP/3 stacks should not rely on IP fragmentation as a fallback. QUIC implementations need sound packetization behavior, proper path MTU discovery, and conservative handling of maximum datagram size. DPLPMTUD or equivalent mechanisms are important, and setting packet sizes blindly near the local interface MTU can be a serious mistake on real paths.

What This Means for SSE Gateways

For an SSE gateway, HTTP/3 is not just another protocol label.

A gateway may terminate, inspect, proxy, downgrade, block, or tunnel traffic. Any of these steps can influence transport behavior. Even when the gateway does not intentionally modify application data, it may change buffering, timing, packetization, ACK visibility, connection reuse, MTU exposure, or the point where congestion control decisions are made.

That means an SSE product can accidentally turn a healthy HTTP/3 path into a poor one — or hide a poor HTTP/3 path by falling back to HTTP/2.

This is why testing matters. It is not enough to ask whether HTTP/3 is allowed or blocked. It is also important to understand whether HTTP/3 works well through the full path that real users take: browser, endpoint, gateway, network, server, and back.

When Things Work Well

When the sender and receiver implementations interact cleanly, HTTP/2 and HTTP/3 can deliver comparable throughput for single-stream downloads over a low-loss WAN path.

In that case, the famous HTTP/3 advantage — avoiding TCP head-of-line blocking between independent streams — does not necessarily show up as a raw speed advantage. On a clean single download, there is little or no transport-level head-of-line blocking to remove.

At the same time, HTTP/2's kernel TCP foundation does not automatically make it faster. A mature HTTP/3 implementation can compensate for the user-space transport overhead well enough to reach parity.

The result is less dramatic than "HTTP/3 is faster" and more useful than "HTTP/3 is slower":

HTTP/3 is competitive when implemented well, but it is more sensitive to transport implementation details than many people expect.

What We Learned So Far

The lesson is not that HTTP/3 is always faster.

The lesson is not that HTTP/3 is always slower.

The lesson is that HTTP/3 moves more transport behavior into implementation-specific user-space code. That creates flexibility and enables important protocol improvements, but it also increases the importance of details such as loss detection, pacing, ACK timing, flow control, and packetization.

On a clean path, HTTP/3 can match HTTP/2. Under packet loss and multi-stream traffic, HTTP/3 can avoid some limitations that are inherent to TCP-based multiplexing. But when implementation details interact badly, HTTP/3 can fail completely or perform dramatically worse than expected.

That makes HTTP/3 performance a practical test topic, not just a protocol-theory topic.

Try It Yourself

The SSE TestCenter Download page is intended to make these differences visible.

Try the same download over HTTP/2 and HTTP/3 from your own environment. If the results are close, that is a good sign: the browser, network path, gateway, and server are interacting well enough to reach parity.

If you see large and consistent differences — especially if HTTP/3 is much slower or fails while HTTP/2 works — the causes above are good places to start investigating:

  • Does the HTTP/3 connection establish reliably?
  • Does the receiver ACK behavior interact badly with the sender's pacing or congestion controller?
  • Is the path MTU being discovered correctly?
  • Are large UDP datagrams being fragmented somewhere on the path?
  • Is an SSE gateway, proxy, firewall, or endpoint product changing timing, buffering, or packetization behavior?

Please share your findings with us at testcenters@nocloudfellows.de — especially when the same browser and server behave differently across gateways or network paths.

References