Latency is the gap between what is happening and what the viewer sees. For live sports, betting, interactive broadcasts, and auction-style commerce, that gap determines whether the experience works or fails. The three main approaches to shrinking it — WebRTC, Low-Latency HLS, and low-latency CMAF — each solve the problem differently, and the right choice depends on your scale requirements, device matrix, and CDN architecture.
This guide breaks down how each protocol actually works at the transport layer, where they fit in a production delivery pipeline, and which tradeoffs matter most when you are shipping to real devices.
What “low latency” actually means in practice
Standard HLS and DASH operate at 15-30 seconds of glass-to-glass latency. That is fine for most VOD-like live content: news broadcasts, concert streams, religious services. Nobody notices if the stream is 20 seconds behind.
Below 5 seconds, you enter the territory where viewers can interact with the content in near-real-time. Sports betting needs sub-3-second latency to feel fair. Live commerce wants sub-5 seconds so chat responses make sense. Interactive gaming and gambling need sub-1-second.
The protocols map roughly to these latency bands:
- WebRTC: 200ms–800ms (real-time)
- LL-HLS: 2–4 seconds (near-real-time)
- Low-latency CMAF (LL-CMAF/LL-DASH): 2–5 seconds (near-real-time)
Each comes with its own scaling ceiling, CDN requirements, and device compatibility story.
WebRTC: real-time but hard to scale
WebRTC was designed for peer-to-peer communication — video calls, not broadcasts. It uses UDP-based transport (SRTP over DTLS), does not rely on HTTP, and achieves sub-second latency by design.
How it works
The sender encodes video and pushes RTP packets to a Selective Forwarding Unit (SFU) or media server. The SFU routes packets to connected viewers without transcoding. Each viewer maintains a persistent connection to the SFU.
There is no manifest file, no segment-based delivery, and no HTTP caching. Adaptive bitrate works differently: the SFU can switch between simulcast layers (the encoder sends multiple quality levels simultaneously) based on receiver feedback.
Scaling challenges
Because WebRTC does not use HTTP, traditional CDNs cannot cache or distribute WebRTC streams. Each viewer needs a stateful connection to an SFU. Scaling to thousands of viewers requires a mesh of SFUs, and scaling to hundreds of thousands requires purpose-built infrastructure like media relay networks.
Some vendors solve this with WebRTC-to-HLS fallback: the first few hundred viewers get WebRTC, and overflow viewers get LL-HLS. Others use tree-based SFU topologies where SFUs relay to downstream SFUs. Both approaches add architectural complexity.
Device support
WebRTC works natively in Chrome, Firefox, Safari, and Edge on desktop and mobile. On smart TVs, support is limited. Roku does not support WebRTC. Tizen and webOS have partial support through their embedded Chromium engines, but it is not reliable across model years. Connected TV platform constraints mean WebRTC is primarily a browser and mobile protocol.
When to choose WebRTC
WebRTC is the right choice when you need sub-second latency and your audience is on web browsers or mobile apps. It is the wrong choice when you need to reach smart TVs, fire TV sticks, or other connected devices at scale, or when you need CDN-based distribution for cost efficiency.
LL-HLS: Apple’s answer to low latency
Low-Latency HLS extends standard HLS with partial segments and blocking playlist requests to reduce latency from 15-30 seconds down to 2-4 seconds while keeping the HTTP-based delivery model intact.
How it works
Instead of waiting for a full segment (typically 6 seconds) to be encoded and published, LL-HLS splits each segment into partial segments (typically 200ms–1 second each). The server publishes partial segments as they become available, and the player requests them incrementally.
The player uses blocking playlist requests: it asks the server for the latest version of the playlist and the server holds the connection open until a new partial segment is available. This eliminates polling overhead and reduces latency from segment-boundary waits.
Preload hints in the playlist tell the player where the next partial segment will be, enabling speculative prefetch. The CDN needs to support chunked transfer encoding so partial segments can begin transferring before the full segment is complete.
CDN requirements
LL-HLS requires CDN support for:
- Chunked transfer encoding (partial segments stream as they are produced)
- Blocking playlist reload (holding connections open at the edge)
- Quick purge or pass-through for live edge playlists
Most major CDNs support LL-HLS now, but the edge configuration is more involved than standard HLS. If your CDN does not support blocking playlist reloads natively, you may need to bypass cache for the live playlist, which increases origin load.
Device support
Apple mandated LL-HLS support for App Store submissions. Safari, iOS, tvOS, and macOS all support it natively through AVPlayer. On the broader device matrix — including Roku, Samsung Tizen, and Google TV — LL-HLS support depends on the player library. Shaka Player and hls.js both support LL-HLS, which covers most smart TV browsers and Android-based devices.
When to choose LL-HLS
LL-HLS is the right choice when you need broad device reach, CDN-native delivery, and 2-4 second latency. It is the default recommendation for most consumer-facing live streaming services that need to reach Apple devices and want a single low-latency protocol that works across the device matrix.
Low-latency CMAF: the standards-based approach
Low-latency CMAF uses chunked encoding within CMAF segments to enable progressive delivery over standard HTTP. Combined with DASH’s availability time offset mechanism or LL-HLS partial segment signaling, it achieves comparable latency to LL-HLS.
How it works
CMAF segments are fragmented MP4. In low-latency mode, each segment is encoded with multiple CMAF chunks inside it. Each chunk can be transferred independently via chunked HTTP transfer encoding. The player begins decoding the first chunk of a segment before the full segment is complete.
On the DASH side, the MPD uses availabilityTimeOffset to tell the player how soon chunks within a segment become available. On the HLS side, CMAF chunks map to partial segments.
The key advantage of CMAF is that the same media segments can serve both HLS and DASH manifests. You encode once, package once, and generate two manifest formats. For services that need both HLS and DASH delivery, this reduces encoding and storage costs.
CDN and packaging requirements
Like LL-HLS, low-latency CMAF requires chunked transfer encoding at the CDN edge. The packager must produce CMAF segments with chunk boundaries aligned to decode-friendly points (typically every 200ms-1s).
Packaging tools like Shaka Packager, AWS Elemental MediaPackage, and Unified Streaming support low-latency CMAF output. The CDN configuration is essentially the same as LL-HLS since both rely on HTTP chunked transfer.
When to choose LL-CMAF
Choose low-latency CMAF when you are already using CMAF segments for standard-latency delivery and want to enable low-latency without changing your segment format. It is also the right choice if your video delivery pipeline already supports both HLS and DASH and you want low-latency for both from a single encoded source.
Protocol comparison matrix
| Characteristic | WebRTC | LL-HLS | LL-CMAF |
|---|---|---|---|
| Glass-to-glass latency | 200ms–800ms | 2–4s | 2–5s |
| Transport | SRTP/UDP | HTTP | HTTP |
| CDN compatible | No (needs SFU mesh) | Yes | Yes |
| Manifest format | None (SDP signaling) | M3U8 + partial segments | MPD + CMAF chunks or M3U8 |
| DRM support | DTLS-SRTP (limited) | FairPlay, Widevine | Widevine, FairPlay via CMAF |
| Smart TV support | Minimal | Good (via hls.js/Shaka) | Good (via Shaka/dash.js) |
| Scale ceiling | Thousands (SFU-dependent) | Millions (CDN-native) | Millions (CDN-native) |
| Encoding overhead | Simulcast layers | Standard ABR ladder | Standard ABR ladder |
Choosing for your device matrix
If your application targets connected TVs — Roku, Samsung, LG, Google TV — WebRTC is effectively off the table for the primary playback path. You need an HTTP-based protocol that works with the media players available on those platforms.
For most OTT services shipping to a broad device matrix, the recommendation in 2026 is:
- LL-HLS as the primary low-latency protocol for Apple devices and any platform using hls.js
- LL-CMAF with DASH as a secondary path for Android and smart TV devices using Shaka Player or dash.js
- WebRTC only for specific use cases requiring sub-second latency on browser and mobile
If your service does not need sub-second latency, LL-HLS alone covers the widest device range with the least operational complexity.
Operational considerations
Monitoring. Low-latency streams are more sensitive to CDN edge latency, origin throughput, and encoder output timing. Monitor the latency from encoder output to player render, not just CDN response times. A 200ms delay at the origin that is invisible in standard HLS becomes significant in LL-HLS.
Failover. Have a standard-latency fallback. If the LL-HLS edge becomes overloaded or a CDN node fails, falling back to standard HLS (with higher latency but stable playback) is better than buffering or errors.
Encoder configuration. Low-latency encoding means shorter GOPs and more frequent IDR frames, which reduces compression efficiency by 10-20%. Budget for higher bitrates or accept slightly lower quality at the same bitrate compared to standard-latency encoding.
Testing across devices. Low-latency behavior varies by player and platform. Test LL-HLS on actual Samsung Tizen and Roku hardware, not just in a browser. The system media pipeline on each platform handles chunked segment delivery differently, and edge cases appear that do not reproduce in browser-based testing.
For a broader look at testing methodology across devices, see our device QA guide.