Video Delivery

Building a Scalable Video CDN for Live Events

How to architect and operate a scalable CDN for live video events, covering capacity planning, multi-tier caching, origin protection, failover, and real-time monitoring.

April 26, 2026

CDN edge cache network visualization showing regional traffic distribution during a live event

A CDN for live video events faces different challenges than a CDN for VOD or general web content. The traffic pattern is concentrated: millions of viewers requesting the same content at the same time, all watching the live edge. The cache window is tiny — segments that are seconds old are the freshest content. And failure is not an option — there is no retry button for a live event.

This guide covers how to build and operate CDN infrastructure that handles live events at scale, from sports broadcasts to global premieres.

Live CDN architecture fundamentals

The live delivery chain

Encoder/packager produces live HLS/DASH segments and manifests in real time
Origin stores the live segments and serves the manifest
Mid-tier cache (shield) absorbs repeated requests from edge nodes
Edge PoPs serve viewers from cache, falling back to mid-tier on miss
Player fetches manifest and segments, managing ABR and playback

Each layer has specific requirements for live:

Origin must publish segments with minimal latency. A 500ms delay at the origin propagates to every viewer.
Mid-tier must cache segments immediately and serve them to edge nodes with minimal added latency.
Edge must handle the thundering herd: the first request for a new segment triggers a cache miss, and all subsequent requests must be held until the segment is cached (request coalescing).

Request coalescing

When a new live segment is published, hundreds of edge PoPs simultaneously request it from the mid-tier or origin. Without request coalescing, each concurrent request triggers a separate origin fetch.

Request coalescing (also called request collapsing or stale-while-revalidate) ensures that only the first request for a segment goes to origin. All subsequent requests for the same segment are queued and served from the same response.

Most CDNs support this natively, but it must be explicitly enabled and tested for live content. Misconfigured coalescing causes either origin overload (no coalescing) or increased latency (overly aggressive hold times).

Capacity planning for live events

Estimating bandwidth requirements

Start with:

peak_bandwidth = concurrent_viewers × average_bitrate × (1 + overhead_factor)

Where:

concurrent_viewers is the expected peak (estimate from marketing, pre-registrations, or historical data)
average_bitrate is the weighted average across ABR rungs (typically 60-70% of the top rung, since not all viewers have enough bandwidth for max quality)
overhead_factor accounts for manifest requests, retries, and ABR switches (typically 10-15%)

A 5-million-viewer live event with an average bitrate of 4 Mbps needs:

5,000,000 × 4 Mbps × 1.12 = ~22.4 Tbps of edge throughput

That is a lot. No single CDN PoP handles that — the load must be distributed across hundreds of edge locations globally.

Geographic distribution

Estimate viewer distribution by geography. A US-focused event concentrates traffic on US edge PoPs. A global event distributes across all regions. Communicate the expected geographic distribution to your CDN provider so they can position capacity appropriately.

CDN provider coordination

For events above 1 million concurrent viewers, engage your CDN provider’s event engineering team at least 2 weeks in advance. They can:

Pre-position cache capacity at high-traffic PoPs
Configure origin shield regions to match your ingest location
Set up dedicated origin connections if needed
Provide a dedicated support contact during the event

Multi-tier caching for live

Two-tier vs three-tier

Two-tier (origin + edge): simpler to configure. Works well for events under 1 million viewers or when using a CDN with a dense edge network. The risk is origin overload from edge cache misses.

Three-tier (origin + shield + edge): the shield layer absorbs edge cache misses. Only one request per segment reaches origin (from the shield), regardless of how many edge PoPs need the segment. This is essential for large events.

Shield placement

Place the CDN shield in the same region as your origin. If your origin is in AWS us-east-1, your shield should be in a CDN PoP in the US East region. This minimises the shield-to-origin round trip.

For global events with multiple origin regions (for redundancy), use region-specific shields, each paired with its local origin.

Manifest caching strategy

Live manifests update every segment duration (2-6 seconds). Cache the manifest at the shield and edge with a short TTL (half the segment duration). For a 4-second segment duration:

Manifest TTL at edge: 2 seconds
Manifest TTL at shield: 1 second

This ensures viewers get a fresh manifest within one segment duration while reducing origin manifest request volume.

For low-latency streams, manifest caching is tighter. LL-HLS blocking playlist reloads require the CDN to hold the connection until the manifest updates. Not all CDN configurations support this — verify with your CDN provider.

Segment caching strategy

Live segments are immutable once published. Cache them with a long TTL (hours or longer). The segment will never change, so there is no freshness concern. The only reason to limit segment TTL is storage capacity at the edge, which is rarely a constraint for live content (the segment count is bounded by the DVR window).

Origin protection

The origin is the single point of failure in a live CDN architecture. Protecting it is critical.

Rate limiting

Configure origin-side rate limits per edge PoP connection. The CDN shield should be the only entity making frequent requests to origin. Direct origin requests from outside the CDN should be blocked or severely rate-limited.

Health checks and failover

Run active health checks against the origin from the CDN shield. If the primary origin fails health checks, automatically failover to a secondary origin in a different region or availability zone.

Health check interval: every 5-10 seconds for live. Failover threshold: 2-3 consecutive failures. Failback: automatic once the primary origin passes health checks again.

Redundant origin

For high-value live events, run a redundant origin in a separate availability zone or region. Both origins receive the same encoder output (via redundant ingest paths). The CDN shield uses the primary origin and fails over to the secondary on health check failure.

Real-time monitoring during live events

Key metrics to watch

During a live event, monitor these in real time (updated every 10-30 seconds):

Concurrent viewers — is the audience tracking your expectations?
Edge bandwidth per region — is any region approaching its capacity ceiling?
Cache hit ratio at edge — should be 99%+ after the first segment. Drops indicate configuration issues.
Origin request rate — should be low and stable. Spikes indicate shield or coalescing problems.
Segment download time (P95) — rising P95 indicates CDN throughput pressure.
Rebuffering ratio — the most important viewer-facing metric.

War room protocol

For events above 500K concurrent viewers, run a war room during the event:

Engineering team monitoring CDN, origin, and player metrics
CDN provider on standby (dedicated Slack channel or phone bridge)
Runbook with pre-defined actions for common scenarios:
- CDN edge overload → shift traffic to backup CDN
- Origin failure → verify failover and confirm secondary origin is serving
- Regional outage → reroute affected viewers to nearest healthy region
- Encoder failure → switch to backup encoder feed

Multi-CDN for live events

For events at scale, multi-CDN delivery provides both capacity and resilience:

Active-active distribution across two CDNs ensures no single CDN failure affects all viewers
DNS-based steering distributes viewers across CDNs based on geography and CDN health
Client-side CDN switching provides the fastest failover: the player detects segment download failures and switches to an alternative CDN endpoint

The overhead of multi-CDN is operational complexity: managing configurations, monitoring, and contracts with multiple providers. But for events where a single rebuffer event costs hundreds of thousands of dollars in advertiser guarantees, the insurance is worth it.

Post-event analysis

After every live event, conduct a retrospective:

QoE summary: startup time distribution, rebuffering ratio, failure rate, quality distribution
CDN performance: cache hit ratio, origin offload, segment download times per CDN per region
Incidents: what went wrong, how it was detected, how fast it was resolved
Capacity validation: did actual viewership match estimates? Were there capacity-related issues?
Improvement actions: what to fix before the next event

Each live event is a learning opportunity. The data from this event improves the architecture, monitoring, and operational readiness for the next one. For broader delivery optimisation, see our video delivery performance solutions.

More resources

Browse the full set of guides and platform notes.

All Guides