Video Delivery

Edge Computing for Video Streaming: Use Cases and Architectures

How edge computing improves video streaming delivery through origin offloading, real-time transcoding at the edge, personalised manifests, and regional cache strategies.

April 12, 2026

Edge network topology diagram showing CDN nodes distributing video segments to regional viewer clusters

Edge computing in video streaming is not a new concept — CDNs have been caching video segments at edge locations for decades. What is changing is the kind of compute available at the edge. Modern edge platforms can run transcoding, manifest manipulation, ad insertion, and authentication logic at the CDN edge, not just serve cached files. This shifts work away from centralised origins and closer to viewers, reducing latency and improving scalability.

This guide covers the practical edge computing use cases that matter for OTT video delivery in 2026, with specific architectures for each.

The edge continuum

“Edge” means different things in different contexts. For video streaming, the relevant layers are:

CDN edge PoPs: the traditional cache layer. Hundreds or thousands of locations globally. Primarily serve cached content with minimal compute. This is where most video segments are served from today.

Regional edge compute: a smaller number of locations (tens to low hundreds) with meaningful compute capacity. Can run containers, serverless functions, or specialised video processing. This is where edge transcoding and manifest manipulation happen.

Device edge: the viewer’s device itself. Limited compute, but useful for final-mile optimisations like client-side ABR decisions and adaptive playback.

The video delivery pipeline can leverage all three layers. The question is which work belongs where.

Use case 1: origin offloading

The simplest edge computing use case is reducing load on the origin server.

Standard CDN caching

A CDN edge cache stores video segments and manifests. When a viewer requests a segment, the edge serves it from cache. If the segment is not cached, the edge fetches it from the origin, caches it, and serves it. The cache hit ratio for popular VOD content is typically 95%+ after warm-up.

Edge-based manifest generation

Live stream manifests change every segment duration (2-6 seconds). With standard caching, the manifest TTL must be short (1-2 seconds), and many manifest requests miss the cache and go to origin.

Edge compute can generate manifests at the edge instead. The edge receives segment availability notifications from the origin (via push or short-poll) and generates the manifest locally. This eliminates manifest cache misses and reduces origin manifest request volume by orders of magnitude.

When origin offloading matters

Origin offloading is critical for live sports and large-scale events where millions of viewers request the same manifest simultaneously. Without edge-based manifest generation, the origin becomes the bottleneck during the thundering herd.

Use case 2: edge transcoding

Just-in-time transcoding

Instead of pre-encoding every ABR rung for every piece of content, encode only the most popular quality levels centrally and transcode additional rungs at the edge on demand.

Example: encode 1080p and 720p at the origin. When a viewer on a slow connection requests 360p, the edge transcodes from 720p to 360p and caches the result. Subsequent 360p requests are served from the edge cache.

This reduces origin encoding cost and storage for long-tail content that may never be requested at every quality level.

Per-viewer quality adaptation

Edge compute can adjust video quality parameters per viewer session without maintaining separate encodes for each configuration. This includes:

  • Reducing bitrate for viewers on metered connections
  • Applying different compression settings for mobile vs TV viewers
  • Transcoding from HEVC to H.264 at the edge for devices that do not support HEVC

Limitations

Edge transcoding requires significant compute at the edge (GPU or specialised media processing hardware). Not all edge locations have this capacity. The architecture typically involves a tiered approach: large regional edge locations handle transcoding, and smaller PoPs cache the transcoded output.

Use case 3: personalised manifest manipulation

Server-side ad insertion at the edge

SSAI requires per-session manifest manipulation: each viewer gets a unique manifest with their specific ad segments interleaved with content. Running this at a centralised origin means every manifest request goes to the origin, defeating CDN caching for manifests.

Edge-based SSAI runs the manifest manipulation logic at the edge. The edge receives the content manifest template and ad segment URLs from the ad decisioning service, stitches the personalised manifest locally, and serves it to the viewer. Content segments are still served from the CDN cache — only the manifest is personalised per session.

Personalised content recommendations

Some services personalise the manifest to include viewer-specific preview segments, trailers, or pre-roll content. Edge compute can assemble these personalised manifests without routing every request to a central recommendation engine.

Access control at the edge

Instead of routing authentication and entitlement checks to a central API, edge workers can validate session tokens, check entitlement caches, and enforce geographic restrictions at the CDN edge. Invalid requests never reach the origin.

This is particularly relevant for token-based content authentication — the CDN edge validates signed URLs without a round trip to the origin auth service.

Use case 4: real-time analytics processing

Edge-side quality metrics

Edge nodes see every segment request and can compute quality metrics in real time:

  • Segment download times (first byte, complete)
  • Request patterns that indicate rebuffering (rapid sequential segment requests at low quality)
  • Geographic distribution of viewers
  • ABR quality distribution (what percentage of viewers are on each rung)

Processing these at the edge rather than shipping raw logs to a central analytics pipeline reduces data volume and provides faster insight.

Anomaly detection

Edge compute can detect anomalies locally: a sudden spike in 404 errors for a specific content path, a regional surge in rebuffering, or an unexpected drop in cache hit ratio. The edge can alert the central monitoring system and potentially take corrective action (switch to a backup origin, redirect traffic to a healthier PoP) before the issue affects many viewers.

Architecture patterns

Pattern 1: edge workers for light compute

Use serverless edge functions (Cloudflare Workers, AWS CloudFront Functions, Fastly Compute) for:

  • Token validation and access control
  • Manifest URL rewriting and personalisation
  • Request routing and A/B testing
  • Header manipulation and CORS handling

These run on every request with minimal latency overhead (sub-millisecond) and are available at every CDN edge location.

Pattern 2: regional edge containers for heavy compute

Use containerised workloads at regional edge locations (AWS Wavelength, Cloudflare Workers with Durable Objects, Akamai Compute) for:

  • Just-in-time transcoding
  • SSAI manifest stitching
  • Real-time analytics aggregation
  • Ad decisioning with local caching

These require more compute than serverless functions but are still closer to viewers than a central origin.

Pattern 3: hybrid edge-origin

Most production architectures are hybrid:

  • The origin handles encoding, packaging, and content management
  • Regional edge handles transcoding overflow, SSAI, and personalisation
  • CDN edge handles caching, token validation, and request routing
  • The device handles ABR decisions and playback

Each layer does what it does best. The video delivery pipeline is distributed across the continuum.

Cost considerations

Edge compute is priced per request or per compute-second, and the costs add up at video streaming scale. A service with 100 million manifest requests per day running edge functions on each request generates meaningful compute costs.

Optimise by:

  • Caching edge function results where possible (e.g., cache the personalised manifest for 1-2 seconds)
  • Running compute-heavy tasks only at regional edge locations, not every PoP
  • Using edge compute selectively for the use cases that need it, not as a default for every request

Edge computing for video streaming is a tool, not a blanket solution. Apply it where it solves specific problems — origin bottlenecks, manifest personalisation, latency-sensitive access control — and let standard CDN caching handle the rest.

More resources

Browse the full set of guides and platform notes.

All Guides