Video Delivery

Optimal Video Pipeline: Encoding, Packaging & Delivery in 2026

A complete walkthrough of the modern OTT video pipeline from ingest through encoding, packaging, DRM, CDN delivery, and playback, with 2026 best practices for each stage.

April 29, 2026

End-to-end video pipeline flowchart showing ingest, encode, package, protect, and deliver stages

The video pipeline is the backbone of every OTT service. It transforms source video into the multi-format, multi-codec, DRM-protected, CDN-distributed streams that land on viewers’ screens. Every stage — from ingest through encoding, packaging, encryption, and delivery — has decisions that affect quality, cost, latency, and device compatibility.

This guide walks through each stage of a modern OTT video pipeline with the best practices and tradeoffs relevant in 2026.

Stage 1: ingest

Ingest is where source content enters the pipeline. For VOD, this means receiving mezzanine files from content providers. For live, this means receiving encoder output from a broadcast chain or cloud encoder.

VOD ingest

Mezzanine files should be the highest quality available: uncompressed or lightly compressed (ProRes, DNxHR, or high-bitrate H.264/HEVC). The quality of your output is bounded by the quality of your input.

Accept common mezzanine formats:

Apple ProRes 422/4444
Avid DNxHR HQ/HQX
XDCAM/MXF
High-bitrate H.264 or HEVC (> 50 Mbps for 1080p)

Validate mezzanine files on ingest:

Verify container integrity (no truncation, correct atom structure)
Check codec parameters (profile, level, color space)
Verify audio tracks (channel count, sample rate, language tags)
Confirm duration matches expected metadata

Live ingest

For live, the standard ingest protocols are:

SRT (Secure Reliable Transport): low-latency, encrypted contribution over the public internet. Becoming the standard for remote contribution.
RTMP: legacy but still widely used for ingest from software encoders (OBS, Wirecast). Being replaced by SRT for new deployments.
RIST (Reliable Internet Stream Transport): interoperable alternative to SRT, standardised by the VSF.
Direct MPEG-TS over UDP/RTP: used for facility-internal contribution where latency and reliability are controlled at the network level.

Stage 2: encoding

Encoding transforms the source video into the ABR ladder that will be delivered to viewers. This is the most compute-intensive stage of the pipeline.

Codec selection

In 2026, the practical codec options are:

H.264: universal baseline. Every device decodes it. Use for the lowest ABR rungs and as the fallback codec.
HEVC: 30-40% more efficient than H.264. Standard for 4K/HDR. Use for modern devices.
AV1: 40-50% more efficient than H.264. Royalty-free. Use for VOD on devices that support it.

Most services encode a multi-codec ladder and let the player select based on device support.

ABR ladder design

The ABR ladder defines the quality levels available to the player. Key decisions:

Resolution/bitrate pairs. Match bitrate to resolution based on content complexity. Use per-title or AI-driven analysis to optimise the ladder for each piece of content.

Number of rungs. More rungs give the player finer granularity for ABR switching but increase encoding cost and storage. 5-7 rungs per codec is typical.

GOP structure. For standard delivery, use 2-second GOPs aligned with segment boundaries. For low-latency delivery, use 1-2 second GOPs with CMAF chunk boundaries for progressive delivery.

Frame rate. Match the source frame rate. Do not encode 24fps source at 30fps or vice versa without a good reason. Include 50/60fps renditions for sports content.

Encoding profiles

H.264: High Profile for 1080p+, Main Profile for SD. Level 4.0 for 1080p30, Level 4.1 for 1080p60.
HEVC: Main Profile for SDR, Main 10 Profile for HDR. Level 4.1 for 1080p, Level 5.1 for 4K.
AV1: Main Profile for SDR, High Profile for HDR.

Stage 3: packaging

Packaging wraps encoded segments into the container format and generates manifests for streaming delivery.

CMAF (Common Media Application Format)

CMAF is the industry-standard container for OTT delivery. It uses fragmented MP4 (fMP4) segments that can be served by both HLS and DASH manifests. This means:

Encode once
Package into CMAF segments once
Generate HLS (M3U8) and DASH (MPD) manifests from the same segments
Store one set of segments, not two

Segment duration

Standard segment duration is 2-6 seconds. Shorter segments reduce latency and enable faster ABR switching but increase manifest size and CDN request volume. Longer segments improve compression efficiency but increase startup time and ABR switch latency.

2-second segments are a good default for services that value low startup time and responsive ABR. 6-second segments are better for bandwidth-constrained delivery where compression efficiency matters.

Subtitle and audio packaging

Package subtitles as sidecar files (WebVTT for HLS, TTML for DASH) or as embedded tracks within CMAF segments. WebVTT sidecars are the most compatible approach across connected TV platforms.

Package multiple audio tracks (languages, descriptive audio) as separate CMAF track files referenced in the manifest. This allows the player to switch audio tracks without restarting playback.

Stage 4: content protection (DRM and encryption)

Encryption

Encrypt CMAF segments using CBCS (CBC with subsample encryption). CBCS is supported by Widevine, FairPlay, and PlayReady, enabling multi-DRM delivery from a single encrypted source.

Encrypt during packaging, not as a separate post-processing step. The packager generates encrypted segments and includes PSSH (Protection System Specific Header) boxes for each DRM system.

Key management

Use a key management system that:

Generates unique content keys per asset (or per quality level for multi-key scenarios)
Stores keys securely with HSM-backed encryption
Serves keys to DRM license servers on demand
Supports key rotation for live streams (rotate keys every N hours or segments)

DRM signal in manifests

The HLS manifest includes #EXT-X-KEY or #EXT-X-SESSION-KEY tags pointing to the FairPlay license server. The DASH MPD includes ContentProtection elements for each DRM system (Widevine, PlayReady) with the PSSH data.

Stage 5: CDN delivery

Origin to CDN

Push packaged content from the origin to the CDN or let the CDN pull on demand:

Push-based: origin pushes segments to CDN origin storage as they are produced. Lower first-request latency. Higher origin-to-CDN bandwidth usage.
Pull-based: CDN fetches segments from origin on first request. Origin bandwidth scales with CDN cache miss rate. Simpler to configure.

For VOD, pull-based is standard. For live, push-based or aggressive pre-warming reduces first-viewer latency.

CDN configuration

Cache headers: set Cache-Control: public, max-age=31536000 on CMAF segments (they are immutable). Set short TTLs on live manifests (1-3 seconds).
CORS headers: configure appropriate CORS headers for browser-based players.
Range requests: ensure the CDN supports HTTP range requests for byte-range addressing within CMAF segments.
Compression: do not gzip-compress video segments (they are already compressed). Do gzip-compress manifests (text-based M3U8 and MPD).

For live events, see our guide on building a scalable video CDN.

Stage 6: playback

The final stage is the player on the viewer’s device. The player is responsible for:

Fetching and parsing the manifest
Selecting the initial quality level
Downloading segments and feeding them to the decoder
Managing the ABR algorithm during playback
Handling DRM license acquisition
Rendering subtitles and managing audio track switching
Reporting QoE metrics back to your analytics service

Player choices by platform

Roku: native Video node with built-in HLS/DASH support
Samsung Tizen: AVPlay (Samsung-specific) or MSE-based player (Shaka, hls.js)
Google TV: ExoPlayer (Jetpack Media3)
LG webOS: MSE-based player (Shaka, hls.js, dash.js)
Web: Shaka Player, hls.js, dash.js, or Video.js
iOS/tvOS: AVPlayer (native HLS) or custom MSE-based player

ABR algorithm tuning

The player’s ABR algorithm decides when to switch quality levels. Tune it for your content and audience:

Conservative: prioritise stability, minimise quality switches, accept lower average quality. Good for devices with limited decoder switching capability.
Aggressive: maximise quality, switch up quickly when bandwidth allows, accept more switches. Good for devices with fast decoder switching.
Bandwidth estimation: use a combination of segment download throughput and TCP connection metrics. On constrained devices, prefer segment-level estimation over connection-level estimation.

Pipeline automation

VOD workflow

Automate the full VOD pipeline: ingest → validate → encode → package → encrypt → publish → verify.

Use a workflow orchestrator (Step Functions, Temporal, Airflow) to manage the stages. Each stage should be idempotent and retriable. If encoding fails, retry the encode without re-ingesting. If packaging fails, retry packaging without re-encoding.

Live workflow

The live pipeline runs continuously. Automate:

Encoder health monitoring and automatic failover to backup encoder
Packaging verification (segment continuity, manifest correctness)
CDN push verification (segments arrive at CDN within expected latency)
Alerting on any stage failure or degradation

Quality gates

Add automated quality gates between pipeline stages:

Post-encode: verify VMAF/SSIM meets threshold. Reject and re-encode if quality is too low.
Post-package: verify manifest correctness (valid segment references, correct DRM signals, proper timing).
Post-publish: verify segments are accessible from CDN edge. Run a synthetic playback test.

A well-designed video pipeline is invisible to the viewer. They see content playing smoothly on their screen. The pipeline’s job is to make that happen reliably, efficiently, and at scale. For broader architecture considerations, see our streaming app architecture solutions.

More resources

Browse the full set of guides and platform notes.

All Guides