The video pipeline is the backbone of every OTT service. It transforms source video into the multi-format, multi-codec, DRM-protected, CDN-distributed streams that land on viewers’ screens. Every stage — from ingest through encoding, packaging, encryption, and delivery — has decisions that affect quality, cost, latency, and device compatibility.
This guide walks through each stage of a modern OTT video pipeline with the best practices and tradeoffs relevant in 2026.
Stage 1: ingest
Ingest is where source content enters the pipeline. For VOD, this means receiving mezzanine files from content providers. For live, this means receiving encoder output from a broadcast chain or cloud encoder.
VOD ingest
Mezzanine files should be the highest quality available: uncompressed or lightly compressed (ProRes, DNxHR, or high-bitrate H.264/HEVC). The quality of your output is bounded by the quality of your input.
Accept common mezzanine formats:
- Apple ProRes 422/4444
- Avid DNxHR HQ/HQX
- XDCAM/MXF
- High-bitrate H.264 or HEVC (> 50 Mbps for 1080p)
Validate mezzanine files on ingest:
- Verify container integrity (no truncation, correct atom structure)
- Check codec parameters (profile, level, color space)
- Verify audio tracks (channel count, sample rate, language tags)
- Confirm duration matches expected metadata
Live ingest
For live, the standard ingest protocols are:
- SRT (Secure Reliable Transport): low-latency, encrypted contribution over the public internet. Becoming the standard for remote contribution.
- RTMP: legacy but still widely used for ingest from software encoders (OBS, Wirecast). Being replaced by SRT for new deployments.
- RIST (Reliable Internet Stream Transport): interoperable alternative to SRT, standardised by the VSF.
- Direct MPEG-TS over UDP/RTP: used for facility-internal contribution where latency and reliability are controlled at the network level.
Stage 2: encoding
Encoding transforms the source video into the ABR ladder that will be delivered to viewers. This is the most compute-intensive stage of the pipeline.
Codec selection
In 2026, the practical codec options are:
- H.264: universal baseline. Every device decodes it. Use for the lowest ABR rungs and as the fallback codec.
- HEVC: 30-40% more efficient than H.264. Standard for 4K/HDR. Use for modern devices.
- AV1: 40-50% more efficient than H.264. Royalty-free. Use for VOD on devices that support it.
Most services encode a multi-codec ladder and let the player select based on device support.
ABR ladder design
The ABR ladder defines the quality levels available to the player. Key decisions:
Resolution/bitrate pairs. Match bitrate to resolution based on content complexity. Use per-title or AI-driven analysis to optimise the ladder for each piece of content.
Number of rungs. More rungs give the player finer granularity for ABR switching but increase encoding cost and storage. 5-7 rungs per codec is typical.
GOP structure. For standard delivery, use 2-second GOPs aligned with segment boundaries. For low-latency delivery, use 1-2 second GOPs with CMAF chunk boundaries for progressive delivery.
Frame rate. Match the source frame rate. Do not encode 24fps source at 30fps or vice versa without a good reason. Include 50/60fps renditions for sports content.
Encoding profiles
- H.264: High Profile for 1080p+, Main Profile for SD. Level 4.0 for 1080p30, Level 4.1 for 1080p60.
- HEVC: Main Profile for SDR, Main 10 Profile for HDR. Level 4.1 for 1080p, Level 5.1 for 4K.
- AV1: Main Profile for SDR, High Profile for HDR.
Stage 3: packaging
Packaging wraps encoded segments into the container format and generates manifests for streaming delivery.
CMAF (Common Media Application Format)
CMAF is the industry-standard container for OTT delivery. It uses fragmented MP4 (fMP4) segments that can be served by both HLS and DASH manifests. This means:
- Encode once
- Package into CMAF segments once
- Generate HLS (M3U8) and DASH (MPD) manifests from the same segments
- Store one set of segments, not two
Segment duration
Standard segment duration is 2-6 seconds. Shorter segments reduce latency and enable faster ABR switching but increase manifest size and CDN request volume. Longer segments improve compression efficiency but increase startup time and ABR switch latency.
2-second segments are a good default for services that value low startup time and responsive ABR. 6-second segments are better for bandwidth-constrained delivery where compression efficiency matters.
Subtitle and audio packaging
Package subtitles as sidecar files (WebVTT for HLS, TTML for DASH) or as embedded tracks within CMAF segments. WebVTT sidecars are the most compatible approach across connected TV platforms.
Package multiple audio tracks (languages, descriptive audio) as separate CMAF track files referenced in the manifest. This allows the player to switch audio tracks without restarting playback.
Stage 4: content protection (DRM and encryption)
Encryption
Encrypt CMAF segments using CBCS (CBC with subsample encryption). CBCS is supported by Widevine, FairPlay, and PlayReady, enabling multi-DRM delivery from a single encrypted source.
Encrypt during packaging, not as a separate post-processing step. The packager generates encrypted segments and includes PSSH (Protection System Specific Header) boxes for each DRM system.
Key management
Use a key management system that:
- Generates unique content keys per asset (or per quality level for multi-key scenarios)
- Stores keys securely with HSM-backed encryption
- Serves keys to DRM license servers on demand
- Supports key rotation for live streams (rotate keys every N hours or segments)
DRM signal in manifests
The HLS manifest includes #EXT-X-KEY or #EXT-X-SESSION-KEY tags pointing to the FairPlay license server. The DASH MPD includes ContentProtection elements for each DRM system (Widevine, PlayReady) with the PSSH data.
Stage 5: CDN delivery
Origin to CDN
Push packaged content from the origin to the CDN or let the CDN pull on demand:
- Push-based: origin pushes segments to CDN origin storage as they are produced. Lower first-request latency. Higher origin-to-CDN bandwidth usage.
- Pull-based: CDN fetches segments from origin on first request. Origin bandwidth scales with CDN cache miss rate. Simpler to configure.
For VOD, pull-based is standard. For live, push-based or aggressive pre-warming reduces first-viewer latency.
CDN configuration
- Cache headers: set
Cache-Control: public, max-age=31536000on CMAF segments (they are immutable). Set short TTLs on live manifests (1-3 seconds). - CORS headers: configure appropriate CORS headers for browser-based players.
- Range requests: ensure the CDN supports HTTP range requests for byte-range addressing within CMAF segments.
- Compression: do not gzip-compress video segments (they are already compressed). Do gzip-compress manifests (text-based M3U8 and MPD).
For live events, see our guide on building a scalable video CDN.
Stage 6: playback
The final stage is the player on the viewer’s device. The player is responsible for:
- Fetching and parsing the manifest
- Selecting the initial quality level
- Downloading segments and feeding them to the decoder
- Managing the ABR algorithm during playback
- Handling DRM license acquisition
- Rendering subtitles and managing audio track switching
- Reporting QoE metrics back to your analytics service
Player choices by platform
- Roku: native Video node with built-in HLS/DASH support
- Samsung Tizen: AVPlay (Samsung-specific) or MSE-based player (Shaka, hls.js)
- Google TV: ExoPlayer (Jetpack Media3)
- LG webOS: MSE-based player (Shaka, hls.js, dash.js)
- Web: Shaka Player, hls.js, dash.js, or Video.js
- iOS/tvOS: AVPlayer (native HLS) or custom MSE-based player
ABR algorithm tuning
The player’s ABR algorithm decides when to switch quality levels. Tune it for your content and audience:
- Conservative: prioritise stability, minimise quality switches, accept lower average quality. Good for devices with limited decoder switching capability.
- Aggressive: maximise quality, switch up quickly when bandwidth allows, accept more switches. Good for devices with fast decoder switching.
- Bandwidth estimation: use a combination of segment download throughput and TCP connection metrics. On constrained devices, prefer segment-level estimation over connection-level estimation.
Pipeline automation
VOD workflow
Automate the full VOD pipeline: ingest → validate → encode → package → encrypt → publish → verify.
Use a workflow orchestrator (Step Functions, Temporal, Airflow) to manage the stages. Each stage should be idempotent and retriable. If encoding fails, retry the encode without re-ingesting. If packaging fails, retry packaging without re-encoding.
Live workflow
The live pipeline runs continuously. Automate:
- Encoder health monitoring and automatic failover to backup encoder
- Packaging verification (segment continuity, manifest correctness)
- CDN push verification (segments arrive at CDN within expected latency)
- Alerting on any stage failure or degradation
Quality gates
Add automated quality gates between pipeline stages:
- Post-encode: verify VMAF/SSIM meets threshold. Reject and re-encode if quality is too low.
- Post-package: verify manifest correctness (valid segment references, correct DRM signals, proper timing).
- Post-publish: verify segments are accessible from CDN edge. Run a synthetic playback test.
A well-designed video pipeline is invisible to the viewer. They see content playing smoothly on their screen. The pipeline’s job is to make that happen reliably, efficiently, and at scale. For broader architecture considerations, see our streaming app architecture solutions.