Video Delivery

Monitoring & Analytics: Metrics that Matter for Streaming Quality

The essential video streaming quality metrics for OTT services: what to measure, how to collect it, and how to use monitoring data to improve viewer experience across devices.

April 23, 2026

Streaming quality monitoring dashboard showing real-time playback metrics across viewer sessions

You cannot improve what you do not measure. For OTT streaming services, the metrics that matter are the ones that correlate with viewer behaviour: startup time, rebuffering, quality switches, playback failures, and errors. A dashboard full of server-side metrics (CPU utilisation, CDN cache hit ratio, origin response time) tells you about your infrastructure, but it does not tell you what the viewer is actually experiencing.

This guide covers the metrics that matter for streaming quality, how to collect them from real viewer sessions across connected TV platforms, and how to turn data into actionable improvements.

Quality of Experience (QoE) metrics

QoE metrics describe the viewer’s actual experience. They are measured at the player, not the server.

Video startup time (TTFF)

Time to First Frame (TTFF): the duration from when the viewer presses play to when the first video frame renders on screen. This includes:

Manifest fetch and parse time
DRM license acquisition time
First segment download time
Decoder initialisation time

Target: under 2 seconds for VOD, under 3 seconds for live.

Why it matters: startup time is the strongest predictor of session abandonment. A 1-second increase in startup time can increase abandonment by 5-10%.

What affects it: CDN performance, manifest size, segment duration, DRM license latency, device decoder warmup, and player configuration.

Rebuffering ratio

The percentage of playback time spent rebuffering (waiting for data). Calculated as:

rebuffer_ratio = total_rebuffer_duration / (total_play_duration + total_rebuffer_duration)

Target: under 0.5% for a well-performing service. Premium services target under 0.1%.

Why it matters: rebuffering is the most damaging quality event. A single rebuffer of more than 2 seconds during a movie significantly reduces completion rate and increases churn risk.

What affects it: CDN throughput, ABR algorithm quality, segment duration and format, network variability, and device buffer management.

Average bitrate and quality score

The average bitrate delivered during a session, weighted by play time at each quality level. Higher average bitrate generally means better visual quality, but the relationship is not linear (a 20% bitrate increase on a well-encoded stream may produce minimal visible improvement).

Some services compute a composite quality score that combines bitrate, resolution, and codec efficiency:

quality_score = f(bitrate, resolution, codec) / max_possible_score

Target: varies by service. The goal is to maximise average quality within the available bandwidth.

Quality switches (ABR oscillation)

The number of ABR quality switches per minute of playback. Some switching is expected and desirable (stepping up to higher quality as bandwidth improves). But rapid oscillation (switching up and down every few seconds) creates a visible, distracting experience.

Target: fewer than 2 switches per minute during steady-state playback. Zero oscillation (immediate up-down-up patterns) is the goal.

Playback failure rate

The percentage of playback attempts that fail completely (no video rendered). Failures include:

Manifest fetch errors (404, timeout)
DRM license acquisition failures
Decoder errors (unsupported codec, hardware failure)
Network timeouts before first segment

Target: under 0.5% failure rate. Under 0.1% for premium services.

Error rate

The rate of errors during playback (distinct from complete failures). Errors include:

Segment download failures (recovered by retry or ABR switch)
DRM license renewal failures (may cause brief playback interruption)
Audio/video sync issues
Subtitle rendering errors

Not all errors are viewer-visible. Track both raw error counts and viewer-impacting errors (those that caused rebuffering, quality degradation, or playback interruption).

Infrastructure metrics

Infrastructure metrics do not directly describe viewer experience but help diagnose QoE problems.

CDN performance

Segment download time: P50, P95, P99 per CDN per region. A P99 spike indicates a CDN edge issue affecting a subset of viewers.
Cache hit ratio: percentage of segment requests served from CDN cache. Should be 95%+ for VOD, 90%+ for live.
Origin offload: percentage of requests that the CDN handles without going to origin. Track this per CDN if using multi-CDN.
Error rates: 4xx and 5xx responses from CDN edges.

Origin and backend

Manifest generation latency: time to generate a manifest response. For live streams, this should be under 50ms.
DRM license latency: time to process a license request end-to-end (including the proxy and key server). Should be under 200ms.
API response time: for catalog, auth, and playback APIs. P95 under 100ms for playback-critical APIs.

Device-specific metrics

Different devices produce different quality outcomes from the same delivery infrastructure:

Startup time by device model: a Roku Express has slower startup than a Roku Ultra due to hardware differences.
Rebuffer ratio by platform: Samsung Tizen TVs may have different buffer management behavior than Google TV devices.
Error rate by device firmware version: specific firmware versions may introduce regressions.

Segment all QoE metrics by device platform and model to identify device-specific issues.

Collection architecture

Client-side telemetry

The player sends QoE events to your analytics backend. Key events:

Playback start attempt — timestamp, content ID, device info
First frame rendered — timestamp (TTFF = this - start attempt)
Rebuffer start — timestamp, buffer level, current bitrate
Rebuffer end — timestamp, duration
Quality switch — from bitrate/resolution, to bitrate/resolution
Playback error — error code, error message, recovery action
Playback end — total play time, total rebuffer time, average bitrate

Transport

Send events via a lightweight HTTP POST to an analytics endpoint. Batch events (send every 10-30 seconds) to reduce request volume. Use a separate analytics domain or path that does not compete with manifest and segment requests.

On connected TV platforms with limited concurrent connection capacity, batching is especially important. A Roku device that sends analytics events individually for every ABR switch or segment download will exhaust its connection pool.

Backend processing

Ingest events into a stream processing pipeline (Kafka, Kinesis, or similar). Process in near-real-time for dashboarding and alerting. Store raw events for long-term analysis.

For real-time dashboards, compute rolling aggregates:

Median and P95 startup time over the last 5 minutes
Rebuffer ratio over the last 15 minutes, segmented by CDN and region
Playback failure rate over the last hour
Active concurrent sessions by platform

From metrics to action

Alert on anomalies, not thresholds

Static threshold alerts (“alert if startup time exceeds 3 seconds”) produce false positives during normal variation and false negatives during gradual degradation. Use anomaly detection: alert when a metric deviates significantly from its recent baseline.

Root cause analysis workflow

When a QoE metric degrades:

Segment by dimension: is it all platforms or one platform? All regions or one region? All CDNs or one CDN?
Correlate with infrastructure metrics: does the CDN error rate spike at the same time? Did origin latency increase? Did a deployment happen?
Check for platform-specific regressions: did a firmware update roll out? Did a specific device model start behaving differently?
Reproduce on device: once you have narrowed the issue to a specific platform and condition, reproduce it on physical hardware and examine player-level diagnostics.

Long-term quality tracking

Track QoE metrics over weeks and months to identify trends:

Is average startup time slowly increasing as the catalog grows (larger manifests)?
Is rebuffer ratio improving after CDN configuration changes?
Are specific device models degrading as they age and run lower-performance firmware?

Quality improvement is iterative. The metrics tell you where to focus, and the device-level investigation tells you what to fix. For structured approaches to device-level diagnostics, see our video delivery performance solutions.

More resources

Browse the full set of guides and platform notes.

All Guides