Best Practice for Content Preparation - Technical recommendations

Preparing your media for OTT is important to enable efficient delivery and a good end-user experience.

Also, it is key for processing your content later for purposes like archiving, clipping and replay.

Having source content correctly formatted will make the processing by Unified Packager and Unified Origin easier and improve performance.


This section is about the format of the content at the media source. For dynamic delivery of VOD or Live, Unified Origin will repackage this source on-the-fly (i.e., ‘just-in-time’) for delivery in the requested output format (to support different end user devices). For static delivery of VOD, this source may be repackaged with Unified Packager into the intended delivery format.

Should Fix: source content is stored as (f)MP4 (preferably CMAF)

In general, source content must be stored as fragmented MP4 (preferably CMAF). Some advantages:

  • Efficient for cloud storage

  • Used by DASH-IF Live ingest

  • Used in HLS and DASH protocols

  • popular international standard

  • Single track per file fmp4 not stricty CMAF compliant is OK in many cases too.

Must Fix: each video segment starts with an IDR frame

The first sample in a segment contains an Instant Decoder Refresh (IDR) frame that is signaled as being a sync-sample, so that the segment can be considered discrete from a decoding perspective. This enables the player to switch between adaptive bitrate video components without significant degradation of the rendered video.

Must Fix: Audio track metadata includes language

In case more than 1 audio track is used, the mdhd box and/or elng box shall contain the audio language.

Must Fix: Timed text metadata includes language

In case 1 or more timed text tracks are used, the mdhd box and/or elng box shall contain the language of the timed text track.

Must Fix: Audio and Video tracks shall contain bit-rate box

Audio and video tracks shall contain a bitrate box indicatign the average and maximum bit-rate. In case of mp4 audio (aac), this may be ommited if signaled in mp4 audio sample entry.

Must Fix: avc and hevc video Tracks shall signal framerate

To be able to calculate and signal the framerate of the video the following must be signaled in an avc video track (avc1/avc3/encv):

  • The timescale and number of units in a tick must be set in the respective VUI parameters (‘time_scale’ and ‘num_units_in_tick’)

  • In addition the Boolean value for the VUI parameters ‘timing_info_present_flag’ and ‘fixed_frame_rate_flag’ must be set to ‘true’, to signal that the timing info is present and that the framerate is fixed.

For HEVC tracks constant_frame_rate shall be 1 or 2 in HEVCDecoderConfiguration in the sample entry of type hev1, hvc1 or encv. The average_frame_rate shall also be set.

Should Fix: Media presentation start times shall be 0

All tracks shall start at the same media presentation time and zero

Must Fix: stss box must be present

An stss box must be present. Its absence indicates that every sample is a sync sample. This must be fixed or the HLS playlist generated by the Origin would only include keyframes video tracks, which is most definitely not the desired outcome.

Must Fix: VP Codec Configuration Record Check

Performs a list of checks on the content of the VP Codec Configuration Box. See for reference In particular, it will check that:

  • “version” is 1

  • “profile” has a legal and defined value

  • “bitDepth” has a legal value

  • “chromaSubSampling” has a legal value and it is coherent with “matricCoefficients”

  • “codecIntializationData” is not used for VP8 and VP9

Must Fix: Fragmented mp4 must be indexed

Either a ‘sidx’ or ‘mfra’ box must be present in the file

Should Fix: video fragments have an equal duration (except the last)

All video fragments except the last shall have an equal duration. In case of audio, small variation in durations may occur. If you are able to align audio segments to video segment boundaries, audio segment durations may also be constant.

Should Fix: timescale of audio tracks matches their sample rate (48 KHz preferably)

To avoid potential timing issues audio tracks should use a TimeScale which matches the sample rate. If the sample rate timescales do not match (an integer multiple of each other) some samples will not be accurately addressable, this may cause discontinuities.

Should Fix: codec parameters are carried out-of-band, instead of in-band

For codecs like AVC/H.264 and HEVC/H.265 codec parameters can be carried in the SampleEntry or in the NAL units in the samples. Carriage of codec parameters in the SampleEntry is preferred. This corresponds to codec configurations such as ‘avc1’ and ‘hvc1’ (i.e., as opposed to ‘avc3’ and ‘hev1’). In case of avc3 or hev1 that codec parameters shall also be present in the Sample Entry.

Note that as of version 1.10.28 Packager can be used to convert avc3 content to avc1 using the option-no_inband_parameter_sets option when packaging content.

Should Fix: pasp and clasp boxes should not preceed codec-specific boxes

The VisualSampleEntry box (14496-12:2020 ‎ includes optional CleanApertureBox (‘clap’) and PixelAspectRatioBox (‘pasp’) boxes.

These boxes are used, if present, to specify the pixel aspect ratio and clean aperture of the video. For maximum compatibility, these boxes should follow, not precede, any boxes defined in or required by derived specifications.