.. meta:: :description lang=en: Unified Streaming Guidelines for Implementation :keywords: guidelines, implementation .. |validator| replace:: *Validator* .. _validator: https://validator.unified-streaming.com Best Practice for Content Preparation - Technical recommendations ===================================================================== Preparing your media for OTT is important to enable efficient delivery and a good end-user experience. Also, it is key for processing your content later for purposes like archiving, clipping and replay. Having source content correctly formatted will make the processing by Unified Packager and Unified Origin easier and improve performance. .. contents:: Table of Contents :local: :depth: 1 .. note:: This section is about the format of the content at the media source. For dynamic delivery of VOD or Live, Unified Origin will repackage this source on-the-fly (i.e., 'just-in-time') for delivery in the requested output format (to support different end user devices). For static delivery of VOD, this source may be repackaged with Unified Packager into the intended delivery format. Should Fix: MPD Manifest does not respect schema ---------------------------------------------------------------- The content of the DASH MPD Manifest is checked against its XML Schema. Any non-conformance is reported as a mustFix. The applied schema can be downloaded from DASH-IF MPEG Conformance and reference source git repository and is also attached for reference :download:`xsd ` Should Fix: source content is stored as (f)MP4 (preferably CMAF) ---------------------------------------------------------------- In general, source content must be stored as fragmented MP4 (preferably CMAF). Some advantages: - Efficient for cloud storage - Used by DASH-IF Live ingest - Used in HLS and DASH protocols - popular international standard - Single track per file fmp4 not stricty CMAF compliant is OK in many cases too. .. Should Fix: a suitable bitrate ladder (content dependent) .. ---------------------------------------------------------- .. One should prepare video content as a set of different bitrate tracks, with each .. of those tracks representing a different quality level. The selection of .. different bitrates is called a bitrate ladder. .. .. A potential ladder could be (this should only be regarded as an example of how .. a ladder might look like, not as a recommendation to use this particular one): .. .. ================= ============== .. Resolution (16:9) Bitrate (H264) .. ================= ============== .. 416x234 145 kb/s .. ----------------- -------------- .. 640x360 365 kb/s .. ----------------- -------------- .. 768x432 730 kb/s .. ----------------- -------------- .. 768x432 1100 kb/s .. ----------------- -------------- .. 960x540 2000 kb/s .. ----------------- -------------- .. 1280x720 4500 kb/s .. ----------------- -------------- .. 1920x1080 6000 kb/s .. ----------------- -------------- .. 1920x1080 7800 kb/s .. ================= ============== .. .. Choosing a good ladder is important for quality and efficient delivery. What is .. 'good' depends on the capabilities of the end users' devices, network capacity, .. the codec that is used, and the content itself. Ideally, you adjust the bitrate .. ladder per asset in your library (as some content requires higher bitrates to .. achieve the same quality, and other content can be encoded more efficiently). .. .. .. note:: .. .. The aspect ratio must remain the same across your entire bitrate ladder. .. CMAF Switchingset provides requirements for tracks to be switcheable. .. .. .. note:: .. .. Audio may have a bit-rate ladder or multiple codecs of well to support .. different network conditions. Must Fix: each video segment starts with an IDR frame ------------------------------------------------------ The first sample in a segment contains an Instant Decoder Refresh (IDR) frame that is signaled as being a sync-sample, so that the segment can be considered discrete from a decoding perspective. This enables the player to switch between adaptive bitrate video components without significant degradation of the rendered video. Must Fix: Audio track metadata includes language ------------------------------------------------- In case more than 1 audio track is used, the mdhd box and/or elng box shall contain the audio language. Must Fix: Timed text metadata includes language ------------------------------------------------- In case 1 or more timed text tracks are used, the mdhd box and/or elng box shall contain the language of the timed text track. Must Fix: Audio and Video tracks shall contain bit-rate box ----------------------------------------------------------- Audio and video tracks shall contain a bitrate box indicatign the average and maximum bit-rate. In case of mp4 audio (aac), this may be ommited if signaled in mp4 audio sample entry. Should Fix: avc and hevc video Tracks should include framerate information -------------------------------------------------------------------------- To be able to calculate and signal the framerate of the video the following must be signaled in an avc video track (avc1/avc3/encv/hvc1): - The timescale and number of units in a tick must be set in the respective VUI parameters ('time_scale' and 'num_units_in_tick') - In addition the Boolean value for the VUI parameters 'timing_info_present_flag' and 'fixed_frame_rate_flag' must be set to 'true', to signal that the timing info is present and that the framerate is fixed. For HEVC tracks constant_frame_rate should be 1 or 2 in HEVCDecoderConfiguration in the sample entry of type hev1, hvc1 or encv. The average_frame_rate should also be set (not zero). If it is set to zero, the frame rate is derived from the HEVC vps carried in the sample entry only. vps_num_units_in_tick and vps_time_scale as vps_frame_rate = vps_time_scale / vps_num_units_in_tick Should Fix: Media presentation start times shall be 0 ------------------------------------------------------ All tracks shall start at the same media presentation time and zero .. Should Fix: Duration of tracks shall not differ more than one fragment duration .. -------------------------------------------------------------------------------- .. The difference in duration of tracks shall not be more than one fragment duration. Must Fix: stss box must be present ---------------------------------- An stss box must be present. Its absence indicates that every sample is a sync sample. This must be fixed or the HLS playlist generated by the Origin would only include keyframes video tracks, which is most definitely not the desired outcome. Must Fix: VP Codec Configuration Record Check --------------------------------------------- Performs a list of checks on the content of the VP Codec Configuration Box. See for reference https://www.webmproject.org/vp9/mp4/. In particular, it will check that: - "version" is 1 - "profile" has a legal and defined value - "bitDepth" has a legal value - "chromaSubSampling" has a legal value and it is coherent with "matricCoefficients" - "codecIntializationData" is not used for VP8 and VP9 Must Fix: Fragmented mp4 must be indexed --------------------------------------------- Either a 'sidx' or 'mfra' box must be present in the file .. _stream-alignment: Should Fix: video fragments have an equal duration (except the last) --------------------------------------------------------------------- All video fragments except the last shall have an equal duration. In case of audio, small variation in durations may occur. If you are able to align audio segments to video segment boundaries, audio segment durations may also be constant. .. Should Fix: fragment boundaries are aligned across all tracks (audio, video, text) .. ----------------------------------------------------------------------------------- .. To ensure maximum compatibility, audio, video and text tracks should be .. perfectly aligned at all fragment boundaries. .. .. Generally speaking, this requires the following: .. .. - Video with a integer frame rate (i.e., no dropped frame rate) .. - Audio with a sample rate of 48 KHz (i.e., not 44.1 KHz) .. .. What you want to achieve is a fragment duration that fits an integer number of audio .. and video frames (fragmenting timed text is much more flexible). To calculate this .. duration, it is important to know that an AAC audio frame consists of 1024 .. samples. Examples are 0.96, 1.92 .... second audio segments with AAC samples at 48 KhZ .. and 25 fps video with similar durations. .. Should Fix: in case video tracks with B-frames, negative composition offsets are used (and no edit lists) .. ---------------------------------------------------------------------------------------------------------- .. The order in which frames need to be decoded (DTS) is not always equal to the .. order in which they should be presented (PTS). That's why each frame has a .. decode timestamp (DTS) and a presentation timestamp (PTS). CMAF recommends that in such .. case composition time offsets are used. Note that this requires trun version 1. Should Fix: timescale of audio tracks matches their sample rate (48 KHz preferably) ----------------------------------------------------------------------------------- To avoid potential timing issues audio tracks should use a TimeScale which matches the sample rate. If the sample rate timescales do not match (an integer multiple of each other) some samples will not be accurately addressable, this may cause discontinuities. Should Fix: codec parameters are carried out-of-band, instead of in-band ------------------------------------------------------------------------- For codecs like AVC/H.264 and HEVC/H.265 codec parameters can be carried in the SampleEntry or in the NAL units in the samples. Carriage of codec parameters in the SampleEntry is preferred. This corresponds to codec configurations such as 'avc1' and 'hvc1' (i.e., as opposed to 'avc3' and 'hev1'). In case of avc3 or hev1 that codec parameters shall also be present in the Sample Entry. Note that as of version 1.10.28 Packager can be used to convert avc3 content to avc1 using the :ref:`option-no_inband_parameter_sets` option when packaging content. Should Fix: pasp and clasp boxes should not preceed codec-specific boxes ------------------------------------------------------------------------- The VisualSampleEntry box (14496-12:2020 ‎12.1.3.2) includes optional CleanApertureBox ('clap') and PixelAspectRatioBox ('pasp') boxes. These boxes are used, if present, to specify the pixel aspect ratio and clean aperture of the video. For maximum compatibility, these boxes should follow, not precede, any boxes defined in or required by derived specifications. .. .. Must Fix: additional IDR frames are present at splice points (SCTE 35 use cases only) .. -------------------------------------------------------------------------------------- .. A splice point is a specific timestamp in a stream by a SCTE 35 marker. In the .. streams, this timestamp must correspond to an IDR frame in a video track (which needs to be .. signaled as a sync-sample). If this is the case the splice point offers the .. opportunity to seamlessly switch the stream to a different clip. Splice points .. can be used to cue: .. .. - Content replacement and insertion opportunities (e.g., ads) .. - Start and endpoint of a program .. .. In Live streaming use cases the Live encoder is responsible for adding the .. additional IDR frames at splice points. For VOD use cases this task can either .. be fulfilled by the encoder itself, or by relying on the transcoding .. functionality that is part of :ref:`remix-avod`. ..