CMAF Formatting

Validator implements checks to assert conformance to ISO/IEC 23000-19 clause 7. This format is commonly used in streaming and based on an official iso standard.

Should Fix: 23000-19 7.1. include CMAF Brands

The CMAF track format is derived from the ISO Base Media File Format in this clause and structural brands are specified. At this point, the cmfc and the cmf2 CMAF structural brands are defined. The cmf2 brand further restricts the ‘cmfc’ brand.

Must Fix: 23000-19 Check Table 3 — CMAF header boxes

Following Boxes are (conditionally) required in a CMAF header, Validator chackes for these:

ftyp, moov, mvhd, trak, tkhd, edts, elst, mdia, mdhd, hdlr, elng, minf, vmhd,
smhd,sthd, dinf, dref, stbl, stsd, stts, stsc, stsz/stz2, stco, sgpd, stss,
udta, cprt, kind, mvex, mehd, trex, pssh

Must Fix: 23000-19 Check Table 4 — Header Protected Sample Entry

Following boxes are (conditionally required) in protected sample entry, Validator checks for these:

stsd, sinf, frma, schm, schi, tenc

Must Fix: 23000-19 CMAF chunk, CMAF fragment, CMAF segment, and CMAF track file

Following boxes are (conditionally required) in CMAF chunk fragment segment, Validator checks for these:

styp, prft, emsg, moof, mfhd, traf, tfhd, tfdt, trun, senc, saio, saiz, sbgp,
sgpd, subs, mdat

Must Fix 23000-19 7.3.2.1 CMAF header

Validator checks the requirements of the CMAF header:

A CMAF header as defined in 3.1.2 conforms to the following constraints.

a) A CMAF header shall contain the set of boxes in Table 3 and Table 4 with the conditions and optionality listed.

b) Each CMAF header shall form a valid CMAF track, as specified in clause 7.3.2.2, when followed by a continuous sequence of associated CMAF fragments in decode order.

c) A CMAF header shall be conformant with ISO/IEC 14496-12 and the following additional constraints and requirements:

  1. The CMAF header shall start with a FileTypeBox.

  2. The CMAF header shall include exactly one MovieBox.

  3. The MovieBox shall start with a MovieHeaderBox, as constrained in clause 7.5.1.

  4. The MovieBox shall contain exactly one track containing media data as specified in clause 7.3.2.2.

  5. The MovieBox shall contain a MovieExtendsBox, as defined in ISO/IEC 14496-12, to indicate that the file contains MovieFragmentBoxes.

conditions 6) and D from the spec were not yet implemented

Must Fix: 23000-19 7.3.2.2 CMAF track

Validator checks for CMAF track conformance based on clause 7.3.2.2

a) A CMAF track shall conform to at least one structural CMAF brand and contain the set of boxes in Table 3, Table 4, and Table 5, with the conditions and optionality listed.

b) The concatenation of a CMAF header and all CMAF fragments in the CMAF track in consecutive decode order shall be a valid fragmented ISO BMFF file, with the exception that the first CMAF fragment in a CMAF track may have a non-zero baseMediaDecodeTime.

c) Each CMAF fragment in a CMAF track shall have baseMediaDecodeTime equal to the sum of all prior CMAF fragment durations added to the first fragment’s baseMediaDecodeTime. A CMAF fragment duration is the sum of the media sample durations, documented in the TrackFragmentRunBox in the MovieFragmentHeaderBox. NOTE Valid CMAF tracks do not have media time discontinuities resulting from missing media samples or fragments. Gaps in decode time can result in audio-video synchronization errors. For recommendations on handling missing media samples and missing CMAF fragments, see Annex F.

d) Each CMAF track contains a single ISO BMFF track and TrackBox, as determined by CMAF header constraints specified in clause 7.3.2.1.

Must Fix: 23000-19 7.4.5 Event Message Box (emsg)

The enhanced DASHEventMessageBox (emsg) described in this clause is version 1 of the DASHEventMessageBox, specified in ISO/IEC 23009-1.

Version 1 of this box adds the field presentation_time, which makes event message timing independent of box location in the CMAF track. Version 1 should be used for event messages in CMAF fragments and addressable media objects. presentation_time provides the presentation time of the event measured on the CMAF track’s presentation timeline, in the timescale declared in its MovieHeaderBox. message_data is the body of the event message.

The syntax and semantics of this field are defined by the owner of the scheme identified in the scheme_id_uri field. Message schemes may be defined for specific applications and users, or standardized for global use, such as SCTE-35 advertisement and program segmentation markers.

DASHEventMessageBoxes can be included in CMAF fragments to indicate ad insertion points, etc. in the media stream, then other DASHEventMessageBoxes added to CMAF segments or CMAF chunks at time of delivery, e.g. to trigger manifest updates. A DASHEventMessageBox in a CMAF track shall contain the value in its timescale field equal to the value of the timescale field in the MediaHeaderBox of the CMAF track that contains it. If version 0 is used, then DASH defines the timing of an Event Message related to the earliest media sample presentation time of a DASH segment using the field presentation_time_delta, which “provides the media presentation time delta of the media presentation time of the event and the earliest presentation time in this segment. For CMAF fragments, the presentation_time_delta shall equal the media presentation time of the event minus the earliest presentation time of the following CMAF fragment.

The earliest decode time in a CMAF track file is zero (defragmented or not), and if an edit list is present, the earliest presentation time is the earliest media sample composition time adjusted by the edit list offset.

Should Fix: 23000-19 7.5.1 Movie Header Box (mvhd)

In the MovieHeaderBox, the value of the duration field should be set to zero to indicate that the MovieBox contains no media samples and therefore has no duration.

The duration field in the MediaHeaderBox (‘mdhd’) applies to the TrackBox (trak), which contains no media samples in a CMAF track. The duration of a CMAF track can optionally be stored in the fragment_duration field of the MovieExtendsHeaderBox (‘mehd’), which is equal to the sum of all CMAF fragment durations in the CMAF track. If the duration is unknown, this box is omitted.

The fields rate, volume, and matrix shall be set to their default values.

Must Fix: 23000-19 7.5.4 Track Header Box (tkhd)

CMAF TrackHeaderBoxes shall conform to ISO/IEC 14496-12 with the following additional constraints.

— The field duration shall be set to a value of zero (0), indicating no media

samples are referenced from the TrackBox (trak).

— The field matrix shall be set to their default values as defined in ISO/IEC

14496-12, except to indicate video orientation (i.e. portrait or landscape orientation relative to the captured scene). See clause 9.2.3.

— The following fields shall be set to default values as defined in ISO/IEC

14496-12, unless specified otherwise in this document.

— The layer field should equal 0 or greater for normally presented video tracks.

— The layer field should equal −1 for subtitle tracks so they are normally

presented over the video.

— The width and height fields for a non-visual track (e.g. audio) shall be 0.

— As defined in ISO/IEC 14496-12, the width and height fields for a CMAF video

track shall specify the track’s normalized presentation size as fixed-point 16.16 values expressed in square pixels after decoder cropping, and in the case of video encoded with a non-square video spatial sample shape, after horizontal scaling has been applied. See 9.2.3 for normalized width and height calculation.

— Subtitle tracks may set width and height to an intended layout size, in which

case the text layout engine or graphics engine can scale the width and height to match the video display aperture (player implementation dependent).

— As defined in ISO/IEC 14496-30, subtitle tracks encoded as text may use

relative position coordinates and font sizes so that the text layout engine can adjust glyph and layout size to match the final video display aperture without relying on image scaling. For such tracks, the value of zero width and height should be used to indicate that the data can be rendered at any size, and the layout size may be determined by matching the size of the video display aperture.

— For scalable text and subtitle tracks, the flag track_size_is_aspect_ratio may

also be used.

Must Fix: 23000-19 7.5.5 Track Header Box (tkhd)

Implemented:

The CMAF MediaHeaderBoxes shall conform to ISO/IEC 14496-12 with the following additional constraints.

— The value of the duration field should be set to a value of zero (0) (see

clause 7.5.1).

Not Implemented:

— Where possible, the value of the timescale field should be chosen such that

when the frame rate is constant, the value of the media sample duration may also be constant.

All tracks that are language-specific should identify the language as precisely as possible (e.g. a text track whose language can be written in different scripts should identify which script is used). When the language is not relevant or not known, the ‘und’ (undetermined) language tag should be used.

Must Fix: 23000-19 7.5.7 Sound Media Header Box (smhd)

The SoundMediaHeaderBox shall conform to ISO/IEC 14496-12 and the following constraint. The field balance shall equal 0 (centre).

Must Fix: 23000-19 7.5.9 Data Reference Box (dref)

DataReferenceBoxes in a CMAF track shall conform to ISO/IEC 14496-12 with the following additional constraints. The DataReferenceBox shall contain a single entry with the entry_flags field set to 0x000001 (which means that the media data is in the same file as the MovieBox containing this data reference).

Must Fix: 23000-19 7.5.10 Sample Description Box

The SampleDescriptionBox in a CMAF track shall conform to version 0 as defined in ISO/IEC 14496-12 with the following additional constraints.

Must Fix: 23000-19 7.5.12 Track contained media sample information boxes

All boxes in the SampleTableBox have a sample count of 0 because CMAF does not reference media samples from the TrackBox. The mandatory boxes of ISO/IEC 14496-12 are mandatory, even if they document no samples.

The following boxes therefore shall have an entry_count of zero:

— TimeToSampleBox (stts);

— SampleToChunkBox (stsc);

— ChunkOffsetBox (stco);

— SampleSizeBox or CompactSampleSizeBox (stsz or stz2);

— SyncSampleBox (stss), if present.

The presence of an empty SyncSampleBox in a CMAF header indicates that not all media samples in the CMAF track are sync samples.

Media sample size, duration, and dependency information can be found in the TrackRunBox(es) in each CMAF fragment or CMAF chunk.

Must Fix 7.5.13 23000-19 Edit List Box (elst)

If the Edit List Box (elst) is present, the following conditions apply:

— The EditBox shall contain a single EditListBox

— The value of entry_count field in the EditListBox shall be set to 1

— The value of the media_rate_integer field shall be set to 1 and the value of

the media_rate_fraction field shall be set to 0

— The value of the segment_duration field shall be set to 0

Such conditions define an offset edit or offset edit list.

Must Fix: 23000-19 7.5.16 Track Fragment Header Box (tfhd)

A TrackFragmentHeaderBox in a CMAF track shall conform to ISO/IEC 14496-12 with the following additional constraints.

— The track_ID field shall contain the same value as the track_ID in the

associated CMAF header

— The base-data-offset-present flag (in the tf_flags field) shall be set to zero

— The default-base-is-moof flag (in the tf_flags field) shall be set to one

— Every TrackFragmentBox shall contain a TrackFragmentBaseMediaDecodeTimeBox,

as defined in ISO/IEC 14496-12, to provide the decode time of the first media sample in the track fragment. NOTE The baseMediaDecodeTime of the first available CMAF fragment in a CMAF track can be non-zero

Must Fix: 23000-19 Track Run Box (‘trun’)

A TrackRunBox in a CMAF track shall conform to ISO/IEC 14496-12 with the following additional constraints.

— The version field shall be set to either 0 or 1,

— The data-offset-present flag (in the tf_flags field) shall be set to true

In order to indicate that the data_offset field is present and contains the byte offset from the start of this fragment’s MovieFragmentBox to the start of the first media sample in the following MediaDataBox.

Not Implemented:

— When the version field is set to 1, the sample_composition_time_offset

of the first presented media sample in a CMAF fragment shall be such that its composition time is equal to the first media sample decode time (baseMediaDecodeTime)

This is called movie fragment relative addressing in ISO/IEC 14496-12.

— Within a CMAF track, any TrackRunBox that describes any non-sync media samples

shall identify sample dependency with the CMAF chunk and CMAF fragment using a combination of the sample_flags and first_sample_flags fields and default values in the TrackFragmentHeaderBox: — sample_is_non_sync_sample shall be 0 for SAP type 1 or 2, and 1 otherwise;

— an empty SyncSampleBox shall be present in the track.

ISO/IEC 14496-12 specifies that absence of the SyncSampleBox indicates that all media samples are sync samples in the track, which allows a reader to know that all subsequent CMAF fragments will also consist of sync samples. If a SyncSampleBox is present, then dependency flags in each CMAF fragment indicate which media samples are sync samples, since the header contains no media samples and the SyncSampleBox therefore lists no media samples.

Must Fix: 23000-19 use the Structural CMAF Brand cmf2

A TrackRunBox in a CMAF track shall conform to the constraints in clause 7.5.17 with the following additional constraints: For video CMAF Tracks not contained in Track Files, Version 1 shall be used. default_sample_flags, sample_flags and first_sample_flags shall be set in the TrackFragmentHeaderBox and/or TrackRunBox to provide sample dependency information within each CMAF chunk and CMAF fragment. Default values or per sample values of sample duration and sample size shall be stored in each CMAF chunk’s TrackRunBox and/or TrackFragmentHeaderBox.

Default flags and sample parameters (duration, size, or sample description index) can be set and ignored in the TrackExtendsBox, as long as those values are also set in all CMAF chunks and CMAF fragments so each CMAF fragment is decodable without access to that track CMAF header.