CMAF Formatting

Validator implements checks to assert conformance to ISO/IEC 23000-19 clause 7. This format is commonly used in streaming and based on an official ISO standard.

Should Fix: 23000-19 7.1. include CMAF Brands

The CMAF track format is derived from the ISO Base Media File Format in this clause and structural brands are specified. At this point, the cmfc and the cmf2 CMAF structural brands are defined. The cmf2 brand further restricts the ‘cmfc’ brand. In case no brand is found in the FileType box a warning is trigerred.

Must Fix: 23000-19 Check Table 3 — CMAF header boxes

Following Boxes are (conditionally) required in a CMAF header, Validator checks for these:

ftyp, moov, mvhd, trak, tkhd, edts, elst, mdia, mdhd, hdlr, elng, minf, vmhd,
smhd,sthd, dinf, dref, stbl, stsd, stts, stsc, stsz/stz2, stco, sgpd, stss,
udta, cprt, kind, mvex, mehd, trex, pssh

In case of absence and the box is expected to be present a must fix error is reported

Must Fix: 23000-19 Check Table 4 — Header Protected Sample Entry

Following boxes are (conditionally required) in protected sample entry, Validator checks for these:

stsd, sinf, frma, schm, schi, tenc

Must Fix: 23000-19 CMAF chunk, CMAF fragment, CMAF segment, and CMAF track file

Following boxes are (conditionally required) in CMAF chunk, fragment or segment, Validator checks for these:

styp, prft, emsg, moof, mfhd, traf, tfhd, tfdt, trun, senc, saio, saiz, sbgp,
sgpd, subs, mdat

In case of absence and the box is expected to be present a must fix error is reported

Must Fix 23000-19 7.3.2.1 CMAF header

Validator checks the requirements of the CMAF header:

A CMAF header as defined in 3.1.2 conforms to the following constraints.

a) A CMAF header shall contain the set of boxes in Table 3 and Table 4 with the conditions and optionality listed.

b) Each CMAF header shall form a valid CMAF track, as specified in clause 7.3.2.2, when followed by a continuous sequence of associated CMAF fragments in decode order.

c) A CMAF header shall be conformant with ISO/IEC 14496-12 and the following additional constraints and requirements:

  1. The CMAF header shall start with a FileTypeBox.

  2. The CMAF header shall include exactly one MovieBox.

  3. The MovieBox shall start with a MovieHeaderBox, as constrained in clause 7.5.1.

  4. The MovieBox shall contain exactly one track containing media data as specified in clause 7.3.2.2.

  5. The MovieBox shall contain a MovieExtendsBox, as defined in ISO/IEC 14496-12, to indicate that the file contains MovieFragmentBoxes.

conditions 6) and D from the spec were not yet implemented in validator, for the other conditions a must fix error will be reported.

Must Fix: 23000-19 7.3.2.2 CMAF track

Validator checks for CMAF track conformance based on clause 7.3.2.2

a) A CMAF track shall conform to at least one structural CMAF brand and contain the set of boxes in Table 3, Table 4, and Table 5, with the conditions and optionality listed.

b) The concatenation of a CMAF header and all CMAF fragments in the CMAF track in consecutive decode order shall be a valid fragmented ISO BMFF file, with the exception that the first CMAF fragment in a CMAF track may have a non-zero baseMediaDecodeTime.

c) Each CMAF fragment in a CMAF track shall have baseMediaDecodeTime equal to the sum of all prior CMAF fragment durations added to the first fragment’s baseMediaDecodeTime. A CMAF fragment duration is the sum of the media sample durations, documented in the TrackFragmentRunBox in the MovieFragmentHeaderBox. NOTE Valid CMAF tracks do not have media time discontinuities resulting from missing media samples or fragments. Gaps in decode time can result in audio-video synchronization errors. For recommendations on handling missing media samples and missing CMAF fragments, see Annex F.

d) Each CMAF track contains a single ISO BMFF track and TrackBox, as determined by CMAF header constraints specified in clause 7.3.2.1.

Validator reports a must fix error in case this condition is violated

Must Fix: 23000-19 7.4.5 Event Message Box (emsg)

The enhanced DASHEventMessageBox (emsg) described in this clause is version 1 of the DASHEventMessageBox, specified in ISO/IEC 23009-1.

Version 1 of this box adds the field presentation_time, which makes event message timing independent of box location in the CMAF track. Version 1 should be used for event messages in CMAF fragments and addressable media objects. presentation_time provides the presentation time of the event measured on the CMAF track’s presentation timeline, in the timescale declared in its MovieHeaderBox. message_data is the body of the event message.

The syntax and semantics of this field are defined by the owner of the scheme identified in the scheme_id_uri field. Message schemes may be defined for specific applications and users, or standardized for global use, such as SCTE-35 advertisement and program segmentation markers.

DASHEventMessageBoxes can be included in CMAF fragments to indicate ad insertion points, etc. in the media stream, then other DASHEventMessageBoxes added to CMAF segments or CMAF chunks at time of delivery, e.g. to trigger manifest updates. A DASHEventMessageBox in a CMAF track shall contain the value in its timescale field equal to the value of the timescale field in the MediaHeaderBox of the CMAF track that contains it. If version 0 is used, then DASH defines the timing of an Event Message related to the earliest media sample presentation time of a DASH segment using the field presentation_time_delta, which “provides the media presentation time delta of the media presentation time of the event and the earliest presentation time in this segment. For CMAF fragments, the presentation_time_delta shall equal the media presentation time of the event minus the earliest presentation time of the following CMAF fragment.

Validator checks for the consistency of the timescale of event message box with the mdhd box.

The earliest decode time in a CMAF track file is zero, and if an edit list is present, the earliest presentation time is the earliest media sample composition time adjusted by the edit list offset.

Should Fix: 23000-19 7.5.1 Movie Header Box (mvhd)

In the MovieHeaderBox, the value of the duration field should be set to zero to indicate that the MovieBox contains no media samples and therefore has no duration.

The duration field in the MediaHeaderBox (‘mdhd’) applies to the TrackBox (trak), which contains no media samples in a CMAF track. The duration of a CMAF track can optionally be stored in the fragment_duration field of the MovieExtendsHeaderBox (‘mehd’), which is equal to the sum of all CMAF fragment durations in the CMAF track. If the duration is unknown, this box is omitted.

The fields rate, volume, and matrix shall be set to their default values.

Validator checks for the consistency of the duration fields in mvhd and mdhd, but not for the optional duration signalling in mehd field.

NOTE: 7.5.2 and 7.5.3 on metadataBox and kind box in udta are not implemented in the validator.

Must Fix: 23000-19 7.5.4 Track Header Box (tkhd)

CMAF TrackHeaderBoxes shall conform to ISO/IEC 14496-12 with the following additional constraints.

— The field duration shall be set to a value of zero (0), indicating no media

samples are referenced from the TrackBox (trak).

— The field matrix shall be set to their default values as defined in ISO/IEC

14496-12, except to indicate video orientation (i.e. portrait or landscape orientation relative to the captured scene). See clause 9.2.3.

— The following fields shall be set to default values as defined in ISO/IEC

14496-12, unless specified otherwise in this document.

— The layer field should equal 0 or greater for normally presented video tracks.

— The layer field should equal −1 for subtitle tracks so they are normally

presented over the video.

— The width and height fields for a non-visual track (e.g. audio) shall be 0.

— As defined in ISO/IEC 14496-12, the width and height fields for a CMAF video

track shall specify the track’s normalized presentation size as fixed-point 16.16 values expressed in square pixels after decoder cropping, and in the case of video encoded with a non-square video spatial sample shape, after horizontal scaling has been applied. See 9.2.3 for normalized width and height calculation.

— Subtitle tracks may set width and height to an intended layout size, in which

case the text layout engine or graphics engine can scale the width and height to match the video display aperture (player implementation dependent).

— As defined in ISO/IEC 14496-30, subtitle tracks encoded as text may use

relative position coordinates and font sizes so that the text layout engine can adjust glyph and layout size to match the final video display aperture without relying on image scaling. For such tracks, the value of zero width and height should be used to indicate that the data can be rendered at any size, and the layout size may be determined by matching the size of the video display aperture.

— For scalable text and subtitle tracks, the flag track_size_is_aspect_ratio may

also be used.

Must Fix: 23000-19 7.5.5 Media Header Box (mdhd)

The CMAF MediaHeaderBoxes shall conform to ISO/IEC 14496-12 with the following additional constraints.

— The value of the duration field should be set to a value of zero (0) (see

clause 7.5.1). (Implemented in validator)

Not Implemented in validator:

— Where possible, the value of the timescale field should be chosen such that

when the frame rate is constant, the value of the media sample duration may also be constant.

All tracks that are language-specific should identify the language as precisely as possible (e.g. a text track whose language can be written in different scripts should identify which script is used). When the language is not relevant or not known, the ‘und’ (undetermined) language tag should be used.

NOTE: 7.5.6 vmhd will be implemented once the validation of video CMAF is completed.

Must Fix: 23000-19 7.5.7 Sound Media Header Box (smhd)

The SoundMediaHeaderBox shall conform to ISO/IEC 14496-12 and the following constraint: The field balance shall equal 0 (centre).

NOTE: 7.5.8 can be validated using the timed text validation suite for timed text tracks.

Must Fix: 23000-19 7.5.9 Data Reference Box (dref)

DataReferenceBoxes in a CMAF track shall conform to ISO/IEC 14496-12 with the following additional constraints. The DataReferenceBox shall contain a single entry with the entry_flags field set to 0x000001 (which means that the media data is in the same file as the MovieBox containing this data reference).

Must Fix: 23000-19 7.5.10 Sample Description Box

The SampleDescriptionBox in a CMAF track shall conform to version 0 as defined in ISO/IEC 14496-12 with the following additional constraints.

— Sample entries for encrypted tracks (those containing any encrypted media sample

data) shall encapsulate the existing sample entry with the appropriate four-character-code listed in ISO/IEC 14496-12 and include a ProtectionSchemeInfoBox (‘sinf’) that conforms to ISO/IEC 14496-12 and clause ‎0.

— Constraints on visual sample entries are specified in clause ‎9.2.4. (not yet implemented)

— Constraints on audio sample entries are specified in clause ‎10.2.5. (not yet implemented)

— Constraints on subtitle sample entries are specified in ISO/IEC 14496-30. (implemented in timed text suite)

Must Fix: 23000-19 7.5.11 Protection Scheme Information Box (‘sinf’)

CMAF shall use Common Encryption (ISO/IEC 23001-7) for CMAF tracks containing one or more encrypted CMAF fragments and use Scheme Signalling as defined in ISO/IEC 23001-7. An encrypted CMAF track shall include at least one ProtectionSchemeInfoBox (‘sinf’) containing a TrackEncryptionBox (‘tenc’) identifying a scheme specified in ISO/IEC 23001-7.

Validator checks this based on Table 4 requirements of sinf box and tenc boxes.

Must Fix: 23000-19 7.5.12 Track contained media sample information boxes

All boxes in the SampleTableBox have a sample count of 0 because CMAF does not reference media samples from the TrackBox. The mandatory boxes of ISO/IEC 14496-12 are mandatory, even if they document no samples.

The following boxes therefore shall have an entry_count of zero:

— TimeToSampleBox (stts);

— SampleToChunkBox (stsc);

— ChunkOffsetBox (stco);

— SampleSizeBox or CompactSampleSizeBox (stsz or stz2);

— SyncSampleBox (stss), if present.

The presence of an empty SyncSampleBox in a CMAF header indicates that not all media samples in the CMAF track are sync samples.

Media sample size, duration, and dependency information can be found in the TrackRunBox(es) in each CMAF fragment or CMAF chunk.

Must Fix 7.5.13 23000-19 Edit List Box (elst)

If the Edit List Box (elst) is present, the following conditions apply:

— The EditBox shall contain a single EditListBox

— The value of entry_count field in the EditListBox shall be set to 1

— The value of the media_rate_integer field shall be set to 1 and the value of

the media_rate_fraction field shall be set to 0

— The value of the segment_duration field shall be set to 0

Such conditions define an offset edit or offset edit list.

Must Fix 7.5.14 23000-19 Trex Box (trex)

A TrackExtendsBox shall be present in a CMAF track since it is a fragmented file as defined in ISO/IEC 14496-12.

NOTE: 7.5.15 is not checked as no constraint is introduced in CMAF

Must Fix: 23000-19 7.5.16 Track Fragment Header Box (tfhd)

A TrackFragmentHeaderBox in a CMAF track shall conform to ISO/IEC 14496-12 with the following additional constraints.

— The track_ID field shall contain the same value as the track_ID in the

associated CMAF header

— The base-data-offset-present flag (in the tf_flags field) shall be set to zero

— The default-base-is-moof flag (in the tf_flags field) shall be set to one

— Every TrackFragmentBox shall contain a TrackFragmentBaseMediaDecodeTimeBox,

as defined in ISO/IEC 14496-12, to provide the decode time of the first media sample in the track fragment. NOTE The baseMediaDecodeTime of the first available CMAF fragment in a CMAF track can be non-zero

Must Fix: 23000-19 7.5.17 Track Run Box (‘trun’)

A TrackRunBox in a CMAF track shall conform to ISO/IEC 14496-12 with the following additional constraints.

— The version field shall be set to either 0 or 1,

— The data-offset-present flag (in the tf_flags field) shall be set to true

In order to indicate that the data_offset field is present and contains the byte offset from the start of this fragment’s MovieFragmentBox to the start of the first media sample in the following MediaDataBox.

Not Implemented:

— When the version field is set to 1, the sample_composition_time_offset

of the first presented media sample in a CMAF fragment shall be such that its composition time is equal to the first media sample decode time (baseMediaDecodeTime)

This is called movie fragment relative addressing in ISO/IEC 14496-12.

— Within a CMAF track, any TrackRunBox that describes any non-sync media samples

shall identify sample dependency with the CMAF chunk and CMAF fragment using a combination of the sample_flags and first_sample_flags fields and default values in the TrackFragmentHeaderBox: — sample_is_non_sync_sample shall be 0 for SAP type 1 or 2, and 1 otherwise;

— an empty SyncSampleBox shall be present in the track.

ISO/IEC 14496-12 specifies that absence of the SyncSampleBox indicates that all media samples are sync samples in the track, which allows a reader to know that all subsequent CMAF fragments will also consist of sync samples. If a SyncSampleBox is present, then dependency flags in each CMAF fragment indicate which media samples are sync samples, since the header contains no media samples and the SyncSampleBox therefore lists no media samples.

NOTE: 7.5.18, 7.5.19 and 7.5.20 are not yet implemented in validator.

Must Fix: 23000-19 7.7. use the Structural CMAF Brand cmf2

A CMAF Track conforming to the CMAF structural brand ‘cmf2’ shall conform to constraints of the CMAF structural brand ‘cmfc’ and all remaining constraints in this clause ‎7.7.

For video CMAF Tracks, the EditBox and in particular the EditListBox shall not be present. For video CMAF Track files as well as any other media types, the EditListBox may be present following the constraints in clause ‎7.5.13.

A TrackRunBox in a CMAF track shall conform to the constraints in clause ‎7.5.17 with the following additional constraints:

  • For video CMAF Tracks not contained in Track Files, Version 1 shall be used.

  • default_sample_flags, sample_flags and first_sample_flags shall be set in the TrackFragmentHeaderBox and/or TrackRunBox to provide sample dependency information within each CMAF chunk and CMAF fragment.

  • Default values or per sample values of sample duration and sample size shall be stored in each CMAF chunk’s TrackRunBox and/or TrackFragmentHeaderBox.

NOTE: Default flags and sample parameters (duration, size, or sample description index) can be set and ignored in the TrackExtendsBox, as long as those values are also set in all CMAF chunks and CMAF fragments so each CMAF fragment is decodable without access to that track CMAF header.

Must Fix: vmhd box version shall be 0 according to 23000-19 clause 9.2.2.

VideoMediaHeaderBoxes in a video CMAF track shall conform to ISO/IEC 14496-12 with the following additional constraints. The following fields shall be set to their default values as defined in ISO/IEC 14496-12: version=0; graphicsmode=0; opcolor={0, 0, 0}.

Must Fix: graphicsmode shall be 0 according to 23000-19 clause 9.2.2

VideoMediaHeaderBoxes in a video CMAF track shall conform to ISO/IEC 14496-12 with the following additional constraints. The following fields shall be set to their default values as defined in ISO/IEC 14496-12: version=0; graphicsmode=0; opcolor={0, 0, 0}.

Should Fix: the CleanApertureBox should not be present 9.3.2.1

ISO/IEC 14496-12 specifies that the values of width and height are set to the decoded image size on a uniformly sampled square grid. For NAL structured video CMAF tracks, the values are additionally constrained as follows. The values of width and height shall be normalized to the width and height of the NAL structured video, as defined below. The normalized presentation height shall be the number of vertical video spatial samples in the sequence parameter set NAL VUI after cropping parameters are applied. NOTE 1 The height field of the visual sample entry is also the number of encoded vertical video spatial samples after cropping.The normalized presentation width shall be the number of horizontal video spatial samples after sequence parameter set NAL VUI cropping parameters are applied, then multiplied by the video spatial sample aspect ratio. The sample aspect ratio is specified by aspect_ratio_idc (and if applicable by the sar_width/sar_height ratio) in SPS VUI parameters and the PixelAspectRatioBox if present in the sample entry. NOTE 2 The width field of the visual sample entry is the number of encoded horizontal video spatial samples after cropping. The CleanApertureBox should not be present, since the cropped aperture is defined to be active image (clean) in a video CMAF track.

Must Fix tkhd flags shall be set to 0x000007 according to 23000-19 clause 9.2.3

For video tracks, the fields of the TrackHeaderBox shall be set to the values constrained below and specified in ISO/IEC 14496-12.

flags = 0x000007

ShouldFix: First visual sample entry has width value that is not equal or exceeding any height value in SPS entries Check ISO/IEC 23000-19 9.3.2.2

The SampleDescriptionBox in a video track shall contain one or more visual sample entries conforming to ISO/IEC 14496-12 and ISO/IEC 14496-15. The first VisualSampleEntry in the SampleDescriptionBox:

  • shall include width and height field values that equal or exceed the largest cropped horizontal and vertical video spatial sample counts in any sequence parameter set referenced by a video slice in the NAL structured video track;

  • should contain no more than one NAL parameter set of each type, e.g., for AVC video, one SPS NAL with VUI and one PPS NAL in the AVCDecoderConfigurationRecord

  • and contain a decoder configuration record that:

  • should signal the lowest codec profile, level, height and width values that are equal to or greater than the values required for all the CMAF fragments in the CMAF track. See general constraints in subclause 9.2.4

  • should set LengthSizeMinusOne field to the value 3 (to indicate 4 bytes) to address arge VCL NALs and simplify conversion of elementary streams between MPEG-2 TS bytestreams with startcodes and ISO/IEC 14496-15 with NAL length headers

NOTE 1 The size of the NAL header length field defined in video tracks conforming to ISO/IEC 14496-15 is stored in the field LengthSizeMinusOne in the corresponding decoder configuration record, e.g., for AVC video in the AVCDecoderConfigurationRecord.

  • shall contain one or more ColorInformationBoxes with sub-type ‘nclx’ and a PixelAspectRatioBox ‘pasp’, as documented in ISO/IEC 14496-12, if the first sample entry contains no SPS NAL with VUI in the decoder configuration record.

NOTE 2 A decoder configuration record without a parameter set is valid for a sample description such as ‘avc3’ or ‘hev1’ that can store the parameter set NALs necessary for decoding and display in each CMAF fragment. Any subsequent sample entries in the SampleDescriptionBox:

  • should contain one parameter set including VUI in each visual sample entry’s decoder configuration box, e.g., for AVC, the ‘avcC’ box;

  • may be unreferenced by media samples in this CMAF track or other CMAF tracks in a CMAF switching set;

  • may be constrained by dependency on other CMAF tracks and CMAF headers in a CMAF switching set. See subclause 9.3.6.

NOTE 3 CMAF headers can contain VisualSampleEntries intended for use by other CMAF tracks in a CMAF switching set or for the purpose of initializing decoding and display, but not referenced by any slice header in a media sample.

Must Fix: tkhd matrix values as specified in ISO/IEC 23000-19 9.2.3.

The value of the matrix field signals the video orientation. Non-identity matrices shall be rotations in multiples of 90 degrees.

  • When video is not rotated, matrix shall be {0x00010000, 0, 0, 0, 0x00010000, 0, 0, 0, 0x40000000}.

  • When video should be rotated 90 degrees clockwise for display, matrix should be {0, 0x00010000, 0, 0xFFFF0000, 0, 0, height<<16, 0, 0x40000000}.

  • When video should be rotated 180 degrees for display, matrix should be {0xFFFF0000, 0, 0, 0, 0xFFFF0000, 0, width<<16, height<<16, 0x40000000}.

  • When video should be rotated 90 degrees counter-clockwise for display, matrix should be {0, 0xFFFF0000, 0, 0x00010000, 0, 0, 0, width<<16, 0x40000000}.

Should Fix: Media duration, greater than 2 seconds, requires SAP type 3 to be used. Did not find SAP reference of type 3 23000-19 9.2.9

When coded video sequences have durations longer than 2 seconds, pictures of SAP type 3 should be encoded every 2 seconds or less to provide additional random access video media samples. For longer coded video sequences and resulting CMAF fragment durations, additional type 3 SAPs (open GOP independently decodable pictures) enable independent picture decoding for fast forward, fast reverse, and resumption of normal playback, while improving visual uniformity and lowering bit rate relative to groups of pictures with type 1 o 2 SAPs limited to the equivalent random access duration.

Must Fix: ISO/IEC 23000-19 9.2.6 sample_is_non_sync_sample shall be 0 when SAP is type 1 or 2

Within a video CMAF track, any TrackRunBox that describes any non-sync pictures shall identify picture dependencies using a combination of the sample_flags and first_sample_flags fields, or default flags in the corresponding TrackFragmentHeaderBox:

  • sample_is_non_sync_sample shall be 0 for SAP type 1 or 2, and 1 otherwise;

  • sample_depends_on should be 1 or 2 (the value 2 identifies I pictures);

  • sample_is_depended_on should be 2 for disposable pictures.

Should Fix: Sample_depends_on should be 1 or 2 when SAP is type 1 or 2 See ISO/IEC 23000-19 9.2.6

Within a video CMAF track, any TrackRunBox that describes any non-sync pictures shall identify picture dependencies using a combination of the sample_flags and first_sample_flags fields, or default flags in the corresponding TrackFragmentHeaderBox:

  • sample_is_non_sync_sample shall be 0 for SAP type 1 or 2, and 1 otherwise;

  • sample_depends_on should be 1 or 2 (the value 2 identifies I pictures);

  • sample_is_depended_on should be 2 for disposable pictures.

Should Fix: More than one PPS NAL was found in the first video sample entry. There should only be one according to ISO/IEC 23000-19 9.3.2.2

The SampleDescriptionBox in a video track shall contain one or more visual sample entries conforming to ISO/IEC 14496-12 and ISO/IEC 14496-15. The first VisualSampleEntry in the SampleDescriptionBox:

  • shall include width and height field values that equal or exceed the largest cropped horizontal and vertical video spatial sample counts in any sequence parameter set referenced by a video slice in the NAL structured video track;

  • should contain no more than one NAL parameter set of each type, e.g., for AVC video, one SPS NAL with VUI and one PPS NAL in the AVCDecoderConfigurationRecord;

  • and contain a decoder configuration record that:

  • should signal the lowest codec profile, level, height and width values that are equal to or greater than the values required for all the CMAF fragments in the CMAF track. See general constraints in subclause 9.2.4;

  • should set LengthSizeMinusOne field to the value “3” (to indicate 4 bytes) to address large VCL NALs and simplify conversion of elementary streams between MPEG-2 TS bytestreams with startcodes and ISO/IEC 14496-15 with NAL length headers;

NOTE 1 The size of the NAL header length field defined in video tracks conforming to ISO/IEC 14496-15 is stored in the field LengthSizeMinusOne in the corresponding decoder configuration record, e.g., for AVC video in the AVCDecoderConfigurationRecord.

  • shall contain one or more ColorInformationBoxes with sub-type ‘nclx’ and a PixelAspectRatioBox ‘pasp’, as documented in ISO/IEC 14496-12, if the first sample entry contains no SPS NAL with VUI in the decoder configuration record.

NOTE 2 A decoder configuration record without a parameter set is valid for a sample description such as ‘avc3’ or ‘hev1’ that can store the parameter set NALs necessary for decoding and display in each CMAF fragment.

  • Any subsequent sample entries in the SampleDescriptionBox:

  • should contain one parameter set including VUI in each visual sample entry’s decoder configuration box, e.g., for AVC, the ‘avcC’ box;

  • may be unreferenced by media samples in this CMAF track or other CMAF tracks in a CMAF switching set;

  • may be constrained by dependency on other CMAF tracks and CMAF headers in a CMAF switching set. See subclause 9.3.6.

NOTE 3 CMAF headers can contain VisualSampleEntries intended for use by other CMAF tracks in a CMAF switching set or for the purpose of initializing decoding and display, but not referenced by any slice header in a media sample.

Should Fix: Non start sample entries should contain one parameter set including VUI. Found one instance at least where that is not the case. See ISO/IEC 23000-19 9.3.2.2

Any subsequent sample entries in the SampleDescriptionBox:

  • should contain one parameter set including VUI in each visual sample entry’s decoder configuration box, e.g., for AVC, the ‘avcC’ box;

  • may be unreferenced by media samples in this CMAF track or other CMAF tracks in a CMAF switching set;

  • may be constrained by dependency on other CMAF tracks and CMAF headers in a CMAF switching set. See subclause 9.3.6.

NOTE 3 CMAF headers can contain VisualSampleEntries intended for use by other CMAF tracks in a CMAF switching

set or for the purpose of initializing decoding and display, but not referenced by any slice header in a media sample.

Must Fix: In band parameter set NAL units found. ISO/IEC 14496-15 clause 5.2 requires sample entries of avc1 or avc2 to disallow in band parameter set NAL units.

Should Fix: ISOIEC 23000-19 9.3.4, in band parameter set NAL units having the same indexes ss parameter set NAL units in the header sample entries shall be equal.

Each CMAF fragment shall contain one or more complete coded video sequences, as specified by the video codec. Consequently, the first media sample in all NAL structured video CMAF fragments is SAP type 1 or 2, as specified in ISO/IEC 14496-12. NAL structured video sample descriptions that allow inband parameter sets (e.g., avc3 and hev1) shall contain all SPS and PPS NAL units referenced by a coded video sequence in the first access unit of that coded video sequence, immediately following the access unit delimiter NAL unit, if present; followed by SEI NAL units, if present; followed by the VCL NAL unit(s) of the first access unit. If sample entries also exist in the CMAF tracks CMAF header using the same sample entry and parameter set indexes as an inband parameter set, then they shall contain the same SPS and PPS NAL units in their decoder configuration record as the inband parameter set.

Must Fix: frame_mbs_only_flag shall be set to 1 for AVC in CMAF as required by ISOIEC 23000-19 9.4.2.2.1

Sequence parameter set NAL units that occur in an AVC video CMAF track shall conform to ISO/IEC 14496-10 with the following additional constraints.

  • The following fields have pre-determined values as follows:

  • frame_mbs_only_flag shall be set to 1;

  • vui_parameters_present_flag shall be set to 1;

  • gaps_in_frame_num_value_allowed_flag should be set to 0.

The values of the following fields shall not change throughout a CMAF track.
  • chroma_format_idc

  • bit_depth_luma_minus8

  • bit_depth_chroma_minus8

The maximum values of the following fields are specified by CMAF media profiles in A.2.
  • profile_idc

  • level_idc

  • pic_width_in_mbs_minus1

  • pic_height_in_map_units_minus1

The following fields may change per CMAF fragment within a CMAF track. Changes in these parameters shall require a different sample entry, if the SPS NALs referenced are stored in sample entries.
  • pic_width_in_mbs_minus1

  • pic_height_in_map_units_minus1

  • frame_crop_right_offset

  • frame_crop_bottom_offset

  • max_num_ref_frames

Must Fix: vui_parameters_present_flag shall be set to 1 for AVC in CMAF as required by ISOIEC 23000*19 9.4.2.2.1

see above

Should Fix: gaps_in_frame_num_value_allowed_flag should be set to 0 for AVC in CMAF as required by ISOIEC 23000-19 9.4.2.2.1

see above

must Fix: avc chroma_format_idc value shall not change as required by ISOIEC 23000-19 9.4.2.2.1.

see above

must Fix: avc bit_depth_luma_minus8 value shall not change as required by ISOIEC 23000-19 9.4.2.2.1.

see above

Must Fix: bit_depth_chroma_minus8 value shall not change as required by ISOIEC 23000-19 9.4.2.2.1.

see above

Must Fix: brand cfsd, cfhd, chdf profile_idc shall be either 100 or 77 according to ISOIEC 23000-19 9.4.2.2.1 and ISOIEC 14496-10 A.2.4

see above

Must Fix: brand cfsd, cfhd, chdf level_idc shall be one of 31, 40 or 42 according to ISOIEC 23000-19 9.4.2.2.1 and ISOIEC 14496-10 A.3.1.

see above

Must Fix: ic_width_in_mbs_minus1 shall be either 53 or 119 according to ISOIEC 23000-19 9.4.2.2.1 and ISOIEC 14496-10 7.4.1.

see above

Must Fix: pic_height_in_map_units_minus1 has an incorrect value according to ISOIEC 23000-19 9.4.2.2.1 and ISOIEC 14496-10 7.4.1.

see above

Must Fix: aspect_ratio_info_present_flag is not set to True. It shall be True when present. see ISOIEC 23000-19 9.4.2.2.2

VUI parameters that occur within a CMAF AVC video track shall conform to ISO/IEC 14496-10 with the following additional constraints. The following fields have pre-determined values as follows:

  • video_signal_type_present_flag should be set to 1, and the value of video_full_range_flag when not present shall be assumed to be 0.

NOTE 1 This indicates normal black setup, i.e., black level is 16 for 8-bit video.

  • aspect_ratio_info_present_flag shall be set to 1. aspect_ratio_idc shall not be set to 0 (unknown).

NOTE 2 These parameters refer to the video spatial sample aspect ratio, not picture aspect ratio. Always setting the value distinguishes between content that omitted the value because it intended the default or just failed to set it properly.

  • overscan_info_present_flag, if present, shall be set to 0.

NOTE 3 Exact scan encoding of the active image is used for reliable image framing by devices and precise adaptive scaling during adaptive switching.
  • aspect_ratio_idc shall be set to 1.

  • If video_signal_type_present_flag is set to 1, colour_description_present_flag should be set to 1.

NOTE 4 As defined in ISO/IEC 14496-10, if the colour_description_present_flag is set to 1, the colour_primaries, transfer_characteristics and matrix_coefficients fields are present.

  • If colour_description_present_flag is set to 1, then colour_primaries, transfer_characteristics and matrix_coefficients shall be set to the values used in the AVC video CMAF track.

  • If colour_description_present_flag is set to 0, the following values shall be assumed in AVC video CMAF tracks:

  • colour_primaries = 1;

  • transfer_characteristics = 1;

  • matrix_coefficients = 1.

  • The presence and values of the following fields shall not change in a CMAF track.

  • low_delay_hrd_flag

  • colour_primaries

  • transfer_characteristics

  • matrix_coefficients

Must Fix: aspect_ratio_idc shall not be zero. See ISOIEC 23000-19 9.4.2.2.2

see above

Must Fix: overscan_info_present_flag shall be zero. See ISOIEC 23000-19 9.4.2.2.2

see above

Must Fix: colour_primaries shall be one of 1,5, 6 if colour_description_flag is 1. See SOIEC 23000-19 9.4.2.2.2

see above

Must Fix: matrix_coefficients shall be one of 1,5, 6 if colour_description_flag is 1. See ISOIEC 23000-19 9.4.2.2.2

see above

Must Fix: transfer_characteristics shall be 1 or 6 if colour_description_flag is 1. See ISOIEC 23000-19 9.4.2.2.2

see above

Must Fix: low_delay_hrd_flag value shall not change. See ISOIEC 23000-19 9.4.2.2

see above

Must Fix: transfer_characteristics value shall not change. See ISOIEC 23000-19 9.4.2.2

see above

Must Fix: matrix_coefficients value shall not change See ISOIEC 23000-19 9.4.2.2

see above

Must Fix: colour_primaries value shall not change. See ISOIEC 23000-19 9.4.2.2

see above

Should Fix: The first visual sample entry should have LengthSizeMinusOne set to 3. See ISO/IEC 23000-19 9.3.2.2

The SampleDescriptionBox in a video track shall contain one or more visual sample entries conforming to ISO/IEC 14496-12 and ISO/IEC 14496-15. The first VisualSampleEntry in the SampleDescriptionBox:

  • shall include width and height field values that equal or exceed the largest cropped horizontal and vertical video spatial sample counts in any sequence parameter set referenced by a video slice in the NAL structured video track;

  • should contain no more than one NAL parameter set of each type, e.g., for AVC video, one SPS NAL with VUI and one PPS NAL in the AVCDecoderConfigurationRecord; and contain a decoder configuration record that:

  • should signal the lowest codec profile, level, height and width values that are equal to or greater than the values required for all the CMAF fragments in the CMAF track. See general constraints in subclause 9.2.4;

  • should set LengthSizeMinusOne field to the value 3 (to indicate 4 bytes) to address large VCL NALs and simplify conversion of elementary streams between MPEG-2 TS bytestreams with startcodes and ISO/IEC 14496-15 with NAL length headers;

NOTE 1 The size of the NAL header length field defined in video tracks conforming to ISO/IEC 14496-15 is stored in the field LengthSizeMinusOne in the corresponding decoder configuration record, e.g., for AVC video in the AVCDecoderConfigurationRecord.

  • shall contain one or more ColorInformationBoxes with sub-type ‘nclx’ and a PixelAspectRatioBox pasp, as documented in ISO/IEC 14496-12, if the first sample entry contains no SPS NAL with VUI in the decoder configuration record.

NOTE 2 A decoder configuration record without a parameter set is valid for a sample description such as ‘avc3’ or ‘hev1’ that can store the parameter set NALs necessary for decoding and display in each CMAF fragment.

Any subsequent sample entries in the SampleDescriptionBox:

  • should contain one parameter set including VUI in each visual sample entry s decoder configuration box, e.g., for AVC, the ‘avcC’ box;

  • may be unreferenced by media samples in this CMAF track or other CMAF tracks in a CMAF switching set;

  • may be constrained by dependency on other CMAF tracks and CMAF headers in a CMAF switching set. See subclause 9.3.6.

NOTE 3 CMAF headers can contain VisualSampleEntries intended for use by other CMAF tracks in a CMAF switching set or for the purpose of initializing decoding and display, but not referenced by any slice header in a media sample.

Should Fix: The decoder configuration record shall contain an nclx and pasp box when no SPS VUI. See ISO/IEC 23000-19 9.3.2.2

see above

Should Fix: More than one SPS NAL was found in the first video sample entry. There should only be one according to ISO/IEC 23000-19 9.3.2.2

see above

Must Fix: Found an access unit delimiter before an inline parameter set. ISO/IEC 23000-19 9.3.4 requires that delimiter nal units come after parameter set nal units

Each CMAF fragment shall contain one or more complete coded video sequences, as specified by the video codec. Consequently, the first media sample in all NAL structured video CMAF fragments is SAP type 1 or 2, as specified in ISO/IEC 14496-12. NAL structured video sample descriptions that allow inband parameter sets (e.g., avc3 and hev1) shall contain all SPS and PPS NAL units referenced by a coded video sequence in the first access unit o f that coded video sequence, immediately following the access unit delimiter NAL unit, if present; followed by SEI NAL units, if present; followed by the VCL NAL unit(s) of the first access unit. If sample entries also exist in the CMAF tracks CMAF header using the same sample entry and parameter set indexes as an inband parameter set, then they shall contain the same SPS and PPS NAL units in their decoder configuration record as the inband parameter set.

Must Fix: Found sei nal before in line parameter set nal units. ISO/IEC 23000-19 9.3.4 requires that sei nal units come after parameter set nal units

see above

Must Fix: Found vcl nal before sei nal units. ISO/IEC 23000-19 9.3.4 requires that vcl nal units come after sei nal units

see above

Must Fix: Found vcl nal before in line parameter set nal units. ISO/IEC 23000-19 9.3.4 requires that vcl nal units come after in line parameter nal units

see above

Must Fix In band parameter set NAL units found. ISO/IEC 14496-15 clause 8.3.1 requires sample entries of avc1 or avc2 to disallow in band parameter set NAL units.

text to be added

ISO/IEC 14496-12 8.16.3 8.16.3 Segment index box

timescale provides the timescale, in ticks per second, for the time and duration fields within this box; it is recommended that this match the timescale of the reference stream or track; for files based on this document, that is the timescale field of the media header box of the track.

shouldFix timescale in segment index matches mdhd timescale

earliest_presentation_time is the earliest presentation time of any content in the reference stream in the first subsegment, in the timescale indicated in the timescale field; the earliest presentation time is derived from media in access units, or parts of access units, that are not omitted by an edit list (if any);

Must Fix min presentation time of segment not matching sidx expected media presentation time sidx expected presentation time:

first_offset is the distance in bytes, in the file containing media, from the anchor point, to the first byte of the indexed material. reference_count provides the number of referenced items; referenced_size: the distance in bytes from the first byte of the referenced item to the first byte of the next referenced item, or in the case of the last entry, the end of the referenced material;

ShouldFix: sidx box test, not all references point to (sub-)segment, this indicates that sidx box or track may be corrupted (this indidactes that corresponding top level boxes were not found at indicated byte offsets)

shouldFix: CMAF track but found index pointing to non MovieFragmentBox this contradicts 23000-19 and 14496-12 (note that 23000-19 has no explicit sidx requirements, hence the should fix)

shouldFix: CMAF segments should start with a SAP type 1 or 2 starts with sap in sidx set to zero indicating no SAP point at start

shouldFix: SAP_delta_time greater than zero CMAF segments start with SAP 1 or SAP 2

CMAF segments start with a SAP type 1 or 2, therefore startWithSAP should not be zero and SAP_delta_time should be greater than zero.

reference_type: when set to 1 indicates that the reference is to a SegmentIndexBox; otherwise the reference is to media content (e.g., in the case of files based on this document, to a MovieFragmentBox) if a separate index segment is used, then entries with reference type 1 are in the index segment, and entries with reference type 0 are in the media file.

mustFix ISO/IEC 14496-12 8.16.3 for reference_type 0=MEDIA, a sidx box references a moof box, but not found’ mustFix or should fix: ISO/IEC 23000-19 sidx box is referencing another sidx box, while this is allowed in 14496-12, it is not in 23000-19 (implicitly derived)