draft-ietf-avt-mpeg4-multisl-01.txt   draft-ietf-avt-mpeg4-multisl-02.txt 
Internet Engineering Task Force Avaro-France Telecom Internet Engineering Task Force Basso-AT&T
Internet Draft Basso-AT&T Internet Draft Civanlar-AT&T
Casner-Packet Design
Civanlar-AT&T
Gentric-Philips Gentric-Philips
Herpel-Thomson Herpel-Thomson
Lifshitz-Optibase Lifshitz-Optibase
Lim-mp4cast Lim-mp4cast
Perkins-ISI Perkins-ISI
van der Meer-Philips Van Der Meer-Philips
July 2001 September 2001
Expires Jan. 2002 Expires March 2002
Document: draft-ietf-avt-mpeg4-multisl-01.txt Document: draft-ietf-avt-mpeg4-multisl-02.txt
RTP Payload Format for MPEG-4 Streams RTP Payload Format for MPEG-4 Streams
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at line 53 skipping to change at line 51
This document contains a MIME type registration form that is This document contains a MIME type registration form that is
intended to be taken as-is and therefore makes reference to this intended to be taken as-is and therefore makes reference to this
document, using the temporary placeholder: <self-reference-to-this>. document, using the temporary placeholder: <self-reference-to-this>.
Abstract Abstract
This document describes a payload format for transporting MPEG-4 This document describes a payload format for transporting MPEG-4
encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for
the coding of natural and synthetic audio-visual data. Several the coding of natural and synthetic audio-visual data. Several
services provided by RTP are beneficial for MPEG-4 encoded data services provided by RTP are beneficial for MPEG-4 encoded data
Gentric et al. Expires January 2002 1
RTP Payload Format for MPEG-4 Streams July 2001
transport over the Internet. Additionally, the use of RTP makes it transport over the Internet. Additionally, the use of RTP makes it
possible to synchronize MPEG-4 data with other real-time data types. possible to synchronize MPEG-4 data with other real-time data types.
Gentric et al. Expires March 2002 1
RTP Payload Format for MPEG-4 Streams September 2001
1. Introduction 1. Introduction
MPEG-4 is a recent standard from ISO/IEC for the coding of natural MPEG-4 is a recent standard from ISO/IEC for the coding of natural
and synthetic audio-visual data in the form of audiovisual objects and synthetic audio-visual data in the form of audiovisual objects
that are arranged into an audiovisual scene by means of a scene that are arranged into an audiovisual scene by means of a scene
description [1][2][3][4]. This draft specifies an RTP [5] payload description [1][2][3][4]. This draft specifies an RTP [5] payload
format for transporting MPEG-4 encoded data streams. format for transporting MPEG-4 encoded data streams.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
skipping to change at line 86 skipping to change at line 83
ii. Monitoring MPEG-4 delivery performance through RTCP ii. Monitoring MPEG-4 delivery performance through RTCP
iii. Combining MPEG-4 and other real-time data streams received from iii. Combining MPEG-4 and other real-time data streams received from
multiple end-systems into a set of consolidated streams through RTP multiple end-systems into a set of consolidated streams through RTP
mixers mixers
iv. Converting data types, etc. through the use of RTP translators. iv. Converting data types, etc. through the use of RTP translators.
1.1 Overview of MPEG-4 End-System Architecture 1.1 Overview of MPEG-4 End-System Architecture
Fig. 1 below shows the layered architecture of a terminal which Fig. 1 below shows the layered architecture of a terminal, which
implements the complete MPEG-4 systems model. The Compression Layer implements the complete MPEG-4 systems model. The Compression Layer
processes individual audio-visual media streams. The MPEG-4 processes individual audio-visual media streams. The MPEG-4
compression schemes are defined in the ISO/IEC specifications 14496- compression schemes are defined in the ISO/IEC specifications 14496-
2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve 2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve
efficient encoding over a bandwidth ranging from several kbps to efficient encoding over a bandwidth ranging from several kbps to
many Mbps. The audio-visual content compressed by this layer is many Mbps. The audio-visual content compressed by this layer is
organized into Elementary Streams (ESs). organized into Elementary Streams (ESs).
The MPEG-4 standard specifies MPEG-4 compliant streams. Within the The MPEG-4 standard specifies MPEG-4 compliant streams. Within the
constraint of this compliance the compression layer is unaware of a constraint of this compliance the compression layer is unaware of a
specific delivery technology, but it can be made to react to the specific delivery technology, but it can be made to react to the
skipping to change at line 111 skipping to change at line 108
technologies that are different than the one it is specifically technologies that are different than the one it is specifically
designed to operate with. designed to operate with.
The hierarchical relations, location and properties of ESs in a The hierarchical relations, location and properties of ESs in a
presentation are described by a dynamic set of Object Descriptors presentation are described by a dynamic set of Object Descriptors
(ODs). Each OD groups one or more ES Descriptors referring to a (ODs). Each OD groups one or more ES Descriptors referring to a
single content item (audio-visual object). Hence, multiple single content item (audio-visual object). Hence, multiple
alternative or hierarchical representations of each content item are alternative or hierarchical representations of each content item are
possible. possible.
Gentric et al. Expires January 2002 2
RTP Payload Format for MPEG-4 Streams July 2001
ODs are themselves conveyed through one or more ESs. A complete set ODs are themselves conveyed through one or more ESs. A complete set
of ODs can be seen as an MPEG-4 resource or session description at a of ODs can be seen as an MPEG-4 resource or session description at a
Gentric et al. Expires March 2002 2
RTP Payload Format for MPEG-4 Streams September 2001
stream level. The resource description may itself be hierarchical, stream level. The resource description may itself be hierarchical,
i.e. an ES conveying an OD may describe other ESs conveying other i.e. an ES conveying an OD may describe other ESs conveying other
ODs. ODs.
The session description is accompanied by a dynamic scene The session description is accompanied by a dynamic scene
description, Binary Format for Scene (BIFS), again conveyed through description, Binary Format for Scene (BIFS), again conveyed through
one or more ESs. At this level, content is identified in terms of one or more ESs. At this level, content is identified in terms of
audio-visual objects. The spatio-temporal location of each object is audio-visual objects. The spatio-temporal location of each object is
defined by BIFS. The audio-visual content of those objects that are defined by BIFS. The audio-visual content of those objects that are
synthetic and static are described by BIFS also. Natural and synthetic and static are described by BIFS also. Natural and
skipping to change at line 166 skipping to change at line 164
media or control data (ODs, BIFS). Integer or fractional AUs are media or control data (ODs, BIFS). Integer or fractional AUs are
then encapsulated in SL packets and in the following we will then encapsulated in SL packets and in the following we will
describe this payload format as transporting SL packets, although in describe this payload format as transporting SL packets, although in
many cases SL packet payloads are actually (entire) Access Units many cases SL packet payloads are actually (entire) Access Units
payloads i.e. encoded media frames. All consecutive data from one payloads i.e. encoded media frames. All consecutive data from one
stream is called an SL-packetized stream at this layer. The stream is called an SL-packetized stream at this layer. The
interface between the compression layer and the SL is called the interface between the compression layer and the SL is called the
Elementary Stream Interface (ESI). The ESI is informative i.e. it is Elementary Stream Interface (ESI). The ESI is informative i.e. it is
extremely useful in order to define concepts and mechanisms but does extremely useful in order to define concepts and mechanisms but does
not have to be implemented. For the same reason this draft describes not have to be implemented. For the same reason this draft describes
Gentric et al. Expires January 2002 3
RTP Payload Format for MPEG-4 Streams July 2001
the transport of SL packets i.e. Access Units or fragments thereof. the transport of SL packets i.e. Access Units or fragments thereof.
It is important to note however that a SL stream can be configured It is important to note however that a SL stream can be configured
Gentric et al. Expires March 2002 3
RTP Payload Format for MPEG-4 Streams September 2001
so that SL packets are reduced to the media (compressed) data and in so that SL packets are reduced to the media (compressed) data and in
that case implementations do not need to be aware of the SL at all. that case implementations do not need to be aware of the SL at all.
The Delivery Layer in MPEG-4 consists of the Delivery Multimedia The Delivery Layer in MPEG-4 consists of the Delivery Multimedia
Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is
media unaware but delivery technology aware. It provides transparent media unaware but delivery technology aware. It provides transparent
access to and delivery of content irrespective of the technologies access to and delivery of content irrespective of the technologies
used. The interface between the SL and DMIF is called the DMIF used. The interface between the SL and DMIF is called the DMIF
Application Interface (DAI). It offers content location independent Application Interface (DAI). It offers content location independent
procedures for establishing MPEG-4 sessions and access to transport procedures for establishing MPEG-4 sessions and access to transport
skipping to change at line 221 skipping to change at line 219
+-------------------------------------------+ +-------------------------------------------+
Figure 1: Conceptual MPEG-4 terminal architecture Figure 1: Conceptual MPEG-4 terminal architecture
1.2 MPEG-4 Elementary Stream Data Packetization 1.2 MPEG-4 Elementary Stream Data Packetization
The ESs from the encoders are fed into the SL with indications of AU The ESs from the encoders are fed into the SL with indications of AU
boundaries, random access points, desired composition time and the boundaries, random access points, desired composition time and the
current time. current time.
Gentric et al. Expires January 2002 4
RTP Payload Format for MPEG-4 Streams July 2001
The Sync Layer fragments the ESs into SL packets, each containing a The Sync Layer fragments the ESs into SL packets, each containing a
header that encodes information conveyed through the ESI. If the AU header that encodes information conveyed through the ESI. If the AU
is larger than a SL packet, subsequent packets containing remaining is larger than a SL packet, subsequent packets containing remaining
Gentric et al. Expires March 2002 4
RTP Payload Format for MPEG-4 Streams September 2001
parts of the AU are generated with subset headers until the complete parts of the AU are generated with subset headers until the complete
AU is packetized. AU is packetized.
The syntax of the Sync Layer is configurable and can be adapted to The syntax of the Sync Layer is configurable and can be adapted to
the needs of the stream to be transported. This includes the the needs of the stream to be transported. This includes the
possibility to select the presence or absence of individual syntax possibility to select the presence or absence of individual syntax
elements as well as configuration of their length in bits. The elements as well as configuration of their length in bits. The
configuration for each individual stream is conveyed in a configuration for each individual stream is conveyed in a
SLConfigDescriptor, which is an integral part of the ES Descriptor SLConfigDescriptor, which is an integral part of the ES Descriptor
for this stream. The MPEG-4 SLConfigDescriptor, being configuration for this stream. The MPEG-4 SLConfigDescriptor, being configuration
skipping to change at line 276 skipping to change at line 275
To avoid unnecessary overhead and potential interoperability risks To avoid unnecessary overhead and potential interoperability risks
when transporting MPEG-4 systems, it is desirable to remove the when transporting MPEG-4 systems, it is desirable to remove the
redundancy between the SL packet header and the RTP packet header. redundancy between the SL packet header and the RTP packet header.
To be independent on the use of MPEG-4 systems, synchronization can To be independent on the use of MPEG-4 systems, synchronization can
rely on the parameters provided in the RTP header. rely on the parameters provided in the RTP header.
In case SL headers are used, the redundant fields are removed from In case SL headers are used, the redundant fields are removed from
the SL header, producing "reduced SL headers". the SL header, producing "reduced SL headers".
The remaining information from the SL header, if any, is contained The remaining information from the SL header, if any, is contained
inside the RTP packet payload, together with the SL packet payload. inside the RTP packet payload, together with the SL packet payload.
Gentric et al. Expires January 2002 5
RTP Payload Format for MPEG-4 Streams July 2001
The combination of RTP packet headers and reduced SL packet headers The combination of RTP packet headers and reduced SL packet headers
can be used to logically map the RTP packets to complete SL packets. can be used to logically map the RTP packets to complete SL packets.
Gentric et al. Expires March 2002 5
RTP Payload Format for MPEG-4 Streams September 2001
Some of the information contained in the reduced SL headers is also Some of the information contained in the reduced SL headers is also
useful for transport over RTP when MPEG-4 systems is not used. useful for transport over RTP when MPEG-4 systems is not used.
For that reason the information in the "reduced" SL headers is split For that reason the information in the "reduced" SL headers is split
into "general useful information" and "MPEG-4 systems only into "general useful information" and "MPEG-4 systems only
information". information".
The "general useful information" hereinafter called Mapped SL Packet The "general useful information" hereinafter called Mapped SL Packet
Header (MSLH) is carried by a number of fields configurable using Header (MSLH) is carried by a number of fields configurable using
parameters defined in section 4.1; all receivers MUST parse these parameters defined in section 4.1; all receivers MUST parse these
skipping to change at line 331 skipping to change at line 329
Figure 2: Mapping of SL Packet into RTP packet Figure 2: Mapping of SL Packet into RTP packet
When the configuration is such that SL packet headers map directly When the configuration is such that SL packet headers map directly
to RTP headers this process of mapping SL packet headers is purely to RTP headers this process of mapping SL packet headers is purely
conceptual. For example this RTP payload format has been designed so conceptual. For example this RTP payload format has been designed so
that it is by default configured to be identical to RFC 3016 for the that it is by default configured to be identical to RFC 3016 for the
recommended MPEG-4 video configurations (see section 5.5). Hence recommended MPEG-4 video configurations (see section 5.5). Hence
receivers that comply with this payload specification can decode receivers that comply with this payload specification can decode
such RTP payload without knowledge about the Synch Layer (see the such RTP payload without knowledge about the Synch Layer (see the
example in Appendix.1). In a similar fashion MPEG-4 audio (see example in Appendix.1). In a similar fashion MPEG-4 audio (see
Gentric et al. Expires January 2002 6
RTP Payload Format for MPEG-4 Streams July 2001
Appendix for examples) can be transported without explicit use of Appendix for examples) can be transported without explicit use of
the Synch Layer. the Synch Layer.
Gentric et al. Expires March 2002 6
RTP Payload Format for MPEG-4 Streams September 2001
3. Payload Format 3. Payload Format
The RTP Payload corresponds to an integer number of SL packets. The RTP Payload corresponds to an integer number of SL packets.
If multiple SL packets are transported in each RTP packet, they MUST If multiple SL packets are transported in each RTP packet, they MUST
be in decoding order, i.e: be in decoding order, i.e:
i) decodingTimeStamp order, if present i) decodingTimeStamp order, if present
ii) packetSequenceNumber order, if present ii) packetSequenceNumber order, if present
iii) Implicit decoding order in all other cases. iii) Implicit decoding order in all other cases.
skipping to change at line 371 skipping to change at line 368
above). In case of interleaving the first SL packet of each RTP above). In case of interleaving the first SL packet of each RTP
packet is used as reference as in the following examples of RTP packet is used as reference as in the following examples of RTP
packets containing interleaved SL packets. packets containing interleaved SL packets.
This sequence is correct: [0,2,4][1,3,5] This sequence is correct: [0,2,4][1,3,5]
This sequence is correct: [0,3,6][1,2][4,5] This sequence is correct: [0,3,6][1,2][4,5]
This sequence is correct: [0,3,6][1,4][2,5] This sequence is correct: [0,3,6][1,4][2,5]
This sequence is prohibited: [0,4,2][1,5,3] This sequence is prohibited: [0,4,2][1,5,3]
This sequence is prohibited: [1,3,5][0,2,4] This sequence is prohibited: [1,3,5][0,2,4]
This sequence is prohibited: [0,3,6][2,5][1,4] This sequence is prohibited: [0,3,6][2,5][1,4]
In the multiple-SL modes senders MUST make sure that no fields
undergo roll over inside one RTP packet. This may limit the number
of SL packets inside one RTP packet and, when interleaving, may
limit the interleaving period.
The size (or number) of the SL packet(s) SHOULD be adjusted such The size (or number) of the SL packet(s) SHOULD be adjusted such
that the resulting RTP packet is not larger than the path-MTU. To that the resulting RTP packet is not larger than the path-MTU. To
handle larger packets, this payload format relies on lower layers handle larger packets, this payload format relies on lower layers
for fragmentation, which may not be desirable. for fragmentation, which may not be desirable.
3.1 RTP Header Fields Usage 3.1 RTP Header Fields Usage
Payload Type (PT): The assignment of an RTP payload type for this Payload Type (PT): The assignment of an RTP payload type for this
new packet format is outside the scope of this document, and will new packet format is outside the scope of this document, and will
not be specified here. It is expected that the RTP profile for a not be specified here. It is expected that the RTP profile for a
particular class of applications will assign a payload type for this particular class of applications will assign a payload type for this
encoding, or if that is not done then a payload type in the dynamic encoding, or if that is not done then a payload type in the dynamic
range shall be chosen. range shall be chosen.
Gentric et al. Expires March 2002 7
RTP Payload Format for MPEG-4 Streams September 2001
Marker (M) bit: The M bit is set to 1 when all SL packets in the RTP Marker (M) bit: The M bit is set to 1 when all SL packets in the RTP
packet are Access Units ends i.e. the M bit maps to the Synch Layer packet are Access Units ends i.e. the M bit maps to the Synch Layer
accessUnitEndFlag. accessUnitEndFlag.
Gentric et al. Expires January 2002 7
RTP Payload Format for MPEG-4 Streams July 2001
Specifically the M bit is set to 0 when the RTP packet contains one Specifically the M bit is set to 0 when the RTP packet contains one
or more Access Unit fragments that are not Access Unit ends, and the or more Access Unit fragments that are not Access Unit ends, and the
M bit is set to 1 for RTP packets that contain either: M bit is set to 1 for RTP packets that contain either:
. A single complete Access Unit . A single complete Access Unit
. The last fragment of an Access Unit . The last fragment of an Access Unit
. Several complete Access Units . Several complete Access Units
. Several last fragments of Access Units . Several last fragments of Access Units
. A mix of complete Access Units and last fragments of Access Units . A mix of complete Access Units and last fragments of Access Units
Therefore for streams where all SL packets are complete Access Units Therefore for streams where all SL packets are complete Access Units
the M bit is 1 for all RTP packets. the M bit is 1 for all RTP packets.
Extension (X) bit: Defined by the RTP profile used. Extension (X) bit: Defined by the RTP profile used.
Sequence Number: The RTP sequence number should be generated by the Sequence Number: The RTP sequence number should be generated by the
sender with a constant random offset and does not have to be sender with a constant random offset and does not have to be
correlated to any (optional) MPEG-4 SL sequence numbers. correlated to any (optional) MPEG-4 SL sequence numbers.
Timestamp: Set to the value in the compositionTimeStamp field of the Timestamp: Set to the value in the compositionTimeStamp field of the
first SL packet in the RTP packet, if present. If first SL packet in the RTP packet, if present.
compositionTimeStamp has less than 32 bits length, the MSBs of
timestamp MUST be set to zero.
Although it is available from the SL configuration data, the
resolution of the timestamp may need to be conveyed explicitly
through some out-of-band means to be used by network elements that
are not MPEG-4 aware.
If compositionTimeStamp has more than 32 bits length, this payload If compositionTimeStamp has less than 32 bits length, the RTP
format cannot be used. timestamp is incremented to extend it out to 32 bits. If
compositionTimeStamp has more than 32 bits length, the RTP timestamp
uses the 32 LSB of it. The resolution of the timestamp
(timeStampLength) is available from the SL configuration data and
shall be used by receivers to reconstruct compositionTimeStamps with
the original bit length. When making SL streams specifically for
usage with this payload format it is RECOMMENDED to use
timeStampLength=32.
In all cases, the sender SHALL always make sure that RTP time stamps In all cases, the sender SHALL always make sure that RTP time stamps
are identical only for RTP packets transporting fragments of the are identical only for RTP packets transporting fragments of the
same Access Unit. same Access Unit.
In case compositionTimeStamp is not present in the current SL In case compositionTimeStamp is not present in the current SL
packet, but has been present in a previous SL packet the reason is packet, but has been present in a previous SL packet the reason is
that this is the same Access Unit that has been fragmented, that this is the same Access Unit that has been fragmented,
therefore the same timestamp value MUST be taken as RTP timestamp. therefore the same timestamp value MUST be taken as RTP timestamp.
If compositionTimeStamp is never present in SL packets for this If compositionTimeStamp is never present in SL packets for this
stream, the RTP packetizer SHOULD convey a reading of a local clock stream, the RTP packetizer SHOULD convey a reading of a local clock
at the time the RTP packet is created. at the time the RTP packet is created.
According to RFC1889 [5, Section 5.1] timestamps are recommended to According to RFC1889 [5, Section 5.1] timestamps are recommended to
start at a random value for security reasons. However then, a start at a random value for security reasons. However then, a
receiver is not in the general case able to reconstruct the original receiver is not in the general case able to reconstruct the original
MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for
applications where streams from multiple sources are to be applications where streams from multiple sources are to be
Gentric et al. Expires March 2002 8
RTP Payload Format for MPEG-4 Streams September 2001
synchronized. Therefore the usage of such a random offset SHOULD be synchronized. Therefore the usage of such a random offset SHOULD be
avoided. avoided.
Gentric et al. Expires January 2002 8
RTP Payload Format for MPEG-4 Streams July 2001
Note that since RTP devices may re-stamp the stream, all time stamps Note that since RTP devices may re-stamp the stream, all time stamps
inside of the RTP payload (CTS and DTS in MSLH, OCR in RSLH) MUST be inside of the RTP payload (CTS and DTS in MSLH, OCR in RSLH) MUST be
expressed as difference to the RTP time stamp. Since this expressed as difference to the RTP time stamp. Since this
subtraction may lead to negative values, the offset MUST be encoded subtraction may lead to negative values, the offset MUST be encoded
as a two's complement signed integer in network byte order. Note as a two's complement signed integer in network byte order. Note
these offsets (delta) typically require much fewer bits to be these offsets (delta) typically require much fewer bits to be
encoded than the original length, which is another justification. encoded than the original length, which is another justification.
When startCompositionTimeStamp is signaled in the SLConfigDescriptor When startCompositionTimeStamp is signaled in the SLConfigDescriptor
the RTP time stamps MUST start with this value. the RTP time stamps MUST start with this value.
skipping to change at line 496 skipping to change at line 499
The Nth MSLH in the MSLHSection, the Nth RSLH in the RSLHSection and The Nth MSLH in the MSLHSection, the Nth RSLH in the RSLHSection and
the Nth SL packet payload in the SLPPSection correspond to the Nth the Nth SL packet payload in the SLPPSection correspond to the Nth
SL packet transported by the RTP packet. SL packet transported by the RTP packet.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number | |V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Gentric et al. Expires March 2002 9
RTP Payload Format for MPEG-4 Streams September 2001
| timestamp | | timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier | | synchronization source (SSRC) identifier |
Gentric et al. Expires January 2002 9
RTP Payload Format for MPEG-4 Streams July 2001
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: contributing source (CSRC) identifiers : : contributing source (CSRC) identifiers :
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| | | |
| MSLHSection (byte aligned) | | MSLHSection (byte aligned) |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | |
skipping to change at line 553 skipping to change at line 556
block of bit-wise concatenated MSLHs. block of bit-wise concatenated MSLHs.
This size field is absent in the Single-SL mode not because it is This size field is absent in the Single-SL mode not because it is
not needed (which would be a minor gain) but for compatibility with not needed (which would be a minor gain) but for compatibility with
RFC 3016. RFC 3016.
This size field is also absent when the value would always be zero This size field is also absent when the value would always be zero
because the MSLH is always empty, which may happen when a constant because the MSLH is always empty, which may happen when a constant
size in signaled using ConstantSize. size in signaled using ConstantSize.
0 1 2 3 Gentric et al. Expires March 2002 10
RTP Payload Format for MPEG-4 Streams September 2001
Gentric et al. Expires January 2002 10
RTP Payload Format for MPEG-4 Streams July 2001
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MSLH section size in bits | MSLH | etc | | MSLH section size in bits | MSLH | etc |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| as many bit-wise concatenated MSLHs | | as many bit-wise concatenated MSLHs |
| as SL packets in this RTP packet | | as SL packets in this RTP packet |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| : padding bits| | : padding bits|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at line 606 skipping to change at line 608
by parsing it since for example the presence of CTSDelta is signaled by parsing it since for example the presence of CTSDelta is signaled
by the value of CTSFlag. by the value of CTSFlag.
3.4.1 Fields of MSLH 3.4.1 Fields of MSLH
PayloadSize: Indicates the size in bytes of the associated SL Packet PayloadSize: Indicates the size in bytes of the associated SL Packet
Payload, which can be found in the SLPPSection of the RTP packet. Payload, which can be found in the SLPPSection of the RTP packet.
The length in bits of this field is signaled by the SizeLength The length in bits of this field is signaled by the SizeLength
parameter (see section 4.1). parameter (see section 4.1).
There is an exception to that: when the RTP packet contains a single There is an exception to that. In the case that the RTP packet
SL packet the PayloadSize field SHALL contain the size of the entire contains only one SL packet in the "Multiple SL mode", the
corresponding Access Unit, for two reasons, firstly the size of the
fragment is not needed when there is only one fragment, secondly
Gentric et al. Expires January 2002 11 Gentric et al. Expires March 2002 11
RTP Payload Format for MPEG-4 Streams July 2001 RTP Payload Format for MPEG-4 Streams September 2001
this is useful in order to detect that a full Access Unit has been PayloadSize field SHALL contain the size of the entire corresponding
received after the loss of a packet carrying M bit set to 1. Access Unit. There are two reasons, firstly the size of the fragment
is not needed when there is only one fragment, secondly this is
useful in order to detect that a full Access Unit has been received
after the loss of a packet carrying a M bit set to 1.
Index, IndexDelta: Encodes the packetSequenceNumber (serial number) Index, IndexDelta: Encodes the packetSequenceNumber (serial number)
of the SL Packet. When making streams specifically for transport of the SL Packet. When making streams specifically for transport
with this payload format IndexDelta is useful for interleaving (see with this payload format IndexDelta is useful for interleaving (see
section 3.8). Since a mapping of packetSequenceNumber to RTP section 3.8). Since a mapping of packetSequenceNumber to RTP
sequence number is not possible in the Multiple-SL mode there is no sequence number is not possible in the Multiple-SL mode there is no
requirement for a correspondence. requirement for a correspondence.
Index is optional and -if present- appears for the first SL packet Index is optional and -if present- appears for the first SL packet
in a RTP packet. in a RTP packet.
The length in bits of the Index field is defined by the IndexLength The length in bits of the Index field is defined by the IndexLength
parameter (see section 4.1). parameter (see section 4.1).
IndexDelta is optional and -if present- appears for subsequent (non- IndexDelta is optional and -if present- appears for subsequent (non-
first) SL packets in a RTP packet. first) SL packets in a RTP packet.
The length in bits of the IndexDelta field is defined by the The length in bits of the IndexDelta field is defined by the
IndexDeltaLength parameter (see section 4.1). IndexDeltaLength parameter (see section 4.1).
Both Index and IndexDelta MUST be incremented so that 2 different SL
packets SHALL NOT have the same packetSequenceNumber. One exception
for Index is described in 3.8.1.
If the parameter IndexDeltaLength is defined, non-first SL packets If the parameter IndexDeltaLength is defined, non-first SL packets
inside a RTP packet have their packetSequenceNumber encoded as a inside a RTP packet have their packetSequenceNumber encoded as a
difference (thus the name IndexDelta). This difference is relative difference (thus the name IndexDelta). This difference is relative
to the previous SL packet in the RTP packet according to (with to the previous SL packet in the RTP packet according to (with
i>=0): i>=0):
packetSequenceNumber(0) = Index(0) packetSequenceNumber(0) = Index(0)
packetSequenceNumber(i+1) = packetSequenceNumber(i) + packetSequenceNumber(i+1) = packetSequenceNumber(i) +
IndexDelta(i+1) + 1 IndexDelta(i+1) + 1
If the parameter IndexDeltaLength is not defined the default value If the parameter IndexDeltaLength is not defined the default value
skipping to change at line 659 skipping to change at line 666
packetSequenceNumber is incremented by 1 for each SL packet in one packetSequenceNumber is incremented by 1 for each SL packet in one
RTP packet. RTP packet.
CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A
value of 1 indicates that the CTSDelta field is present, a value of value of 1 indicates that the CTSDelta field is present, a value of
0 that it is not present. 0 that it is not present.
If CTSDeltaLength is not zero, CTSFlag is present in all MSLH If CTSDeltaLength is not zero, CTSFlag is present in all MSLH
regardless of whether the SL packet is an Access Unit start or not. regardless of whether the SL packet is an Access Unit start or not.
Gentric et al. Expires March 2002 12
RTP Payload Format for MPEG-4 Streams September 2001
CTSDelta (CTSDeltaLength bits): Specifies the value of the CTS as a CTSDelta (CTSDeltaLength bits): Specifies the value of the CTS as a
2-complement offset (delta) from the timestamp in the RTP header of 2-complement offset (delta) from the timestamp in the RTP header of
the RTP packet. The length in bits of each CTSDelta field is the RTP packet. The length in bits of each CTSDelta field is
specified by the CTSDeltaLength parameter (see section 4.1). specified by the CTSDeltaLength parameter (see section 4.1).
The CTSDelta field is present if CTSFlag is 1. The CTSDelta field is present if CTSFlag is 1.
Gentric et al. Expires January 2002 12
RTP Payload Format for MPEG-4 Streams July 2001
For the first MSLH of each RTP packet CTSFlag is always 0, since the For the first MSLH of each RTP packet CTSFlag is always 0, since the
composition time stamp of the first SL packet in the RTP packet is composition time stamp of the first SL packet in the RTP packet is
mapped to the RTP time stamp. In all cases the sender MUST remove mapped to the RTP time stamp. In all cases the sender MUST remove
the compositionTimeStamp from the RSLH. the compositionTimeStamp from the RSLH.
Senders MUST NOT assemble RTP packets for which CTSDelta rolls over
inside the RTP packet.
DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A
value of 1 indicates that DTSDelta is present, a value of 0 that it value of 1 indicates that DTSDelta is present, a value of 0 that it
is not present. is not present.
If DTSDeltaLength is not zero, DTSFlag is present in all MSLH If DTSDeltaLength is not zero, DTSFlag is present in all MSLH
regardless of whether the SL packet is an Access Unit start or not; regardless of whether the SL packet is an Access Unit start or not;
the receiver needs this flag in order to reconstruct the the receiver needs this flag in order to reconstruct the
decodingTimeStampFlag of SL Headers. decodingTimeStampFlag of SL Headers.
DTSDelta (DTSDeltaLength bits): encodes (compositionTimeStamp - DTSDelta (DTSDeltaLength bits): encodes (compositionTimeStamp -
decodingTimeStamp) for the same SL packet (always positive). The decodingTimeStamp) for the same SL packet (always positive). The
length in bits of each DTSDelta field is specified by the length in bits of each DTSDelta field is specified by the
DTSDeltaLength parameter (see section 4.1). DTSDeltaLength parameter (see section 4.1).
Senders MUST NOT assemble RTP packets for which the difference
between compositionTimeStamp and decodingTimeStamp cannot be
expressed on DTSDeltaLength bits.
The DTSDelta field appears when DTSFlag is 1. The sender MUST always The DTSDelta field appears when DTSFlag is 1. The sender MUST always
remove the decodingTimeStamp from the RSLH. remove the decodingTimeStamp from the RSLH.
If DTSDelta is zero i.e. if decodingTimeStamp equals If DTSDelta is zero i.e. if decodingTimeStamp equals
compositionTimeStamp then DTSFlag MUST be set to 0 and no DTSDelta compositionTimeStamp then DTSFlag MUST be set to 0 and no DTSDelta
field SHALL be present. field SHALL be present.
At the sender side the computation of DTSDelta MUST be performed by
taking into account roll over. For example for a SL stream with the
following (CTS, DTS) pairs (assuming timeStampLength=3):
(4,3), (5,4), (6,5), (7,6), (0,7); DTSDelta for the last pair is
logically (1) and not (-7) which would be illegal and could cause
receivers implemented following section 5.1 to fail.
3.4.2 Relationship between sizes of MSLH fields and parameters 3.4.2 Relationship between sizes of MSLH fields and parameters
The relationship between a Mapped SL Packet Header and the related The relationship between a Mapped SL Packet Header and the related
parameters is as follows: parameters is as follows:
+===========================+=================================+ +===========================+=================================+
Gentric et al. Expires March 2002 13
RTP Payload Format for MPEG-4 Streams September 2001
| Fields of MSLPH | Number of bits (parameters) | | Fields of MSLPH | Number of bits (parameters) |
+===========================+=================================+ +===========================+=================================+
| PayloadSize | SizeLength | | PayloadSize | SizeLength |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| Index | IndexLength | | Index | IndexLength |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| IndexDelta | IndexDeltaLength | | IndexDelta | IndexDeltaLength |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| CTSFlag | 1 If (CTSDeltaLength > 0) | | CTSFlag | 1 If (CTSDeltaLength > 0) |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
skipping to change at line 722 skipping to change at line 747
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| DTSFlag | 1 If (DTSDeltaLength > 0) | | DTSFlag | 1 If (DTSDeltaLength > 0) |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| DTSDelta | DTSDeltaLength If (DTSFlag==1) | | DTSDelta | DTSDeltaLength If (DTSFlag==1) |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
Table 1: Relationship between MSLH field size and parameters Table 1: Relationship between MSLH field size and parameters
3.5 RSLHSection structure 3.5 RSLHSection structure
Gentric et al. Expires January 2002 13
RTP Payload Format for MPEG-4 Streams July 2001
This section consists of a field (RSLHSectionSize) giving the size This section consists of a field (RSLHSectionSize) giving the size
in bits of the following block of bit-wise concatenated RSLHs. in bits of the following block of bit-wise concatenated RSLHs.
If the section consumes a non-integer number of bytes, up to 7 zero If the section consumes a non-integer number of bytes, up to 7 zero
padding bits MUST be inserted at the end in order to achieve byte- padding bits MUST be inserted at the end in order to achieve byte-
alignment. alignment.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable | | RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
skipping to change at line 756 skipping to change at line 778
| : padding bits| | : padding bits|
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7: RSLHSection structure Figure 7: RSLHSection structure
The length in bits of the RSLHSectionSize field is The length in bits of the RSLHSectionSize field is
RSLHSectionSizeLength and is specified with a default value of zero RSLHSectionSizeLength and is specified with a default value of zero
indicating that the whole RSLHSection is absent. Compatibility with indicating that the whole RSLHSection is absent. Compatibility with
RFC 3016 requires that the RSLHSection should be empty, including RFC 3016 requires that the RSLHSection should be empty, including
the RSLHSectionSize field. This is the reason why there is such a the RSLHSectionSize field. This is the reason why there is such a
Gentric et al. Expires March 2002 14
RTP Payload Format for MPEG-4 Streams September 2001
variable length with a default value indicating absence of the variable length with a default value indicating absence of the
RSLHSectionSize field. RSLHSectionSize field.
+=================================+===============================+ +=================================+===============================+
| Fields of RSLHSection | Number of bits | | Fields of RSLHSection | Number of bits |
+=================================+===============================+ +=================================+===============================+
| RSLHSectionSize | RSLHSectionSizeLength | | RSLHSectionSize | RSLHSectionSizeLength |
+---------------------------------+-------------------------------+ +---------------------------------+-------------------------------+
| all bit-wise concatenated RSLHs | RSLHSectionSize | | all bit-wise concatenated RSLHs | RSLHSectionSize |
+---------------------------------+-------------------------------+ +---------------------------------+-------------------------------+
skipping to change at line 778 skipping to change at line 804
Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system
awareness, specifically it requires to understand the MPEG-4 awareness, specifically it requires to understand the MPEG-4
Synchronization Layer (SL) syntax and the modifications to this Synchronization Layer (SL) syntax and the modifications to this
syntax described in the next section. syntax described in the next section.
However thanks to the RSLHSectionSize field non-MPEG-4-system However thanks to the RSLHSectionSize field non-MPEG-4-system
receivers MAY skip this part by rounding up RSLPHSize/8 to the next receivers MAY skip this part by rounding up RSLPHSize/8 to the next
integer number of bytes. integer number of bytes.
Gentric et al. Expires January 2002 14
RTP Payload Format for MPEG-4 Streams July 2001
3.6 RSLH structure 3.6 RSLH structure
A Remaining SL Packet Header (RSLH) is what remains of an SL header A Remaining SL Packet Header (RSLH) is what remains of an SL header
after modifications for mapping into this payload format. after modifications for mapping into this payload format.
The following modifications of the SL packet header MUST be applied. The following modifications of the SL packet header MUST be applied.
The other fields of the SL packet header MUST remain unchanged but The other fields of the SL packet header MUST remain unchanged but
are bit-shifted to fill in the gaps left by the operations specified are bit-shifted to fill in the gaps left by the operations specified
below. below.
skipping to change at line 811 skipping to change at line 834
. AccessUnitEndFlag (in Single-SL mode only) . AccessUnitEndFlag (in Single-SL mode only)
The AccessUnitEndFlag, when present for a given stream, MUST be The AccessUnitEndFlag, when present for a given stream, MUST be
removed from every RSLH when using the Single-SL mode since it has removed from every RSLH when using the Single-SL mode since it has
the same meaning as the Marker bit (and for compatibility with RFC the same meaning as the Marker bit (and for compatibility with RFC
3016). However when using the Multiple-SL mode, AccessUnitEndFlag 3016). However when using the Multiple-SL mode, AccessUnitEndFlag
MUST NOT be removed since it is useful to signal individual AU ends. MUST NOT be removed since it is useful to signal individual AU ends.
3.6.2 Mapping of OCR 3.6.2 Mapping of OCR
Gentric et al. Expires March 2002 15
RTP Payload Format for MPEG-4 Streams September 2001
Furthermore if the SL Packet header contains an OCR, then this field Furthermore if the SL Packet header contains an OCR, then this field
is encoded in the RSLH as a 2-complement difference (delta) exactly is encoded in the RSLH as a 2-complement difference (delta) exactly
like a compositionTimeStamp or a decodingTimeStamp in the MSLH. The like a compositionTimeStamp or a decodingTimeStamp in the MSLH. The
length in bit of this difference is indicated by the OCRDeltaLength length in bit of this difference is indicated by the OCRDeltaLength
parameter (see section 4.1). parameter (see section 4.1).
With this payload format OCRs MUST have the same clock resolution as With this payload format OCRs MUST have the same clock resolution as
Time Stamps. Time Stamps.
If compositionTimeStamp is not present for a SL packet that has OCR If compositionTimeStamp is not present for a SL packet that has OCR
then the OCR SHALL be encoded as a difference to the RTP time stamp. then the OCR SHALL be encoded as a difference to the RTP time stamp.
3.6.3 Degradation Priority 3.6.3 Degradation Priority
For streams that use the optional degradationPriority field in the For streams that use the optional degradationPriority field in the
SL Packet Headers, only SL packets with the same degradation SL Packet Headers, only SL packets with the same degradation
priority SHALL be transported by one RTP packet so that components priority SHALL be transported by one RTP packet so that components
may dispatch the RTP packets according to appropriate QOS or may dispatch the RTP packets according to appropriate QoS or
protection schemes. Furthermore only the first RSLH of one RTP protection schemes. Furthermore only the first RSLH of one RTP
packet SHALL contain the degradationPriority field since it would be packet SHALL contain the degradationPriority field since it would be
otherwise redundant. otherwise redundant.
3.7 SLPPSection structure 3.7 SLPPSection structure
Gentric et al. Expires January 2002 15
RTP Payload Format for MPEG-4 Streams July 2001
The SLPPSection (SL Packet Payload Section) contains the The SLPPSection (SL Packet Payload Section) contains the
concatenated SL Packet Payloads. By definition SL Packet Payloads concatenated SL Packet Payloads. By definition SL Packet Payloads
are byte aligned. are byte aligned.
For efficiency SL packets do not carry their own payload size. This For efficiency SL packets do not carry their own payload size. This
is not an issue for RTP packets that contain a single SL Packet. is not an issue for RTP packets that contain a single SL Packet.
However in the Multiple-SL mode the size of each SL packet payload However in the Multiple-SL mode the size of each SL packet payload
MUST be available to the receiver. MUST be available to the receiver.
skipping to change at line 866 skipping to change at line 889
bits on which this PayloadSize field is encoded MUST be indicated bits on which this PayloadSize field is encoded MUST be indicated
using the SizeLength parameter (see section 4.1). using the SizeLength parameter (see section 4.1).
The absence of either ConstantSize or SizeLength indicates the The absence of either ConstantSize or SizeLength indicates the
Single-SL mode i.e. that a single SL packet is transported in each Single-SL mode i.e. that a single SL packet is transported in each
RTP packet for that stream. RTP packet for that stream.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SLPP (variable number of bytes) | | SLPP (variable number of bytes) |
| | | |
Gentric et al. Expires March 2002 16
RTP Payload Format for MPEG-4 Streams September 2001
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | SLPP (variable number of bytes) | | | SLPP (variable number of bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | |
| | | |
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| etc | | etc |
| as many byte-wise concatenated SLPPs | | as many byte-wise concatenated SLPPs |
| as SL Packets in this RTP packet | | as SL Packets in this RTP packet |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 8: SLPPSection structure Figure 8: SLPPSection structure
3.8 Interleaving 3.8 Interleaving
SL Packets MAY be interleaved. Senders MAY perform interleaving. SL Packets MAY be interleaved. Senders MAY perform interleaving.
Receivers MUST support interleaving. Receivers MUST support interleaving.
The AUSequenceNumber field of the SL header MUST NOT be used for
interleaving since firstly it may collide with the Scene Description
Carousel usage described in section 5.2 and secondly it is not
visible to non-MPEG-4 system receivers.
When interleaving of SL packets is used it SHALL be implemented When interleaving of SL packets is used it SHALL be implemented
using the Index and IndexDelta fields of MSLH. using the IndexDelta fields of MSLH. Senders MUST use properly large
values for IndexDeltaLength, as required by the interleaving
algorithm.
Gentric et al. Expires January 2002 16 Senders SHALL use non zero values of IndexDeltaLength only for
RTP Payload Format for MPEG-4 Streams July 2001 streams that MAY exhibit interleaving, so that this CAN be
interpreted by receivers as an indication that interleaving may be
present.
The conjunction of RTP sequence number and Index, IndexDelta can There are, based on this, two ways for a receiver to implement de-
interleaving, using either Index or timestamps. This is signaled
using mime parameters as in the following table, where TSBI and IBI
stand respectively for Time-Stamp-Based-Interleaving (see section
3.8.1) and Index-Based-Interleaving (see section 3.8.2).
==================================================================
| | IndexDeltaLength = 0 | IndexDeltaLength != 0 |
------------------------------------------------------------------
| IndexLength=0 | no interleaving | TSBI |
------------------------------------------------------------------
| IndexLength!=0 | no interleaving, | Index=0 | Index!=0 |
| | SL.packetSeqNum |-------------------------
| | transport | TSBI | IBI |
==================================================================
Gentric et al. Expires March 2002 17
RTP Payload Format for MPEG-4 Streams September 2001
3.8.1 Time stamp based interleaving
The conjunction of RTP time stamp, IndexDelta and CTS may allow a
receiver to un-ambiguously re-order SL packets based on their time
stamps (CTS).
This is possible and efficient for streams where SL packets
transport complete Access Units and receivers can always compute the
CTS of each Access Unit.
In case of Access Units of constant duration (e.g. audio streams)
the explicit presence of CTS in MSLH is not even required.
Indeed then we have (i being the index of SL packets in one RTP
packet):
CTS(0) = RTP-TS
for (i >= 1): CTS(i) = CTS(i-1) + (IndexDelta(i)+1)*AU-duration
AU-duration, when constant, can be either signaled in SLConfig or be
deduced from the decoder configuration (see the config MIME
parameter).
Senders MUST use either IndexLength=0 or set all Index values in all
packets to zero so that receivers CAN detect this as an indication
that de-interleaving SHOULD be performed using time stamps.
In cases where CTS is transported in MSLH senders MUST use properly
large values for SL.timeStampLength when interleaving (in order to
prevent the CTS from rolling over). Pre-existing SL streams that do
not comply with this requirement cannot be interleaved using this
payload format (or by using 3.8.2)
3.8.2 Index based interleaving
If the AU duration is not constant (SLConfigDescriptor.durationFlag
= 0) and CTS is not signaled (SLConfigDescriptor.useTimeStampsFlag=
0) or SL packets transport AU fragments, then the timestamp-based
interleaving algorithm described in 3.8.1. would not work because a
CTS cannot always be computed for all SL packets (for example after
a packet loss).
When interleaving, senders of such streams MUST use the index-based
technique described in this section.
The conjunction of RTP sequence number, Index and IndexDelta can
produce a quasi-unique identifier for each SL packet so that a produce a quasi-unique identifier for each SL packet so that a
receiver can unambiguously reconstruct the original order even in receiver can unambiguously reconstruct the original order even in
case of out-of-order packets, packet loss or duplication. case of out-of-order packets, packet loss or duplication (see the
pseudo code in 3.4.1 and 5.1).
However implementors of receivers must take care that when This requires, however, that IndexLength is not too small. For that
IndexLength is small, Index will rollover often; for that reason reason senders MUST use properly large values for IndexLength when
timestamps SHOULD be used as a basis for implementation of de- interleaving in this fashion. Pre-existing SL streams that do not
interleaving, i.e. the reordering algorithm should consider comply with this requirement (specifically if SL.packetSeqNumLength
timestamps and IndexDelta first and use Index only when CTS are not
available. Symmetrically senders MUST either use properly large
values for IndexLength or use small values only when CTS are either
present in MSLH or can be otherwise unambiguously computed for each
SL packet (for example audio streams as in Appendix.5).
The AUSequenceNumber field of the SL header MUST NOT be used for Gentric et al. Expires March 2002 18
interleaving since firstly it may collide with the Scene Description RTP Payload Format for MPEG-4 Streams September 2001
Carousel usage described in section 4.1 and secondly it is not
visible to non-MPEG-4 system receivers. is too small) cannot be interleaved using this payload format (or by
using 3.8.1).
Receivers CAN interpret non-zero values in the Index field as an
indication that de-interleaving CAN be performed using Index and
IndexDelta and CANNOT be performed using timestamps.
3.8.3 SL streams that cannot be interleaved
SL streams for which both SL.timeStampLength and
SL.packetSeqNumLength are too small cannot be interleaved with this
payload format.
3.9 Fragmentation Rules 3.9 Fragmentation Rules
This section specifies rules for senders in order to prevent media This section specifies rules for senders in order to prevent media
decoding difficulties at the receiver end. decoding difficulties at the receiver end.
MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams
and SHOULD be mapped directly into RTP packets of this format with and SHOULD be mapped directly into RTP packets of this format with
two exceptions: two exceptions:
- Access Units larger than the MTU - Access Units larger than the MTU
skipping to change at line 945 skipping to change at line 1050
Therefore encoders and decoders are both aware whether they are Therefore encoders and decoders are both aware whether they are
operating in such a mode or not (however since this codec operating in such a mode or not (however since this codec
configuration is an opaque data block this is not explicitly configuration is an opaque data block this is not explicitly
signaled by this payload format). signaled by this payload format).
If not operating in such a mode it is obvious that the decoder has If not operating in such a mode it is obvious that the decoder has
to skip packets after a loss until an Access Unit start is received. to skip packets after a loss until an Access Unit start is received.
Similarly decoder implementations that do not implement robust Similarly decoder implementations that do not implement robust
decoding of Access Units fragments have to discard all packets after decoding of Access Units fragments have to discard all packets after
a packet loss until an Access Unit start is received. In the same a packet loss until an Access Unit start is received. In the same
Gentric et al. Expires January 2002 17
RTP Payload Format for MPEG-4 Streams July 2001
way decoder implementations that do not implement re-synchronization way decoder implementations that do not implement re-synchronization
at any Access Units start have to discard all packets after a packet at any Access Units start have to discard all packets after a packet
loss until a Random Access Point Access Unit is received. These are loss until a Random Access Point Access Unit is received. These are
all obvious things that a good implementation would do. all obvious things that a good implementation would do.
However serious problems would arise for decoder implementations However serious problems would arise for decoder implementations
that try to restart decoding after a packet loss if independently that try to restart decoding after a packet loss if independently
Gentric et al. Expires March 2002 19
RTP Payload Format for MPEG-4 Streams September 2001
decodable fragments are signaled (in the decoder configuration) but decodable fragments are signaled (in the decoder configuration) but
the fragments actually received are not independently decodable the fragments actually received are not independently decodable
because the RTP sender has made RTP packets on different boundaries because the RTP sender has made RTP packets on different boundaries
than the fragments provided by the encoder (so this issue applies to than the fragments provided by the encoder (so this issue applies to
the interface between the encoder and the RTP sender and to the RTP the interface between the encoder and the RTP sender and to the RTP
sender component itself), because the decoder has in general no way sender component itself), because the decoder has in general no way
to detect such a faulty fragment. to detect such a faulty fragment.
For this reason the following rules must apply to SL streams that For this reason the following rules must apply to SL streams that
are specifically made for transport with this payload format: are specifically made for transport with this payload format:
skipping to change at line 1002 skipping to change at line 1107
the same SL packet. the same SL packet.
4. Types and Names 4. Types and Names
This section describes the MIME types and names associated with this This section describes the MIME types and names associated with this
payload format. Section 4.1 is intended for registration with IANA payload format. Section 4.1 is intended for registration with IANA
as in RFC 2048. as in RFC 2048.
This format may require additional information about the mapping to This format may require additional information about the mapping to
be made available to the receiver. This is done using parameters be made available to the receiver. This is done using parameters
Gentric et al. Expires January 2002 18
RTP Payload Format for MPEG-4 Streams July 2001
described in the next section. The absence of any of these fields is described in the next section. The absence of any of these fields is
equivalent to a field set to the default value, which is always equivalent to a field set to the default value, which is always
zero. The absence of any such parameters resolves into a default zero. The absence of any such parameters resolves into a default
"basic" configuration compatible with RFC3016 for MPEG-4 video. "basic" configuration compatible with RFC3016 for MPEG-4 video.
In the MPEG-4 framework the SL stream configuration information is In the MPEG-4 framework the SL stream configuration information is
carried using the Object Descriptor. For compatibility with carried using the Object Descriptor. For compatibility with
Gentric et al. Expires March 2002 20
RTP Payload Format for MPEG-4 Streams September 2001
receivers that do not implement the full MPEG-4 system specification receivers that do not implement the full MPEG-4 system specification
this information MAY also be signaled using parameters described this information MAY also be signaled using parameters described
here. When such information is present both in an Object Descriptor here. When such information is present both in an Object Descriptor
and as a parameter of this payload format it MUST be exactly the and as a parameter of this payload format it MUST be exactly the
same. same.
For transport of MPEG-4 audio and video without the use of MPEG-4 For transport of MPEG-4 audio and video without the use of MPEG-4
systems, as well as to support non-MPEG-4 system receivers, it is systems, as well as to support non-MPEG-4 system receivers, it is
also possible to transport information on the profile and level of also possible to transport information on the profile and level of
the stream and on the decoder configuration. This is also described the stream and on the decoder configuration. This is also described
skipping to change at line 1059 skipping to change at line 1164
Required parameters: none Required parameters: none
Optional parameters: Optional parameters:
Mode: Mode:
The mode in which this specification is used. This specification The mode in which this specification is used. This specification
itself defines only the default mode (Mode=default). When the mode itself defines only the default mode (Mode=default). When the mode
parameter is not present the default mode SHALL be assumed. In the parameter is not present the default mode SHALL be assumed. In the
default mode all parameters are optional and as defined here. Other default mode all parameters are optional and as defined here. Other
modes may be defined as needed in other RFCs. A mode MUST be a modes may be defined as needed in other RFCs. A mode MUST be a
Gentric et al. Expires January 2002 19
RTP Payload Format for MPEG-4 Streams July 2001
subset of this specification. Specifically when defining a mode care subset of this specification. Specifically when defining a mode care
MUST be taken that an implementation of this specification can MUST be taken that an implementation of this specification can
decode the payload format corresponding to this new mode. For this decode the payload format corresponding to this new mode. For this
reason a mode MUST NOT specify new default values for MIME reason a mode MUST NOT specify new default values for MIME
parameters and MIME parameters MUST be present (unless they have the parameters and MIME parameters MUST be present (unless they have the
default value) even if it is redundant in case the mode assigns default value) even if it is redundant in case the mode assigns
fixed values. A mode may define additionally that some MIME fixed values. A mode may define additionally that some MIME
Gentric et al. Expires March 2002 21
RTP Payload Format for MPEG-4 Streams September 2001
parameters are required instead of optional, that some MIME parameters are required instead of optional, that some MIME
parameters have fixed values (or ranges), and that there are rules parameters have fixed values (or ranges), and that there are rules
restricting the usage (for example forbidding the carriage of restricting the usage (for example forbidding the carriage of
multiple AU fragments in the same RTP packet). multiple AU fragments in the same RTP packet).
Profile: Profile:
The meaning of this parameter may be defined by a mode. This is The meaning of this parameter may be defined by a mode. This is
meant to be used in order to define sub-configurations of a given meant to be used in order to define sub-configurations of a given
mode, for example the maximum delay (and therefore the size of mode, for example the maximum delay (and therefore the size of
buffers) induced by the usage of interleaving. Implementations of buffers) induced by the usage of interleaving. Implementations of
this specification can ignore this parameter. this specification can ignore this parameter.
DTSDeltaLength: DTSDeltaLength:
The number of bits on which the DTSDelta field is encoded in MSLH. The number of bits on which the DTSDelta field is encoded in MSLH.
The default value is zero and indicates the absence of DTSFlag and The default value is zero and indicates the absence of DTSFlag and
DTSDelta in MSLH (the stream does not transport decodingTimeStamps). DTSDelta in MSLH (the stream does not transport decodingTimeStamps).
A value larger than zero indicates that there is a DTSFlag in each A value larger than zero indicates that there is a DTSFlag in each
MSLH. Since decodingTimeStamp -if present- must be encoded as a MSLH. Since decodingTimeStamp, if present, must be encoded as a
difference to the RTP time stamp, the DTSDeltaLength parameter MUST difference to the RTP time stamp, the DTSDeltaLength parameter MUST
be present in order to transport decodingTimeStamps with this be present in order to transport decodingTimeStamps with this
payload format. payload format.
CTSDeltaLength: CTSDeltaLength:
The number of bits on which the CTSDelta field is encoded in (non- The number of bits on which the CTSDelta field is encoded in (non-
first) MSLH. The default value is zero and indicates the absence of first) MSLH. The default value is zero and indicates the absence of
the CTSFlag and CTSDelta fields in MSLH. Non-zero values MUST NOT be the CTSFlag and CTSDelta fields in MSLH. Non-zero values MUST NOT be
signaled in the Single-SL mode. Since compositionTimeStamps if signaled in the Single-SL mode. Since compositionTimeStamps, if
present- must be encoded as a difference to the RTP time stamp, the present, must be encoded as a difference to the RTP time stamp, the
CTSDeltaLength parameter MUST be present in order to transport CTSDeltaLength parameter MUST be present in order to transport
compositionTimeStamps using this payload format (in the Multiple-SL compositionTimeStamps using this payload format (in the Multiple-SL
mode). However CTSDeltaLength SHOULD be set to zero (or not mode). However CTSDeltaLength SHOULD be set to zero (or not
signaled) for streams that have a constant Access Unit duration signaled) for streams that have a constant Access Unit duration
(which can be explicitly signaled using the DurationFlag and (which can be explicitly signaled using the DurationFlag and
AccessUnitDuration field of SLConfigDescriptor). AccessUnitDuration field of SLConfigDescriptor).
OCRDeltaLength: OCRDeltaLength:
The number of bits on which the OCRDelta field is encoded in RSLH. The number of bits on which the OCRDelta field is encoded in RSLH.
The default value is zero and indicates the absence of OCR for this The default value is zero and indicates the absence of OCR for this
stream. Since objectClockReference -if present- must be encoded as a stream. Since objectClockReference -if present- must be encoded as a
difference to the RTP time stamp, the OCRDeltaLength parameter MUST difference to the RTP time stamp, the OCRDeltaLength parameter MUST
be present in order to transport objectClockReferences with this be present in order to transport objectClockReferences with this
payload format. payload format.
SizeLength: SizeLength:
The number of bits on which the PayloadSize field of MSLH is The number of bits on which the PayloadSize field of MSLH is
encoded. The default value is zero and indicates the Single-SL mode encoded. The default value is zero and indicates the Single-SL mode
Gentric et al. Expires January 2002 20
RTP Payload Format for MPEG-4 Streams July 2001
(unless ConstantSize is present). Simultaneous presence of this (unless ConstantSize is present). Simultaneous presence of this
parameter and ConstantSize is illegal. Either the SizeLength or parameter and ConstantSize is illegal. Either the SizeLength or
ConstantSize parameter MUST be present in order to signal the ConstantSize parameter MUST be present in order to signal the
Multiple-SL mode of this payload format. Multiple-SL mode of this payload format.
ConstantSize: ConstantSize:
Gentric et al. Expires March 2002 22
RTP Payload Format for MPEG-4 Streams September 2001
The constant size in bytes of each SL Packet Payload for this The constant size in bytes of each SL Packet Payload for this
stream. The default value is zero and indicates variable SL Packet stream. The default value is zero and indicates variable SL Packet
Payload size (or the Single-SL mode if SizeLength is absent). Payload size (or the Single-SL mode if SizeLength is absent).
Simultaneous presence of this parameter and SizeLength is illegal. Simultaneous presence of this parameter and SizeLength is illegal.
Either the SizeLength or ConstantSize parameter MUST be present in Either the SizeLength or ConstantSize parameter MUST be present in
order to signal the Multiple-SL mode of this payload format. When order to signal the Multiple-SL mode of this payload format. When
ConstantSize is present the PayloadSize of MSLH in the RTP packets ConstantSize is present the PayloadSize of MSLH in the RTP packets
MUST NOT be present. MUST NOT be present.
IndexLength: IndexLength:
skipping to change at line 1173 skipping to change at line 1278
ISO/IEC 14496-1 [1]. For video this parameter indicates which MPEG-4 ISO/IEC 14496-1 [1]. For video this parameter indicates which MPEG-4
Visual tool subsets are applied to encode the video stream and is Visual tool subsets are applied to encode the video stream and is
defined in Table G-1 of ISO/IEC 14496-2 [2]. This parameter MAY be defined in Table G-1 of ISO/IEC 14496-2 [2]. This parameter MAY be
used in the capability exchange or session setup procedure to used in the capability exchange or session setup procedure to
indicate MPEG-4 Profile and Level combination of which the relevant indicate MPEG-4 Profile and Level combination of which the relevant
MPEG-4 media codec is capable. If this parameter is not specified MPEG-4 media codec is capable. If this parameter is not specified
its default value is 1 (Simple Profile/Level 1) for video (for its default value is 1 (Simple Profile/Level 1) for video (for
compatibility with RFC 3016) and otherwise 0xFE (defined in ISO/IEC compatibility with RFC 3016) and otherwise 0xFE (defined in ISO/IEC
14496-1 [1] as being the generic default value). 14496-1 [1] as being the generic default value).
Gentric et al. Expires January 2002 21
RTP Payload Format for MPEG-4 Streams July 2001
Config: Config:
A hexadecimal representation of an octet string that expresses the A hexadecimal representation of an octet string that expresses the
media payload configuration. Configuration data is mapped onto the media payload configuration. Configuration data is mapped onto the
octet string in an MSB-first basis. The first bit of the octet string in an MSB-first basis. The first bit of the
configuration data SHALL be located at the MSB of the first octet. configuration data SHALL be located at the MSB of the first octet.
In the last octet, zero-valued padding bits, if necessary, shall In the last octet, zero-valued padding bits, if necessary, shall
Gentric et al. Expires March 2002 23
RTP Payload Format for MPEG-4 Streams September 2001
follow the configuration data. For audio streams, config is the follow the configuration data. For audio streams, config is the
audio object type specific decoder configuration data audio object type specific decoder configuration data
AudioSpecificConfig() as defined in ISO/IEC 14496-3 [3]. For video AudioSpecificConfig() as defined in ISO/IEC 14496-3 [3]. For video
this expresses the MPEG-4 Visual configuration information, as this expresses the MPEG-4 Visual configuration information, as
defined in subclause 6.2.1 Start codes of ISO/IEC14496-2 [2] and the defined in subclause 6.2.1 Start codes of ISO/IEC14496-2 [2] and the
configuration information indicated by this parameter SHALL be the configuration information indicated by this parameter SHALL be the
same as the configuration information in the corresponding MPEG-4 same as the configuration information in the corresponding MPEG-4
Visual stream, except for first-half-vbv-occupancy and latter-half- Visual stream, except for first-half-vbv-occupancy and latter-half-
vbv-occupancy, if it exists, which may vary in the repeated vbv-occupancy, if it exists, which may vary in the repeated
configuration information inside an MPEG-4 Visual stream (See 6.2.1 configuration information inside an MPEG-4 Visual stream (See 6.2.1
skipping to change at line 1203 skipping to change at line 1309
StreamType: StreamType:
The integer value that indicates the type of MPEG-4 stream that is The integer value that indicates the type of MPEG-4 stream that is
carried; its coding corresponds to the values of the streamType as carried; its coding corresponds to the values of the streamType as
defined for the DecoderConfigDescriptor in ISO/IEC 14496-1. defined for the DecoderConfigDescriptor in ISO/IEC 14496-1.
Encoding considerations: Encoding considerations:
System bitstreams MUST be generated according to MPEG-4 System System bitstreams MUST be generated according to MPEG-4 System
specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated
according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio
bitstreams MUST be generated according to MPEG-4 Visual bitstreams MUST be generated according to MPEG-4 Audio
specifications (ISO/IEC 14496-3). All SL streams MUST be generated specifications (ISO/IEC 14496-3). All SL streams MUST be generated
according to MPEG-4 Sync Layer specifications (ISO/IEC 14496-1 according to MPEG-4 Sync Layer specifications (ISO/IEC 14496-1
section 10), in order to read this format the SLConfigDescriptor may section 10), in order to read this format the SLConfigDescriptor may
be required. These bitstream are binary data and MUST be encoded for be required. These bitstreams are binary data and MUST be encoded
non-binary transport (for Email, the Base64 encoding is sufficient). for non-binary transport (for Email, the Base64 encoding is
This type is also defined for transfer via RTP. The RTP packets sufficient). This type is also defined for transfer via RTP. The
MUST be packetized according to the RTP payload format defined in RTP packets MUST be packetized according to the RTP payload format
RFC <self-reference-to-this>. defined in RFC <self-reference-to-this>.
Security considerations: Security considerations:
As in RFC <self-reference-to-this>. As in RFC <self-reference-to-this>.
Interoperability considerations: Interoperability considerations:
MPEG-4 provides a large and rich set of tools for the coding of MPEG-4 provides a large and rich set of tools for the coding of
visual objects. For effective implementation of the standard, visual objects. For effective implementation of the standard,
subsets of the MPEG-4 tool sets have been provided for use in subsets of the MPEG-4 tool sets have been provided for use in
specific applications. These subsets, called 'Profiles', limit the specific applications. These subsets, called 'Profiles', limit the
size of the tool set a decoder is required to implement. In order to size of the tool set a decoder is required to implement. In order to
restrict computational complexity, one or more 'Levels' are set for restrict computational complexity, one or more 'Levels' are set for
each Profile. A Profile@Level combination allows: each Profile. A Profile@Level combination allows:
. a codec builder to implement only the subset of the standard he . a codec builder to implement only the subset of the standard he
needs, while maintaining interoperability with other MPEG-4 devices needs, while maintaining interoperability with other MPEG-4 devices
included in the same combination, and included in the same combination, and
. checking whether MPEG-4 devices comply with the standard . checking whether MPEG-4 devices comply with the standard
('conformance testing'). ('conformance testing').
Gentric et al. Expires January 2002 22
RTP Payload Format for MPEG-4 Streams July 2001
A stream SHALL be compliant with the MPEG-4 Profile@Level specified A stream SHALL be compliant with the MPEG-4 Profile@Level specified
by the parameter "profile-level-id". Interoperability between a by the parameter "profile-level-id". Interoperability between a
sender and a receiver may be achieved by specifying the parameter sender and a receiver may be achieved by specifying the parameter
"profile-level-id" in MIME content, or by arranging in the "profile-level-id" in MIME content, or by arranging in the
capability exchange/announcement procedure to set this parameter capability exchange/announcement procedure to set this parameter
mutually to the same value. mutually to the same value.
Gentric et al. Expires March 2002 24
RTP Payload Format for MPEG-4 Streams September 2001
Published specification: Published specification:
The specifications for MPEG-4 streams are presented in ISO/IEC The specifications for MPEG-4 streams are presented in ISO/IEC
14469-1, 14469-2, and 14469-3. The RTP payload format is described 14469-1, 14469-2, and 14469-3. The RTP payload format is described
in RFC <self-reference-to-this>. in RFC <self-reference-to-this>.
Applications that use this media type: Applications that use this media type:
Multimedia streaming and conferencing tools, Internet messaging and Multimedia streaming and conferencing tools, Internet messaging and
Email applications. Also trans-galactic supra-relativistic Email applications.
elementary particle hyperspace tunneling communication devices :-)
Additional information: none Additional information: none
Magic number(s): none Magic number(s): none
File extension(s): File extension(s):
None. A file format with the extension .mp4 has been defined for None. A file format with the extension .mp4 has been defined for
MPEG-4 content but is not directly correlated with this MIME type MPEG-4 content but is not directly correlated with this MIME type
which sole purpose is RTP transport. which sole purpose is RTP transport.
skipping to change at line 1273 skipping to change at line 1377
Intended usage: COMMON Intended usage: COMMON
Author/Change controller: Author/Change controller:
Authors of RFC <self-reference-to-this>. Authors of RFC <self-reference-to-this>.
4.2 Concatenation of parameters 4.2 Concatenation of parameters
Multiple parameters SHOULD be expressed as a MIME media type string, Multiple parameters SHOULD be expressed as a MIME media type string,
in the form of a semicolon-separated list of parameter=value pairs in the form of a semicolon-separated list of parameter=value pairs
(see examples in Appendix). (see examples below).
4.3 Usage of SDP 4.3 Usage of SDP
4.3.1 The a=fmtp keyword 4.3.1 The a=fmtp keyword
It is assumed that one typical way to transport the above-described It is assumed that one typical way to transport the above-described
parameters associated with this payload format is via an SDP [10] parameters associated with this payload format is via an SDP [10]
message for example transported to the client in reply to a RTSP message for example transported to the client in reply to a RTSP
[13] DESCRIBE message or via SAP [14]. In that case the (a=fmtp) [13] DESCRIBE message or via SAP [14]. In that case the (a=fmtp)
keyword MUST be used as described in RFC 2327 [10, section 6]. The keyword MUST be used as described in RFC 2327 [10, section 6]. The
syntax being then: syntax being then:
Gentric et al. Expires January 2002 23
RTP Payload Format for MPEG-4 Streams July 2001
a=fmtp:<format> <parameter name>=<value> a=fmtp:<format> <parameter name>=<value>
4.3.2 SDP example 4.3.2 SDP example
The following is an example of SDP syntax for the description of a The following is an example of SDP syntax for the description of a
session containing one MPEG-4 audio stream, one MPEG-4 video and session containing one MPEG-4 video, one MPEG-4 audio stream and
three MPEG-4 system streams, the first one being BIFS, the second three MPEG-4 system streams, the first one being BIFS, the second
Gentric et al. Expires March 2002 25
RTP Payload Format for MPEG-4 Streams September 2001
one OD and the third one IPMP. All are transported using this format one OD and the third one IPMP. All are transported using this format
and the AVP profile [12]. Note that the video stream DTSDelta are and the AVP profile [12]. Note the usage of some MIME parameters:
encoded on 4 bits in this example. See the Appendix for more all stream display their streamtype; the video stream uses DTS with
examples. DTSDelta encoded on 4 bits; the audio stream uses the multiple-SL
mode with 12 bits to describe the size of each SL packet payload.
See the Appendix for more examples.
o= .... o= ....
I= .... I= ....
c=IN IP4 123.234.71.112 c=IN IP4 123.234.71.112
m=video 1034 RTP/AVP 97 m=video 1034 RTP/AVP 97
a=fmtp:97 StreamType=4;DTSDeltaLength=4 a=fmtp:97 StreamType=4;DTSDeltaLength=4
a=rtpmap:97 mpeg4-sl a=rtpmap:97 mpeg4-generic
m=audio 810 RTP/AVP 98 m=audio 1810 RTP/AVP 98
a=fmtp:98 StreamType=5; profile-level-id=1; config=7866E7E6EF a=fmtp:98 StreamType=5; SizeLength=12; profile-level-id=1;
a=rtpmap:98 mpeg4-sl config=7866E7E6EF
a=rtpmap:98 mpeg4-generic
m=application 1234 RTP/AVP 99 m=application 1234 RTP/AVP 99
a=rtpmap:99 mpeg4-sl a=rtpmap:99 mpeg4-generic
a=fmtp:99 StreamType=3; a=fmtp:99 StreamType=3;
m=application 1236 RTP/AVP 99 m=application 1236 RTP/AVP 99
a=rtpmap:99 mpeg4-sl a=rtpmap:99 mpeg4-generic
a=fmtp:99 StreamType=1; a=fmtp:99 StreamType=1;
m=application 1238 RTP/AVP 99 m=application 1238 RTP/AVP 99
a=rtpmap:99 mpeg4-sl a=rtpmap:99 mpeg4-generic
a=fmtp:99 StreamType=7; a=fmtp:99 StreamType=7;
5. Other issues 5. Other issues
5.1 SL packetized stream reconstruction 5.1 SL packetized stream reconstruction
The purpose of this section is to document how a receiver can The purpose of this section is to document how a receiver can
reconstruct a valid SL packetized stream. Since this format directly reconstruct a valid SL packetized stream. Since this format directly
transports SL packets this reconstruction is performed by reversing transports SL packets this reconstruction is performed by reversing
the payload structure rules (section 3). We explicitly describe here the payload structure rules (section 3). We explicitly describe here
the most complex transformations. the most complex transformations.
In the following let (i) be the index of SL packets inside one RTP In the following let (i) be the index of SL packets inside one RTP
packet (starting at zero for each RTP packet), let SLPacketHeader.x packet (starting at zero for each RTP packet), let SLPacketHeader.x
denote field x of the reconstructed SL packet header, let MSLH.x denote field x of the reconstructed SL packet header, let MSLH.x
denote field x of the received MSLH, etc. denote field x of the received MSLH, etc.
SLPacketHeader.packetSequenceNumber is restored from MSLH.Index and SLPacketHeader.packetSequenceNumber is restored from MSLH.Index and
MSLH.IndexDelta using: MSLH.IndexDelta using:
Gentric et al. Expires January 2002 24
RTP Payload Format for MPEG-4 Streams July 2001
If ( IndexLength == 0) { // or is absent If ( IndexLength == 0) { // or is absent
if ( SLConfig.packetSeqNumLength == 0 ) { if ( SLConfig.packetSeqNumLength == 0 ) {
// this stream does not have SL packet sequence number // this stream does not have SL packet sequence number
Gentric et al. Expires March 2002 26
RTP Payload Format for MPEG-4 Streams September 2001
} }
else { else {
// illegal, normally the sender MUST map // illegal, normally the sender MUST map
// SLPacketHeader.packetSequenceNumber in MSLH // SLPacketHeader.packetSequenceNumber in MSLH
// and set a relevant IndexLength value; // and set a relevant IndexLength value;
// otherwise it is unfortunately impossible for the receiver // otherwise it is unfortunately impossible for the receiver
// to reconstruct the correct sequence // to reconstruct the correct sequence
} }
} }
else { // IndexLength is not zero else { // IndexLength is not zero
skipping to change at line 1380 skipping to change at line 1489
SLPacketHeader.packetSequenceNumber(i+1)= SLPacketHeader.packetSequenceNumber(i+1)=
SLPacketHeader.packetSequenceNumber(i) SLPacketHeader.packetSequenceNumber(i)
+ MSLH.IndexDelta(i+1) + MSLH.IndexDelta(i+1)
+1; +1;
} }
} }
All time stamps (CTS, DTS, OCR), when present, are restored from the All time stamps (CTS, DTS, OCR), when present, are restored from the
delta values. Time stamps flags (CTSFlag, DTSFlag) in MSLH are used delta values. Time stamps flags (CTSFlag, DTSFlag) in MSLH are used
to reconstruct respectively the compositionTimeStampFlag and to reconstruct respectively the compositionTimeStampFlag and
decodingTimeStampFlag of SLPacketHeader. decodingTimeStampFlag of SLPacketHeader. The function corrected(x)
for the RTP time stamp transformation is the mapping from 32 bits to
SLConfig.timeStampLength, which may be smaller or larger than 32
bits:
If (timeStampLength < 32 ) { // short SL time stamps
corrected(x) = LSB(x); // only the timeStampLength LSBits of x
}
else If (timeStampLength > 32 ) { // long SL time stamps
corrected(x) = x + m; // start with m=0
if ( x(i) < x(i-1) ) { // 32 bits RTPTS roll over has occurred
{
m += 2^32;
}
}
else If (timeStampLength = 32 ) { // recommended value
corrected(x) = x; // direct mapping
}
if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent
// CTS is not transported for this RTP stream // CTS is not transported for this RTP stream
Gentric et al. Expires March 2002 27
RTP Payload Format for MPEG-4 Streams September 2001
if (i == 0){ // first SL packet in RTP packet if (i == 0){ // first SL packet in RTP packet
if ( SLConfig.useTimeStamps == 1 ) { if ( SLConfig.useTimeStamps == 1 ) {
if ( SLPacketHeader.accessUnitStartFlag == 1 ) { if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
SLPacketHeader.compositionTimeStampFlag(0) = 1; SLPacketHeader.compositionTimeStampFlag(0) = 1;
SLPacketHeader.compositionTimeStamp(0) = RTP TimeStamp; SLPacketHeader.compositionTimeStamp(0) =
corrected(RTP TimeStamp);
} }
else { else {
// ignore // ignore
} }
} }
else { else {
// empty // empty
} }
} }
Gentric et al. Expires January 2002 25
RTP Payload Format for MPEG-4 Streams July 2001
else { // non-first SL packets in RTP packet else { // non-first SL packets in RTP packet
if ( SLConfig.useTimeStamps == 1 ) { if ( SLConfig.useTimeStamps == 1 ) {
if ( SLPacketHeader.accessUnitStartFlag == 1 ) { if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
SLPacketHeader.compositionTimeStampFlag(i) = 0; SLPacketHeader.compositionTimeStampFlag(i) = 0;
} }
else { else {
// ignore // ignore
} }
} }
else { else {
skipping to change at line 1423 skipping to change at line 1550
} }
} }
} }
else { // CTSDeltaLength is not zero else { // CTSDeltaLength is not zero
// CTS is transported for this stream // CTS is transported for this stream
if ( SLConfig.useTimeStamps == 1 ) { if ( SLConfig.useTimeStamps == 1 ) {
if ( SLPacketHeader.accessUnitStartFlag == 1 ) { if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
SLPacketHeader.compositionTimeStampFlag(i) = SLPacketHeader.compositionTimeStampFlag(i) =
MSLH.CTSFlag(i); MSLH.CTSFlag(i);
SLPacketHeader.compositionTimeStamp(i) = SLPacketHeader.compositionTimeStamp(i) =
RTP TimeStamp + MSLH.CTSDelta(i); corrected(RTP TimeStamp) + MSLH.CTSDelta(i);
} }
else { else {
// ignore CTSFlag (which must be zero) // ignore CTSFlag (which must be zero)
} }
else { else {
// this is strange and sub-optimal at best // this is strange and sub-optimal at best
// a receiver should ignore this // a receiver should ignore this
} }
} }
if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent
// DTS is not transported for this stream // DTS is not transported for this stream
if ( SLConfig.useTimeStamps == 1 ) { if ( SLConfig.useTimeStamps == 1 ) {
if ( SLPacketHeader.accessUnitStartFlag == 1 ) { if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
SLPacketHeader.decodingTimeStampFlag(i) = 0; SLPacketHeader.decodingTimeStampFlag(i) = 0;
} }
Gentric et al. Expires March 2002 28
RTP Payload Format for MPEG-4 Streams September 2001
else { else {
// ignore // ignore
} }
} }
else { else {
// empty // empty
} }
} }
else { else {
// DTS is transported for this stream // DTS is transported for this stream
if ( SLConfig.useTimeStamps == 1 ) { if ( SLConfig.useTimeStamps == 1 ) {
if ( SLPacketHeader.accessUnitStartFlag == 1 ) { if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
SLPacketHeader.decodingTimeStampFlag(i) = SLPacketHeader.decodingTimeStampFlag(i) =
MSLH.DTSFlag(i); MSLH.DTSFlag(i);
SLPacketHeader.decodingTimeStamp(i) = SLPacketHeader.decodingTimeStamp(i) =
SLPacketHeader.compositionTimeStamp(i)
Gentric et al. Expires January 2002 26 - MSLH.DTSDelta(i); // DTS <= CTS always
RTP Payload Format for MPEG-4 Streams July 2001
RTP TimeStamp + MSLH.DTSDelta(i);
} }
else { else {
// ignore DTSFlag (which must be zero) // ignore DTSFlag (which must be zero)
} }
} }
else { else {
// this is strange and sub-optimal at best // this is strange and sub-optimal at best
// a receiver should ignore this // a receiver should ignore this
} }
} }
skipping to change at line 1491 skipping to change at line 1619
} }
else { else {
if ( SLConfig.OCRLenght == 0 ) { if ( SLConfig.OCRLenght == 0 ) {
// this is strange and sub-optimal at best // this is strange and sub-optimal at best
// a receiver should ignore this // a receiver should ignore this
} }
else { else {
SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i); SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i);
if ( SLPacketHeader.OCRflag(i) == 1) { if ( SLPacketHeader.OCRflag(i) == 1) {
SLPacketHeader.objectClockReference(i) = SLPacketHeader.objectClockReference(i) =
RTP TimeStamp + RSLH.OCRDelta(i); corrected(RTP TimeStamp) + RSLH.OCRDelta(i);
} }
} }
} }
Gentric et al. Expires March 2002 29
RTP Payload Format for MPEG-4 Streams September 2001
In the SingleSL mode the AccessUnitEndFlag, if needed, is restored In the SingleSL mode the AccessUnitEndFlag, if needed, is restored
from the M bit, as follows: from the M bit, as follows:
if ( SLConfig.useAccessUnitEndFlag == 0 ) { if ( SLConfig.useAccessUnitEndFlag == 0 ) {
// this SL stream does not signal access unit ends // this SL stream does not signal access unit ends
else { else {
SLPacketHeader.AccessUnitEndFlag = M bit; SLPacketHeader.AccessUnitEndFlag = M bit;
} }
In the multipleSL mode the AccessUnitEndFlag is untouched in RSLH. In the multipleSL mode the AccessUnitEndFlag is untouched in RSLH.
The other SL packet header fields SHALL remain as found in RSLH. The other SL packet header fields SHALL remain as found in RSLH.
It is obvious that in the general case the reconstruction of the It is obvious that in the general case the reconstruction of the
original SL packetized stream requires SL-awareness. However this original SL packetized stream requires SL-awareness. However this
Gentric et al. Expires January 2002 27
RTP Payload Format for MPEG-4 Streams July 2001
payload format allows in all cases a receiver that does not know payload format allows in all cases a receiver that does not know
about the SL syntax to reconstruct the semantic of SL for the about the SL syntax to reconstruct the semantic of SL for the
following very useful features: following very useful features:
- Packet order (decoding order) - Packet order (decoding order)
- Access Unit boundaries (using the M bit) - Access Unit boundaries (using the M bit)
- Access Unit fragments (i.e. SL packet boundaries using - Access Unit fragments (i.e. SL packet boundaries using
MSLH.PayloadSize) MSLH.PayloadSize)
- Composition Time Stamps (using the RTP Time Stamp and - Composition Time Stamps (using the RTP Time Stamp and
MSLH.CTSDelta) MSLH.CTSDelta)
- Decoding Time Stamps (using the RTP Time Stamp and MSLH.DTSDelta) - Decoding Time Stamps (using the RTP Time Stamp and MSLH.DTSDelta)
skipping to change at line 1545 skipping to change at line 1672
a rather new concept and for that reasons some specific comments are a rather new concept and for that reasons some specific comments are
needed. needed.
Typically scene descriptions are encoded in such a way that Typically scene descriptions are encoded in such a way that
information loss would in the general case cripple the presentation information loss would in the general case cripple the presentation
beyond any hope of repair by the receiver. Still this is well suited beyond any hope of repair by the receiver. Still this is well suited
for a number of multimedia applications were the scene is first made for a number of multimedia applications were the scene is first made
available via reliable channels to the client and then played. This available via reliable channels to the client and then played. This
payload format is not intended for this type of applications for payload format is not intended for this type of applications for
which download of MPEG-4 interchange (.mp4) files is typical. which download of MPEG-4 interchange (.mp4) files is typical.
However it can also be used if the RTP packets are transported using However this payload format can also be used. It is then RECOMMENDED
TCP or any other reliable protocol. that the RTP packets should be transported using TCP (for example
inside RTSP as described in [13, section 10.12]) or any other
reliable protocol.
On the other hand MPEG-4 has introduced the possibility to On the other hand MPEG-4 has introduced the possibility to
dynamically change the scene description by sending animation dynamically change the scene description by sending animation
Gentric et al. Expires March 2002 30
RTP Payload Format for MPEG-4 Streams September 2001
information (changes in parameters) and structural change information (changes in parameters) and structural change
information (updates). Since this information has to be sent in a information (updates). Since this information has to be sent in a
timely fashion MPEG-4 has defined a number of techniques in order to timely fashion MPEG-4 has defined a number of techniques in order to
encode the scene description in a manner that makes it behave encode the scene description in a manner that makes it behave
similarly to other temporal encoding schemes such as audio and similarly to other temporal encoding schemes such as audio and
video. This payload format is intended for this usage. video. This payload format is intended for this usage.
Note that in many cases the application will consist of first the Note that in many cases the application will consist of first the
reliable transmission of a static initial scene followed by the reliable transmission of a static initial scene followed by the
streaming of animations and updates. For this reason the usage of streaming of animations and updates. For this reason the usage of
this payload format is attractive since it offers a unique solution. this payload format is attractive since it offers a unique solution.
Senders must be aware that suitable schemes should be used when Senders must be aware that suitable schemes should be used when
scene description streams transport sensitive configuration scene description streams transport sensitive configuration
information. For example in case the RTP packet transporting an OD- information. For example in case the RTP packet transporting an OD-
update command would be lost, the corresponding media stream would update command would be lost, the corresponding media stream would
not be accessible by the receiver. not be accessible by the receiver.
Gentric et al. Expires January 2002 28
RTP Payload Format for MPEG-4 Streams July 2001
Redundancy is a possibility and may either be added by tools Redundancy is a possibility and may either be added by tools
hierarchically higher than this payload format, e.g. by packet based hierarchically higher than this payload format, e.g. by packet based
FEC, re-transmission, or similar tools. In such a case, the general FEC, re-transmission, or similar tools. In such a case, the general
congestion control principles have to be observed. congestion control principles have to be observed.
Since BIFS and OD streams may be modified during the session with Since BIFS and OD streams may be modified during the session with
update commands, there is a need to send both update commands and update commands, there is a need to send both update commands and
full BIFS/OD refresh. For that reason MPEG-4 defines Random Access full BIFS/OD refresh. For that reason MPEG-4 defines Random Access
Points (RAP) for scene description streams (OD and BIFS) where by Points (RAP) for scene description streams (OD and BIFS) where by
definition a decoder can restart decoding i.e. receives a "full definition a decoder can restart decoding i.e. receives a "full
update" of the scene. This mechanism is called Scene and Object update" of the scene. This mechanism is called Scene and Object
Description Carrousel. The AU Sequence Number field of SL Packet Description Carousel. The AU Sequence Number field of SL Packet
Header is used to support this behavior at the Synchronization Header is used to support this behavior at the Synchronization
Layer. When two access units are sent consecutively with the same AU Layer. When two access units are sent consecutively with the same AU
Sequence Number, the second one is assumed to be a semantic Sequence Number, the second one is assumed to be a semantic
repetition of the first. If a receiver starts to listen in the repetition of the first. If a receiver starts to listen in the
middle of a session or has detected losses, it can skip all received middle of a session or has detected losses, it can skip all received
Access Units until such a RAP. The periodicity of transmission of Access Units until such a RAP. The periodicity of transmission of
these RAPs should be chosen/adjusted depending on the application these RAPs should be chosen/adjusted depending on the application
and the network it is deployed on; i.e. exactly like Intra-coded and the network it is deployed on; i.e. exactly like Intra-coded
frames for video, it is the responsibility of the sender to make frames for video, it is the responsibility of the sender to make
sure the periodicity of RAPs is suitable. sure the periodicity of RAPs is suitable.
skipping to change at line 1606 skipping to change at line 1736
An advanced MPEG-4 session may involve a large number of objects An advanced MPEG-4 session may involve a large number of objects
that may be as many as a few hundred, transporting each ES as an that may be as many as a few hundred, transporting each ES as an
individual RTP stream may not always be practical. Allocating and individual RTP stream may not always be practical. Allocating and
controlling hundreds of destination addresses for each MPEG-4 controlling hundreds of destination addresses for each MPEG-4
session may pose insurmountable session administration problems. session may pose insurmountable session administration problems.
The input/output processing overhead at the end-points will be The input/output processing overhead at the end-points will be
extremely high also. Additionally, low delay transmission of low extremely high also. Additionally, low delay transmission of low
bitrate data streams, e.g. facial animation parameters, results in bitrate data streams, e.g. facial animation parameters, results in
extremely high header overheads. extremely high header overheads.
Gentric et al. Expires March 2002 31
RTP Payload Format for MPEG-4 Streams September 2001
To solve these problems, MPEG-4 data transport requires a To solve these problems, MPEG-4 data transport requires a
multiplexing scheme that allows selective bundling of several ESs. multiplexing scheme that allows selective bundling of several ESs.
This is beyond the scope of the payload format defined here. This is beyond the scope of the payload format defined here.
The MPEG-4's Flexmux multiplexing scheme may be used for this The MPEG-4's Flexmux multiplexing scheme may be used for this
purpose and a specific RTP payload format is being developed [11]. purpose and a specific RTP payload format is being developed [11].
Another approach may be to develop a generic RTP multiplexing scheme Another approach may be to develop a generic RTP multiplexing scheme
usable for MPEG-4 data. The multiplexing scheme reported in [8] may usable for MPEG-4 data. The multiplexing scheme reported in [8] may
be a candidate for this approach. be a candidate for this approach.
For MPEG-4 applications, the multiplexing technique needs to address For MPEG-4 applications, the multiplexing technique needs to address
the following requirements: the following requirements:
i. The ESs multiplexed in one stream can change frequently during a i. The ESs multiplexed in one stream can change frequently during a
session. Consequently, the coding type, individual packet size and session. Consequently, the coding type, individual packet size and
temporal relationships between the multiplexed data units must be temporal relationships between the multiplexed data units must be
handled dynamically. handled dynamically.
Gentric et al. Expires January 2002 29
RTP Payload Format for MPEG-4 Streams July 2001
ii. The multiplexing scheme should have a mechanism to determine the ii. The multiplexing scheme should have a mechanism to determine the
ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is
not a part of the SL header. not a part of the SL header.
iii. In general, an SL packet does not contain information about its iii. In general, an SL packet does not contain information about its
size. The multiplexing scheme should be able to delineate the size. The multiplexing scheme should be able to delineate the
multiplexed packets whose lengths may vary from a few bytes to close multiplexed packets whose lengths may vary from a few bytes to close
to the path-MTU. to the path-MTU.
5.5 Overlap with RFC 3016 5.5 Overlap with RFC 3016
skipping to change at line 1662 skipping to change at line 1792
sequence-end-code transported in-band. sequence-end-code transported in-band.
6. Security Considerations 6. Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [5]. This implies that confidentiality of the media specification [5]. This implies that confidentiality of the media
streams is achieved by encryption. Because the data compression used streams is achieved by encryption. Because the data compression used
with this payload format is applied end-to-end, encryption may be with this payload format is applied end-to-end, encryption may be
performed on the compressed data so there is no conflict between the performed on the compressed data so there is no conflict between the
Gentric et al. Expires March 2002 32
RTP Payload Format for MPEG-4 Streams September 2001
two operations. The packet processing complexity of this payload two operations. The packet processing complexity of this payload
type (i.e. excluding media data processing) does not exhibit any type (i.e. excluding media data processing) does not exhibit any
significant non-uniformity in the receiver side to cause a denial- significant non-uniformity in the receiver side to cause a denial-
of-service threat. of-service threat.
However, it is possible to inject non-compliant MPEG streams (Audio, However, it is possible to inject non-compliant MPEG streams (Audio,
Video, and Systems) to overload the receiver/decoder's buffers which Video, and Systems) to overload the receiver/decoder's buffers which
might compromise the functionality of the receiver or even crash it. might compromise the functionality of the receiver or even crash it.
This is especially true for end-to-end systems like MPEG where the This is especially true for end-to-end systems like MPEG where the
buffer models are precisely defined. buffer models are precisely defined.
MPEG-4 Systems supports stream types including commands that are MPEG-4 Systems supports stream types including commands that are
executed on the terminal like OD commands, BIFS commands, etc. and executed on the terminal like OD commands, BIFS commands, etc. and
programmatic content like MPEG-J (Java(TM) Byte Code) and programmatic content like MPEG-J (Java(TM) Byte Code) and
ECMAScript. It is possible to use one or more of the above in a ECMAScript. It is possible to use one or more of the above in a
manner non-compliant to MPEG to crash or temporarily make the manner non-compliant to MPEG to crash or temporarily make the
receiver unavailable. receiver unavailable.
Gentric et al. Expires January 2002 30
RTP Payload Format for MPEG-4 Streams July 2001
Authentication mechanisms can be used to validate of the sender and Authentication mechanisms can be used to validate of the sender and
the data to prevent security problems due to non-compliant malignant the data to prevent security problems due to non-compliant malignant
MPEG-4 streams. MPEG-4 streams.
A security model is defined in MPEG-4 Systems streams carrying MPEG- A security model is defined in MPEG-4 Systems streams carrying MPEG-
J access units which comprises Java(TM) classes and objects. MPEG-J J access units which comprises Java(TM) classes and objects. MPEG-J
defines a set of Java APIs and a secure execution model. MPEG-J defines a set of Java APIs and a secure execution model. MPEG-J
content can call this set of APIs and Java(TM) methods from a set of content can call this set of APIs and Java(TM) methods from a set of
Java packages supported in the receiver within the defined security Java packages supported in the receiver within the defined security
model. According to this security model, downloaded byte code is model. According to this security model, downloaded byte code is
skipping to change at line 1702 skipping to change at line 1833
model. According to this security model, downloaded byte code is model. According to this security model, downloaded byte code is
forbidden to load libraries, define native methods, start programs, forbidden to load libraries, define native methods, start programs,
read or write files, or read system properties. read or write files, or read system properties.
Receivers can implement intelligent filters to validate the buffer Receivers can implement intelligent filters to validate the buffer
requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J,
ECMAScript) commands in the streams. However, this can increase the ECMAScript) commands in the streams. However, this can increase the
complexity significantly. complexity significantly.
7. Acknowledgements 7. Acknowledgements
This document evolved across several years thanks to contributions This document evolved across several years thanks to contributions
from a large number of people since it is based on work within the from a large number of people since it is based on work within the
IETF AVT working group and various ISO MPEG working groups, IETF AVT working group and various ISO MPEG working groups,
especially the 4-on-IP ad-hoc group in the last stages. The authors especially the 4-on-IP ad-hoc group. The authors wish to thank
wish to thank Guido Fransceschini, Art Howarth, Dave Mackie, Dave Olivier Avaro, Stephen Casner, Guido Fransceschini, Art Howarth,
Singer, and Stephan Wenger for their valuable comments. Dave Mackie, Dave Singer, and Stephan Wenger for their valuable
comments and support. Attentive readers and early implementers also
found flaws and bugs, thank you all.
8. References 8. References
[1] ISO/IEC 14496-1:2001 MPEG-4 Systems [1] ISO/IEC 14496-1:2001 MPEG-4 Systems
[2] ISO/IEC 14496-2:2001 MPEG-4 Visual [2] ISO/IEC 14496-2:2001 MPEG-4 Visual
Gentric et al. Expires March 2002 33
RTP Payload Format for MPEG-4 Streams September 2001
[3] ISO/IEC 14496-3:2001 MPEG-4 Audio [3] ISO/IEC 14496-3:2001 MPEG-4 Audio
[4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework. [4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework.
[5] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A [5] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A
Transport Protocol for Real Time Applications, RFC 1889, Internet Transport Protocol for Real Time Applications, RFC 1889, Internet
Engineering Task Force, January 1996. Engineering Task Force, January 1996.
[6] S. Bradner, Key words for use in RFCs to Indicate Requirement [6] S. Bradner, Key words for use in RFCs to Indicate Requirement
Levels, RFC 2119, Internet Engineering Task Force, March 1997. Levels, RFC 2119, Internet Engineering Task Force, March 1997.
[7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP
payload format for MPEG-4 Audio/Visual streams, Internet Engineering payload format for MPEG-4 Audio/Visual streams, Internet Engineering
Task Force, RFC 3016. Task Force, RFC 3016.
[8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed
RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-02.txt, RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-04.txt, July
November 2000. 2001.
Gentric et al. Expires January 2002 31
RTP Payload Format for MPEG-4 Streams July 2001
[9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over
IP-based Protocols, work in progress, draft-singer-mpeg4-ip-02.txt, IP-based Protocols, work in progress, draft-singer-mpeg4-ip-02.txt,
May 2001. May 2001.
[10] M. Handley, V. Jacobson, SDP: Session Description Protocol, RFC [10] M. Handley, V. Jacobson, SDP: Session Description Protocol, RFC
2327, Internet Engineering Task Force, April 1998. 2327, Internet Engineering Task Force, April 1998.
[11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed
Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt, Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt,
skipping to change at line 1760 skipping to change at line 1894
January 1996. January 1996.
[13] H. Schulzrinne, A. Rao, R. Lanphier, Real Time Streaming [13] H. Schulzrinne, A. Rao, R. Lanphier, Real Time Streaming
Protocol, RFC 2326, Internet Engineering Task Force, April 1998. Protocol, RFC 2326, Internet Engineering Task Force, April 1998.
[14] M. Handley, C. Perkins, E. Whelan, Session Announcement [14] M. Handley, C. Perkins, E. Whelan, Session Announcement
Protocol, RFC 2974, Internet Engineering Task Force, October 2000. Protocol, RFC 2974, Internet Engineering Task Force, October 2000.
9. Authors' Addresses 9. Authors' Addresses
Olivier Avaro
France Telecom
35 A Schutzenhuttenweg
60598 Frankfurt am Main
Deutschland
e-mail: olivier.avaro@francetelecom.fr
Andrea Basso Andrea Basso
AT&T Labs Research AT&T Labs Research
200 Laurel Avenue 200 Laurel Avenue
Middletown, NJ 07748 Middletown, NJ 07748
USA USA
e-mail: basso@research.att.com e-mail: basso@research.att.com
Stephen L. Casner
Packet Design, Inc.
66 Willow Place
Menlo Park, CA 94025
USA
e-mail: casner@acm.org
M. Reha Civanlar M. Reha Civanlar
AT&T Labs - Research AT&T Labs - Research
100 Schultz Drive 200 Laurel Ave. South, A5 4D04
Red Bank, NJ 07701
Gentric et al. Expires March 2002 34
RTP Payload Format for MPEG-4 Streams September 2001
Middletown, NJ 07748
USA USA
e-mail: civanlar@research.att.com e-mail: civanlar@research.att.com
Philippe Gentric Philippe Gentric
Philips Digital Networks, MP4Net
Gentric et al. Expires January 2002 32
RTP Payload Format for MPEG-4 Streams July 2001
Philips Digital Networks MP4Net
51 rue Carnot 51 rue Carnot
92156 Suresnes 92156 Suresnes
France France
e-mail: philippe.gentric@philips.com e-mail: philippe.gentric@philips.com
Carsten Herpel Carsten Herpel
THOMSON multimedia THOMSON multimedia
Karl-Wiechert-Allee 74 Karl-Wiechert-Allee 74
30625 Hannover 30625 Hannover
Germany Germany
skipping to change at line 1829 skipping to change at line 1949
Colin Perkins Colin Perkins
USC Information Sciences Institute USC Information Sciences Institute
4350 N. Fairfax Drive #620 4350 N. Fairfax Drive #620
Arlington, VA 22203 Arlington, VA 22203
USA USA
e-mail : csp@isi.edu e-mail : csp@isi.edu
Jan van der Meer Jan van der Meer
Philips Digital Networks Philips Digital Networks
Cederlaan 4 Building WDB-1
5600 JB Eindhoven Prof Holstlaan 4
5656 AA Eindhoven
Netherlands Netherlands
e-mail : jan.vandermeer@philips.com e-mail : jan.vandermeer@philips.com
Gentric et al. Expires March 2002 35
RTP Payload Format for MPEG-4 Streams September 2001
APPENDIX: Examples of usage APPENDIX: Examples of usage
This payload format has been designed to transport efficiently a This payload format has been designed to transport efficiently a
very versatile packetization scheme: the MPEG-4 Synch Layer; as a very versatile packetization scheme: the MPEG-4 Synch Layer; as a
result its complexity is larger than the average RTP payload format. result its complexity is larger than the average RTP payload format.
Gentric et al. Expires January 2002 33
RTP Payload Format for MPEG-4 Streams July 2001
For this reason this section describes a number of key examples of For this reason this section describes a number of key examples of
how this payload format can be used. how this payload format can be used.
A C++-like syntax called SDL (Syntactic Description Language) A C++-like syntax called SDL (Syntactic Description Language)
defined in [1, section 14] is used to economically describe MPEG-4 defined in [1, section 14] is used to economically describe MPEG-4
system data structures. system data structures.
However, as discussed in section 2, this payload format can also be However, as discussed in section 2, this payload format can also be
used without explicit knowledge of SL (logically equivalent to used without explicit knowledge of SL (logically equivalent to
configuring the SL headers as being empty), several examples configuring the SL headers as being empty), several examples
skipping to change at line 1891 skipping to change at line 2011
bit(8) timeStampLength; = 0 bit(8) timeStampLength; = 0
bit(8) OCRLength; = 0 bit(8) OCRLength; = 0
bit(8) AU_Length; = 0 bit(8) AU_Length; = 0
bit(8) instantBitrateLength; = 0 bit(8) instantBitrateLength; = 0
bit(4) degradationPriorityLength; = 0 bit(4) degradationPriorityLength; = 0
bit(5) AU_seqNumLength; = 0 bit(5) AU_seqNumLength; = 0
bit(5) packetSeqNumLength; = 0 bit(5) packetSeqNumLength; = 0
bit(2) reserved=0b11; bit(2) reserved=0b11;
} }
if (durationFlag) { if (durationFlag) {
Gentric et al. Expires March 2002 36
RTP Payload Format for MPEG-4 Streams September 2001
bit(32) timeScale; // NOT USED bit(32) timeScale; // NOT USED
bit(16) accessUnitDuration; // NOT USED bit(16) accessUnitDuration; // NOT USED
bit(16) compositionUnitDuration; // NOT USED bit(16) compositionUnitDuration; // NOT USED
} }
if (!useTimeStampsFlag) { if (!useTimeStampsFlag) {
Gentric et al. Expires January 2002 34
RTP Payload Format for MPEG-4 Streams July 2001
bit(timeStampLength) startDecodingTimeStamp; = 0 bit(timeStampLength) startDecodingTimeStamp; = 0
bit(timeStampLength) startCompositionTimeStamp; = 0 bit(timeStampLength) startCompositionTimeStamp; = 0
} }
} }
SL Packet Header structure SL Packet Header structure
With this configuration we have the following SL packet header With this configuration we have the following SL packet header
structure: structure:
skipping to change at line 1948 skipping to change at line 2068
In this example we have an RTP overhead of 40 bytes for 1400 bytes In this example we have an RTP overhead of 40 bytes for 1400 bytes
of payload i.e. 3 % overhead. of payload i.e. 3 % overhead.
Appendix.2 MPEG-4 Video with SL Appendix.2 MPEG-4 Video with SL
Let us consider the case of a 30 frames per second MPEG-4 video Let us consider the case of a 30 frames per second MPEG-4 video
stream which bit rate is high enough that Access Units have to be stream which bit rate is high enough that Access Units have to be
split in several SL packets (typically above 300 kb/s). split in several SL packets (typically above 300 kb/s).
Gentric et al. Expires March 2002 37
RTP Payload Format for MPEG-4 Streams September 2001
Let us assume also that the video codec generates in that case Video Let us assume also that the video codec generates in that case Video
Packets suitable to fit in one SL packet i.e that the video codec is Packets suitable to fit in one SL packet i.e that the video codec is
MTU aware and the MTU is 1500 bytes. We assume furthermore that this MTU aware and the MTU is 1500 bytes. We assume furthermore that this
stream contains B frames and that decodingTimeStamps are present. stream contains B frames and that decodingTimeStamps are present.
Gentric et al. Expires January 2002 35
RTP Payload Format for MPEG-4 Streams July 2001
SLConfigDescriptor SLConfigDescriptor
In this example the SLConfigDescriptor is: In this example the SLConfigDescriptor is:
class SLConfigDescriptor extends BaseDescriptor : bit(8) class SLConfigDescriptor extends BaseDescriptor : bit(8)
tag=SLConfigDescrTag { tag=SLConfigDescrTag {
bit(8) predefined; bit(8) predefined;
if (predefined==0) { if (predefined==0) {
bit(1) useAccessUnitStartFlag; = 1 bit(1) useAccessUnitStartFlag; = 1
bit(1) useAccessUnitEndFlag; = 0 bit(1) useAccessUnitEndFlag; = 0
skipping to change at line 2004 skipping to change at line 2124
The useRandomAccessPointFlag is set so that the The useRandomAccessPointFlag is set so that the
randomAccessPointFlag can indicate that the corresponding SL packet randomAccessPointFlag can indicate that the corresponding SL packet
contains a GOV and the first Video Packet of an Intra coded frame. contains a GOV and the first Video Packet of an Intra coded frame.
SL Packet Header structure SL Packet Header structure
With this configuration we have the following SL packet header With this configuration we have the following SL packet header
structure: structure:
aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
Gentric et al. Expires March 2002 38
RTP Payload Format for MPEG-4 Streams September 2001
bit(1) accessUnitStartFlag; // 1 bit bit(1) accessUnitStartFlag; // 1 bit
if (accessUnitStartFlag) { if (accessUnitStartFlag) {
bit(1) randomAccessPointFlag; // 1 bit bit(1) randomAccessPointFlag; // 1 bit
bit(1) decodingTimeStampFlag; // 1 bit bit(1) decodingTimeStampFlag; // 1 bit
bit(1) compositionTimeStampFlag; // 1 bit bit(1) compositionTimeStampFlag; // 1 bit
Gentric et al. Expires January 2002 36
RTP Payload Format for MPEG-4 Streams July 2001
if (decodingTimeStampFlag) { if (decodingTimeStampFlag) {
bit(SL.timeStampLength) decodingTimeStamp; bit(SL.timeStampLength) decodingTimeStamp;
} }
if (compositionTimeStampFlag) { if (compositionTimeStampFlag) {
bit(SL.timeStampLength) compositionTimeStamp; bit(SL.timeStampLength) compositionTimeStamp;
} }
} }
Parameters Parameters
decodingTimeStamps are encoded on 32 bits, which is much more than decodingTimeStamps are encoded on 32 bits, which is much more than
needed for delta. Therefore the sender will use DTSDeltaLength to needed for delta. Therefore the sender will use DTSDeltaLength to
signal that only 7 bits are used for the coding of relative DTS in signal that only 7 bits are used for the coding of relative DTS in
the RTP packet. the RTP packet.
The RSLHSectionSize cannot exceed 2 bits, which is encoded on 2 bits The RSLHSectionSize cannot exceed 4 (bits), which is encoded on 3
and signaled by RSLHSectionSizeLength. The resulting concatenated bits and signaled by RSLHSectionSizeLength. The resulting
fmtp line is: concatenated fmtp line is:
a=fmtp:<format> DTSDeltaLength=7;RSLHSectionSizeLength=3 a=fmtp:<format> DTSDeltaLength=7;RSLHSectionSizeLength=3
RTP packet structure RTP packet structure
Two cases can occur; for packets that transport first fragments of Two cases can occur; for packets that transport first fragments of
Access Units we have: Access Units we have:
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DTSFlag = 1 | 1 bit | | DTSFlag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DTSDelta | 7 bits | | DTSDelta | 7 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to byte alignment | 0 bits | | bits to byte alignment | 0 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| RSLHSectionSize = 4 | 3 bits | | RSLHSectionSize = (100) | 3 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitStartFlag = 1 | 1 bit | | accessUnitStartFlag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| randomAccessPointFlag | 1 bit | | randomAccessPointFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| decodingTimeStampFlag | 1 bit | | decodingTimeStampFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| compositionTimeStampFlag | 1 bit | | compositionTimeStampFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to byte alignment | 1 bit | | bits to byte alignment =(0) | 1 bit |
Gentric et al. Expires March 2002 39
RTP Payload Format for MPEG-4 Streams September 2001
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| SL packet payload | N bytes | | SL packet payload | N bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Gentric et al. Expires January 2002 37
RTP Payload Format for MPEG-4 Streams July 2001
For packets that transport non-first fragments of Access Units we For packets that transport non-first fragments of Access Units we
have: have:
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DTSFlag = 0 | 1 bit | | DTSFlag = 0 | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to byte alignment | 7 bits | | bits to byte alignment = (0000000) | 7 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| RSLHSectionSize = 1 | 3 bits | | RSLHSectionSize = (001) | 3 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitStartFlag = 0 | 1 bit | | accessUnitStartFlag = (0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to byte alignment | 4 bits | | bits to byte alignment = (0000) | 4 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| SL packet payload | N bytes | | SL packet payload | N bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Overhead estimation Overhead estimation
In this example we have a RTP overhead of 40 + 2 bytes for 1400 In this example we have a RTP overhead of 40 + 2 bytes for 1400
bytes of payload i.e. 3 % overhead. bytes of payload i.e. 3 % overhead.
Appendix.3 Low delay MPEG-4 Audio (no SL) Appendix.3 Low delay MPEG-4 Audio (no SL)
skipping to change at line 2116 skipping to change at line 2237
We also assume here an audio Object Type for which all Access Units We also assume here an audio Object Type for which all Access Units
are Random Access Points, which is signaled using the are Random Access Points, which is signaled using the
hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor.
We assume furthermore a mode where the Access Unit size is constant We assume furthermore a mode where the Access Unit size is constant
and equal to 5 bytes (which is signaled with AU_Length). and equal to 5 bytes (which is signaled with AU_Length).
In this example the SLConfigDescriptor is: In this example the SLConfigDescriptor is:
class SLConfigDescriptor extends BaseDescriptor : bit(8) class SLConfigDescriptor extends BaseDescriptor : bit(8)
Gentric et al. Expires March 2002 40
RTP Payload Format for MPEG-4 Streams September 2001
tag=SLConfigDescrTag { tag=SLConfigDescrTag {
bit(8) predefined; bit(8) predefined;
if (predefined==0) { if (predefined==0) {
bit(1) useAccessUnitStartFlag; = 0 bit(1) useAccessUnitStartFlag; = 0
bit(1) useAccessUnitEndFlag; = 0 bit(1) useAccessUnitEndFlag; = 0
Gentric et al. Expires January 2002 38
RTP Payload Format for MPEG-4 Streams July 2001
bit(1) useRandomAccessPointFlag; = 0 bit(1) useRandomAccessPointFlag; = 0
bit(1) hasRandomAccessUnitsOnlyFlag; = 1 bit(1) hasRandomAccessUnitsOnlyFlag; = 1
bit(1) usePaddingFlag; = 0 bit(1) usePaddingFlag; = 0
bit(1) useTimeStampsFlag; = 0 bit(1) useTimeStampsFlag; = 0
bit(1) useIdleFlag; = 0 bit(1) useIdleFlag; = 0
bit(1) durationFlag; = 1 // signals constant AU duration bit(1) durationFlag; = 1 // signals constant AU duration
bit(32) timeStampResolution; = 0 bit(32) timeStampResolution; = 0
bit(32) OCRResolution; = 0 bit(32) OCRResolution; = 0
bit(8) timeStampLength; = 0 bit(8) timeStampLength; = 0
bit(8) OCRLength; = 0 bit(8) OCRLength; = 0
skipping to change at line 2173 skipping to change at line 2294
RTP packet structure RTP packet structure
Note that the RTP header M bit should be always set to 1. Note that the RTP header M bit should be always set to 1.
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| SL packet payload | 5 bytes | | SL packet payload | 5 bytes |
Gentric et al. Expires March 2002 41
RTP Payload Format for MPEG-4 Streams September 2001
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Overhead estimation Overhead estimation
Gentric et al. Expires January 2002 39
RTP Payload Format for MPEG-4 Streams July 2001
The overhead is extremely large i.e. more than 800 %, since 40 bytes The overhead is extremely large i.e. more than 800 %, since 40 bytes
of headers are required to transport 5 bytes of data. Note however of headers are required to transport 5 bytes of data. Note however
that RTP header compression would work well since time stamps that RTP header compression would work well since time stamps
increments are constant. increments are constant.
Appendix.4 Media delivery MPEG-4 Audio (no SL) Appendix.4 Media delivery MPEG-4 Audio (no SL)
This example is for a media delivery service where delay is not an This example is for a media delivery service where delay is not an
issue but efficiency is. In this case several SL Packets are issue but efficiency is. In this case several SL Packets are
transported in each RTP packet. transported in each RTP packet.
skipping to change at line 2214 skipping to change at line 2336
is empty. is empty.
The size of SL Packets (which are all complete Access Units in this The size of SL Packets (which are all complete Access Units in this
case) is constant and is indicated with: case) is constant and is indicated with:
a=fmtp:<format> ConstantSize=5 a=fmtp:<format> ConstantSize=5
This also indicates to the receiver that the Multiple-SL mode will This also indicates to the receiver that the Multiple-SL mode will
be used, the 2 bytes field that would give the size of the be used, the 2 bytes field that would give the size of the
MSLHSection is ommited since in this case this field always contains MSLHSection is ommited since in this case this field always contains
zero (the MSLHSection is always empty). zero (the MSLHSection is always empty due to the absence of any
other MIME parameter).
RTP packet structure RTP packet structure
Note that the RTP header M bit is always set to 1, which indicates Note that the RTP header M bit is always set to 1, which indicates
to the receiver that only complete Access Units are transported. to the receiver that only complete Access Units are transported.
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
Gentric et al. Expires March 2002 42
RTP Payload Format for MPEG-4 Streams September 2001
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| SL packet payload | 5 bytes | | SL packet payload | 5 bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| SL packet payload | 5 bytes | | SL packet payload | 5 bytes |
Gentric et al. Expires January 2002 40
RTP Payload Format for MPEG-4 Streams July 2001
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| etc, until MTU is reached | | etc, until MTU is reached |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| SL packet payload | 5 bytes | | SL packet payload | 5 bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Overhead estimation Overhead estimation
The overhead is 3% i.e. minimal. The overhead is 3% i.e. minimal.
Appendix.5 AAC with interleaving (no SL) Appendix.5 AAC with interleaving (no SL)
Let us consider AAC at 128 kb/s where each Access Unit is in the Let us consider AAC at 128 kb/s where each Access Unit is in the
average 320 bytes. Interleaving is applied with a continuous average 320 bytes. Interleaving is applied with a continuous
interleaving scheme (see table below) where 4 Access Units are used interleaving scheme (see table below) where 4 Access Units are used
to construct each RTP packet in order to match a MTU of 1500 bytes. to construct each RTP packet in order to match a MTU of 1500 bytes.
IndexDelta is constant and equal to 2 (since +1 is automatically IndexDelta is constant and equal to 2 (since +1 is automatically
added); it is encoded on 3 bits. added); it is encoded on 2 bits.
Index (being encoded on 3 bits) rolls over very fast and is not very As explained in section 3.8 this is a time stamp based interleaving
useful for reordering. However this a case as explained in section scheme (IndexLength=0); indeed receivers know that each SL packet is
3.8 where time stamps should be used for de-interleaving; receivers a complete Access Unit because all RTP packets have the M bit set to
know that each SL packet is a complete Access Unit because all RTP 1 and therefore, since Access Unit duration is constant, Access Unit
packets have the M bit set to 1 and therefore, since Access Unit timestamps can be computed from RTP timestamps and IndexDelta
duration is constant, Access Unit timestamps can be computed from values; this can be used for de-interleaving even in case of losses.
RTP timestamps and IndexDelta values; this can be used for de-
interleaving even in case of losses. Note that it would also be possible to use IndexLength=2 so as to
maintain a byte alignement in the MSLH portions; in this case
however the value of these two bits MUST be zero as stated in 3.8.1.
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| RTP packet | RTP Timestamp | Aus | Index,IndexDelta | | RTP packet | RTP Timestamp | Aus | IndexDelta |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 1 | CTS(AU1) | 1 | 1 | | 1 | CTS(AU1) | 1 | - |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 2 | CTS(AU2) | 2, 5 | 2,2 | | 2 | CTS(AU2) | 2, 5 | -,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 3 | CTS(AU3) | 3, 6, 9 | 3,2,2 | | 3 | CTS(AU3) | 3, 6, 9 | -,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 4 | CTS(AU4) | 4, 7,10,13 | 4,2,2,2 | | 4 | CTS(AU4) | 4, 7,10,13 | -,2,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 5 | CTS(AU8) | 8,11,14,17 | 0,2,2,2 | | 5 | CTS(AU8) | 8,11,14,17 | -,2,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 6 | CTS(AU12) | 12,15,18,21 | 4,2,2,2 | | 6 | CTS(AU12) | 12,15,18,21 | -,2,2,2 |
Gentric et al. Expires March 2002 43
RTP Payload Format for MPEG-4 Streams September 2001
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 7 | CTS(AU16) | 16,19,22,25 | 0,2,2,2 | | 7 | CTS(AU16) | 16,19,22,25 | -,2,2,2 |
+----------------------------------------------------------------+ +----------------------------------------------------------------+
| 8 | CTS(AU20) | 20,23,26,29 | 4,2,2,2 | | 8 | CTS(AU20) | 20,23,26,29 | -,2,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 9 | CTS(AU24) | 24,27,30,33 | 0,2,2,2 | | 9 | CTS(AU24) | 24,27,30,33 | -,2,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 10 | CTS(AU28) | 28,31,34,37 | 4,2,2,2 | | 10 | CTS(AU28) | 28,31,34,37 | -,2,2,2 |
Gentric et al. Expires January 2002 41
RTP Payload Format for MPEG-4 Streams July 2001
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| etc | | etc |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
SLConfigDescriptor SLConfigDescriptor
Similar to previous example. Similar to previous example.
SL Packet Header SL Packet Header
Similar to previous example (empty). Similar to previous example (empty).
Parameters Parameters
The resulting concatenated fmtp line is: The resulting concatenated fmtp line is:
a=fmtp:<format> SizeLength=13;IndexLength=3;IndexDeltaLength=3 a=fmtp:<format> SizeLength=9; IndexDeltaLength=2;
RTP packet structure RTP packet structure
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
MSLHSection MSLHSection
+=========================================+=============+ +=========================================+=============+
| MSLHSection size in bits = 135 | 2 bytes | | MSLHSection size in bits = 42 bits | 2 bytes |
+-----------------------------------------+-------------+
| PayloadSize | 13 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Index | 3 bits | | PayloadSize | 9 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 13 bits | | PayloadSize | 9 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| IndexDelta | 3 bits | | IndexDelta | 2 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 13 bits | | PayloadSize | 9 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| IndexDelta | 3 bits | | IndexDelta | 2 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 13 bits | | PayloadSize | 9 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| IndexDelta | 3 bits | | IndexDelta | 2 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to byte alignment | 0 bits | | bits to byte alignment = (000000) | 6 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Gentric et al. Expires March 2002 44
RTP Payload Format for MPEG-4 Streams September 2001
SLPPSection SLPPSection
+=========================================+=============+ +=========================================+=============+
| AAC Access Unit | x bytes | | AAC Access Unit | x bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| AAC Access Unit | x bytes | | AAC Access Unit | x bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Gentric et al. Expires January 2002 42
RTP Payload Format for MPEG-4 Streams July 2001
| AAC Access Unit | x bytes | | AAC Access Unit | x bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| AAC Access Unit | x bytes | | AAC Access Unit | x bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Overhead estimation Overhead estimation
The MSLHSection is 8 bytes; in this example we have therefore a RTP The MSLHSection is 8 bytes; in this example we have therefore a RTP
overhead of 40 + 8 bytes for 1400 bytes (approx) of payload i.e. overhead of 40 + 8 bytes for 1400 bytes (approx) of payload i.e.
around 4 % overhead. around 4 % overhead.
Appendix.6 A more complex case: AAC with interleaving and SL Appendix.6 AAC with Index-based interleaving and SL
Let us consider AAC around 130 kb/s where each Access Unit is split Let us consider AAC around 130 kb/s where each Access Unit is split
in 4 SL packets corresponding to Error Sensitivity Categories (ESC) in 4 SL packets corresponding to Error Sensitivity Categories (ESC)
of maximum 90 bytes for which interleaving is very useful in terms of maximum 90 bytes for which interleaving is very useful in terms
of error resilience. We thus use an interleaving scheme where 15 SL of error resilience. We thus use an interleaving scheme where 15 SL
Packets (extracted from 15 consecutive Access Units) are used to Packets (extracted from 15 consecutive Access Units) are used to
construct each RTP packet in order to match a MTU of 1500 bytes. construct each RTP packet in order to match a MTU of 1500 bytes.
Note that since ESC fragments are not byte aligned we also use the Note that since ESC fragments are not byte aligned we also use the
paddingFlag and paddingBits features of the Synch Layer. paddingFlag and paddingBits features of the Synch Layer. The
interleaving sequence is 4 RTP packets and 350 ms long, which is too
The interleaving sequence is 4 RTP packets and 350 ms long, which is long for conferencing but perfectly OK for Internet radio.
too long for conferencing but perfectly OK for Internet radio.
Since the sequence contains 60 SL packets, the sequence number can Since the sequence contains 60 SL packets, IndexLength is set to 16
be encoded on 6 bits. However 2 bits are actually enough if the bits so as to provide a safe margin in case of long loss bursts.
sender always resets the SL packet sequence number to zero at the This will also indicate to the receiver that this is a Index-Based-
start of each sequence, since only the first MSLH in each of the 4 Interleaving scheme (indeed CTS cannot be computed for SL packets
RTP packets in the sequence carries an absolute sequence number that are not AU starts).
value (0,1,2,3).
2 bits are also enough for IndexDelta, which is constant and equal 2 bits are enough for IndexDelta, which is constant and equal to 3
to 3 (since +1 is automatically added). (since +1 is automatically added).
Note that the 4th RTP packet in each sequence has its M bit set to 1 Note that the 4th RTP packet in each sequence has its M bit set to 1
since it contains 15 SL packets transporting the end of 15 since it contains 15 SL packets transporting the end of 15
consecutive Access Units. consecutive Access Units.
With this scheme a sender (for example upon reception of RTCP With this scheme a sender (for example upon reception of RTCP
reports indicating high loss rates) can (for example) choose to reports indicating high loss rates) can (for example) choose to
duplicate for each interleaving sequence the first RTP packet that duplicate for each interleaving sequence the first RTP packet that
contains the most useful data in terms of ESC or apply other error contains the most useful data in terms of ESC or apply other error
protection techniques, with due care to congestion issues. protection techniques, with due care to congestion issues.
Gentric et al. Expires March 2002 45
RTP Payload Format for MPEG-4 Streams September 2001
In this example we will also show several other SL features (OCR, AU In this example we will also show several other SL features (OCR, AU
boundary flags, padding, as detailed below). boundary flags, padding, as detailed below).
One feature demonstrated by this example is the degradation One feature demonstrated by this example is the degradation
priority. We assume degradation priority can take 4 different priority. We assume degradation priority can take 4 different
Gentric et al. Expires January 2002 43
RTP Payload Format for MPEG-4 Streams July 2001
values, mapped to Error Sensitivity Categories, and is encoded on 2 values, mapped to Error Sensitivity Categories, and is encoded on 2
bits. This interleaving scheme makes sure that only SL packets of bits. This interleaving scheme makes sure that only SL packets of
identical degradation priorities are grouped in the same RTP packet identical degradation priorities are grouped in the same RTP packet
(3.6.3) and that only the first RSLH of each RTP packet transports (3.6.3) and that only the first RSLH of each RTP packet transports
the degradation priority. the degradation priority.
We also assume that for each last SL packet of each RTP packet the We also assume that for each last SL packet of each RTP packet the
server inserts an OCR. server inserts an OCR.
SLConfigDescriptor SLConfigDescriptor
skipping to change at line 2446 skipping to change at line 2566
bit(32) timeScale; = 1000// milliseconds bit(32) timeScale; = 1000// milliseconds
bit(16) accessUnitDuration; = 23.22 // ms bit(16) accessUnitDuration; = 23.22 // ms
bit(16) compositionUnitDuration; = 23.22 // ms bit(16) compositionUnitDuration; = 23.22 // ms
} }
if (!useTimeStampsFlag) { if (!useTimeStampsFlag) {
bit(timeStampLength) startDecodingTimeStamp; = 0 bit(timeStampLength) startDecodingTimeStamp; = 0
bit(timeStampLength) startCompositionTimeStamp; = 0 bit(timeStampLength) startCompositionTimeStamp; = 0
} }
} }
Gentric et al. Expires March 2002 46
RTP Payload Format for MPEG-4 Streams September 2001
SL Packet Header structure SL Packet Header structure
With this configuration we have the following SL packet header With this configuration we have the following SL packet header
structure: structure:
Gentric et al. Expires January 2002 44
RTP Payload Format for MPEG-4 Streams July 2001
aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
bit(1) accessUnitStartFlag; bit(1) accessUnitStartFlag;
bit(1) accessUnitEndFlag; bit(1) accessUnitEndFlag;
bit(1) OCRflag; bit(1) OCRflag;
bit(1) paddingFlag; bit(1) paddingFlag;
if (paddingFlag) bit(3) paddingBits; if (paddingFlag) bit(3) paddingBits;
bit(SL.packetSeqNumLength) packetSequenceNumber; bit(SL.packetSeqNumLength) packetSequenceNumber;
bit(1) DegPrioflag; bit(1) DegPrioflag;
if (DegPrioflag) { if (DegPrioflag) {
bit(SL.degradationPriorityLength) degradationPriority;} bit(SL.degradationPriorityLength) degradationPriority;}
if (OCRflag) { if (OCRflag) {
bit(SL.OCRLength) objectClockReference;} bit(SL.OCRLength) objectClockReference;}
} }
} }
Parameters Parameters
The RSLHSectionSize cannot exceed 2 bits, which is encoded on 2 bits
and signaled by RSLHSectionSizeLength.
The resulting concatenated fmtp line is: The resulting concatenated fmtp line is:
a=fmtp:<format> a=fmtp:<format> SizeLength=7; RSLHSectionSizeLength=8;
SizeLength=6;RSLHSectionSizeLength=2;IndexLength=2;IndexDeltaLength= IndexLength=16; IndexDeltaLength=2; OCRDeltaLength=16
2;OCRDeltaLength=16
RTP packet structure RTP packet structure
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
MSLHSection MSLHSection
+=========================================+=============+ +=========================================+=============+
| MSLHSection size in bits = 135 | 2 bytes | | MSLHSection size in bits = 149 | 2 bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 7 bits | | PayloadSize | 7 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Index = 0 or 1 or 2 or 3 | 2 bits | | Index | 16 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 7 bits | | PayloadSize | 7 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| IndexDelta = 3 | 2 bits | | IndexDelta = (11) | 2 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| etc + 12 times 9 bits | | etc + 12 times 9 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 7 bits | | PayloadSize | 7 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| IndexDelta = 3 | 2 bits | | IndexDelta = (11) | 2 bits |
+-----------------------------------------+-------------+
| bits to byte alignment | 7 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to byte alignment = (000) | 3 bits |
Gentric et al. Expires January 2002 45 Gentric et al. Expires March 2002 47
RTP Payload Format for MPEG-4 Streams July 2001 RTP Payload Format for MPEG-4 Streams September 2001
+-----------------------------------------+-------------+
RSLHSection RSLHSection
+=========================================+=============+ +=========================================+=============+
| RSLHSectionSize | 6 bits | | RSLHSectionSize = (10000111) | 8 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitStartFlag | 1 bit | | accessUnitStartFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitEndFlag | 1 bit | | accessUnitEndFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| OCRFlag = 0 | 1 bit | | OCRFlag = (0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| paddingFlag = 1 | 1 bit | | paddingFlag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| paddingBits | 3 bits | | paddingBits | 3 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DegPrioflag = 1 | 1 bit | | DegPrioflag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| degradationPriority | 2 bits | | degradationPriority | 2 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitStartFlag | 1 bit | | accessUnitStartFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitEndFlag | 1 bit | | accessUnitEndFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| OCRFlag = 0 | 1 bit | | OCRFlag = (0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| paddingFlag = 1 | 1 bit | | paddingFlag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| paddingBits | 3 bits | | paddingBits | 3 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DegPrioflag = 0 | 1 bit | | DegPrioflag = (0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| etc + 12 times 8 bits | | etc + 12 times 8 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitStartFlag | 1 bit | | accessUnitStartFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitEndFlag | 1 bit | | accessUnitEndFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| OCRFlag = 1 | 1 bit | | OCRFlag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| OCRDelta | 16 bits | | OCRDelta | 16 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| paddingFlag = 0 | 1 bit | | paddingFlag = (0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DegPrioflag = 0 | 1 bit | | DegPrioflag = (0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to byte alignment | 5 bits | | bits to byte alignment = (000) | 3 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
SLPPSection SLPPSection
+=========================================+=============+ +=========================================+=============+
| SL packet payload |max 90 bytes | | SL packet payload |max 90 bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| etc + 13 SL packets | | etc + 13 SL packets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| SL packet payload |max 90 bytes |
Gentric et al. Expires January 2002 46 Gentric et al. Expires March 2002 48
RTP Payload Format for MPEG-4 Streams July 2001 RTP Payload Format for MPEG-4 Streams September 2001
| SL packet payload |max 90 bytes |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Note that in the above table the last SL packet in the RTP packet Note that in the above table the last SL packet in the RTP packet
has a payload that is byte-aligned (at the end). When this happens has a payload that is byte-aligned (at the end). When this happens
paddingFlag is set to zero and the paddingBits field is omitted. paddingFlag is set to zero and the paddingBits field is omitted.
Overhead estimation Overhead estimation
The MSLHSection is 19 bytes, the RSLHSection is 16 bytes; in this The MSLHSection is 19 bytes, the RSLHSection is 16 bytes; in this
example we have therefore a RTP overhead of 40 + 35 bytes for 1350 example we have therefore a RTP overhead of 40 + 35 bytes for 1350
bytes (max) of payload i.e. around 6 % overhead. bytes of payload i.e. around 6 % overhead.
Gentric et al. Expires January 2002 47 Gentric et al. Expires March 2002 49
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/