draft-ietf-avt-mpeg4-multisl-03.txt   draft-ietf-avt-mpeg4-multisl-04.txt 
Internet Engineering Task Force Basso-AT&T Internet Engineering Task Force Basso-AT&T
Internet Draft Civanlar-AT&T Internet Draft Civanlar-AT&T
Gentric-Philips Gentric-Philips
Herpel-Thomson Herpel-Thomson
Lifshitz-Optibase Lifshitz-Optibase
Lim-mp4cast Lim-mp4cast
Perkins-ISI Perkins-ISI
Van Der Meer-Philips Van Der Meer-Philips
November 2001 February 2002
Expires May 2002 Expires August 2002
Document: draft-ietf-avt-mpeg4-multisl-03.txt Document: draft-ietf-avt-mpeg4-multisl-04.txt
RTP Payload Format for MPEG-4 Streams RTP Payload Format for MPEG-4 Streams
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at line 41 skipping to change at line 41
group within the Internet Engineering Task Force and ISO/IEC MPEG-4 group within the Internet Engineering Task Force and ISO/IEC MPEG-4
ad hoc group on MPEG-4 over Internet. Comments are solicited and ad hoc group on MPEG-4 over Internet. Comments are solicited and
should be addressed to the working group's mailing list at should be addressed to the working group's mailing list at
avt@ietf.org and/or the authors. avt@ietf.org and/or the authors.
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This document contains a MIME type registration form that is <<
Note for the RFC editor:
XXXX should be replaced with this RFC number and YYYY replaced by
the number given to the companion RFC which draft is: draft-ietf-
avt-mpeg4-simple-**.txt.
This document also contains a MIME type registration form that is
intended to be taken as-is and therefore makes reference to this intended to be taken as-is and therefore makes reference to this
document, using the temporary placeholder: <self-reference-to-this>. document, using the temporary placeholder: XXXX.
>>
Gentric et al. Expires August 2002 1
RTP Payload Format for MPEG-4 Streams February 2002
Abstract Abstract
This document describes a payload format for transporting MPEG-4 This document describes a payload format for transporting MPEG-4
encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for
the coding of natural and synthetic audio-visual data. Several the coding of natural and synthetic audio-visual data. Several
services provided by RTP are beneficial for MPEG-4 encoded data services provided by RTP are beneficial for MPEG-4 encoded data
transport over the Internet. Additionally, the use of RTP makes it transport over the Internet. Additionally, the use of RTP makes it
possible to synchronize MPEG-4 data with other real-time data types. possible to synchronize MPEG-4 data with other real-time data types.
Gentric et al. Expires March 2002 1 Table of Contents
RTP Payload Format for MPEG-4 Streams September 2001
1. Introduction....................................................3
1.1 Overview of MPEG-4 End-System Architecture.....................3
1.2 The simplified MPEG-4 terminal model...........................4
1.3 The complete MPEG-4 terminal model.............................4
1.3.1 The Sync Layer and DMIF......................................6
2. Analysis of the carriage of MPEG-4 over IP......................8
2.1 The Sync Layer point of view...................................8
2.2 The Elementary Stream point of view............................9
2.3 How the two views reconcile...................................10
2.4 Rationale for features........................................11
2.5 Relation with RFC 3016........................................11
3. Payload format.................................................13
3.1 RTP Header Fields Usage.......................................14
3.2 RTP payload structure.........................................16
3.3 Payload Header Section structure..............................17
3.3.1 Payload Header structure....................................18
3.3.2 Fields of a Payload Header..................................19
3.4 RSLHSection structure.........................................21
3.4.1 RSLH structure..............................................22
3.4.2 Removal of fields...........................................22
3.4.3 Mapping of OCR..............................................23
3.4.4 Degradation Priority........................................23
3.5 Payload Section structure.....................................23
3.6 Interleaving..................................................24
3.6.1 Time stamp based interleaving (TSBI)........................25
3.6.2 Index based interleaving (IBI)..............................26
3.6.3 SL streams that should not be interleaved...................26
3.7 Fragmentation Rules...........................................26
4. Types and names................................................28
4.1 MIME type registration........................................28
4.2 Concatenation of parameters...................................33
4.3 Usage of SDP..................................................33
4.3.1 The a=fmtp keyword..........................................33
4.3.2 SDP example.................................................33
5. IANA considerations............................................34
6. Other issues...................................................34
6.1 SL-packetized stream reconstruction...........................34
6.2 Handling of scene description streams.........................38
6.3 Overlap with RFC 3016.........................................39
6.4 Multiplexing..................................................40
7. Security considerations........................................41
8. Acknowledgements...............................................42
Gentric et al. Expires March 2002 2
RTP Payload Format for MPEG-4 Streams February 2002
9. References.....................................................42
10. Authors's addresses...........................................43
APPENDIX: Examples of usage.......................................44
Appendix.1 RFC 3016 compatible MPEG-4 Video (no SL)...............44
Appendix.2 MPEG-4 Video with SL...................................46
Appendix.3 Low delay MPEG-4 Audio (no SL).........................48
Appendix.4 Media delivery MPEG-4 Audio (no SL)....................50
Appendix.5 AAC with interleaving (no SL)..........................51
Appendix.6 AAC with Index-based interleaving and SL...............53
1. Introduction 1. Introduction
MPEG-4 is a recent standard from ISO/IEC for the coding of natural MPEG-4 is a recent standard from ISO/IEC for the coding of natural
and synthetic audio-visual data in the form of audiovisual objects and synthetic audio-visual data in the form of audiovisual objects
that are arranged into an audiovisual scene by means of a scene that are arranged into an audiovisual scene by means of a scene
description [1][2][3][4]. This draft specifies an RTP [5] payload description [1][2][3][4]. This draft specifies an RTP [5] payload
format for transporting MPEG-4 encoded data streams. format for transporting MPEG-4 encoded data streams. It supplements
RFC 3016 in the respect that it can transport all MPEG-4 stream
types while being compatible with RFC 3016 for the transport of
MPEG-4 video.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119 [6]. this document are to be interpreted as described in RFC 2119 [6].
The benefits of using RTP for MPEG-4 data stream transport include: The benefits of using RTP for MPEG-4 data stream transport include:
i. Ability to synchronize MPEG-4 streams with other RTP payloads i. Ability to synchronize MPEG-4 streams with other RTP payloads,
one example is the transport and synchronization of MPEG-4 video
associated with AMR audio in mobile networks.
ii. Monitoring MPEG-4 delivery performance through RTCP ii. Monitoring MPEG-4 delivery performance through RTCP.
iii. Combining MPEG-4 and other real-time data streams received from iii. Combining MPEG-4 and other real-time data streams received from
multiple end-systems into a set of consolidated streams through RTP multiple end-systems into a set of consolidated streams through RTP
mixers mixers.
iv. Converting data types, etc. through the use of RTP translators. iv. Converting data types, etc. through the use of RTP translators.
1.1 Overview of MPEG-4 End-System Architecture 1.1 Overview of MPEG-4 End-System Architecture
Two types of terminals can use this specification. One case is a Two types of terminals can use this specification. One case is a
complete MPEG-4 terminal i.e. a terminal implementing the MPEG-4 complete MPEG-4 terminal i.e. a terminal implementing the MPEG-4
system [1] specification and possibly also MPEG-4 video [2] and system [1] specification and possibly also MPEG-4 video [2] and
audio [3]. Another possibility is a terminal implementing only a audio [3]. Another possibility is a terminal implementing only a
part of this set of MPEG-4 specification; one example is a terminal part of this set of MPEG-4 specification; one example is a terminal
using MPEG-4 video [2] but not MPEG-4 systems as in RFC3016. using MPEG-4 video [2] but not MPEG-4 systems as in RFC3016.
This document is structured so as to be understandable from both This document is structured so as to be understandable from both
points of view (with or without MPEG-4 systems). The target is also points of view (with or without MPEG-4 systems). The target is also
that services deployed for one type of terminal can be adapted for that services deployed for one type of terminal can be adapted for
the other type thanks to minor session description change because the other type with only a minor change in the session description
recorded streams are the same. Another key assumption is that the
properties of streams of various type (video, audio, scene Gentric et al. Expires March 2002 3
RTP Payload Format for MPEG-4 Streams February 2002
because the media formats are the same. Another key assumption is
that the properties of streams of various types (video, audio, scene
description) can be described with the same Elementary Stream model description) can be described with the same Elementary Stream model
so that this same payload format can transport any MPEG-4 stream. so that this same payload format can transport any MPEG-4 stream.
1.1.1 The simplified MPEG-4 model 1.2 The simplified MPEG-4 terminal model
In the simplified MPEG-4 model MPEG-4 systems [1] is not used. In the simplified MPEG-4 model MPEG-4 systems [1] is not used.
However the concept of Elementary Stream remains i.e. both MPEG-4 However the concept of Elementary Stream remains, by MPEG
video [2] and MPEG-4 audio [3] describe how respectively audio and definition: "A consecutive flow of mono-media data from a single
video bit streams are fragmented into pieces that are called Access source entity to a single destination entity on the compression
Units. Each Access Unit has by definition a number of media layer". Indeed both MPEG-4 video [2] and MPEG-4 audio [3] documents
independent basic properties: describe how respectively audio and video bit streams are fragmented
. composition time stamp into pieces that are called Access Units, again by MPEG definition:
. framing "An individually accessible portion of data within an Elementary
. possibly decoding time stamp Stream. An access unit is the smallest data entity to which timing
information can be attributed". Each Access Unit has by this
Gentric et al. Expires March 2002 2 definition a number of media independent basic properties:
RTP Payload Format for MPEG-4 Streams September 2001 . Composition time stamp (CTS)
. Framing
. Possibly decoding time stamp (DTS)
Furthermore both the video [2] and audio [3] specification also Furthermore both the video [2] and audio [3] specification also
define how Access Units (AU) shall be themselves fragmented since in define how Access Units (AU) shall be themselves fragmented since in
the spirit of Application Level Framing AUs SHOULD be fragmented in the spirit of Application Level Framing AUs should be fragmented in
a way that decoders can process the packets immediately after a such a way that decoders can process the packets arriving
packet loss. In this case the signaling of Access Unit fragment immediately after a packet loss. In this case the signaling of
boundaries is also required. Access Unit fragment boundaries is also required.
In order to be understandable from this point of view this payload In order to be understandable from this point of view this payload
format is described in terms of Access Units (AU) and Access Units format is described in terms of Access Units (AU) and Access Units
fragments, without reference to media specific properties (but for a fragments. This specification does not make reference to media
few exceptions). specific properties (but for a few exceptions). Indeed it is the
purpose of this specification to provide RTP transport for all media
types in MPEG-4 in a generic fashion.
1.1.2 The complete MPEG-4 model In this mode of operation the RTP framework is used for transport of
timing and synchronization and protocols such as H.323, SIP, RTSP,
etc, can be used for control.
1.3 The complete MPEG-4 terminal model
Fig. 1 below shows the layered architecture of a terminal, which Fig. 1 below shows the layered architecture of a terminal, which
implements the complete MPEG-4 systems model. The Compression Layer implements the complete MPEG-4 systems model. The Compression Layer
processes individual audio-visual media streams. The MPEG-4 processes individual audio-visual media streams. The MPEG-4
compression schemes are defined in the ISO/IEC specifications 14496- compression schemes are defined in the ISO/IEC specifications 14496-
2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve 2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve
efficient encoding over a bandwidth ranging from a few kbps to many efficient encoding over a bandwidth ranging from a few kbps to many
Mbps. The audio-visual content compressed by this layer is organized Mbps. The audio-visual content compressed by this layer is organized
into Elementary Streams (ESs). into Elementary Streams (ESs).
The MPEG-4 standard specifies MPEG-4 compliant streams. Within the The MPEG-4 standard specifies MPEG-4 compliant streams. Within the
constraint of this compliance the compression layer is unaware of a constraint of this compliance the compression layer is unaware of a
Gentric et al. Expires March 2002 4
RTP Payload Format for MPEG-4 Streams February 2002
specific delivery technology, but it can be made to react to the specific delivery technology, but it can be made to react to the
characteristics of a particular delivery layer such as the path-MTU characteristics of a particular delivery layer such as the path-MTU
or loss characteristics. Also, some compressors can be designed to or loss characteristics. Also, some compressors can be designed to
be delivery specific for implementation efficiency. In such cases be delivery specific for implementation efficiency. In such cases
the compressor may work in a non-optimal fashion with delivery the compressor may work in a non-optimal fashion with delivery
technologies that are different than the one it is specifically technologies that are different than the one it is specifically
designed to operate with. designed to operate with.
The hierarchical relations, location and properties of ESs in a The hierarchical relations, location and properties of ESs in a
presentation are described by a dynamic set of Object Descriptors presentation are described by a dynamic set of Object Descriptors
skipping to change at line 166 skipping to change at line 251
stream level. The resource description may itself be hierarchical, stream level. The resource description may itself be hierarchical,
i.e. an ES conveying an OD may describe other ESs conveying other i.e. an ES conveying an OD may describe other ESs conveying other
ODs. ODs.
The session description is accompanied by a dynamic scene The session description is accompanied by a dynamic scene
description, Binary Format for Scene (BIFS), again conveyed through description, Binary Format for Scene (BIFS), again conveyed through
one or more ESs. At this level, content is identified in terms of one or more ESs. At this level, content is identified in terms of
audio-visual objects. The spatio-temporal location of each object is audio-visual objects. The spatio-temporal location of each object is
defined by BIFS. The audio-visual content of those objects that are defined by BIFS. The audio-visual content of those objects that are
synthetic and static are described by BIFS also. Natural and synthetic and static are described by BIFS also. Natural and
Gentric et al. Expires March 2002 3
RTP Payload Format for MPEG-4 Streams September 2001
animated synthetic objects may refer to an OD that points to one or animated synthetic objects may refer to an OD that points to one or
more ESs that carries the coded representation of the object or its more ESs that carries the coded representation of the object or its
animation data. animation data.
Gentric et al. Expires March 2002 5
RTP Payload Format for MPEG-4 Streams February 2002
media aware +-----------------------------------------+ media aware +-----------------------------------------+
delivery unaware | COMPRESSION LAYER | delivery unaware | COMPRESSION LAYER |
14496-2 Visual |streams from as low as Kbps to multi-Mbps| 14496-2 Visual |streams from as low as Kbps to multi-Mbps|
14496-3 Audio +-----------------------------------------+ 14496-3 Audio +-----------------------------------------+
Elementary Elementary
Stream Stream
===================================================Interface ===================================================Interface
(ESI) (ESI)
skipping to change at line 222 skipping to change at line 306
OD stream are pointed to by an initial object descriptor (IOD). In OD stream are pointed to by an initial object descriptor (IOD). In
this context the IOD needs to be made available to the receivers this context the IOD needs to be made available to the receivers
through some out-of-band means that are out of scope of this payload through some out-of-band means that are out of scope of this payload
specification. However in the context of transport on IP networks it specification. However in the context of transport on IP networks it
is defined in a separate document [9]. is defined in a separate document [9].
The Compression Layer organizes the ESs in Access Units (AU), the The Compression Layer organizes the ESs in Access Units (AU), the
smallest elements that can be attributed individual timestamps. The smallest elements that can be attributed individual timestamps. The
Access Units concept defines the boundary between media specific Access Units concept defines the boundary between media specific
processing and delivery specific processing. That is to say processing and delivery specific processing. That is to say
Gentric et al. Expires March 2002 4
RTP Payload Format for MPEG-4 Streams September 2001
transport should not depend on the nature of the media data but only transport should not depend on the nature of the media data but only
on AU properties. on AU properties.
1.1.3 The Sync Layer Gentric et al. Expires March 2002 6
RTP Payload Format for MPEG-4 Streams February 2002
1.3.1 The Sync Layer and DMIF
The Sync Layer (SL) that primarily provides the synchronization The Sync Layer (SL) that primarily provides the synchronization
between streams defines a homogeneous encapsulation of ESs carrying between streams defines a homogeneous encapsulation of ESs carrying
media or control data (ODs, BIFS). Integer or fractional AUs are media or control data (ODs, BIFS). Integer or fractional AUs are
then encapsulated in SL packets. then encapsulated in SL packets.
All consecutive data from one stream is called an SL-packetized All consecutive data from one stream is called an SL-packetized
stream. The interface between the compression layer and the SL is stream. The interface between the compression layer and the SL is
called the Elementary Stream Interface (ESI). The ESI is informative called the Elementary Stream Interface (ESI). The ESI is informative
i.e. it is extremely useful in order to define concepts and i.e. it is extremely useful in order to define concepts and
skipping to change at line 273 skipping to change at line 356
specification [1]. The syntax of the Sync Layer is configurable and specification [1]. The syntax of the Sync Layer is configurable and
can be adapted to the needs of the stream to be transported. This can be adapted to the needs of the stream to be transported. This
includes the possibility to select the presence or absence of includes the possibility to select the presence or absence of
individual syntax elements as well as configuration of their length individual syntax elements as well as configuration of their length
in bits. The configuration for each individual stream is conveyed in in bits. The configuration for each individual stream is conveyed in
a SLConfigDescriptor, which is an integral part of the ES Descriptor a SLConfigDescriptor, which is an integral part of the ES Descriptor
for this stream. The MPEG-4 SLConfigDescriptor, being configuration for this stream. The MPEG-4 SLConfigDescriptor, being configuration
information, is not carried by the media stream itself but is rather information, is not carried by the media stream itself but is rather
transported via an ObjectDescriptor Stream encoded using the MPEG-4 transported via an ObjectDescriptor Stream encoded using the MPEG-4
Object Description framework. This can be done in a separate stream Object Description framework. This can be done in a separate stream
using this payload format (see section 5.2 for details). The using this payload format (see section 6.2 for details). The
SLConfigDescriptor MAY also be transported by other means (for SLConfigDescriptor MAY also be transported by other means (for
example as a parameter, see section 4.1). example as a MIME parameter, see section 4.1).
An important point is to note that this draft could just as well An important point is to note that this draft could just as well
have been entirely written in terms of SL packets instead of Access have been entirely written in terms of SL packets instead of Access
Gentric et al. Expires March 2002 5
RTP Payload Format for MPEG-4 Streams September 2001
Units and Access Unit fragments. However this could have created Units and Access Unit fragments. However this could have created
confusion for implementers who only need basic properties and do not confusion for implementers who only need basic properties and do not
want to cope with the additional complexity of the Sync Layer. want to cope with the additional complexity of the Sync Layer.
Instead this specification refers to the Sync Layer only when
needed.
1.1.4 Where the two models meet Gentric et al. Expires March 2002 7
RTP Payload Format for MPEG-4 Streams February 2002
In basic cases an Elementary Stream is such that SL packets are
reduced to the media (compressed) data (empty headers) and in that
case implementations do not actually need to be aware of the Sync
Layer at all. In these cases it is logically equivalent to say that
the Sync Layer is not implemented or to say that the SL packet
headers are completely empty (or fully map into the RTP headers).
The Sync Layer can then be seen as a purely conceptual construction
that does not have to be implemented at all.
The above described MPEG-4 system model also deals with session
setup through Object Descriptors. In cases where the complete MPEG-4
system framework is not used a replacement for this key functionally
is required. In fact for simple (audio/video) systems only the
knowledge of the decoder configuration is needed; we will see how
this specification defines options so that decoder configuration can
also be signaled without MPEG-4 system.
In conclusion this payload format is intended to be capable of Instead this specification refers to the Sync Layer only when
transporting data formatted according to the Sync Layer needed.
specification but is also useful without the Sync Layer, or when the
Sync Layer is invisible, which is equivalent to not using it.
2. Analysis of the carriage of MPEG-4 over IP 2. Analysis of the carriage of MPEG-4 over IP
As explained above when transporting MPEG-4 audio and video, As explained above when transporting MPEG-4 audio and video,
applications may or may not require the use of MPEG-4 systems. To applications may or may not require the use of MPEG-4 systems. To
achieve the highest level of interoperability between all MPEG-4 achieve the highest level of interoperability between all MPEG-4
applications, it is desirable that (a) in both cases the same MPEG-4 applications, it is desirable that (a) in both cases the same MPEG-4
transport format can be used and that (b) receivers that have no transport format can be used and that (b) receivers that have no
MPEG-4 system knowledge can easily skip the MPEG-4 system specific MPEG-4 system knowledge can easily skip the MPEG-4 system specific
information, if any. information, if any.
An example of application not requiring MPEG-4 system is audio/video
streaming from a single source. Examples of applications that would
benefit from MPEG-4 system features are:
. Audio/video streaming mixing RTP and non-RTP sources (e.g. local
storage in the .mp4 interchange format)
. Rich multimedia applications including 2D, 2.5D or 3D interactive
scenes with multiple graphical/audio/video objects and/or a
composition variable in time and/or according to a server-push
and/or server-pull model.
. Applications involving Digital Right Management for some or all
parts/streams in the content
. Applications involving the use of advanced meta-data and the
associated content management features as provided by the MPEG suite
of relevant standards (MPEG-7 and MPEG-11).
2.1 The Sync Layer point of view 2.1 The Sync Layer point of view
RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4 RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4
video, but when using MPEG-4 systems a problem arises from the fact video, but when using MPEG-4 systems a problem arises from the fact
that both RTP and MPEG-4 systems contain a synchronization layer. that both RTP and MPEG-4 systems contain a synchronization layer.
In particular, the RTP header duplicates some of the information In particular, the RTP header duplicates some of the information
provided in SL packet headers such as the composition timestamps provided in SL packet headers such as the composition timestamps
(CTSs) and Access Unit boundaries. (CTS) and Access Unit boundaries.
To avoid unnecessary overhead and potential interoperability risks To avoid unnecessary overhead and potential interoperability risks
when transporting MPEG-4 systems, it is desirable to remove the when transporting MPEG-4 systems, it is desirable to remove the
redundancy between the SL packet header and the RTP packet header. redundancy between the SL packet header and the RTP packet header.
Gentric et al. Expires March 2002 6
RTP Payload Format for MPEG-4 Streams September 2001
To be independent on the use of MPEG-4 systems, synchronization can To be independent on the use of MPEG-4 systems, synchronization can
rely on the parameters provided in the RTP header. rely on the parameters provided in the RTP header. Another desired
Another desired property is to have compatibility with RFC3016 for property is to have compatibility with RFC3016 for MPEG-4 video
MPEG-4 video transport. transport.
In case SL headers are used, the redundant fields are removed from
the SL header, producing "reduced SL headers". The remaining
information from the SL header, if any, is contained inside the RTP
packet payload, together with the SL packet payload.
The combination of RTP packet headers and reduced SL packet headers
can be used to logically map the RTP packets to complete SL packets.
Some of the information contained in the reduced SL headers is also
useful for transport over RTP when an MPEG-4 system is not used.
For that reason the information in the "reduced" SL headers is split
into "general useful information" and "MPEG-4 systems only
information".
The "general useful information" hereinafter called Payload Header This is achieved in the following fashion (also depicted in figure
is carried by a number of fields configurable using parameters 5): In case SL headers are used, the redundant fields are removed
defined in section 4.1; all receivers MUST parse these fields. from the SL header. The remaining information from the SL header, if
any, is contained inside the RTP packet payload, together with the
SL packet payload. Some of this information is also useful for
transport over RTP when an MPEG-4 system is not used. For that
reason this information is split into "general useful information"
The "MPEG-4 systems only information", if any, is contained in an Gentric et al. Expires March 2002 8
auxiliary header, hereinafter called Remaining SL Packet Header RTP Payload Format for MPEG-4 Streams February 2002
(RSLH), also configured using parameters (see section 4.1) and
preceded by a length field, so that non-MPEG-4-system devices MAY
skip this information.
This is depicted in figure 2a. and "MPEG-4 systems only information". The "general useful
information" hereinafter called Payload Header is carried by a
number of fields configurable using parameters defined in section
4.1; all receivers MUST parse these fields. The "MPEG-4 systems only
information", if any, is contained in an auxiliary header,
hereinafter called Remaining SL Packet Header (RSLH), also
configured using parameters (see section 4.1) and preceded by a
length field, so that non-MPEG-4-system devices MAY skip this
information.
+------------+ +------------+
extended framing and | AU or AU | extended framing and | AU or AU |
timing information | fragment | timing information | fragment |
+------------+ +------------+
| | | |
| | | |
| | | |
| | | |
V V V V
skipping to change at line 392 skipping to change at line 458
| SL Packet | SL Packet | | SL Packet | SL Packet |
| Header | Payload | | Header | Payload |
+---------------------------+ +---------------------------+
| | | |
| | | |
+-------------+----------+---+ | +-------------+----------+---+ |
| | | | | | | |
V V V V V V V V
+-----------+ +-----------+ +-------------+ +-----------+ +-----------+ +-----------+ +-------------+ +-----------+
|RTP Packet | | Payload | | Remaining SL| | SL Packet | |RTP Packet | | Payload | | Remaining SL| | SL Packet |
Gentric et al. Expires March 2002 7
RTP Payload Format for MPEG-4 Streams September 2001
| Header | | Header | | Header | | Payload | | Header | | Header | | Header | | Payload |
+-----------+ +-----------+ +-------------+ +-----------+ +-----------+ +-----------+ +-------------+ +-----------+
<----RTP Packet Payload-------------------> <----RTP Packet Payload------------------->
Figure 2a: Mapping of ES into SL, then SL Packet into RTP packet Figure 5: Mapping of ES into SL, then SL Packet into RTP packet
2.2 The Elementary Stream point of view 2.2 The Elementary Stream point of view
Another way to see the mapping of Elementary Streams (i.e. Access Another way to see the mapping of Elementary Streams (i.e. Access
Units or AU fragments) into RTP packets is depicted in figure 2.b. Units or AU fragments) into RTP packets is depicted in Figure 6. In
In this view the "basic" timing and fragmentation information listed this view the "basic" timing and fragmentation information listed in
in section 1.1.1 is obtained directly at the codec interfaces and section 1.2 is obtained directly at the codec interfaces and mapped
mapped into the RTP header or the RTP Payload Header. into the RTP header or the RTP Payload Header.
For example this RTP payload format has been designed so that it is For example this RTP payload format has been designed so that it is
by default configured to be identical to RFC 3016 for the by default configured to be identical to RFC 3016 for the
recommended MPEG-4 video configurations, specifically in this case recommended MPEG-4 video configurations, specifically in this case
the Payload Header is empty. Hence receivers that comply with this the Payload Header is empty. Hence receivers that comply with this
payload specification can decode such RTP payload without knowledge payload specification can decode such RTP payload without knowledge
about the Sync Layer (see the example in Appendix 1). In a similar
fashion but with non-empty Payload Headers, MPEG-4 audio (see Gentric et al. Expires March 2002 9
Appendix 3 and 4 for examples) can be transported without explicit RTP Payload Format for MPEG-4 Streams February 2002
use of the Sync Layer.
about the Sync Layer (see the relevant examples in Appendix). In a
similar fashion but with non-empty Payload Headers, MPEG-4 audio
(see Appendix 3 and 4 for examples) can be transported without
explicit use of the Sync Layer.
+------------+ +------------+
basic framing and | AU or AU | basic framing and | AU or AU |
timing information | fragment | timing information | fragment |
+------------+ +------------+
| | | |
| | | |
+-------------+ | +-------------+ |
| | | | | |
V V V V V V
+-----------+ +-----------+ +-----------+ +-----------+ +-----------+ +-----------+
|RTP Packet | | Payload | | | |RTP Packet | | Payload | | |
| Header | | Header | | Payload | | Header | | Header | | Payload |
+-----------+ +-----------+ +-----------+ +-----------+ +-----------+ +-----------+
<----RTP Packet Payload---> <----RTP Packet Payload--->
Figure 2b: Direct mapping of Elementary Streams into RTP packet Figure 6: Direct mapping of Elementary Streams into RTP packet
2.3 How the two views reconcile 2.3 How the two views reconcile
A simple concept enables to unify these apparently antagonistic A simple concept enables to unify these apparently antagonistic
points of view: a "no-SL" terminals can skip (ignore) the Remaining points of view: a terminal that does not implement the Sync Layer
SL Header, if present. can skip (ignore) the Remaining SL Header, if present.
Gentric et al. Expires March 2002 8 There are also cases when an Elementary Stream is such that SL
RTP Payload Format for MPEG-4 Streams September 2001 packets are reduced to the media (compressed) data (empty headers)
and in that case implementations do not actually need to be aware of
the Sync Layer at all. In these cases it is logically equivalent to
say that the Sync Layer is not implemented or to say that the SL
packet headers are completely empty (or fully map into the RTP
headers). The Sync Layer can then be seen as a purely conceptual
construction that does not have to be implemented at all. Examples
are video transported as in RFC3016 (see below) and some audio modes
(see Annex).
3. Payload Format The above described MPEG-4 system model also deals with session
setup through Object Descriptors. In cases where the complete MPEG-4
system framework is not used a replacement for this key functionally
is required. In fact for simple (audio/video) systems only the
knowledge of the decoder configuration is needed; we will see how
this specification defines options so that decoder configuration can
also be signaled without MPEG-4 system.
The RTP Payload corresponds to an integer number of Access Units or In conclusion this payload format is intended to be capable of
Access Unit fragments. transporting data formatted according to the Sync Layer
The RTP payload is composed of 3 sections: Gentric et al. Expires March 2002 10
. a Payload Header section RTP Payload Format for MPEG-4 Streams February 2002
. a RSLH section
. a Payload Section.
The AU and AU fragment boundaries and timing information is specification but is also useful without the Sync Layer, or when the
transported in the Payload Header. Sync Layer is invisible, which is equivalent to not using it.
2.4 Rationale for features
This payload format has a number of uncommon features that are best
understood by first considering their rationale:
. Genericity: The payload structure does not depend on the nature of
the stream (audio, video, scene, etc). In this respect the apparent
complexity of this specification should be compared to the
complexity of the only alternative solution, which would have been
the specification and implementation of many different RTP payload
formats.
. Variable geometry: this payload format is highly configurable i.e.
the structure of the RTP payload depends on MIME parameters;
actually all the Payload Header components are optional and most of
them have a configurable size. This is aligned with the Sync Layer
definition and allows optimal efficiency in terms of payload size
per packet.
. Two packing style (single and multiple): the rationale for
transporting a single AU or AU fragment per RTP packet is
simplicity, it is also the packing style for backward compatibility
with RFC3016. The rationale for transporting multiple AU per RTP
packet is efficiency, at the cost of sensitivity to losses.
. Two interleaving methods: the rationale for interleaving is to
enable various error concealment strategies in case of packet losses
when packing several AU or AU fragments per RTP packets. The need
for two interleaving methods arises from the fact that the default
one, based on time stamps, is the most efficient but does not work
for all configurations. Another method, based on indexes, is
therefore required.
. The rationale for transporting multiple interleaved AU fragments
per RTP packet is to benefit from advanced error resiliency
properties of bit streams (such as MPEG-4 audio version 2).
2.5 Relation with RFC 3016
The following set of figures displays the relationship between the
MPEG-4 RTP payload formats; there are 4 MPEG-4-related RTP payload
formats. The FlexMux is a really separate issue [11] and need not be
discussed here apart from the fact that is shares with this work the
MPEG-4 Sync Layer as the interface into the MPEG-4 domain. RFC 3016
describes transport of MPEG-4 video and LATM (for speech and audio
codecs). This specification defines transport of any MPEG-4 type of
data, with or without the Sync Layer. RFC YYYY describes a subset of
the configurations that this specification can handle.
Figure 2 displays the situation for video; note that this
specification is compatible with RFC 3016. Figure 3 displays the
situation for audio, note the presence of the LATM multiplex, which
makes RFC 3016 audio transport incompatible with this specification.
Gentric et al. Expires March 2002 11
RTP Payload Format for MPEG-4 Streams February 2002
Figure 4 displays the situation for other MPEG-4 streams, including
BIFS, ODS, IPMP, etc.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| |
| MPEG-4 Video |
| | I
|+++++++++++++++++++++++| | S
| | | O
| Sync Layer | | /
| | | M
|+++++++++++++++++++++++| | P
| | | | E
| FlexMux | | | G
| | | <- same RTP packet structure -> |
|++++++++++++| +++++++++++++++++++++++++++|++++++++++++|***
| | | | |
| FlexMux | RFC XXXX | RFC YYYY | RFC 3016 | I
| RTP | MPEG-4 generic RTP | | for | E
| payload | payload +++++++++++++ Video | T
| | | | F
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Figure 2: Relationship of MPEG-4 RTP payload formats for the
transport of video
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| |
| MPEG-4 Audio |
| | I
|+++++++++++++++++++++++| | S
| | | O
| Sync Layer | | /
| | | M
|+++++++++++++++++++++++| +++++++++++++| P
| | | | | E
| FlexMux | | | LATM | G
| | | | |
|++++++++++++| +++++++++++++++++++++++++++|++++++++++++|***
| | | | |
| FlexMux | RFC XXXX | RFC YYYY | RFC 3016 | I
| RTP | MPEG-4 generic RTP | | for | E
| payload | payload +++++++++++++ Audio | T
| | | | F
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Figure 3: Relationship of MPEG-4 RTP payload formats for the
transport of audio
Gentric et al. Expires March 2002 12
RTP Payload Format for MPEG-4 Streams February 2002
++++++++++++++++++++++++++++++++++++++++++++++++++++
| |
| MPEG-4 system |
| | I
|+++++++++++++++++++++++| | S
| | | O
| Sync Layer | | /
| | | M
|+++++++++++++++++++++++| | P
| | | | E
| FlexMux | | | G
| | | |
|++++++++++++| +++++++++++++++++++++++++++|***
| | | |
| FlexMux | RFC XXXX | RFC YYYY | I
| RTP | MPEG-4 generic RTP | | E
| payload | payload ++++++++++++| T
| | | F
++++++++++++++++++++++++++++++++++++++++++++++++++++
Figure 4: Relationship of MPEG-4 RTP payload formats for the
transport of MPEG-4 system streams (including BIFS, ODS, IPMP).
3. Payload Format
One or more Access Units or Access Unit fragments (see section 3.9
for fragmentation rules) are mapped into each RTP packet. Some
information attached to these AU or AU Fragment is mapped onto the
RTP header (see section 3.1), some form an additional payload
header. The resulting RTP payload is described in section 3.2, it is
composed of 3 parts (see figure 5):
. a Payload Header section (optional)
. a RSLH (Remaining SL Header) section (optional)
. a Payload Section.
These are described respectively in section 3.3, 3.4 and 3.5 of this
memo.
When transporting SL streams, SL Packet Headers are transformed into When transporting SL streams, SL Packet Headers are transformed into
Remaining SL Header (RSLH) with some fields extracted to be mapped Remaining SL Header (RSLH) with some fields extracted to be mapped
in the RTP header and others extracted to be mapped in the in the RTP header and others extracted to be mapped in the
corresponding Payload Header. corresponding Payload Header. The AU or AU fragment data (SL packet
payload) i.e. Elementary Stream codec data is unchanged.
The AU or AU fragment data (SL packet payload) i.e. Elementary When transporting Elementary Streams there is no RSLH section.
stream codec data is unchanged.
This payload format has two modes. The "Single" mode is a mode where This payload format has two packing styles. The "Single" packing
a single AU or AU fragment is transported per RTP packet. The style is a packing style where a single AU or AU fragment is
"Multiple" mode is a mode where possibly more than one AU or AU transported per RTP packet. The "Multiple" packing style is a
fragment are transported per RTP packet. The default mode is the packing style where possibly more than one AU or AU fragment are
"Single" mode. transported per RTP packet. The default packing style is the
"Single" packing style.
In the "Multiple" mode, AU or AU fragments MUST be in decoding order Gentric et al. Expires March 2002 13
inside one RTP packet. Decoding order is defined by the relevant RTP Payload Format for MPEG-4 Streams February 2002
codec specification. Decoding order may be different than
presentation order, for example for video streams containing B In the "Multiple" packing style, AU or AU fragments MUST be in
frames. According to the MPEG-4 system model this order is decoding order inside one RTP packet. Decoding order is defined by
quantified using decoding time stamps (DTS). the relevant codec specification. Note that decoding order and
presentation order may be different, typically for video streams
containing B frames (see [2]). According to the MPEG-4 system model
the decoding order may be quantified using decoding time stamps
(DTS).
RTP Packets SHOULD be sent in the decoding order. In case of RTP Packets SHOULD be sent in the decoding order. In case of
interleaving the first AU or AU fragment of each RTP packet is used interleaving the first AU or AU fragment of each RTP packet is used
as reference as in the following examples of RTP packets containing as reference as in the following examples of RTP packets containing
interleaved SL packets. interleaved SL packets.
This sequence is correct: [0,2,4][1,3,5] This sequence is correct: [0,2,4][1,3,5]
This sequence is correct: [0,3,6][1,2][4,5] This sequence is correct: [0,3,6][1,2][4,5]
This sequence is correct: [0,3,6][1,4][2,5] This sequence is correct: [0,3,6][1,4][2,5]
This sequence is prohibited: [0,4,2][1,5,3] This sequence is prohibited: [0,4,2][1,5,3]
This sequence is prohibited: [1,3,5][0,2,4] This sequence is prohibited: [1,3,5][0,2,4]
This sequence is prohibited: [0,3,6][2,5][1,4] This sequence is prohibited: [0,3,6][2,5][1,4]
In the "Multiple" mode senders MUST make sure that no fields undergo In the "Multiple" packing style the Payload Header and RSLH contains
roll over inside one RTP packet. This may limit the number of SL fields with relative values, they MUST have sufficient bits to
packets inside one RTP packet and, when interleaving, may limit the encode the difference i.e. senders MUST make sure that no fields
interleaving period as detailed below. undergo roll over inside one RTP packet. This may limit the number
of SL packets inside one RTP packet and, when interleaving, may
limit the interleaving period as detailed in section 3.6.
The size and/or number of the payload(s) SHOULD be adjusted such The size and/or number of the payload(s) SHOULD be adjusted such
that the resulting RTP packet is not larger than the path-MTU. To that the resulting RTP packet is not larger than the path-MTU. To
Gentric et al. Expires March 2002 9
RTP Payload Format for MPEG-4 Streams September 2001
handle larger packets, this payload format relies on lower layers handle larger packets, this payload format relies on lower layers
for fragmentation, which may not be desirable. for fragmentation, which may not be desirable.
3.1 RTP Header Fields Usage 3.1 RTP Header Fields Usage
Payload Type (PT): The assignment of an RTP payload type for this Payload Type (PT):
new packet format is outside the scope of this document, and will The assignment of an RTP payload type for this new packet
not be specified here. It is expected that the RTP profile for a format is outside the scope of this document, and will not be
particular class of applications will assign a payload type for this specified here. It is expected that the RTP profile for a
encoding, or if that is not done then a payload type in the dynamic particular class of applications will assign a payload type for
range shall be chosen. this encoding, or if that is not done then a payload type in
the dynamic range shall be chosen.
Marker (M) bit: The M bit is set to 1 when all AU fragments in the Marker (M) bit:
RTP packet are Access Units ends. The M bit is set to 1 when all AU fragments in the RTP packet
are Access Units ends.
Specifically the M bit is set to 0 when the RTP packet contains one Specifically the M bit is set to 0 when the RTP packet contains
or more AU fragments that are not Access Unit ends, and the M bit is one or more AU fragments that are not Access Unit ends, and the
set to 1 for RTP packets that contain either: M bit is set to 1 for RTP packets that contain either:
. A single complete Access Unit . A single complete Access Unit
. The last fragment of an Access Unit . The last fragment of an Access Unit
. Several complete Access Units . Several complete Access Units
. Several last fragments of Access Units . Several last fragments of Access Units
. A mix of complete Access Units and last fragments of Access Units
Therefore for streams where all SL packets are complete Access Units Gentric et al. Expires March 2002 14
the M bit is 1 for all RTP packets. Note also that in terms of Sync RTP Payload Format for MPEG-4 Streams February 2002
Layer this means that the M bit is related to the accessUnitEndFlag.
Extension (X) bit: Defined by the RTP profile used. . A mix of complete Access Units and last fragments of Access
Units
Sequence Number: The RTP sequence number should be generated by the Therefore for streams where all SL packets are complete Access
sender with a constant random offset. Units the M bit is 1 for all RTP packets. Note also that in
terms of Sync Layer this means that the M bit is related to the
accessUnitEndFlag.
Timestamp: Set to the value in the compositionTimeStamp field of the Extension (X) bit:
first AU or AU fragment in the RTP packet, if present. Defined by the RTP profile used.
If compositionTimeStamp has less than 32 bits length, the RTP Sequence Number:
timestamp is generated to extend it out to 32 bits. If The RTP sequence number should be generated by the sender with
compositionTimeStamp has more than 32 bits length, the RTP timestamp a constant random offset.
uses the 32 LSB of it. When using the Sync Layer the resolution of
the timestamp (timeStampLength) is available from the SL
configuration data and shall be used by receivers to reconstruct
compositionTimeStamps with the original bit length. In all other
case it is RECOMMENDED to use timeStampLength=32.
In case compositionTimeStamp is not present in the current SL Timestamp:
packet, but has been present in a previous AU or AU fragmentthe Set to a value corresponding to the compositionTimeStamp (CTS)
reason is that this is the same Access Unit that has been of the first AU or AU fragment in the RTP packet. This mapping
fragmented, therefore the same timestamp value MUST be taken as RTP is established as follows:
timestamp.
Gentric et al. Expires March 2002 10 If CTS has less than 32 bits length, the RTP timestamp is
RTP Payload Format for MPEG-4 Streams September 2001 generated to extend it out to 32 bits using the number of
wraparounds. If CTS has more than 32 bits length, the RTP
timestamp uses the 32 LSB of it. When using the Sync Layer the
resolution of the timestamp (timeStampLength) is available from
the SL configuration data and shall be used by receivers to
reconstruct CTS with the original bit length. It is RECOMMENDED
to use timeStampLength=32.
If compositionTimeStamp is never present in SL packets for this When an RTP packet starts with a non-initial AU fragment, the
stream, the RTP packetizer SHOULD convey a reading of a local clock timestamp of the initial fragment SHALL be used.
at the time the RTP packet is created.
In all cases, the sender SHALL always make sure that RTP time stamps For SL streams where CTS is never present the RTP packetizer
are identical only for RTP packets transporting fragments of the SHOULD convey a reading of a local clock at the time the RTP
same Access Unit. packet is created.
According to RFC1889 [5, Section 5.1] timestamps are recommended to Note that since, according to RFC1889 [5, Section 5.1],
start at a random value for security reasons. However then, a timestamps are recommended to start at a random value, a
receiver is not in the general case able to reconstruct the original receiver is not in the general case able to reconstruct the
MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for original MPEG-4 Time Stamps (CTS, DTS, OCR). This is not an
issue for synchronization of multiple RTP streams. However,
applications where streams from multiple sources are to be applications where streams from multiple sources are to be
synchronized (for example one stream from local storage, another synchronized (for example one stream from local storage,
from a streaming server). Therefore the usage of such a random another from a RTP streaming server) may have to transport out
offset SHOULD be avoided. of band the random offset used to map CTS into RTP timestamp,
which is not in the scope of this specification.
Note also that since RTP devices may re-stamp the stream, all
time stamps inside of the RTP payload (CTS and DTS in the
Payload Header, OCR in RSLH) MUST be expressed as difference to
the RTP time stamp. Since this subtraction may lead to negative
values, the offset MUST be encoded as a two's complement signed
integer in network octet order. Note these offsets (delta)
typically require much fewer bits to be encoded than the
Note that since RTP devices may re-stamp the stream, all time stamps Gentric et al. Expires March 2002 15
inside of the RTP payload (CTS and DTS in PayloadHeader, OCR in RTP Payload Format for MPEG-4 Streams February 2002
RSLH) MUST be expressed as difference to the RTP time stamp. Since
this subtraction may lead to negative values, the offset MUST be
encoded as a two's complement signed integer in network octet order.
Note these offsets (delta) typically require much fewer bits to be
encoded than the original length, which is another justification.
When startCompositionTimeStamp is signaled in the SLConfigDescriptor original length. Nevertheless senders MUST make sure that these
the RTP time stamps MUST start with this value. fields have enough bits to encode these differences.
When startCompositionTimeStamp is signaled in the
SLConfigDescriptor the RTP time stamps MUST start with this
value.
SSRC, CC and CSRC fields are used as described in RFC 1889 [5]. SSRC, CC and CSRC fields are used as described in RFC 1889 [5].
RTCP SHOULD be used as defined in RFC 1889 [5]. RTCP SHOULD be used as defined in RFC 1889 [5].
3.2 RTP payload structure 3.2 RTP payload structure
The packet payload structure consists of 3 octet-aligned sections. The packet payload structure consists of 3 octet-aligned sections.
The first section is the Payload Header Section and contains Payload The first section is the Payload Header Section and contains Payload
Headers. Each Payload Header contains basic fragmentation and timing Headers. Each Payload Header contains basic fragmentation and timing
information for one AU or AU fragment. The Payload Header structure information (relative to the RTP timestamp) for one AU or AU
is described in 3.3. In the "Single" mode this section is empty by fragment. The Payload Header structure is described in 3.3. In the
default. "Single" packing style this section is empty by default.
The second section is the RSLH Section and contains Remaining SL The second section is the RSLH Section and contains Remaining SL
Headers (RSLH). The RSLH structure is described in 3.5. By default Headers (RSLH). The RSLH structure is described in 3.4. By default
this section is empty. this section is empty.
The last section (Payload Section) contains the AU or AU fragment The last section (Payload Section) contains the AU or AU fragment
codec bit stream fragments. This section is never empty. codec bit stream fragments and is described in section 3.5. This
section is never empty.
The Nth Payload Header in the Payload Header Section, the Nth RSLH The Nth Payload Header in the Payload Header Section, the Nth RSLH
in the RSLH Section and the Nth AU or AU fragment payload in the in the RSLH Section and the Nth AU or AU fragment payload in the
Payload Section correspond to the Nth AU or AU fragment transported Payload Section correspond to the Nth AU or AU fragment transported
by the RTP packet. by the RTP packet.
Gentric et al. Expires March 2002 11 Gentric et al. Expires March 2002 16
RTP Payload Format for MPEG-4 Streams September 2001 RTP Payload Format for MPEG-4 Streams February 2002
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number | |V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp | | timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier | | synchronization source (SSRC) identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at line 645 skipping to change at line 879
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | | | |
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ |
| | | |
| Payload Section (octet aligned) | | Payload Section (octet aligned) |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding | | :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: An RTP packet for MPEG-4 Figure 5: RTP packet for MPEG-4
3.3 Payload Header Section structure 3.3 Payload Header Section structure
If the Payload Header Section consumes a non-integer number of If the Payload Header Section consumes a non-integer number of
octets, up to 7 zero-valued padding bits MUST be inserted at the end octets, up to 7 zero-valued padding bits MUST be inserted at the end
in order to achieve octet-alignment. This size excludes the padding in order to achieve octet-alignment.
bits, if any.
In the "Single" mode the Payload Header Section consists of a single In the "Single" packing style the Payload Header Section consists of
Payload Header. a single Payload Header.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Header (x bits ) : padding bits| | Payload Header (x bits ) : padding bits|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: Payload Header Section structure in "Single" mode Figure 6: Payload Header Section structure in "Single" packing style
Gentric et al. Expires March 2002 12 In the "Multiple" packing style the Payload Header section consist
RTP Payload Format for MPEG-4 Streams September 2001 of a 2 octets field giving the size in bits (in network octet order)
In the "Multiple" mode the Payload Header section consist of a 2 Gentric et al. Expires March 2002 17
octets field giving the size in bits (in network octet order) of the RTP Payload Format for MPEG-4 Streams February 2002
following block of bit-wise concatenated PayloadHeaders.
This size field is absent in the "Single" mode not because it is not of the following block of bit-wise concatenated PayloadHeaders. This
needed (which would be a minor gain) but for compatibility with RFC size excludes the padding bits, if any.
3016.
This size field is absent in the "Single" packing style not because
it is not needed (which would be a minor gain) but for compatibility
with RFC 3016.
This size field is also absent when the value would always be zero This size field is also absent when the value would always be zero
because the Payload Header is always empty, which may happen when a because the Payload Header is always empty, which happens when a
constant payload size in signaled using ConstantSize (see below). constant payload size in signaled using ConstantSize (see below).
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Header section size in bits | | | Payload Header section size | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| as many bit-wise concatenated Payload Headers | | as many bit-wise concatenated Payload Headers |
| as AU or AU fragments in this RTP packet | | as AU or AU fragments in this RTP packet |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| : padding bits| | : padding bits|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5: Payload Header Section structure in "Multiple" mode Figure 7: Payload Header Section structure in "Multiple" packing
style
3.4 Payload Header structure 3.3.1 Payload Header structure
The Payload Header content depends on parameters (as described in The Payload Header content depends on parameters (as described in
section 4.1); by default it is empty for the "Single" mode and, section 4.1); by default it is empty for the "Single" packing style
except when ConstantSize is signaled, contains at least the and, in the "Multiple" packing style, contains at least the
PayloadSize field in the "Multiple" mode. PayloadSize field, except when ConstantSize is signaled.
When all options are used the Payload Header structure and the When all options are used the Payload Header structure and the
relationship with the related parameter is given in table 1. relationship with the related parameter is given in table 1.
+===========================+=================================+ +===========================+=================================+
| Fields of MSLPH | Number of bits (parameters) | | Fields of Payload Header | Number of bits (parameters) |
+===========================+=================================+ +===========================+=================================+
| PayloadSize | SizeLength | | PayloadSize | SizeLength |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| Index | IndexLength | | Index | IndexLength |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| IndexDelta | IndexDeltaLength | | IndexDelta | IndexDeltaLength |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| CTSFlag | 1 If (CTSDeltaLength > 0) | | CTSFlag | 1 If (CTSDeltaLength > 0) |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| CTSDelta | CTSDeltaLength If (CTSFlag==1) | | CTSDelta | CTSDeltaLength If (CTSFlag==1) |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| DTSFlag | 1 If (DTSDeltaLength > 0) | | DTSFlag | 1 If (DTSDeltaLength > 0) |
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
| DTSDelta | DTSDeltaLength If (DTSFlag==1) | | DTSDelta | DTSDeltaLength If (DTSFlag==1) |
Gentric et al. Expires March 2002 13
RTP Payload Format for MPEG-4 Streams September 2001
+---------------------------+---------------------------------+ +---------------------------+---------------------------------+
Gentric et al. Expires March 2002 18
RTP Payload Format for MPEG-4 Streams February 2002
Table 1: Payload Header fields and parameters giving the sizes Table 1: Payload Header fields and parameters giving the sizes
In the general case a receiver can only discover the size of a In the general case a receiver can only discover the size of a
Payload Header by parsing it since for example the presence of Payload Header by parsing it since for example the presence of
CTSDelta is signaled by the value of CTSFlag. CTSDelta is signaled by the value of CTSFlag.
3.4.1 Fields of Payload Header 3.3.2 Fields of a Payload Header
PayloadSize: Indicates the size in octets of the associated Payload, PayloadSize:
which can be found in the Payload Section of the RTP packet. The Indicates the size in octets of the associated Payload, which
length in bits of this field is signaled by the SizeLength parameter can be found in the Payload Section of the RTP packet. The
(see section 4.1). length in bits of this field is signaled by the SizeLength
parameter (see section 4.1).
There is an exception to that. In the case that the RTP packet There is an exception to that. In the "Multiple" packing style
contains only one AU or AU fragment in the "Multiple" mode, the when a RTP packet contains only one AU or AU fragment, the
PayloadSize field SHALL contain the size of the entire corresponding PayloadSize field SHALL contain the size of the entire
Access Unit. There are two reasons, firstly the size of the fragment corresponding AU. There are two reasons, firstly the size of
is not needed when there is only one fragment in the RTP packet, the fragment is not needed when there is only one fragment in
secondly this is useful in order to detect if a full Access Unit has the RTP packet, secondly this is useful in order to detect if a
been received after the loss of a packet carrying a M bit set to 1. full Access Unit has been received after the loss of a packet
carrying a M bit set to 1.
Index, IndexDelta: encodes the serial number of the associated AU or Index, IndexDelta:
AU fragment. IndexDelta is useful for interleaving (see section Encodes the serial number of the associated AU or AU fragment.
3.8). When transporting a SL stream, Index and IndexDelta SHALL be IndexDelta is useful for interleaving (see section 3.6). When
used to encode the SL Packet Header packetSequenceNumber field. transporting a SL stream, Index and IndexDelta SHALL be used to
encode the packetSequenceNumber field of the SL Packet Header,
if present.
Index is optional and -if present- appears in the first Payload Index is optional and -if present- appears in the first Payload
Header of a RTP packet. Header of a RTP packet.
The length in bits of the Index field is defined by the IndexLength The length in bits of the Index field is defined by the
parameter (see section 4.1). IndexLength parameter (see section 4.1).
IndexDelta is optional and -if present- appears for subsequent (non- IndexDelta is optional and -if present- appears for subsequent
first) Payload Headers of a RTP packet. (non-first) Payload Headers of a RTP packet.
The length in bits of the IndexDelta field is defined by the The length in bits of the IndexDelta field is defined by the
IndexDeltaLength parameter (see section 4.1). IndexDeltaLength parameter (see section 4.1).
Both Index and IndexDelta MUST be incremented so that 2 consecutive Both Index and IndexDelta MUST be incremented so that 2
AU or AU fragments SHALL be distinguishable. One exception for Index consecutive AU or AU fragments SHALL be distinguishable. One
is described in 3.8.1. exception for Index is described in 3.6.1.
If the parameter IndexDeltaLength is defined, non-first AU or AU If the parameter IndexDeltaLength is defined, non-first AU or
fragments inside a RTP packet have their serial number encoded as a AU fragments inside a RTP packet have their serial number
difference (thus the name IndexDelta). This difference is relative encoded as a difference (thus the name IndexDelta). IndexDelta
to the previous AU or AU fragment in the RTP packet according to MUST have sufficient bits to encode this difference. This
(with i>=0): difference is relative to the previous AU or AU fragment in the
RTP packet according to (with i>=0):
Serial number(0) = Index(0) Serial number(0) = Index(0)
Serial number (i+1) = Serial number (i) + IndexDelta(i+1) + 1
Gentric et al. Expires March 2002 14 Gentric et al. Expires March 2002 19
RTP Payload Format for MPEG-4 Streams September 2001 RTP Payload Format for MPEG-4 Streams February 2002
If the parameter IndexDeltaLength is not defined the default value Serial number (i+1) = Serial number (i) + IndexDelta(i+1) + 1
is zero and then the IndexDelta field is not present for non-first
AU or AU fragments. Nevertheless receivers SHALL then apply the
above formula with IndexDelta equal to zero. In other words by
default the serial number is incremented by 1 for each AU or AU
fragment in the RTP packet.
CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A If the parameter IndexDeltaLength is not defined the default
value of 1 indicates that the CTSDelta field is present, a value of value is zero and then the IndexDelta field is not present for
0 that it is not present. non-first AU or AU fragments. Nevertheless receivers SHALL then
apply the above formula with IndexDelta equal to zero. In other
words by default the serial number is incremented by 1 for each
AU or AU fragment in the RTP packet.
If CTSDeltaLength is not zero, CTSFlag is present in all Payload CTSFlag (1 bit):
Headers regardless of whether the AU fragment is an Access Unit Indicates whether the CTSDelta field is present.
start or not. A value of 1 indicates that the CTSDelta field is present, a
value of 0 that it is not present.
CTSDelta (CTSDeltaLength bits): Specifies the value of the CTS as a If CTSDeltaLength is not zero, CTSFlag is present in all
2-complement offset (delta) from the timestamp in the RTP header of Payload Headers regardless of whether the AU fragment is an
the RTP packet. The length in bits of each CTSDelta field is Access Unit start or not.
specified by the CTSDeltaLength parameter (see section 4.1).
CTSDelta (CTSDeltaLength bits):
Specifies the value of the CTS as a 2-complement offset (delta)
from the timestamp in the RTP header of the RTP packet. The
length in bits of each CTSDelta field is specified by the
CTSDeltaLength parameter (see section 4.1). CTSDelta MUST have
sufficient bits to encode this difference.
The CTSDelta field is present if CTSFlag is 1. The CTSDelta field is present if CTSFlag is 1.
For the first Payload Header of each RTP packet CTSFlag is always 0, For the first Payload Header of each RTP packet CTSFlag is
since the composition time stamp of the first AU or AU fragment in always 0, since the composition time stamp of the first AU or
the RTP packet is mapped to the RTP time stamp. When using the Sync AU fragment in the RTP packet is mapped to the RTP time stamp.
Layer the sender MUST remove the compositionTimeStamp from the RSLH. When using the Sync Layer the sender MUST remove the
compositionTimeStamp from the RSLH.
Senders MUST finish assembling a RTP packet for which CTSDelta would Senders MUST finish assembling a RTP packet for which CTSDelta
roll over since this would prevent the receiver from reconstructing would roll over since this would prevent the receiver from
the correct CTS. This can result in sub optimal RTP packets (smaller reconstructing the correct CTS. This can result in sub optimal
than the MTU) depending on the MTU, the AU or AU fragment sizes and RTP packets (smaller than the MTU) depending on the MTU, the AU
CTSDeltaLength. or AU fragment sizes and CTSDeltaLength.
DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A DTSFlag (1 bit):
value of 1 indicates that DTSDelta is present, a value of 0 that it Indicates whether the DTSDelta field is present. A value of 1
is not present. indicates that DTSDelta is present, a value of 0 that it is not
present.
If DTSDeltaLength is not zero, DTSFlag is present in all Payload If DTSDeltaLength is not zero, DTSFlag is present in all
Headers regardless of whether the AU fragment is an Access Unit Payload Headers regardless of whether the AU fragment is an
start or not. When transporting SL streams the receiver needs this Access Unit start or not. When transporting SL streams the
flag in order to reconstruct the decodingTimeStampFlag of SL Packet receiver needs this flag in order to reconstruct the
Headers. decodingTimeStampFlag of SL Packet Headers.
DTSDelta (DTSDeltaLength bits): encodes (compositionTimeStamp - DTSDelta (DTSDeltaLength bits):
decodingTimeStamp) for the same AU or AU fragment(always positive).
The length in bits of each DTSDelta field is specified by the
DTSDeltaLength parameter (see section 4.1).
Senders MUST make sure that DTSDeltaLength is large enough to encode Gentric et al. Expires March 2002 20
the difference between CTS and DTS (otherwise the DTS computed by RTP Payload Format for MPEG-4 Streams February 2002
the receiver would be incorrect).
Gentric et al. Expires March 2002 15 Encodes (compositionTimeStamp - decodingTimeStamp) for the same
RTP Payload Format for MPEG-4 Streams September 2001 AU or AU fragment(always positive). The length in bits of each
DTSDelta field is specified by the DTSDeltaLength parameter
(see section 4.1).
The DTSDelta field appears when DTSFlag is 1. The sender MUST always Senders MUST make sure that DTSDeltaLength is large enough to
remove the decodingTimeStamp from the RSLH. encode the difference between CTS and DTS (otherwise the DTS
computed by the receiver would be incorrect).
The DTSDelta field appears when DTSFlag is 1. The sender MUST
always remove the decodingTimeStamp from the RSLH.
If DTSDelta is zero i.e. if decodingTimeStamp equals If DTSDelta is zero i.e. if decodingTimeStamp equals
compositionTimeStamp then DTSFlag MUST be set to 0 and no DTSDelta compositionTimeStamp then DTSFlag MUST be set to 0 and no
field SHALL be present. DTSDelta field SHALL be present.
3.5 RSLHSection structure 3.4 RSLHSection structure
This section is present only when using the Sync Layer, and then, This section is present only when using the Sync Layer, and then,
when the rules in the previous section have left remaining fields. when the rules in the previous section have left remaining fields.
This section first consists of a field (RSLHSectionSize) giving the This section first consists of a field (RSLHSectionSize) giving the
size in bits of the following block of bit-wise concatenated RSLHs size in bits of the following block of bit-wise concatenated RSLHs
(this size does not include padding bits). (this size does not include padding bits).
If the section consumes a non-integer number of octets, up to 7 zero If the section consumes a non-integer number of octets, up to 7 zero
padding bits MUST be inserted at the end in order to achieve octet- padding bits MUST be inserted at the end in order to achieve octet-
alignment. alignment.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable | | RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| number of bits) | | number of bits) |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | RSLH (variable number of bits) | | | RSLH (variable number of bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| etc | | etc |
| as many bit-wise concatenated RSLHs | | as many bit-wise concatenated RSLHs |
| as SL Packets in this RTP packet | | as SL Packets in this RTP packet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RSLH (variable number of bits) | | RSLH (variable number of bits) |
| +-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+
| : padding bits| | : padding bits|
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7: RSLHSection structure Figure 8: RSLHSection structure
The length in bits of the RSLHSectionSize field is The length in bits of the RSLHSectionSize field is
RSLHSectionSizeLength and is specified with a default value of zero RSLHSectionSizeLength and is specified with a default value of zero
Gentric et al. Expires March 2002 21
RTP Payload Format for MPEG-4 Streams February 2002
indicating that the whole RSLHSection is absent. Note that for indicating that the whole RSLHSection is absent. Note that for
compatibility with RFC 3016 we need to be able to make the compatibility with RFC 3016 we need to be able to make the
RSLHSection disappear completely, including the RSLHSectionSize RSLHSection disappear completely, including the RSLHSectionSize
field. This is the reason why there is such a variable length with a field. This is the reason why there is such a variable length with a
zero default value indicating the absence of the RSLHSectionSize zero default value indicating the absence of the RSLHSectionSize
field. field.
+=================================+===============================+ +=================================+===============================+
| Fields of RSLHSection | Number of bits | | Fields of RSLHSection | Number of bits |
+=================================+===============================+ +=================================+===============================+
| RSLHSectionSize | RSLHSectionSizeLength | | RSLHSectionSize | RSLHSectionSizeLength |
Gentric et al. Expires March 2002 16
RTP Payload Format for MPEG-4 Streams September 2001
+---------------------------------+-------------------------------+ +---------------------------------+-------------------------------+
| all bit-wise concatenated RSLHs | RSLHSectionSize | | all bit-wise concatenated RSLHs | RSLHSectionSize |
+---------------------------------+-------------------------------+ +---------------------------------+-------------------------------+
Table 2: Sizes in bits inside RSLHSection Table 2: Sizes in bits inside RSLHSection
Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system
awareness, specifically it requires to understand the MPEG-4 awareness, specifically it requires to understand the MPEG-4
Sync Layer (SL) syntax and the modifications to this syntax Sync Layer (SL) syntax and the modifications to this syntax
described in the next section. described in the next section.
However thanks to the RSLHSectionSize field non-MPEG-4-system However thanks to the RSLHSectionSize field non-MPEG-4-system
receivers CAN skip this part by rounding up RSLPHSize/8 to the next receivers can skip this part by rounding up RSLPHSize/8 to the next
integer number of octets. This means that receivers not implementing integer number of octets. This means that receivers not implementing
the Sync Layer can process streams containing Sync Layer specific the Sync Layer can process streams containing Sync Layer specific
items by simply ignoring the parts they would not be able to parse. items by simply ignoring the parts they would not be able to parse.
3.6 RSLH structure 3.4.1 RSLH structure
RSLH is present only when using the Sync Layer, and then, when the RSLH is present only when using the Sync Layer, and then, when the
rules in the previous section have left remaining fields. rules in the previous section have left remaining fields.
A Remaining SL Packet Header (RSLH) is what remains of an SL header A Remaining SL Packet Header (RSLH) is what remains of an SL header
after modifications for mapping into this payload format. after modifications for mapping into this payload format.
The following modifications of the SL Packet Header MUST be applied. The following modifications of the SL Packet Header MUST be applied.
The other fields of the SL Packet Header MUST remain unchanged but The other fields of the SL Packet Header MUST remain unchanged but
are bit-shifted to fill in the gaps left by the operations specified are bit-shifted to fill in the gaps left by the operations specified
below. below.
3.6.1 Removal of fields 3.4.2 Removal of fields
The following SL Packet Header fields -if present- are removed since The following SL Packet Header fields -if present- are removed since
they are mapped either in the RTP header or in the corresponding they are mapped either in the RTP header or in the corresponding
Payload Header: Payload Header:
. compositionTimeStampFlag . compositionTimeStampFlag
. compositionTimeStamp . compositionTimeStamp
. decodingTimeStampFlag . decodingTimeStampFlag
. decodingTimeStamp . decodingTimeStamp
. packetSequenceNumber . packetSequenceNumber
. AccessUnitEndFlag (in "Single" mode only) . AccessUnitEndFlag (in "Single" packing style only)
Gentric et al. Expires March 2002 22
RTP Payload Format for MPEG-4 Streams February 2002
The AccessUnitEndFlag, when present for a given stream, MUST be The AccessUnitEndFlag, when present for a given stream, MUST be
removed from every RSLH when using the "Single" mode since it has removed from every RSLH when using the "Single" packing style since
the same meaning as the Marker bit (and for compatibility with RFC it has the same meaning as the Marker bit (and for compatibility
3016). However when using the "Multiple" mode, AccessUnitEndFlag with RFC 3016). However when using the "Multiple" packing style,
MUST NOT be removed since it is useful to signal individual AU ends. AccessUnitEndFlag MUST NOT be removed since it is useful to signal
individual AU ends.
3.6.2 Mapping of OCR 3.4.3 Mapping of OCR
Furthermore if the SL Packet header contains an OCR, then this field Furthermore if the SL Packet header contains an OCR, then this field
is encoded in the RSLH as a 2-complement difference (delta) exactly is encoded in the RSLH as a 2-complement difference (delta) exactly
like a compositionTimeStamp or a decodingTimeStamp in the like a compositionTimeStamp or a decodingTimeStamp in the
Gentric et al. Expires March 2002 17
RTP Payload Format for MPEG-4 Streams September 2001
PayloadHeader. The length in bit of this difference is indicated by PayloadHeader. The length in bit of this difference is indicated by
the OCRDeltaLength parameter (see section 4.1). the OCRDeltaLength parameter (see section 4.1).
With this payload format OCRs MUST have the same clock frequency as With this payload format OCRs MUST have the same clock frequency as
Time Stamps. Time Stamps.
If compositionTimeStamp is not present for a SL packet that has OCR If compositionTimeStamp is not present for a SL packet that has OCR
then the OCR SHALL be encoded as a difference to the RTP time stamp. then the OCR SHALL be encoded as a difference to the RTP time stamp.
3.6.3 Degradation Priority 3.4.4 Degradation Priority
For streams that use the optional degradationPriority field in the For streams that use the optional degradationPriority field in the
SL Packet Headers, only SL packets with the same degradation SL Packet Headers, only SL packets with the same degradation
priority SHALL be transported by one RTP packet so that components priority SHALL be transported by one RTP packet so that components
may dispatch the RTP packets according to appropriate QoS or may dispatch the RTP packets according to appropriate QoS or
protection schemes. Furthermore only the first RSLH of one RTP protection schemes. Furthermore only the first RSLH of one RTP
packet SHALL contain the degradationPriority field since it would be packet SHALL contain the degradationPriority field since it would be
otherwise redundant. otherwise redundant.
3.7 Payload Section structure 3.5 Payload Section structure
The Payload Section contains the concatenated AU or AU fragment The Payload Section contains the concatenated AU or AU fragment
Payloads. By definition AU or AU fragment Payloads are octet Payloads. By definition AU or AU fragment Payloads are octet
aligned. aligned.
For efficiency SL packets do not carry their own payload size. This For efficiency SL packets do not carry their own payload size. This
is not an issue for RTP packets that contain a single SL Packet. is not an issue for RTP packets that contain a single SL Packet.
However in the "Multiple" mode the size of each AU or AU fragment However in the "Multiple" packing style the size of each AU or AU
payload MUST be available to the receiver. fragment payload MUST be available to the receiver.
If the AU or AU fragment payload size is constant for a stream, the If the AU or AU fragment payload size is constant for a stream, the
size information SHOULD NOT be transported in the RTP packet. size information SHOULD NOT be transported in the RTP packet.
However in that case it MUST be signaled using the ConstantSize However in that case it MUST be signaled using the ConstantSize
parameter (see section 4.1). parameter (see section 4.1).
If the AU or AU fragment payload size is variable then the size of If the AU or AU fragment payload size is variable then the size of
each AU or AU fragment payload MUST be indicated in the each AU or AU fragment payload MUST be indicated in the
corresponding Payload Header. In order to do so the Payload Header corresponding Payload Header. In order to do so the Payload Header
MUST contain a PayloadSize field. The number of bits on which this MUST contain a PayloadSize field. The number of bits on which this
PayloadSize field is encoded MUST be indicated using the SizeLength PayloadSize field is encoded MUST be indicated using the SizeLength
parameter (see section 4.1). parameter (see section 4.1).
Gentric et al. Expires March 2002 23
RTP Payload Format for MPEG-4 Streams February 2002
The absence of either ConstantSize or SizeLength indicates the The absence of either ConstantSize or SizeLength indicates the
"Single" mode i.e. that a single AU or AU fragment is transported in "Single" packing style i.e. that a single AU or AU fragment is
each RTP packet for that stream. transported in each RTP packet for that stream.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| AU or AU fragment (variable number of octets) | | AU or AU fragment (variable number of octets) |
| | | |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | AU or AU fragment | | | AU or AU fragment |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
Gentric et al. Expires March 2002 18
RTP Payload Format for MPEG-4 Streams September 2001
+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | |
| (variable number of octets) | | (variable number of octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| etc | | etc |
| as many octet-wise concatenated AU or AU fragment | | as many octet-wise concatenated AU or AU fragment |
| as required to finish RTP packet | | as required to finish RTP packet |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 8: Payload Section structure Figure 9: Payload Section structure
3.8 Interleaving 3.6 Interleaving
SL Packets MAY be interleaved. Senders MAY perform interleaving. SL Packets MAY be interleaved. Senders MAY perform interleaving.
Receivers MUST support interleaving. Receivers MUST support interleaving. Additional specifications MAY
restrict this support by explicit signaling (see for example
RFCYYYY).
Note for Sync Layer implementers: the AUSequenceNumber field of the Note for Sync Layer implementers: the AUSequenceNumber field of the
SL Header MUST NOT be used for interleaving since firstly it may SL Header MUST NOT be used for interleaving since firstly it may
collide with the Scene Description Carousel usage described in collide with the Scene Description Carousel usage described in
section 5.2 and secondly it is not visible to receivers that do not section 6.2 and secondly it is not visible to receivers that do not
implement the Sync Layer and would skip the RSLH section implement the Sync Layer and would skip the RSLH section
transporting AUSequenceNumber. transporting AUSequenceNumber.
When interleaving of AU or AU fragments is used it SHALL be When interleaving of AU or AU fragments is used it SHALL be
implemented using the IndexDelta fields of the Payload Header. implemented using the IndexDelta fields of the Payload Header.
Senders MUST NOT make RTP packets for which IndexDelta rolls over. Senders MUST NOT make RTP packets for which IndexDelta rolls over.
Therefore depending on the interleaving scheme (if any), the MTU and Therefore depending on the interleaving scheme (if any), the MTU and
the AU or AU fragment sizes, senders wishing to make optimally sized the AU or AU fragment sizes, senders wishing to make optimally sized
RTP packets (i.e. close to the MTU) will need to set RTP packets (i.e. close to the MTU) will need to set
IndexDeltaLength to a properly large value. IndexDeltaLength to a properly large value.
Senders SHALL use non zero values of IndexDeltaLength only for Senders SHOULD use non zero values of IndexDeltaLength only for
streams that exhibit interleaving, so that this can be interpreted streams that exhibit interleaving, so that this can be interpreted
by receivers as an indication that interleaving is (maybe) present. by receivers as an indication that interleaving maybe present.
There are, based on this, two ways for a receiver to implement de- There are, based on this, two ways for a receiver to implement de-
interleaving, using either Index or timestamps. This is signaled interleaving:
using mime parameters as in the following table, where TSBI and IBI
stand respectively for Time-Stamp-Based-Interleaving (see section Gentric et al. Expires March 2002 24
3.8.1) and Index-Based-Interleaving (see section 3.8.2). Note that RTP Payload Format for MPEG-4 Streams February 2002
the need for two methods arises from two facts: firstly the time
stamp based method is more economical and in basic cases (no . Time-Stamp-Based-Interleaving (TSBI see section 3.6.1) uses
multiple AU fragments, CTS always defined) simpler to implement. IndexDelta and timestamps.
. Index-Based-Interleaving (see section 3.6.2) uses IndexDelta and
Index.
This is signaled using mime parameters as in the following table.
Note that the need for two methods arises from two facts: firstly
the time stamp based method is more economical and in basic cases
(no multiple AU fragments, CTS always defined) simpler to implement.
Secondly, unfortunately this method does not always work as Secondly, unfortunately this method does not always work as
explained below. explained below.
================================================================== ==================================================================
| | IndexDeltaLength = 0 | IndexDeltaLength != 0 | | | IndexDeltaLength = 0 | IndexDeltaLength != 0 |
------------------------------------------------------------------ ------------------------------------------------------------------
| IndexLength=0 | no interleaving | TSBI | | IndexLength=0 | no interleaving | TSBI |
------------------------------------------------------------------ ------------------------------------------------------------------
| IndexLength!=0 | no interleaving, | Index=0 | Index!=0 | | IndexLength!=0 | no interleaving, | Index=0 | Index!=0 |
Gentric et al. Expires March 2002 19
RTP Payload Format for MPEG-4 Streams September 2001
| | SL.packetSeqNum |------------------------- | | SL.packetSeqNum |-------------------------
| | transport | TSBI | IBI | | | transport | TSBI | IBI |
================================================================== ==================================================================
3.8.1 Time stamp based interleaving (TSBI) 3.6.1 Time stamp based interleaving (TSBI)
The conjunction of RTP time stamp, IndexDelta and CTS may allow a The conjunction of RTP time stamp, IndexDelta and CTS may allow a
receiver to un-ambiguously re-order AU or AU fragments based on receiver to un-ambiguously re-order AU or AU fragments based on
their time stamps (CTS). their time stamps (CTS).
This is possible and efficient for streams where only complete This is possible and efficient for streams where only complete
Access Units are transported and receivers can always compute the Access Units are transported and receivers can always compute the
time stamp of each Access Unit. time stamp of each Access Unit.
In case of Access Units of constant duration (e.g. audio streams) In case of Access Units of constant duration (e.g. audio streams)
the explicit presence of CTS in the Payload Header is not even the explicit presence of CTS in the Payload Header is not even
required; Indeed then we have (i being the index of one AU in one required; Indeed then we have (i being the index of one AU in one
RTP packet): RTP packet):
CTS(0) = RTP-TS CTS(0) = RTP-TS
for (i >= 1): CTS(i) = CTS(i-1) + (IndexDelta(i)+1)*AU-duration for (i >= 1): CTS(i) = CTS(i-1) + (IndexDelta(i)+1)*AU_duration
AU-duration, when constant, can be either signaled in SLConfig or be AU_duration, when constant, can be either signaled in SLConfig or be
deduced from the decoder configuration (see the config MIME deduced from the decoder configuration (see the "Config" MIME
parameter). parameter).
Senders MUST use either IndexLength=0 or set all Index values in all Senders MUST use either IndexLength=0 or set all Index values in all
packets to zero so that receivers CAN detect this as an indication packets to zero so that receivers can detect this as an indication
that de-interleaving SHOULD be performed using time stamps. that de-interleaving SHOULD be performed using time stamps.
When using the Sync Layer and when interleaving senders MUST use for When using the Sync Layer and when interleaving senders MUST use for
SL.timeStampLength values large enough to prevent the CTS from SL.timeStampLength values large enough to prevent the CTS from
rolling over more often than a packet loss burst length. Pre- rolling over more often than a packet loss burst length. Pre-
existing SL streams that do not comply with this requirement cannot existing SL streams that do not comply with this requirement cannot
be interleaved using this payload format (or by using 3.8.2)
3.8.2 Index based interleaving (IBI) Gentric et al. Expires March 2002 25
RTP Payload Format for MPEG-4 Streams February 2002
The timestamp-based interleaving algorithm described in 3.8.1. does be interleaved using this payload format (or by using IBI as in
not work when a CTS cannot always be computed for all AU or AU 3.6.2)
fragments (for example after a packet loss); this happens:
. If the AU duration is not constant (SL durationFlag = 0) and CTS 3.6.2 Index based interleaving (IBI)
is not signaled (SL useTimeStampsFlag= 0).
The timestamp-based interleaving algorithm described in the previous
section does not work when a CTS cannot always be computed for all
AU or AU fragments (for example after a packet loss); this happens:
. If the AU duration is not constant (SL durationFlag = 0) and
CTS is not signaled (SL useTimeStampsFlag= 0).
. When interleaving AU fragments. . When interleaving AU fragments.
When interleaving, senders of such streams MUST use the index-based When interleaving, senders of such streams MUST use the index-based
technique described in this section. technique described in this section.
The conjunction of RTP sequence number, Index and IndexDelta can The conjunction of RTP sequence number, Index and IndexDelta can
produce a quasi-unique identifier for each AU or AU fragment so that produce a quasi-unique identifier for each AU or AU fragment so that
a receiver can unambiguously reconstruct the original order even in a receiver can unambiguously reconstruct the original order even in
case of out-of-order packets, packet loss or duplication (see the case of out-of-order packets, packet loss or duplication (see the
pseudo code in 3.4.1 and 5.1). pseudo code in 3.3.2 and 6.1). Specifically the RTP sequence number
is used to re-order packets and inside one RTP packet we have:
Gentric et al. Expires March 2002 20 Serial number(0) = Index(0)
RTP Payload Format for MPEG-4 Streams September 2001 Serial number(i+1) = Serial number(i) + IndexDelta(i+1) + 1 (i>=0)
This requires, however, that IndexLength is not too small. For that This requires, however, that IndexLength is not too small. For that
reason senders when interleaving in this fashion MUST use for reason senders when interleaving in this fashion MUST use for
IndexLength values large enough to prevent Index from rolling over IndexLength values large enough to prevent Index from rolling over
more often than a typical loss burst loss. Pre-existing SL streams more often than a typical loss burst length. Pre-existing SL streams
that do not comply with this requirement (specifically if that do not comply with this requirement (specifically if
SL.packetSeqNumLength is too small) cannot be interleaved using this SL.packetSeqNumLength is too small) cannot be interleaved using this
payload format (or by using 3.8.1). payload format (or should use TSBI).
Receivers CAN interpret non-zero values in the Index field as an Receivers SHOULD interpret non-zero values in the Index field as an
indication that de-interleaving CAN be performed using Index and indication that de-interleaving can be performed using Index and
IndexDelta and CANNOT be performed using timestamps. IndexDelta but cannot be performed using timestamps.
3.8.3 SL streams that cannot be interleaved 3.6.3 SL streams that should not be interleaved
SL streams for which both SL.timeStampLength and SL streams for which both SL.timeStampLength and
SL.packetSeqNumLength are too small cannot be interleaved with this SL.packetSeqNumLength are too small SHOULD NOT be interleaved with
payload format. Typically small values would cause a receiver to this payload format, the reason being that small values would cause
drop a large part of the stream in case of packet loss. The actual a receiver to drop a large part of the stream in case of packet
minimal value depends on network loss properties and on the expected loss. The actual minimal length depends on network loss properties
quality of service. and on the expected quality of service.
3.9 Fragmentation Rules
This section specifies rules for senders in order to prevent media 3.7 Fragmentation Rules
decoding difficulties at the receiver end.
MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams
and SHOULD be mapped directly into RTP packets of this format with and SHOULD be mapped directly into RTP packets of this format with
two exceptions: two exceptions:
- Access Units larger than the MTU - Access Units larger than the MTU
- When using interleaving for better packet loss resilience. - When using interleaving for better packet loss resilience.
Gentric et al. Expires March 2002 26
RTP Payload Format for MPEG-4 Streams February 2002
This section gives rules to apply when performing Access Unit This section gives rules to apply when performing Access Unit
fragmentation. Let us first explain the context before describing fragmentation. Let us first explain the context before describing
the rules. the rules.
Some MPEG-4 codecs define optional syntax for Access Units sub- For error resilience purposes some MPEG-4 codecs define optional
entities (fragments) that are independently decodable for error syntax of Access Units fragments that are independently decodable.
resilience purposes. Examples are Video Packets for video and Error Examples are Video Packets for video and Error Sensitivity
Sensitivity Categories (ESC) for audio. This always corresponds to Categories (ESC) for audio. This always corresponds to specific
specific bitstream syntax, which is signaled in the bitstream syntax, which is signaled in the DecoderSpecificInfo
DecoderSpecificInfo inside the DecoderConfig in SLConfig, and/or inside the DecoderConfig in SLConfig, and/or using the corresponding
using the corresponding parameters as described in section 4.1. parameters as described in section 4.1.
Thanks to that decoders are aware whether encoders are operating in Thanks to that, decoders are aware whether encoders are operating in
such a mode or not (however since this codec configuration is an such a mode or not (however since this codec configuration is an
opaque data block this is not explicitly signaled by this payload opaque data block this is not explicitly signaled by this payload
format). format).
If not operating in such a mode it is obvious that the decoder has If not operating in such a mode it is obvious that the decoder has
to skip packets after a loss until an Access Unit start is received. to skip packets after a loss until an Access Unit start is received.
Similarly decoder implementations that do not implement robust Similarly decoder implementations that do not implement robust
decoding of Access Units fragments have to discard all packets after decoding of Access Units fragments have to discard all packets after
Gentric et al. Expires March 2002 21
RTP Payload Format for MPEG-4 Streams September 2001
a packet loss until an Access Unit start is received. In the same a packet loss until an Access Unit start is received. In the same
way decoder implementations that do not implement re-synchronization way decoder implementations that do not implement re-synchronization
at any Access Units start have to discard all packets after a packet at any Access Units start have to discard all packets after a packet
loss until a Random Access Point Access Unit is received. These are loss until a Random Access Point Access Unit is received. These are
all obvious things that a good implementation would do. all obvious things that a good implementation would do.
However serious problems would arise for decoder implementations However serious problems would arise for decoder implementations
that try to restart decoding after a packet loss if independently that try to restart decoding after a packet loss if independently
decodable fragments are signaled (in the decoder configuration) but decodable fragments are signaled (in the decoder configuration) but
the fragments actually received are not independently decodable the fragments actually received are not independently decodable
because the RTP sender has made RTP packets on different boundaries because the RTP sender has made RTP packets on different boundaries
than the fragments provided by the encoder (so this issue applies to than the fragments provided by the encoder (so this issue applies to
the interface between the encoder and the RTP sender and to the RTP the interface between the encoder and the RTP sender and to the RTP
sender component itself), because the decoder has in general no way sender component itself). Indeed the decoder has in general no way
to detect such a faulty fragment. to detect such a faulty fragment (except for MPEG-4 video).
For this reason the following rules must be applied: For this reason the following rules must be applied:
In the spirit of ALF this payload format should transport either In the spirit of ALF this payload format should transport either
complete Access Units or fragments of Access Units that are complete Access Units or fragments of Access Units that are
independently decodable. Specifically when a given codec has an independently decodable. Specifically when a given codec has an
independently decodable Access Unit fragments optional syntax this independently decodable Access Unit fragments optional syntax this
option SHOULD be used. option SHOULD be used.
Independently decodable Access Units fragments MUST NOT be split Independently decodable Access Units fragments SHOULD NOT be split
across several RTP packets. across several RTP packets.
For example an MPEG-4 audio stream encoded using the ESC syntax MUST An MPEG-4 audio stream encoded using the ESC syntax MUST NOT split
NOT split one ESC across 2 RTP packets. one ESC across 2 RTP packets.
This rule is relaxed when using MPEG-4 Video Packets for two When using MPEG-4 Video Packets since all Video Packets start with a
reasons: firstly Video Packets can be much larger than typical MTU specific resynchronization marker that can be unambiguously detected
and secondly all Video Packets start with a specific this rule is not needed. However it is strongly RECOMMENDED to
resynchronization marker that can be unambiguously detected.
Therefore for video streams using the Video Packet syntax Video Gentric et al. Expires March 2002 27
Packets MAY be split across several SL packets although it is RTP Payload Format for MPEG-4 Streams February 2002
strongly RECOMMENDED to always adapt the Video Packet size to fit
the MTU. However a Video Packet start MUST always be aligned with an always adapt the Video Packet size to fit the MTU. In any case a
AU fragment start, except when a GOV is present, in which case the video AU or AU fragment start MUST always be aligned with either:
GOV and the first Video Packet of the following VOP MUST be included . a VOP start.
in the same SL packet. . a Video Packet start.
. or a GOV followed by the first (or only) Video Packet of the
following VOP.
4. Types and Names 4. Types and Names
This section describes the MIME types and names associated with this This section describes the MIME types and names associated with this
payload format. Section 4.1 registers the MIME types, as per RFC payload format. Section 4.1 registers the MIME types, as per RFC
2048. 2048.
This format may require additional information about the mapping to This format may require additional information about the mapping to
be made available to the receiver. This is done using parameters be made available to the receiver. This is done using parameters
described in the next section. The absence of any of these fields is described in the next section. The absence of any of these fields is
equivalent to a field set to the default value, which is always equivalent to a field set to the default value, which is always zero
for numerical parameters. The absence of any such parameters
Gentric et al. Expires March 2002 22 resolves into a default "basic" configuration compatible with
RTP Payload Format for MPEG-4 Streams September 2001 RFC3016 for MPEG-4 video.
zero. The absence of any such parameters resolves into a default
"basic" configuration compatible with RFC3016 for MPEG-4 video.
In the MPEG-4 framework the SL stream configuration information is In the MPEG-4 framework the SL stream configuration information is
carried using the Object Descriptor. For compatibility with carried using the Object Descriptor. For compatibility with
receivers that do not implement the full MPEG-4 system specification receivers that do not implement the full MPEG-4 system specification
this information MAY also be signaled using parameters described this information MAY also be signaled using parameters described
here. When such information is present both in an Object Descriptor here. When such information is present both in an Object Descriptor
and as a parameter of this payload format it MUST be exactly the and as a parameter of this payload format it MUST be exactly the
same. same.
For transport of MPEG-4 audio and video without the use of MPEG-4 For transport of MPEG-4 audio and video without the use of MPEG-4
systems, as well as to support non-MPEG-4 system receivers, it is systems, as well as to support non-MPEG-4 system receivers, it is
also possible to transport information on the profile and level of also possible to transport information on the profile and level of
the stream and on the decoder configuration. This is also described the stream and on the decoder configuration. This is also described
in the next section. in the next section.
Finally this MIME type also defines a mode parameter and a profile Finally this MIME type also defines a mode parameter and a profile
parameter that are intended for future derivations of this payload parameter that are intended for derivations of this payload format.
format. One such derivation is described in the companion RFC YYYY.
4.1 MIME type registration 4.1 MIME type registration
MIME media type name: "video" or "audio" or "application" MIME media type name: "video" or "audio" or "application"
"video" SHOULD be used for MPEG-4 Visual streams (i.e. video as "video" MUST be used for MPEG-4 Visual streams (i.e. video as
defined in ISO/IEC 14496-2 [2] and/or graphics as defined in ISO/IEC defined in ISO/IEC 14496-2 (Streamtype = 4) and/or graphics as
14496-1 [1]) or MPEG-4 Systems streams that convey information defined in ISO/IEC 14496-1 (Streamtype = 3)) or MPEG-4 Systems
needed for an audio/visual presentation. streams that convey information needed for an audio/visual
presentation.
"audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3)
MPEG-4 Systems streams that convey information needed for an audio (Streamtype = 5)) or MPEG-4 Systems streams that convey information
only presentation. needed for an audio only presentation.
"application" SHOULD be used for MPEG-4 Systems streams Gentric et al. Expires March 2002 28
(ISO/IEC14496-1) that serve other purposes than audio/visual RTP Payload Format for MPEG-4 Streams February 2002
presentation, e.g. in some cases when MPEG-J streams are
transmitted. "application" MUST be used for MPEG-4 Systems streams (ISO/IEC14496-
1 (all other StreamType values)) that serve other purposes than
audio/visual presentation, e.g. in some cases when MPEG-J streams
are transmitted.
MIME subtype name: mpeg4-generic MIME subtype name: mpeg4-generic
Required parameters: none Required parameters: none
Optional parameters: Optional parameters:
mode: mode:
The mode in which this specification is used. This specification The mode in which this specification is used. This
itself defines only the default mode (Mode=default). When the mode specification itself defines only the default mode
parameter is not present the default mode SHALL be assumed. In the (Mode=default). When the mode parameter is not present the
default mode all parameters are optional and as defined here. Other default mode SHALL be assumed. In the default mode all
modes may be defined as needed in other RFCs. A mode MUST be a parameters are OPTIONAL and as defined here. Other modes may be
subset of this specification. Specifically when defining a mode care defined as needed in other RFCs. A mode MUST be a subset of
MUST be taken that an implementation of this specification can this specification. Specifically when defining a mode care MUST
be taken that an implementation of this specification can
Gentric et al. Expires March 2002 23 decode the payload format corresponding to this new mode. For
RTP Payload Format for MPEG-4 Streams September 2001 this reason a mode MUST NOT specify new default values for MIME
parameters and MIME parameters MUST be present (unless they
decode the payload format corresponding to this new mode. For this have the default value) even if it is redundant in case the
reason a mode MUST NOT specify new default values for MIME mode assigns fixed values. A mode may define additionally that
parameters and MIME parameters MUST be present (unless they have the some MIME parameters are required instead of optional, that
default value) even if it is redundant in case the mode assigns some MIME parameters have fixed values (or ranges), and that
fixed values. A mode may define additionally that some MIME there are rules restricting the usage (for example RFCYYYY
parameters are required instead of optional, that some MIME forbids the carriage of multiple AU fragments in the same RTP
parameters have fixed values (or ranges), and that there are rules packet and -logically- uses only TSBI interleaving).
restricting the usage (for example forbidding the carriage of
multiple AU fragments in the same RTP packet).
profile: profile:
The meaning of this parameter may be defined by a mode. This is The meaning of this parameter may be defined by a mode. This is
meant to be used in order to define sub-configurations of a given meant to be used in order to define sub-configurations of a
mode, for example the maximum delay (and therefore the size of given mode, for example the maximum delay (and therefore the
buffers) induced by the usage of interleaving. Implementations of size of buffers) induced by the usage of interleaving.
this specification can ignore this parameter. Implementations of this specification can ignore this
parameter.
DTSDeltaLength: DTSDeltaLength:
The number of bits on which the DTSDelta field is encoded in each The number of bits on which the DTSDelta field is encoded in
Payload Header. The default value is zero and indicates the absence each Payload Header. The default value is zero and indicates
of DTSFlag and DTSDelta in the Payload Header (the stream does not the absence of DTSFlag and DTSDelta in the Payload Header (the
transport decodingTimeStamps). A value larger than zero indicates stream does not transport decodingTimeStamps). A value larger
that there is a DTSFlag in each Payload Header. Since than zero indicates that there is a DTSFlag in each Payload
decodingTimeStamp, if present, must be encoded as a difference to Header. Since decodingTimeStamp, if present, must be encoded as
the RTP time stamp, the DTSDeltaLength parameter MUST be present in a difference to the RTP time stamp, the DTSDeltaLength
order to transport decodingTimeStamps with this payload format. parameter MUST be present in order to transport
decodingTimeStamps with this payload format.
CTSDeltaLength: CTSDeltaLength:
The number of bits on which the CTSDelta field is encoded. The The number of bits on which the CTSDelta field is encoded. The
default value is zero and indicates the absence of the CTSFlag and default value is zero and indicates the absence of the CTSFlag
CTSDelta fields in Payload Header. Non-zero values MUST NOT be
signaled in the "Single" mode. Since compositionTimeStamps, if Gentric et al. Expires March 2002 29
present, must be encoded as a difference to the RTP time stamp, the RTP Payload Format for MPEG-4 Streams February 2002
CTSDeltaLength parameter MUST be present in order to transport
compositionTimeStamps using this payload format (in the "Multiple" and CTSDelta fields in Payload Header. Non-zero values MUST NOT
mode). However CTSDeltaLength SHOULD be set to zero (or not be signaled in the "Single" packing style. Since
signaled) for streams that have a constant Access Unit duration compositionTimeStamps, if present, must be encoded as a
(which can be explicitly signaled using the DurationFlag and difference to the RTP time stamp, the CTSDeltaLength parameter
MUST be present in order to transport compositionTimeStamps
using this payload format (in the "Multiple" packing style).
However CTSDeltaLength SHOULD be set to zero (or not signaled)
for streams that have a constant Access Unit duration (which
can be explicitly signaled using the DurationFlag and
AccessUnitDuration field of SLConfigDescriptor). AccessUnitDuration field of SLConfigDescriptor).
OCRDeltaLength: OCRDeltaLength:
The number of bits on which the OCRDelta field is encoded in RSLH. The number of bits on which the OCRDelta field is encoded in
The default value is zero and indicates the absence of OCR for this RSLH. The default value is zero and indicates the absence of
stream. Since objectClockReference -if present- must be encoded as a OCR for this stream. Since objectClockReference -if present-
difference to the RTP time stamp, the OCRDeltaLength parameter MUST must be encoded as a difference to the RTP time stamp, the
be present in order to transport objectClockReferences with this OCRDeltaLength parameter MUST be present in order to transport
payload format. objectClockReferences with this payload format.
SizeLength: SizeLength:
The number of bits on which the PayloadSize field of a Payload The number of bits on which the PayloadSize field of a Payload
Header is encoded. The default value is zero and indicates the Header is encoded. The default value is zero and indicates the
"Single" mode (unless ConstantSize is present). Simultaneous "Single" packing style (unless ConstantSize is present).
presence of this parameter and ConstantSize is illegal. Either the Simultaneous presence of this parameter and ConstantSize is
illegal. Either the SizeLength or ConstantSize parameter MUST
Gentric et al. Expires March 2002 24 be present in order to signal the "Multiple" packing style of
RTP Payload Format for MPEG-4 Streams September 2001 this payload format.
SizeLength or ConstantSize parameter MUST be present in order to
signal the "Multiple" mode of this payload format.
ConstantSize: ConstantSize:
The constant size in octets of each AU or AU fragment Payload for The constant size in octets of each AU or AU fragment Payload
this stream. The default value is zero and indicates variable AU or for this stream. The default value is zero and indicates
AU fragment Payload size (or the "Single" mode if SizeLength is variable AU or AU fragment Payload size (or the "Single"
absent). Simultaneous presence of this parameter and SizeLength is packing style if SizeLength is absent). Simultaneous presence
illegal. Either the SizeLength or ConstantSize parameter MUST be of this parameter and SizeLength is illegal. Either the
present in order to signal the "Multiple" mode of this payload SizeLength or ConstantSize parameter MUST be present in order
format. When ConstantSize is present the PayloadSize field of the to signal the "Multiple" packing style of this payload format.
When ConstantSize is present the PayloadSize field of the
Payload Header in the RTP packets MUST NOT be present. Payload Header in the RTP packets MUST NOT be present.
IndexLength: IndexLength:
The number of bits on which the Index is encoded in the first The number of bits on which the Index is encoded in the first
Payload Header of a RTP packet. The default value is zero and Payload Header of a RTP packet. The default value is zero and
indicates the absence of Index and IndexDelta for all Payload indicates the absence of Index and IndexDelta for all Payload
Headers. Since SL.packetSequenceNumber -if present- must be mapped Headers. Since SL.packetSequenceNumber -if present- must be
in PayloadHeader, the IndexLength parameter MUST be present in order mapped in the Payload Header, the IndexLength parameter MUST be
to transport SL.packetSequenceNumber with this payload format. present in order to transport SL.packetSequenceNumber with this
payload format.
IndexDeltaLength: IndexDeltaLength:
The number of bits on which the IndexDelta are encoded in any non- The number of bits on which the IndexDelta are encoded in any
first Payload Header. The default value is zero and indicates that non-first Payload Header. The default value is zero and
the serial number MUST be incremented by one for each AU or AU indicates that the serial number MUST be incremented by one for
fragment in the RTP packet (see section 3.5). IndexDeltaLength each AU or AU fragment in the RTP packet (see section 3.5). A
parameter MUST be present when using interleaving with this payload
format. Gentric et al. Expires March 2002 30
RTP Payload Format for MPEG-4 Streams February 2002
non-zero IndexDeltaLength parameter MUST be present when using
interleaving with this payload format.
RSLHSectionSizeLength: RSLHSectionSizeLength:
The number of bits that is used to encode the RSLHSectionSize field. The number of bits that is used to encode the RSLHSectionSize
The default value is zero and indicates the absence of the whole field. The default value is zero and indicates the absence of
RSLHSection for all RTP packets of this stream. the whole RSLHSection for all RTP packets of this stream.
SLConfigDescriptor: SLConfigDescriptor:
A base-64 encoding of the SLConfigDescriptor. This SHALL be the A base-64 encoding of the SLConfigDescriptor. This SHALL be the
original SLConfigDescriptor and it SHALL be the same as the one original SLConfigDescriptor and it SHALL be the same as the one
transported by the OD framework, if any. transported by the OD framework, if any.
profile-level-id: profile-level-id:
A decimal representation of the MPEG-4 Profile Level indication A decimal representation of the MPEG-4 Profile Level indication
value. For audio this parameter indicates which MPEG-4 Audio tool value. For audio this parameter indicates which MPEG-4 Audio
subsets are applied to encode the audio stream and is defined in tool subsets are applied to encode the audio stream and is
ISO/IEC 14496-1 [1]. For video this parameter indicates which MPEG-4 defined in ISO/IEC 14496-1 [1]. For video this parameter
Visual tool subsets are applied to encode the video stream and is indicates which MPEG-4 Visual tool subsets are applied to
defined in Table G-1 of ISO/IEC 14496-2 [2]. This parameter MAY be encode the video stream and is defined in Table G-1 of ISO/IEC
used in the capability exchange or session setup procedure to 14496-2 [2]. This parameter MAY be used in the capability
indicate MPEG-4 Profile and Level combination of which the relevant exchange or session setup procedure to indicate MPEG-4 Profile
MPEG-4 media codec is capable. If this parameter is not specified and Level combination of which the relevant MPEG-4 media codec
its default value is 1 (Simple Profile/Level 1) for video (for is capable. If this parameter is not specified its default
compatibility with RFC 3016) and otherwise 0xFE (defined in ISO/IEC value is 1 (Simple Profile/Level 1) for video (for
14496-1 [1] as being the generic default value). compatibility with RFC 3016) and otherwise 254 (0xFE being
defined in ISO/IEC 14496-1 [1] as being the generic default
Gentric et al. Expires March 2002 25 value).
RTP Payload Format for MPEG-4 Streams September 2001
config: config:
A hexadecimal representation of an octet string that expresses the A hexadecimal representation of an octet string that expresses
media payload configuration. Configuration data is mapped onto the the media payload configuration. Configuration data is mapped
octet string in an MSB-first basis. The first bit of the onto the octet string in an MSB-first basis. The first bit of
configuration data SHALL be located at the MSB of the first octet. the configuration data SHALL be located at the MSB of the first
In the last octet, zero-valued padding bits, if necessary, shall octet. In the last octet, zero-valued padding bits, if
follow the configuration data. For audio streams, config is the necessary, shall follow the configuration data. For audio
audio object type specific decoder configuration data streams, config is the audio object type specific decoder
AudioSpecificConfig() as defined in ISO/IEC 14496-3 [3]. For video configuration data AudioSpecificConfig() as defined in ISO/IEC
this expresses the MPEG-4 Visual configuration information, as 14496-3 [3]. For video this expresses the MPEG-4 Visual
defined in subclause 6.2.1 Start codes of ISO/IEC14496-2 [2] and the configuration information, as defined in subclause 6.2.1 Start
configuration information indicated by this parameter SHALL be the codes of ISO/IEC14496-2 [2] and the configuration information
same as the configuration information in the corresponding MPEG-4 indicated by this parameter SHALL be the same as the
Visual stream, except for first-half-vbv-occupancy and latter-half- configuration information in the corresponding MPEG-4 Visual
stream, except for first-half-vbv-occupancy and latter-half-
vbv-occupancy, if it exists, which may vary in the repeated vbv-occupancy, if it exists, which may vary in the repeated
configuration information inside an MPEG-4 Visual stream (See 6.2.1 configuration information inside an MPEG-4 Visual stream (See
Start codes of ISO/IEC14496-2). 6.2.1 Start codes of ISO/IEC14496-2).
StreamType: StreamType:
The integer value that indicates the type of MPEG-4 stream that is The integer value that indicates the type of MPEG-4 stream that
carried; its coding corresponds to the values of the streamType as is carried; its coding corresponds to the values of the
defined for the DecoderConfigDescriptor in ISO/IEC 14496-1. streamType as defined for the DecoderConfigDescriptor in
ISO/IEC 14496-1.
Gentric et al. Expires March 2002 31
RTP Payload Format for MPEG-4 Streams February 2002
Encoding considerations: Encoding considerations:
System bitstreams MUST be generated according to MPEG-4 System System bitstreams MUST be generated according to MPEG-4 System
specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated specifications (ISO/IEC 14496-1). Video bitstreams MUST be
according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio generated according to MPEG-4 Visual specifications (ISO/IEC
bitstreams MUST be generated according to MPEG-4 Audio 14496-2). Audio bitstreams MUST be generated according to MPEG-
specifications (ISO/IEC 14496-3). If the Sync Layer is used SL 4 Audio specifications (ISO/IEC 14496-3). If the Sync Layer is
streams MUST be generated according to MPEG-4 Sync Layer used SL streams MUST be generated according to MPEG-4 Sync
specifications (ISO/IEC 14496-1 section 10), then in order to read Layer specifications (ISO/IEC 14496-1 section 10), then in
the RSLH parts of this format the SLConfigDescriptor is required. order to read the RSLH parts of this format the
These bitstreams are binary data and MUST be encoded for non-binary SLConfigDescriptor is required. These bitstreams are binary
transport (for Email, the Base64 encoding is sufficient). This type data and MUST be encoded for non-binary transport (for Email,
is also defined for transfer via RTP. The RTP packets MUST be the Base64 encoding is sufficient). This type is also defined
packetized according to the RTP payload format defined in RFC <self- for transfer via RTP. The RTP packets MUST be packetized
reference-to-this>. according to the RTP payload format defined in RFC XXXX.
Security considerations: Security considerations:
As in RFC <self-reference-to-this>. As in RFC XXXX.
Interoperability considerations: Interoperability considerations:
MPEG-4 provides a large and rich set of tools for the coding of MPEG-4 provides a large and rich set of tools for the coding of
visual objects. For effective implementation of the standard, visual objects. For effective implementation of the standard,
subsets of the MPEG-4 tool sets have been provided for use in subsets of the MPEG-4 tool sets have been provided for use in
specific applications. These subsets, called 'Profiles', limit the specific applications. These subsets, called "Profiles", limit
size of the tool set a decoder is required to implement. In order to the size of the tool set a decoder is required to implement. In
restrict computational complexity, one or more 'Levels' are set for order to restrict computational complexity, one or more
each Profile. A Profile@Level combination allows: "Levels" are set for each Profile. A Profile@Level combination
. a codec builder to implement only the subset of the standard he allows:
needs, while maintaining interoperability with other MPEG-4 devices . A codec builder to implement only the subset of the standard
included in the same combination, and he needs, while maintaining interoperability with other MPEG-4
devices included in the same combination, and
Gentric et al. Expires March 2002 26 . Checking whether MPEG-4 devices comply with the standard
RTP Payload Format for MPEG-4 Streams September 2001
. checking whether MPEG-4 devices comply with the standard
('conformance testing'). ('conformance testing').
A stream SHALL be compliant with the MPEG-4 Profile@Level specified
by the parameter "profile-level-id". Interoperability between a A stream SHALL be compliant with the MPEG-4 Profile@Level
sender and a receiver may be achieved by specifying the parameter specified by the parameter "profile-level-id". Interoperability
"profile-level-id" in MIME content, or by arranging in the between a sender and a receiver may be achieved by specifying
capability exchange/announcement procedure to set this parameter the parameter "profile-level-id" in MIME content, or by
mutually to the same value. arranging in the capability exchange/announcement procedure to
set this parameter mutually to the same value.
Published specification: Published specification:
The specifications for MPEG-4 streams are presented in ISO/IEC The specifications for MPEG-4 streams are presented in ISO/IEC
14469-1, 14469-2, and 14469-3. The RTP payload format is described 14469-1, 14469-2, and 14469-3. The RTP payload format is
in RFC <self-reference-to-this>. described in RFC XXXX.
Applications that use this media type: Applications that use this media type:
Multimedia streaming and conferencing tools, Internet messaging and Multimedia streaming and conferencing tools.
Email applications.
Additional information: none Additional information: none
Magic number(s): none Magic number(s): none
Gentric et al. Expires March 2002 32
RTP Payload Format for MPEG-4 Streams February 2002
File extension(s): File extension(s):
None. A file format with the extension .mp4 has been defined for None. A file format with the extension .mp4 has been defined
MPEG-4 content but is not directly correlated with this MIME type for MPEG-4 content but is not directly correlated with this
which sole purpose is RTP transport. MIME type which sole purpose is RTP transport.
Macintosh File Type Code(s): none Macintosh File Type Code(s): none
Person & email address to contact for further information: Person & email address to contact for further information:
Authors of RFC <self-reference-to-this>. Authors of RFC XXXX.
Intended usage: COMMON Intended usage: COMMON
Author/Change controller: Author/Change controller:
Authors of RFC <self-reference-to-this>. Authors of RFC XXXX, IETF Audio/Video Transport working group.
4.2 Concatenation of parameters 4.2 Concatenation of parameters
Multiple parameters SHOULD be expressed as a MIME media type string, Multiple parameters SHOULD be expressed as a MIME media type string,
in the form of a semicolon-separated list of parameter=value pairs in the form of a semicolon-separated list of parameter=value pairs
(see examples below). (see examples below).
4.3 Usage of SDP 4.3 Usage of SDP
4.3.1 The a=fmtp keyword 4.3.1 The a=fmtp keyword
It is assumed that one typical way to transport the above-described It is assumed that one typical way to transport the above-described
parameters associated with this payload format is via an SDP [10] parameters associated with this payload format is via an SDP [10]
message for example transported to the client in reply to a RTSP message for example transported to the client in reply to a RTSP
[13] DESCRIBE message or via SAP [14]. In that case the (a=fmtp) [13] DESCRIBE message or via SAP [14]. In that case the (a=fmtp)
keyword MUST be used as described in RFC 2327 [10, section 6]. The keyword MUST be used as described in RFC 2327 [10, section 6]. The
syntax being then: syntax being then:
Gentric et al. Expires March 2002 27
RTP Payload Format for MPEG-4 Streams September 2001
a=fmtp:<format> <parameter name>=<value> a=fmtp:<format> <parameter name>=<value>
4.3.2 SDP example 4.3.2 SDP example
The following is an example of SDP syntax for the description of a The following is an example of SDP syntax for the description of a
session containing one MPEG-4 video, one MPEG-4 audio stream and session containing one MPEG-4 video, one MPEG-4 audio stream and
three MPEG-4 system streams, the first one being BIFS, the second three MPEG-4 system streams, the first one being BIFS, the second
one OD and the third one IPMP. All are transported using this format one OD stream and the third one IPMP. All are transported using this
and the AVP profile [12]. Note the usage of some MIME parameters: format and the AVP profile [12]. Note the usage of some MIME
all stream display their streamtype; the video stream uses DTS with parameters: all stream display their StreamType; the video stream
DTSDelta encoded on 4 bits; the audio stream uses the "Multiple" uses DTS with DTSDelta encoded on 4 bits; the audio stream uses the
mode with 12 bits to describe the size of each AU or AU fragment "Multiple" packing style with 12 bits to describe the size of each
payload. See the Appendix for more examples. AU or AU fragment payload. See the Appendix for more examples.
o= .... o= ....
I= .... I= ....
c=IN IP4 123.234.71.112 c=IN IP4 123.234.71.112
m=video 1034 RTP/AVP 97 m=video 1034 RTP/AVP 97
a=fmtp:97 StreamType=4;DTSDeltaLength=4
a=rtpmap:97 mpeg4-generic a=rtpmap:97 mpeg4-generic
a=fmtp:97 StreamType=4;DTSDeltaLength=4
m=audio 1810 RTP/AVP 98 m=audio 1810 RTP/AVP 98
a=fmtp:98 StreamType=5; SizeLength=12; profile-level-id=1;
config=7866E7E6EF
a=rtpmap:98 mpeg4-generic
Gentric et al. Expires March 2002 33
RTP Payload Format for MPEG-4 Streams February 2002
a=rtpmap:98 mpeg4-generic
a=fmtp:98 StreamType=5;SizeLength=12;
m=application 1234 RTP/AVP 99 m=application 1234 RTP/AVP 99
a=rtpmap:99 mpeg4-generic a=rtpmap:99 mpeg4-generic
a=fmtp:99 StreamType=3 a=fmtp:99 StreamType=3
m=application 1236 RTP/AVP 100
a=rtpmap:100 mpeg4-generic
a=fmtp:100 StreamType=1
m=application 1238 RTP/AVP 101
a=rtpmap:101 mpeg4-generic
a=fmtp:101 StreamType=7
m=application 1236 RTP/AVP 99 5. IANA Considerations
a=rtpmap:99 mpeg4-generic
a=fmtp:99 StreamType=1
m=application 1238 RTP/AVP 99 One new MIME subtype is to be registered, see Section 4.1.
a=rtpmap:99 mpeg4-generic
a=fmtp:99 StreamType=7
5. Other issues 6. Other issues
5.1 SL packetized stream reconstruction 6.1 SL-packetized stream reconstruction
The purpose of this section is to document how a receiver can The purpose of this section is to document how a receiver can
reconstruct a valid SL packetized stream. Since this format directly reconstruct a valid SL-packetized stream. This reconstruction is
transports SL packets this reconstruction is performed by reversing performed by reversing the payload structure rules (section 3). We
the payload structure rules (section 3). We explicitly describe here explicitly describe here the most complex transformations.
the most complex transformations.
In the following let (i) be the index of SL packets inside one RTP In the following let (i) be the index of SL packets inside one RTP
packet (starting at zero for each RTP packet), let SLPacketHeader.x packet (starting at zero for each RTP packet), let SLPacketHeader.x
Gentric et al. Expires March 2002 28
RTP Payload Format for MPEG-4 Streams September 2001
denote field x of the reconstructed SL packet header, let denote field x of the reconstructed SL packet header, let
PayloadHeader.x denote field x of the received PayloadHeader, etc. PayloadHeader.x denote field x of the received PayloadHeader, etc.
SLPacketHeader.packetSequenceNumber is restored from SLPacketHeader.packetSequenceNumber is restored from
PayloadHeader.Index and PayloadHeader.IndexDelta using: PayloadHeader.Index and PayloadHeader.IndexDelta using:
If ( IndexLength == 0) { // or is absent If ( IndexLength == 0) { // or is absent
if ( SLConfig.packetSeqNumLength == 0 ) { if ( SLConfig.packetSeqNumLength == 0 ) {
// this stream does not have SL packet sequence number // this stream does not have SL packet sequence number
} }
skipping to change at line 1588 skipping to change at line 1856
} }
} }
else { // IndexLength is not zero else { // IndexLength is not zero
if ( SLConfig.packetSeqNumLength == 0 ) { if ( SLConfig.packetSeqNumLength == 0 ) {
// the original SL stream does not have SL packet // the original SL stream does not have SL packet
// sequence numbers, typically the sender inserted them // sequence numbers, typically the sender inserted them
// in order to implement interleaving at the RTP level; // in order to implement interleaving at the RTP level;
// they must be ignored for SL stream reconstruction // they must be ignored for SL stream reconstruction
} }
else { else {
Gentric et al. Expires March 2002 34
RTP Payload Format for MPEG-4 Streams February 2002
if (i == 0){ // first SL packet in RTP packet if (i == 0){ // first SL packet in RTP packet
SLPacketHeader.packetSequenceNumber(0) = SLPacketHeader.packetSequenceNumber(0) =
PayloadHeader.Index(0); PayloadHeader.Index(0);
} }
else { // remaining SL packets else { // remaining SL packets
SLPacketHeader.packetSequenceNumber(i+1)= SLPacketHeader.packetSequenceNumber(i+1)=
SLPacketHeader.packetSequenceNumber(i) SLPacketHeader.packetSequenceNumber(i)
+ PayloadHeader.IndexDelta(i+1) + PayloadHeader.IndexDelta(i+1)
+1; +1;
} }
skipping to change at line 1615 skipping to change at line 1887
from 32 bits to SLConfig.timeStampLength, which may be smaller or from 32 bits to SLConfig.timeStampLength, which may be smaller or
larger than 32 bits: larger than 32 bits:
If (timeStampLength < 32 ) { // short SL time stamps If (timeStampLength < 32 ) { // short SL time stamps
corrected(x) = LSB(x); // only the timeStampLength LSBits of x corrected(x) = LSB(x); // only the timeStampLength LSBits of x
} }
else If (timeStampLength > 32 ) { // long SL time stamps else If (timeStampLength > 32 ) { // long SL time stamps
corrected(x) = x + m; // start with m=0 corrected(x) = x + m; // start with m=0
if ( x(i) < x(i-1) ) { // 32 bits RTPTS roll over has occurred if ( x(i) < x(i-1) ) { // 32 bits RTPTS roll over has occurred
{ {
Gentric et al. Expires March 2002 29
RTP Payload Format for MPEG-4 Streams September 2001
m += 2^32; m += 2^32;
} }
} }
else If (timeStampLength = 32 ) { // recommended value else If (timeStampLength = 32 ) { // recommended value
corrected(x) = x; // direct mapping corrected(x) = x; // direct mapping
} }
if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent
// CTS is not transported for this RTP stream // CTS is not transported for this RTP stream
if (i == 0){ // first SL packet in RTP packet if (i == 0){ // first SL packet in RTP packet
skipping to change at line 1644 skipping to change at line 1912
} }
else { else {
// ignore // ignore
} }
} }
else { else {
// empty // empty
} }
} }
else { // non-first SL packets in RTP packet else { // non-first SL packets in RTP packet
Gentric et al. Expires March 2002 35
RTP Payload Format for MPEG-4 Streams February 2002
if ( SLConfig.useTimeStamps == 1 ) { if ( SLConfig.useTimeStamps == 1 ) {
if ( SLPacketHeader.accessUnitStartFlag == 1 ) { if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
SLPacketHeader.compositionTimeStampFlag(i) = 0; SLPacketHeader.compositionTimeStampFlag(i) = 0;
} }
else { else {
// ignore // ignore
} }
} }
else { else {
// empty // empty
skipping to change at line 1671 skipping to change at line 1943
SLPacketHeader.compositionTimeStampFlag(i) = SLPacketHeader.compositionTimeStampFlag(i) =
PayloadHeader.CTSFlag(i); PayloadHeader.CTSFlag(i);
SLPacketHeader.compositionTimeStamp(i) = SLPacketHeader.compositionTimeStamp(i) =
corrected(RTP TimeStamp) + corrected(RTP TimeStamp) +
PayloadHeader.CTSDelta(i); PayloadHeader.CTSDelta(i);
} }
else { else {
// ignore CTSFlag (which must be zero) // ignore CTSFlag (which must be zero)
} }
else { else {
Gentric et al. Expires March 2002 30
RTP Payload Format for MPEG-4 Streams September 2001
// this is strange and sub-optimal at best // this is strange and sub-optimal at best
// a receiver should ignore this // a receiver should ignore this
} }
} }
if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent
// DTS is not transported for this stream // DTS is not transported for this stream
if ( SLConfig.useTimeStamps == 1 ) { if ( SLConfig.useTimeStamps == 1 ) {
if ( SLPacketHeader.accessUnitStartFlag == 1 ) { if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
SLPacketHeader.decodingTimeStampFlag(i) = 0; SLPacketHeader.decodingTimeStampFlag(i) = 0;
skipping to change at line 1701 skipping to change at line 1969
// empty // empty
} }
} }
else { else {
// DTS is transported for this stream // DTS is transported for this stream
if ( SLConfig.useTimeStamps == 1 ) { if ( SLConfig.useTimeStamps == 1 ) {
if ( SLPacketHeader.accessUnitStartFlag == 1 ) { if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
SLPacketHeader.decodingTimeStampFlag(i) = SLPacketHeader.decodingTimeStampFlag(i) =
PayloadHeader.DTSFlag(i); PayloadHeader.DTSFlag(i);
SLPacketHeader.decodingTimeStamp(i)= SLPacketHeader.decodingTimeStamp(i)=
Gentric et al. Expires March 2002 36
RTP Payload Format for MPEG-4 Streams February 2002
SLPacketHeader.compositionTimeStamp(i) SLPacketHeader.compositionTimeStamp(i)
- PayloadHeader.DTSDelta(i); // DTS <= CTS always - PayloadHeader.DTSDelta(i); // DTS <= CTS always
} }
else { else {
// ignore DTSFlag (which must be zero) // ignore DTSFlag (which must be zero)
} }
} }
else { else {
// this is strange and sub-optimal at best // this is strange and sub-optimal at best
// a receiver should ignore this // a receiver should ignore this
skipping to change at line 1728 skipping to change at line 2000
} }
else { else {
// illegal, normally the sender MUST detect // illegal, normally the sender MUST detect
// OCRs, replace them with OCRDelta and set // OCRs, replace them with OCRDelta and set
// a relevant OCRDeltaLength value // a relevant OCRDeltaLength value
} }
} }
else { else {
if ( SLConfig.OCRLenght == 0 ) { if ( SLConfig.OCRLenght == 0 ) {
// this is strange and sub-optimal at best // this is strange and sub-optimal at best
Gentric et al. Expires March 2002 31
RTP Payload Format for MPEG-4 Streams September 2001
// a receiver should ignore this // a receiver should ignore this
} }
else { else {
SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i); SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i);
if ( SLPacketHeader.OCRflag(i) == 1) { if ( SLPacketHeader.OCRflag(i) == 1) {
SLPacketHeader.objectClockReference(i) = SLPacketHeader.objectClockReference(i) =
corrected(RTP TimeStamp) + RSLH.OCRDelta(i); corrected(RTP TimeStamp) + RSLH.OCRDelta(i);
} }
} }
} }
In the "Single" mode the AccessUnitEndFlag, if needed, is restored In the "Single" packing style the AccessUnitEndFlag, if needed, is
from the M bit, as follows: restored from the M bit, as follows:
if ( SLConfig.useAccessUnitEndFlag == 0 ) { if ( SLConfig.useAccessUnitEndFlag == 0 ) {
// this SL stream does not signal access unit ends // this SL stream does not signal access unit ends
else { else {
SLPacketHeader.AccessUnitEndFlag = M bit; SLPacketHeader.AccessUnitEndFlag = M bit;
} }
In the "Multiple" mode the AccessUnitEndFlag is untouched in RSLH. In the "Multiple" packing style the AccessUnitEndFlag is untouched
in RSLH.
The other SL packet header fields SHALL remain as found in RSLH. The other SL packet header fields SHALL remain as found in RSLH.
Gentric et al. Expires March 2002 37
RTP Payload Format for MPEG-4 Streams February 2002
It is obvious that in the general case the reconstruction of the It is obvious that in the general case the reconstruction of the
original SL packetized stream requires SL-awareness. However this original SL packetized stream requires SL-awareness. However this
payload format allows in all cases a receiver that does not know payload format allows in all cases a receiver that does not know
about the SL syntax to reconstruct the semantic of Elementary about the SL syntax to reconstruct the semantic of Elementary
Streams for the following very useful features: Streams for the following very useful features:
- Packet order (decoding order) - Packet order (decoding order)
- Access Unit boundaries (using the M bit) - Access Unit boundaries (using the M bit)
- Access Unit fragments (fragment boundaries using PayloadSize) - Access Unit fragments (fragment boundaries using PayloadSize)
- Composition Time Stamps, according to: - Composition Time Stamps, according to:
compositionTimeStamp(i) = RTP TimeStamp + CTSDelta(i); compositionTimeStamp(i) = RTP TimeStamp + CTSDelta(i);
skipping to change at line 1777 skipping to change at line 2049
decodingTimeStamp(i) = compositionTimeStamp(i) - DTSDelta(i); decodingTimeStamp(i) = compositionTimeStamp(i) - DTSDelta(i);
- Packet serial number, according to: - Packet serial number, according to:
if (i == 0){ // first SL packet in RTP packet if (i == 0){ // first SL packet in RTP packet
packet serial number(0) = Index(0); packet serial number(0) = Index(0);
} }
else { // remaining SL packets else { // remaining SL packets
packet serial number (i+1) = packet serial number (i) packet serial number (i+1) = packet serial number (i)
+ IndexDelta(i+1) + 1; + IndexDelta(i+1) + 1;
} }
5.2 Handling of scene description streams 6.2 Handling of scene description streams
MPEG-4 introduces new stream types as described in section 1 namely MPEG-4 introduces new stream types as described in section 1 namely
Object Descriptors and BIFS. In the following both OD and BIFS are Object Descriptors and BIFS. In the following both OD and BIFS are
discussed on the same basis i.e. as "scene description". discussed on the same basis i.e. as "scene description".
Gentric et al. Expires March 2002 32
RTP Payload Format for MPEG-4 Streams September 2001
Considering scene description as a "stream-able" type of content is Considering scene description as a "stream-able" type of content is
a rather new concept and for that reasons some specific comments are a rather new concept and for that reasons some specific comments are
needed. needed.
Typically scene descriptions are encoded in such a way that Typically scene descriptions are encoded in such a way that
information loss would in the general case cripple the presentation information loss would in the general case cripple the presentation
beyond any hope of repair by the receiver. Still this is well suited beyond any hope of repair by the receiver. This is acceptable for a
for a number of multimedia applications were the scene is first made number of multimedia applications were the scene is first made
available via reliable channels to the client and then played. This available via reliable channels to the client and then played. This
payload format is not intended for this type of applications for payload format is not primarily intended for this type of
which download of MPEG-4 interchange (.mp4) files is typical. applications for which download of MPEG-4 interchange (.mp4) files
However this payload format can also be used. It is then RECOMMENDED would be typical. However this payload format can also be used. It
that the RTP packets should be transported using TCP (for example is then RECOMMENDED however that the RTP packets should be
inside RTSP as described in [13, section 10.12]) or any other transported using TCP (for example inside RTSP as described in [13,
reliable protocol. section 10.12]) or any other reliable protocol.
On the other hand MPEG-4 has introduced the possibility to On the other hand MPEG-4 has introduced the possibility to
dynamically change the scene description by sending animation dynamically change the scene description by sending animation
information (changes in parameters) and structural change information (changes in parameters) and structural change
information (updates). Since this information has to be sent in a information (updates). Since this information has to be sent in a
timely fashion MPEG-4 has defined a number of techniques in order to timely fashion MPEG-4 has defined a number of techniques in order to
encode the scene description in a manner that makes it behave encode the scene description in a manner that makes it behave
similarly to other temporal encoding schemes such as audio and similarly to other temporal encoding schemes such as audio and
video. This payload format is intended for this usage. video. This payload format is intended for this usage.
Gentric et al. Expires March 2002 38
RTP Payload Format for MPEG-4 Streams February 2002
Note that in many cases the application will consist of first the Note that in many cases the application will consist of first the
reliable transmission of a static initial scene followed by the reliable transmission of a static initial scene followed by the
streaming of animations and updates. For this reason the usage of streaming of animations and updates. For this reason the usage of
this payload format is attractive since it offers a unique solution. this payload format is attractive since it offers a unique solution.
Senders must be aware that suitable schemes should be used when Senders must be aware that suitable schemes should be used when
scene description streams transport sensitive configuration scene description streams transport sensitive configuration
information. For example in case the RTP packet transporting an OD- information. For example in case the RTP packet transporting an OD-
update command would be lost, the corresponding media stream would update command would be lost, the corresponding media stream would
not be accessible by the receiver. not be accessible by the receiver.
skipping to change at line 1838 skipping to change at line 2110
update commands, there is a need to send both update commands and update commands, there is a need to send both update commands and
full BIFS/OD refresh. For that reason MPEG-4 defines Random Access full BIFS/OD refresh. For that reason MPEG-4 defines Random Access
Points (RAP) for scene description streams (OD and BIFS) where by Points (RAP) for scene description streams (OD and BIFS) where by
definition a decoder can restart decoding i.e. receives a "full definition a decoder can restart decoding i.e. receives a "full
update" of the scene. This mechanism is called Scene and Object update" of the scene. This mechanism is called Scene and Object
Description Carousel. The AU Sequence Number field of SL Packet Description Carousel. The AU Sequence Number field of SL Packet
Header is used to support this behavior at the Sync Layer. When two Header is used to support this behavior at the Sync Layer. When two
access units are sent consecutively with the same AU Sequence access units are sent consecutively with the same AU Sequence
Number, the second one is assumed to be a semantic repetition of the Number, the second one is assumed to be a semantic repetition of the
first. If a receiver starts to listen in the middle of a session or first. If a receiver starts to listen in the middle of a session or
has detected losses, it can skip all received Access Units until has detected losses, it can ignore all received AUs until such a
RAP. The periodicity of transmission of these RAPs should be
Gentric et al. Expires March 2002 33
RTP Payload Format for MPEG-4 Streams September 2001
such a RAP. The periodicity of transmission of these RAPs should be
chosen/adjusted depending on the application and the network it is chosen/adjusted depending on the application and the network it is
deployed on; i.e. exactly like Intra-coded frames for video, it is deployed on; i.e. exactly like Intra-coded frames for video, it is
the responsibility of the sender to make sure the periodicity of the responsibility of the sender to make sure the periodicity of
RAPs is suitable. RAPs is suitable.
5.3 Multiplexing 6.3 Overlap with RFC 3016
This payload format has been designed to have a (large) overlap with
RFC 3016 [7]. The conditions for this overlap are:
Conditions for RFC 3016:
C1. MPEG-4 video elementary streams only
C2. There MUST be a single VOP or Video Packet per RTP packet (which
is only recommended in RFC 3016)
C3. The decoder configuration MUST be signaled out-of-band either
using the Config mime parameter or using the OD framework
Conditions for this payload format:
C4. No MIME parameters defined (or all set to zero), i.e. "Single"
packing style with empty Payload Header and empty RSLH.
C5. Receivers MUST be ready to accept (and ignore) video
configuration headers (e.g. VOSH, VO and VOL) and visual-object-
sequence-end-code transported in-band.
Gentric et al. Expires March 2002 39
RTP Payload Format for MPEG-4 Streams February 2002
Under conditions C2 and C4 the MPEG-4 video RTP packet structures
are identical. Since C4 and C5 MUST be supported by implementations
of this specification the conditions for RTP streams backward
compatibility of this specification with RFC3016 are established
when RFC3016 is used with condition C1, C2 and C3. Technically the
most stringent condition is C2 but it is also a condition that makes
a lot of sense for many reasons, whatever the application.
Furthermore the MIME parameters have been aligned, specifically the
parameters "config" and "profile-level-id" have the same name and
signification in RFC3016 and in this memo.
The remaining difference is therefore the MIME subtype name. It
would be desirable then that specifications built upon this memo and
enforcing the above minor usage restrictions of RFC3016 in order to
provide a backward compatible solution would then specify that
receivers can interpret the MIME subtype name "MP4V-ES" as being
equivalent to MIME type "video" with subtype name "mpeg4-generic"
and vice versa.
In short this payload format is backward compatible with RFC3016 for
video used in the recommended fashion.
6.4 Multiplexing
An advanced MPEG-4 session may involve a large number of objects An advanced MPEG-4 session may involve a large number of objects
that may be as many as a few hundred, transporting each ES as an that may be as many as a few hundred, transporting each ES as an
individual RTP stream may not always be practical. Allocating and individual RTP stream may not always be practical. Allocating and
controlling hundreds of destination addresses for each MPEG-4 controlling hundreds of destination addresses for each MPEG-4
session may pose insurmountable session administration problems. session may pose insurmountable session administration problems.
The input/output processing overhead at the end-points will be The input/output processing overhead at the end-points will be
extremely high also. Additionally, low delay transmission of low extremely high also. Additionally, low delay transmission of low
bitrate data streams, e.g. facial animation parameters, results in bitrate data streams, e.g. facial animation parameters, results in
extremely high header overheads. extremely high header overheads.
skipping to change at line 1880 skipping to change at line 2193
be a candidate for this approach. be a candidate for this approach.
For MPEG-4 applications, the multiplexing technique needs to address For MPEG-4 applications, the multiplexing technique needs to address
the following requirements: the following requirements:
i. The ESs multiplexed in one stream can change frequently during a i. The ESs multiplexed in one stream can change frequently during a
session. Consequently, the coding type, individual packet size and session. Consequently, the coding type, individual packet size and
temporal relationships between the multiplexed data units must be temporal relationships between the multiplexed data units must be
handled dynamically. handled dynamically.
Gentric et al. Expires March 2002 40
RTP Payload Format for MPEG-4 Streams February 2002
ii. The multiplexing scheme should have a mechanism to determine the ii. The multiplexing scheme should have a mechanism to determine the
ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is
not a part of the SL header. not a part of the SL header.
iii. In general, an SL packet does not contain information about its iii. In general, an SL packet does not contain information about its
size. The multiplexing scheme should be able to delineate the size. The multiplexing scheme should be able to delineate the
multiplexed packets whose lengths may vary from a few octets to multiplexed packets whose lengths may vary from a few octets to
close to the path-MTU. close to the path-MTU.
5.5 Overlap with RFC 3016 7. Security Considerations
This payload format has been designed to have a (large) overlap with
RFC 3016 [7]. The conditions for this overlap are:
Conditions for RFC 3016:
i. MPEG-4 video elementary streams only
Gentric et al. Expires March 2002 34
RTP Payload Format for MPEG-4 Streams September 2001
ii. There MUST be a single VOP or Video Packet per RTP packet (only
recommended in RFC 3016)
iii. The decoder configuration MUST be signaled out-of-band either
using the Config mime parameter or using the OD framework
Conditions for this payload format:
i. No structural parameters defined (or all set to zero), i.e.
"Single" mode with empty Payload Header and empty RSLH.
ii. Receivers MUST be ready to accept (and ignore) video
configuration headers (e.g. VOSH, VO and VOL) and visual-object-
sequence-end-code transported in-band.
6. Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [5]. This implies that confidentiality of the media specification [5]. This implies that confidentiality of the media
streams is achieved by encryption. Because the data compression used streams is achieved by encryption. Because the data compression used
with this payload format is applied end-to-end, encryption may be with this payload format is applied end-to-end, encryption may be
performed on the compressed data so there is no conflict between the performed on the compressed data so there is no conflict between the
two operations. The packet processing complexity of this payload two operations. The packet processing complexity of this payload
type (i.e. excluding media data processing) does not exhibit any type (i.e. excluding media data processing) does not exhibit any
significant non-uniformity in the receiver side to cause a denial- significant non-uniformity in the receiver side to cause a denial-
of-service threat. of-service threat.
However, it is possible to inject non-compliant MPEG streams (Audio, However, it is possible to inject non-compliant MPEG streams (Audio,
Video, and Systems) to overload the receiver/decoder's buffers which Video, and Systems) to overload the receiver/decoder's buffers,
might compromise the functionality of the receiver or even crash it. which might compromise the functionality of the receiver or even
This is especially true for end-to-end systems like MPEG where the crash it. This is especially true for end-to-end systems like MPEG
buffer models are precisely defined. where the buffer models are precisely defined.
MPEG-4 Systems supports stream types including commands that are MPEG-4 Systems supports stream types including commands that are
executed on the terminal like OD commands, BIFS commands, etc. and executed on the terminal like OD commands, BIFS commands, etc. and
programmatic content like MPEG-J (Java(TM) Byte Code) and programmatic content like MPEG-J (Java(TM) Byte Code) and
ECMAScript. It is possible to use one or more of the above in a ECMAScript. It is possible to use one or more of the above in a
manner non-compliant to MPEG to crash or temporarily make the manner non-compliant to MPEG to crash or temporarily make the
receiver unavailable. receiver unavailable.
Authentication mechanisms can be used to validate of the sender and Authentication mechanisms can be used to validate of the sender and
the data to prevent security problems due to non-compliant malignant the data to prevent security problems due to non-compliant malignant
skipping to change at line 1952 skipping to change at line 2247
defines a set of Java APIs and a secure execution model. MPEG-J defines a set of Java APIs and a secure execution model. MPEG-J
content can call this set of APIs and Java(TM) methods from a set of content can call this set of APIs and Java(TM) methods from a set of
Java packages supported in the receiver within the defined security Java packages supported in the receiver within the defined security
model. According to this security model, downloaded byte code is model. According to this security model, downloaded byte code is
forbidden to load libraries, define native methods, start programs, forbidden to load libraries, define native methods, start programs,
read or write files, or read system properties. read or write files, or read system properties.
Receivers can implement intelligent filters to validate the buffer Receivers can implement intelligent filters to validate the buffer
requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J,
Gentric et al. Expires March 2002 35 Gentric et al. Expires March 2002 41
RTP Payload Format for MPEG-4 Streams September 2001 RTP Payload Format for MPEG-4 Streams February 2002
ECMAScript) commands in the streams. However, this can increase the ECMAScript) commands in the streams. However, this can increase the
complexity significantly. complexity significantly.
7. Acknowledgements 8. Acknowledgements
This document evolved across several years thanks to contributions This document evolved across several years through many revisions
from a large number of people since it is based on work within the thanks to contributions from a large number of people since it is
IETF AVT working group and various ISO MPEG working groups, based on work within the IETF AVT working group and various ISO MPEG
especially the 4-on-IP ad-hoc group. The authors wish to thank working groups, especially the 4-on-IP ad-hoc group. The authors
Olivier Avaro, Stephen Casner, Guido Fransceschini, Art Howarth, wish to thank Olivier Avaro, Stephen Casner, Guido Fransceschini,
Dave Mackie, Dave Singer, and Stephan Wenger for their valuable Art Howarth, Dave Mackie, Dave Singer, and Stephan Wenger for their
comments and support. Attentive readers and early implementers also valuable comments and support. Attentive readers and early
found flaws and bugs, thank you all. implementers also found flaws and bugs, thank you all.
8. References 9. References
[1] ISO/IEC 14496-1:2001 MPEG-4 Systems [1] ISO/IEC 14496-1:2001 MPEG-4 Systems
[2] ISO/IEC 14496-2:2001 MPEG-4 Visual [2] ISO/IEC 14496-2:2001 MPEG-4 Visual
[3] ISO/IEC 14496-3:2001 MPEG-4 Audio [3] ISO/IEC 14496-3:2001 MPEG-4 Audio
[4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework. [4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework.
[5] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A [5] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A
skipping to change at line 2009 skipping to change at line 2304
2327, Internet Engineering Task Force, April 1998. 2327, Internet Engineering Task Force, April 1998.
[11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed
Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt, Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt,
February 2001. February 2001.
[12] H. Schulzrinne, RTP Profile for Audio and Video Conferences [12] H. Schulzrinne, RTP Profile for Audio and Video Conferences
with Minimal Control, RFC 1890, Internet Engineering Task Force, with Minimal Control, RFC 1890, Internet Engineering Task Force,
January 1996. January 1996.
Gentric et al. Expires March 2002 36 Gentric et al. Expires March 2002 42
RTP Payload Format for MPEG-4 Streams September 2001 RTP Payload Format for MPEG-4 Streams February 2002
[13] H. Schulzrinne, A. Rao, R. Lanphier, Real Time Streaming [13] H. Schulzrinne, A. Rao, R. Lanphier, Real Time Streaming
Protocol, RFC 2326, Internet Engineering Task Force, April 1998. Protocol, RFC 2326, Internet Engineering Task Force, April 1998.
[14] M. Handley, C. Perkins, E. Whelan, Session Announcement [14] M. Handley, C. Perkins, E. Whelan, Session Announcement
Protocol, RFC 2974, Internet Engineering Task Force, October 2000. Protocol, RFC 2974, Internet Engineering Task Force, October 2000.
9. Authors' Addresses 10. Authors' Addresses
Andrea Basso Andrea Basso
AT&T Labs Research AT&T Labs Research
200 Laurel Avenue 200 Laurel Avenue
Middletown, NJ 07748 Middletown, NJ 07748
USA USA
e-mail: basso@research.att.com e-mail: basso@research.att.com
M. Reha Civanlar M. Reha Civanlar
AT&T Labs - Research AT&T Labs - Research
skipping to change at line 2055 skipping to change at line 2350
Germany Germany
e-mail: herpelc@thmulti.com e-mail: herpelc@thmulti.com
Zvi Lifshitz Zvi Lifshitz
Optibase Ltd. Optibase Ltd.
7 Shenkar St. 7 Shenkar St.
Herzliya 46120 Herzliya 46120
Israel Israel
e-mail: zvil@optibase.com e-mail: zvil@optibase.com
Young-kwon Lim Young-Kwon Lim
mp4cast (MPEG-4 Internet Broadcasting Solution Consortium) net&tv Co., Ltd.
1001-1 Daechi-Dong Gangnam-Gu 5th Floor Himart Building
Seoul, 305-333, 1007-46 Sadang-Dong Dongjak-Gu,
Seoul, 156-090,
Korea Korea
e-mail : young@techway.co.kr e-mail : young@netntv.co.kr
Colin Perkins
Gentric et al. Expires March 2002 37 Gentric et al. Expires March 2002 43
RTP Payload Format for MPEG-4 Streams September 2001 RTP Payload Format for MPEG-4 Streams February 2002
Colin Perkins
USC Information Sciences Institute USC Information Sciences Institute
3811 N. Fairfax Drive suite 200 3811 N. Fairfax Drive suite 200
Arlington, VA 22203 Arlington, VA 22203
USA USA
e-mail : csp@isi.edu e-mail : csp@isi.edu
Jan van der Meer Jan van der Meer
Philips Digital Networks Philips Digital Networks
Building WDB-1 Building WDB-1
Prof Holstlaan 4 Prof Holstlaan 4
5656 AA Eindhoven 5656 AA Eindhoven
Netherlands Netherlands
e-mail : jan.vandermeer@philips.com e-mail : jan.vandermeer@philips.com
APPENDIX: Examples of usage APPENDIX: Examples of usage
This section describes a number of examples of how this payload This section describes a number of examples of how this payload
format can be used either with or without the Sync Layer. In all format can be used either with or without the Sync Layer. In all
examples however the Sync Layer syntax is given which shows how it examples the Sync Layer syntax is given (which shows how it may
becomes invisible in cases 1,3,4 and 5. become invisible in cases 1,3,4 and 5).
A C++-like syntax called SDL (Syntactic Description Language) A C++-like syntax called SDL (Syntactic Description Language)
defined in [1, section 14] is used to economically describe MPEG-4 defined in [1, section 14] is used to economically describe MPEG-4
system data structures. system data structures.
These examples assume that the (a=fmtp) SDP syntax is used to convey These examples assume that the (a=fmtp) SDP syntax is used to convey
the MIME parameters of the payload format. the MIME parameters of the payload format.
Appendix.1 RFC 3016 compatible MPEG-4 Video (no SL) Appendix.1 RFC 3016 compatible MPEG-4 Video (no SL)
This is an example of a video stream where the SL is configured to This is an example of a video stream compatible with RFC 3016.
produce RTP packets compatible with RFC 3016.
SLConfigDescriptor SLConfigDescriptor
In this example the SLConfigDescriptor is: In this example the SLConfigDescriptor is:
class SLConfigDescriptor extends BaseDescriptor : bit(8) class SLConfigDescriptor extends BaseDescriptor : bit(8)
tag=SLConfigDescrTag { tag=SLConfigDescrTag {
bit(8) predefined; bit(8) predefined;
if (predefined==0) { if (predefined==0) {
bit(1) useAccessUnitStartFlag; = 0 bit(1) useAccessUnitStartFlag; = 0
bit(1) useAccessUnitEndFlag; = 1 bit(1) useAccessUnitEndFlag; = 1
bit(1) useRandomAccessPointFlag; = 0 bit(1) useRandomAccessPointFlag; = 0
bit(1) hasRandomAccessUnitsOnlyFlag; = 0 bit(1) hasRandomAccessUnitsOnlyFlag; = 0
bit(1) usePaddingFlag; = 0 bit(1) usePaddingFlag; = 0
bit(1) useTimeStampsFlag; = 0 bit(1) useTimeStampsFlag; = 0
Gentric et al. Expires March 2002 38
RTP Payload Format for MPEG-4 Streams September 2001
bit(1) useIdleFlag; = 0 bit(1) useIdleFlag; = 0
bit(1) durationFlag; = 0 bit(1) durationFlag; = 0
bit(32) timeStampResolution; = 0 bit(32) timeStampResolution; = 0
bit(32) OCRResolution; = 0 bit(32) OCRResolution; = 0
bit(8) timeStampLength; = 0 bit(8) timeStampLength; = 0
bit(8) OCRLength; = 0 bit(8) OCRLength; = 0
Gentric et al. Expires March 2002 44
RTP Payload Format for MPEG-4 Streams February 2002
bit(8) AU_Length; = 0 bit(8) AU_Length; = 0
bit(8) instantBitrateLength; = 0 bit(8) instantBitrateLength; = 0
bit(4) degradationPriorityLength; = 0 bit(4) degradationPriorityLength; = 0
bit(5) AU_seqNumLength; = 0 bit(5) AU_seqNumLength; = 0
bit(5) packetSeqNumLength; = 0 bit(5) packetSeqNumLength; = 0
bit(2) reserved=0b11; bit(2) reserved=0b11;
} }
if (durationFlag) { if (durationFlag) {
bit(32) timeScale; // NOT USED bit(32) timeScale; // NOT USED
bit(16) accessUnitDuration; // NOT USED bit(16) accessUnitDuration; // NOT USED
skipping to change at line 2171 skipping to change at line 2465
RTP packet structure RTP packet structure
Note that accessUnitEndFlag is mapped to the RTP header M bit. Note that accessUnitEndFlag is mapped to the RTP header M bit.
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Access Unit or AU fragment | 1400 octets | | Access Unit or AU fragment | 1400 octets |
Gentric et al. Expires March 2002 39
RTP Payload Format for MPEG-4 Streams September 2001
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Overhead Overhead
In this example we have an RTP overhead of 40 octets for 1400 octets In this example we have an RTP overhead of 40 octets for 1400 octets
of payload i.e. 3 % overhead. of payload i.e. 3 % overhead.
Gentric et al. Expires March 2002 45
RTP Payload Format for MPEG-4 Streams February 2002
Appendix.2 MPEG-4 Video with SL Appendix.2 MPEG-4 Video with SL
Let us consider the case of a 30 frames per second MPEG-4 video Let us consider the case of a 30 frames per second MPEG-4 video
stream which bit rate is high enough that Access Units have to be stream which bit rate is high enough that Access Units have to be
split in several SL packets (typically above 300 kb/s). split in several SL packets (typically above 300 kb/s).
Let us assume also that the video codec generates in that case Video Let us assume also that the video codec generates in that case Video
Packets suitable to fit in one SL packet i.e that the video codec is Packets suitable to fit in one SL packet i.e that the video codec is
MTU aware and the MTU is 1500 octets. We assume furthermore that MTU aware and the MTU is 1500 octets. We assume furthermore that
this stream contains B frames and that decodingTimeStamps are this stream contains B frames and that decodingTimeStamps are
skipping to change at line 2228 skipping to change at line 2521
bit(5) packetSeqNumLength; = 0 bit(5) packetSeqNumLength; = 0
bit(2) reserved=0b11; bit(2) reserved=0b11;
} }
if (durationFlag) { if (durationFlag) {
bit(32) timeScale; // NOT USED bit(32) timeScale; // NOT USED
bit(16) accessUnitDuration; // NOT USED bit(16) accessUnitDuration; // NOT USED
bit(16) compositionUnitDuration; // NOT USED bit(16) compositionUnitDuration; // NOT USED
} }
if (!useTimeStampsFlag) { if (!useTimeStampsFlag) {
bit(timeStampLength) startDecodingTimeStamp; // NOT USED bit(timeStampLength) startDecodingTimeStamp; // NOT USED
Gentric et al. Expires March 2002 40
RTP Payload Format for MPEG-4 Streams September 2001
bit(timeStampLength) startCompositionTimeStamp; // NOT USED bit(timeStampLength) startCompositionTimeStamp; // NOT USED
} }
} }
The useRandomAccessPointFlag is set so that the The useRandomAccessPointFlag is set so that the
randomAccessPointFlag can indicate that the corresponding SL packet randomAccessPointFlag can indicate that the corresponding SL packet
contains a GOV and the first Video Packet of an Intra coded frame. contains a GOV and the first Video Packet of an Intra coded frame.
SL Packet Header structure Gentric et al. Expires March 2002 46
RTP Payload Format for MPEG-4 Streams February 2002
SL Packet Header structure
With this configuration we have the following SL packet header With this configuration we have the following SL packet header
structure: structure:
aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
bit(1) accessUnitStartFlag; // 1 bit bit(1) accessUnitStartFlag; // 1 bit
if (accessUnitStartFlag) { if (accessUnitStartFlag) {
bit(1) randomAccessPointFlag; // 1 bit bit(1) randomAccessPointFlag; // 1 bit
bit(1) decodingTimeStampFlag; // 1 bit bit(1) decodingTimeStampFlag; // 1 bit
bit(1) compositionTimeStampFlag; // 1 bit bit(1) compositionTimeStampFlag; // 1 bit
if (decodingTimeStampFlag) { if (decodingTimeStampFlag) {
skipping to change at line 2285 skipping to change at line 2573
Access Units we have: Access Units we have:
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DTSFlag = (1) | 1 bit | | DTSFlag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DTSDelta | 7 bits | | DTSDelta | 7 bits |
Gentric et al. Expires March 2002 41
RTP Payload Format for MPEG-4 Streams September 2001
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to octet alignment | 0 bits | | bits to octet alignment | 0 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| RSLHSectionSize = (100) | 3 bits | | RSLHSectionSize = (100) | 3 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitStartFlag = (1) | 1 bit | | accessUnitStartFlag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| randomAccessPointFlag | 1 bit | | randomAccessPointFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| decodingTimeStampFlag | 1 bit | | decodingTimeStampFlag | 1 bit |
Gentric et al. Expires March 2002 47
RTP Payload Format for MPEG-4 Streams February 2002
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| compositionTimeStampFlag | 1 bit | | compositionTimeStampFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to octet alignment =(0) | 1 bit | | bits to octet alignment =(0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| SL packet payload | N octets | | SL packet payload | N octets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
For packets that transport non-first fragments of Access Units we For packets that transport non-first fragments of Access Units we
have: have:
skipping to change at line 2341 skipping to change at line 2629
octets of payload i.e. 3 % overhead. octets of payload i.e. 3 % overhead.
Appendix.3 Low delay MPEG-4 Audio (no SL) Appendix.3 Low delay MPEG-4 Audio (no SL)
This example is for a low delay audio service. For this reason a This example is for a low delay audio service. For this reason a
single Access Unit is transported in each RTP packet (in terms of single Access Unit is transported in each RTP packet (in terms of
Sync Layer each SL packet contains a complete Access Unit). Sync Layer each SL packet contains a complete Access Unit).
SLConfigDescriptor SLConfigDescriptor
Gentric et al. Expires March 2002 42 Since CTS=DTS and Access Unit duration is constant, signaling of
RTP Payload Format for MPEG-4 Streams September 2001
Since CTS=DTS and Access Unit duration is constant signaling of
MPEG-4 time stamps is not needed (the durationFlag of SLConfig is MPEG-4 time stamps is not needed (the durationFlag of SLConfig is
set) set).
We also assume here an audio Object Type for which all Access Units We also assume here an audio Object Type for which all Access Units
are Random Access Points, which is signaled using the are Random Access Points, which is signaled using the
hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor.
We assume furthermore a mode where the Access Unit size is constant We assume furthermore a mode where the Access Unit size is constant
and equal to 5 octets (which is signaled with AU_Length). and equal to 5 octets (which is signaled with AU_Length).
Gentric et al. Expires March 2002 48
RTP Payload Format for MPEG-4 Streams February 2002
In this example the SLConfigDescriptor is: In this example the SLConfigDescriptor is:
class SLConfigDescriptor extends BaseDescriptor : bit(8) class SLConfigDescriptor extends BaseDescriptor : bit(8)
tag=SLConfigDescrTag { tag=SLConfigDescrTag {
bit(8) predefined; bit(8) predefined;
if (predefined==0) { if (predefined==0) {
bit(1) useAccessUnitStartFlag; = 0 bit(1) useAccessUnitStartFlag; = 0
bit(1) useAccessUnitEndFlag; = 0 bit(1) useAccessUnitEndFlag; = 0
bit(1) useRandomAccessPointFlag; = 0 bit(1) useRandomAccessPointFlag; = 0
bit(1) hasRandomAccessUnitsOnlyFlag; = 1 bit(1) hasRandomAccessUnitsOnlyFlag; = 1
skipping to change at line 2397 skipping to change at line 2685
bit(timeStampLength) startCompositionTimeStamp; = 0 bit(timeStampLength) startCompositionTimeStamp; = 0
} }
} }
SL packet header SL packet header
With this configuration the SL packet header is empty. The Sync With this configuration the SL packet header is empty. The Sync
Layer is reduced to a purely logical construction that neither Layer is reduced to a purely logical construction that neither
sender nor receiver need to implement. sender nor receiver need to implement.
Gentric et al. Expires March 2002 43
RTP Payload Format for MPEG-4 Streams September 2001
Parameters Parameters
No parameters are required. No parameters are required.
RTP packet structure RTP packet structure
Note that the RTP header M bit should be always set to 1. Note that the RTP header M bit must be set to 1.
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
Gentric et al. Expires March 2002 49
RTP Payload Format for MPEG-4 Streams February 2002
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Access Unit | 5 octets | | Access Unit | 5 octets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Overhead estimation Overhead estimation
The overhead is extremely large i.e. more than 800 %, since 40 The overhead is extremely large i.e. more than 800 %, since 40
octets of headers are required to transport 5 octets of data. Note octets of headers are required to transport 5 octets of data. Note
skipping to change at line 2446 skipping to change at line 2735
Layer is reduced to a purely logical construction that neither Layer is reduced to a purely logical construction that neither
sender nor receiver need to implement. sender nor receiver need to implement.
Parameters Parameters
The absence of RSLHSectionSizeLength indicates that the RSLHSection The absence of RSLHSectionSizeLength indicates that the RSLHSection
is empty. is empty.
The size of SL Packets (which are all complete Access Units in this The size of SL Packets (which are all complete Access Units in this
case) is constant and is indicated with: case) is constant and is indicated with:
a=fmtp:<format> ConstantSize=5 a=fmtp:<format> ConstantSize=5
Gentric et al. Expires March 2002 44 This also indicates to the receiver that the "Multiple" packing
RTP Payload Format for MPEG-4 Streams September 2001 style will be used, the 2 octets field that would give the size of
the Payload Header Section is ommited since in this case this field
This also indicates to the receiver that the Multiple mode will be always contains zero (the Payload Header Section is always empty due
used, the 2 octets field that would give the size of the to the absence of any other MIME parameter).
PayloadHeaderSection is ommited since in this case this field always
contains zero (the PayloadHeaderSection is always empty due to the
absence of any other MIME parameter).
RTP packet structure RTP packet structure
Note that the RTP header M bit is always set to 1, which indicates Note that the RTP header M bit is always set to 1, which indicates
to the receiver that only complete Access Units are transported. to the receiver that only complete Access Units are transported.
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
Gentric et al. Expires March 2002 50
RTP Payload Format for MPEG-4 Streams February 2002
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Access Unit data | 5 octets | | Access Unit data | 5 octets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Access Unit data | 5 octets | | Access Unit data | 5 octets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| etc, until MTU is reached | | etc, until MTU is reached |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Access Unit data | 5 octets | | Access Unit data | 5 octets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Overhead estimation Overhead estimation
The overhead is 3% i.e. minimal. The overhead is 3% i.e. minimal.
Appendix.5 AAC with interleaving (no SL) Appendix.5 AAC with interleaving (no SL)
Let us consider AAC at 128 kb/s where each Access Unit is in the Let us consider AAC at 128 kb/s where each Access Unit is in the
average 320 octets. Interleaving is applied with a continuous average 320 octets. Interleaving is applied using a continuous
interleaving scheme (see table below) where 4 Access Units are used interleaving scheme (see table below) where 4 Access Units are used
to construct each RTP packet in order to match a MTU of 1500 octets. to construct each RTP packet in order to match a MTU of 1500 octets.
IndexDelta is constant and equal to 2 (since +1 is automatically IndexDelta is constant and equal to 2 (since +1 is automatically
added); it is encoded on 2 bits. added); it is encoded on 2 bits.
As explained in section 3.8 this is a time stamp based interleaving As explained in section 3.8 this is a time stamp based interleaving
(TSBI) scheme (IndexLength=0); indeed receivers know that each (TSBI) scheme (IndexLength=0); indeed receivers know that each
payload is a complete Access Unit because all RTP packets have the M payload is a complete Access Unit because all RTP packets have the M
bit set to 1 and therefore, since Access Unit duration is constant, bit set to 1 and therefore, since Access Unit duration is constant,
Access Unit timestamps can be computed from RTP timestamps and Access Unit timestamps can be computed from RTP timestamps and
IndexDelta values; this can be used for de-interleaving even in case IndexDelta values; this can be used for de-interleaving even in case
of losses. of losses.
Note that it is also be possible to use IndexLength=2 so as to
Note that it would also be possible to use IndexLength=2 so as to
maintain a octet alignement in the Payload Header portions; in this maintain a octet alignement in the Payload Header portions; in this
case however the value of these two bits MUST be zero as stated in case however the value of these two bits MUST be zero as stated in
3.8.1. 3.8.1. This solution is used in the companion RFC YYYY.
Gentric et al. Expires March 2002 45
RTP Payload Format for MPEG-4 Streams September 2001
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| RTP packet | RTP Timestamp | Aus | IndexDelta | | RTP packet | RTP Timestamp | Aus | IndexDelta |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 1 | CTS(AU1) | 1 | - | | 1 | CTS(AU1) | 1 | - |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 2 | CTS(AU2) | 2, 5 | -,2 | | 2 | CTS(AU2) | 2, 5 | -,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 3 | CTS(AU3) | 3, 6, 9 | -,2,2 | | 3 | CTS(AU3) | 3, 6, 9 | -,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 4 | CTS(AU4) | 4, 7,10,13 | -,2,2,2 | | 4 | CTS(AU4) | 4, 7,10,13 | -,2,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 5 | CTS(AU8) | 8,11,14,17 | -,2,2,2 | | 5 | CTS(AU8) | 8,11,14,17 | -,2,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 6 | CTS(AU12) | 12,15,18,21 | -,2,2,2 | | 6 | CTS(AU12) | 12,15,18,21 | -,2,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 7 | CTS(AU16) | 16,19,22,25 | -,2,2,2 | | 7 | CTS(AU16) | 16,19,22,25 | -,2,2,2 |
Gentric et al. Expires March 2002 51
RTP Payload Format for MPEG-4 Streams February 2002
+----------------------------------------------------------------+ +----------------------------------------------------------------+
| 8 | CTS(AU20) | 20,23,26,29 | -,2,2,2 | | 8 | CTS(AU20) | 20,23,26,29 | -,2,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 9 | CTS(AU24) | 24,27,30,33 | -,2,2,2 | | 9 | CTS(AU24) | 24,27,30,33 | -,2,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| 10 | CTS(AU28) | 28,31,34,37 | -,2,2,2 | | 10 | CTS(AU28) | 28,31,34,37 | -,2,2,2 |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
| etc | | etc |
+-----------------------------------------------------------------+ +-----------------------------------------------------------------+
skipping to change at line 2556 skipping to change at line 2844
RTP packet structure RTP packet structure
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Payload Header Section Payload Header Section
+=========================================+=============+ +=========================================+=============+
| PayloadHeaderSection size = 42 bits | 2 octets | | PayloadHeaderSection size = (42) | 2 octets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Gentric et al. Expires March 2002 46
RTP Payload Format for MPEG-4 Streams September 2001
| PayloadSize | 9 bits | | PayloadSize | 9 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 9 bits | | PayloadSize | 9 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| IndexDelta | 2 bits | | IndexDelta | 2 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 9 bits | | PayloadSize | 9 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| IndexDelta | 2 bits | | IndexDelta | 2 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 9 bits | | PayloadSize | 9 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| IndexDelta | 2 bits | | IndexDelta | 2 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to octet alignment = (000000) | 6 bits | | bits to octet alignment = (000000) | 6 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Payload Section Payload Section
+=========================================+=============+ +=========================================+=============+
Gentric et al. Expires March 2002 52
RTP Payload Format for MPEG-4 Streams February 2002
| AAC Access Unit | x octets | | AAC Access Unit | x octets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| AAC Access Unit | x octets | | AAC Access Unit | x octets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| AAC Access Unit | x octets | | AAC Access Unit | x octets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| AAC Access Unit | x octets | | AAC Access Unit | x octets |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Overhead estimation Overhead estimation
skipping to change at line 2611 skipping to change at line 2899
Packets (extracted from 15 consecutive Access Units) are used to Packets (extracted from 15 consecutive Access Units) are used to
construct each RTP packet in order to match a MTU of 1500 octets. construct each RTP packet in order to match a MTU of 1500 octets.
Note that since ESC fragments are not octet aligned we also use the Note that since ESC fragments are not octet aligned we also use the
paddingFlag and paddingBits features of the Sync Layer. The paddingFlag and paddingBits features of the Sync Layer. The
interleaving sequence is 4 RTP packets and 350 ms long, which is too interleaving sequence is 4 RTP packets and 350 ms long, which is too
long for conferencing but perfectly OK for Internet radio. long for conferencing but perfectly OK for Internet radio.
Since the sequence contains 60 SL packets, IndexLength is set to 16 Since the sequence contains 60 SL packets, IndexLength is set to 16
bits so as to provide a safe margin in case of long loss bursts. bits so as to provide a safe margin in case of long loss bursts.
This will also indicate to the receiver that this is a Index-Based- This will also indicate to the receiver that this is a Index-Based-
Interleaving scheme (and indeed CTS cannot be computed for SL
Gentric et al. Expires March 2002 47 packets that are not AU starts so TSBI would not work).
RTP Payload Format for MPEG-4 Streams September 2001
Interleaving scheme (indeed CTS cannot be computed for SL packets
that are not AU starts).
2 bits are enough for IndexDelta, which is constant and equal to 3 2 bits are enough for IndexDelta, which is constant and equal to 3
(since +1 is automatically added). (since +1 is automatically added).
Note that the 4th RTP packet in each sequence has its M bit set to 1 Note that the 4th RTP packet in each sequence has its M bit set to 1
since it contains 15 SL packets transporting the end of 15 since it contains 15 SL packets transporting the end of 15
consecutive Access Units. consecutive Access Units.
With this scheme a sender (for example upon reception of RTCP With this scheme a sender (for example upon reception of RTCP
reports indicating high loss rates) can (for example) choose to reports indicating high loss rates) can (for example) choose to
duplicate for each interleaving sequence the first RTP packet that duplicate for each interleaving sequence the first RTP packet that
contains the most useful data in terms of ESC or apply other error contains the most useful data in terms of ESC or apply other error
protection techniques, with due care to congestion issues. protection techniques, with due care to congestion issues.
In this example we will also show several other SL features (OCR, AU In this example we will also show several other SL features (OCR, AU
boundary flags, padding, as detailed below). boundary flags, padding, as detailed below).
Gentric et al. Expires March 2002 53
RTP Payload Format for MPEG-4 Streams February 2002
One feature demonstrated by this example is the degradation One feature demonstrated by this example is the degradation
priority. We assume degradation priority can take 4 different priority. We assume degradation priority can take 4 different
values, mapped to Error Sensitivity Categories, and is encoded on 2 values, mapped to Error Sensitivity Categories, and is encoded on 2
bits. This interleaving scheme makes sure that only SL packets of bits. This interleaving scheme makes sure that only SL packets of
identical degradation priorities are grouped in the same RTP packet identical degradation priorities are grouped in the same RTP packet
(3.6.3) and that only the first RSLH of each RTP packet transports (3.6.3) and that only the first RSLH of each RTP packet transports
the degradation priority. the degradation priority. We also assume that for each last SL
packet of each RTP packet the server inserts an OCR.
We also assume that for each last SL packet of each RTP packet the
server inserts an OCR.
SLConfigDescriptor SLConfigDescriptor
In this example the SLConfigDescriptor is: In this example the SLConfigDescriptor is:
class SLConfigDescriptor extends BaseDescriptor : bit(8) class SLConfigDescriptor extends BaseDescriptor : bit(8)
tag=SLConfigDescrTag { tag=SLConfigDescrTag {
bit(8) predefined; bit(8) predefined;
if (predefined==0) { if (predefined==0) {
bit(1) useAccessUnitStartFlag; = 1 bit(1) useAccessUnitStartFlag; = 1
bit(1) useAccessUnitEndFlag; = 1 bit(1) useAccessUnitEndFlag; = 1
bit(1) useRandomAccessPointFlag; = 0 bit(1) useRandomAccessPointFlag; = 0
bit(1) hasRandomAccessUnitsOnlyFlag; = 1 bit(1) hasRandomAccessUnitsOnlyFlag; = 1
bit(1) usePaddingFlag; = 1 // we need to signal padding bits bit(1) usePaddingFlag; = 1 // we need to signal padding bits
bit(1) useTimeStampsFlag; = 0 bit(1) useTimeStampsFlag; = 0
skipping to change at line 2667 skipping to change at line 2950
bit(1) usePaddingFlag; = 1 // we need to signal padding bits bit(1) usePaddingFlag; = 1 // we need to signal padding bits
bit(1) useTimeStampsFlag; = 0 bit(1) useTimeStampsFlag; = 0
bit(1) useIdleFlag; = 0 bit(1) useIdleFlag; = 0
bit(1) durationFlag; = 1 bit(1) durationFlag; = 1
bit(32) timeStampResolution; = 0 bit(32) timeStampResolution; = 0
bit(32) OCRResolution; = 30 bit(32) OCRResolution; = 30
bit(8) timeStampLength; = 0 bit(8) timeStampLength; = 0
bit(8) OCRLength; = 32 bit(8) OCRLength; = 32
bit(8) AU_Length; = 0 bit(8) AU_Length; = 0
bit(8) instantBitrateLength; = 0 bit(8) instantBitrateLength; = 0
Gentric et al. Expires March 2002 48
RTP Payload Format for MPEG-4 Streams September 2001
bit(4) degradationPriorityLength; = 2 bit(4) degradationPriorityLength; = 2
bit(5) AU_seqNumLength; = 0 bit(5) AU_seqNumLength; = 0
bit(5) packetSeqNumLength; = 6 bit(5) packetSeqNumLength; = 6
bit(2) reserved=0b11; bit(2) reserved=0b11;
} }
if (durationFlag) { if (durationFlag) {
bit(32) timeScale; = 1000// milliseconds bit(32) timeScale; = 1000// milliseconds
bit(16) accessUnitDuration; = 23.22 // ms bit(16) accessUnitDuration; = 23.22 // ms
bit(16) compositionUnitDuration; = 23.22 // ms bit(16) compositionUnitDuration; = 23.22 // ms
} }
skipping to change at line 2688 skipping to change at line 2967
bit(16) accessUnitDuration; = 23.22 // ms bit(16) accessUnitDuration; = 23.22 // ms
bit(16) compositionUnitDuration; = 23.22 // ms bit(16) compositionUnitDuration; = 23.22 // ms
} }
if (!useTimeStampsFlag) { if (!useTimeStampsFlag) {
bit(timeStampLength) startDecodingTimeStamp; = 0 bit(timeStampLength) startDecodingTimeStamp; = 0
bit(timeStampLength) startCompositionTimeStamp; = 0 bit(timeStampLength) startCompositionTimeStamp; = 0
} }
} }
SL Packet Header structure SL Packet Header structure
With this configuration we have the following SL packet header With this configuration we have the following SL packet header
structure: structure:
aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
bit(1) accessUnitStartFlag; bit(1) accessUnitStartFlag;
bit(1) accessUnitEndFlag; bit(1) accessUnitEndFlag;
bit(1) OCRflag; bit(1) OCRflag;
bit(1) paddingFlag; bit(1) paddingFlag;
Gentric et al. Expires March 2002 54
RTP Payload Format for MPEG-4 Streams February 2002
if (paddingFlag) bit(3) paddingBits; if (paddingFlag) bit(3) paddingBits;
bit(SL.packetSeqNumLength) packetSequenceNumber; bit(SL.packetSeqNumLength) packetSequenceNumber;
bit(1) DegPrioflag; bit(1) DegPrioflag;
if (DegPrioflag) { if (DegPrioflag) {
bit(SL.degradationPriorityLength) degradationPriority;} bit(SL.degradationPriorityLength) degradationPriority;}
if (OCRflag) { if (OCRflag) {
bit(SL.OCRLength) objectClockReference;} bit(SL.OCRLength) objectClockReference;}
} }
} }
skipping to change at line 2708 skipping to change at line 2989
bit(SL.packetSeqNumLength) packetSequenceNumber; bit(SL.packetSeqNumLength) packetSequenceNumber;
bit(1) DegPrioflag; bit(1) DegPrioflag;
if (DegPrioflag) { if (DegPrioflag) {
bit(SL.degradationPriorityLength) degradationPriority;} bit(SL.degradationPriorityLength) degradationPriority;}
if (OCRflag) { if (OCRflag) {
bit(SL.OCRLength) objectClockReference;} bit(SL.OCRLength) objectClockReference;}
} }
} }
Parameters Parameters
The resulting concatenated fmtp line is: The resulting concatenated fmtp line is:
a=fmtp:<format> SizeLength=7; RSLHSectionSizeLength=8; a=fmtp:<format> SizeLength=7; RSLHSectionSizeLength=8;
IndexLength=16; IndexDeltaLength=2; OCRDeltaLength=16 IndexLength=16; IndexDeltaLength=2; OCRDeltaLength=16
RTP packet structure RTP packet structure
+=========================================+=============+ +=========================================+=============+
| Field | size | | Field | size |
+=========================================+=============+ +=========================================+=============+
| RTP header | - | | RTP header | - |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Payload Header Section Payload Header Section
+=========================================+=============+ +=========================================+=============+
| Payload Header Section size = 149 bits | 2 octets | | Payload Header Section size = 149 bits | 2 octets |
Gentric et al. Expires March 2002 49
RTP Payload Format for MPEG-4 Streams September 2001
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 7 bits | | PayloadSize | 7 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Index | 16 bits | | Index | 16 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| PayloadSize | 7 bits | | PayloadSize | 7 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| IndexDelta = (11) | 2 bits | | IndexDelta = (11) | 2 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| etc + 12 times 9 bits | | etc + 12 times 9 bits |
skipping to change at line 2757 skipping to change at line 3031
| RSLHSectionSize = (10000111) | 8 bits | | RSLHSectionSize = (10000111) | 8 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitStartFlag | 1 bit | | accessUnitStartFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitEndFlag | 1 bit | | accessUnitEndFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| OCRFlag = (0) | 1 bit | | OCRFlag = (0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| paddingFlag = (1) | 1 bit | | paddingFlag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Gentric et al. Expires March 2002 55
RTP Payload Format for MPEG-4 Streams February 2002
| paddingBits | 3 bits | | paddingBits | 3 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DegPrioflag = (1) | 1 bit | | DegPrioflag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| degradationPriority | 2 bits | | degradationPriority | 2 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitStartFlag | 1 bit | | accessUnitStartFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitEndFlag | 1 bit | | accessUnitEndFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
skipping to change at line 2781 skipping to change at line 3059
| paddingBits | 3 bits | | paddingBits | 3 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DegPrioflag = (0) | 1 bit | | DegPrioflag = (0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| etc + 12 times 8 bits | | etc + 12 times 8 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitStartFlag | 1 bit | | accessUnitStartFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| accessUnitEndFlag | 1 bit | | accessUnitEndFlag | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Gentric et al. Expires March 2002 50
RTP Payload Format for MPEG-4 Streams September 2001
| OCRFlag = (1) | 1 bit | | OCRFlag = (1) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| OCRDelta | 16 bits | | OCRDelta | 16 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| paddingFlag = (0) | 1 bit | | paddingFlag = (0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| DegPrioflag = (0) | 1 bit | | DegPrioflag = (0) | 1 bit |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| bits to octet alignment = (000) | 3 bits | | bits to octet alignment = (000) | 3 bits |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
skipping to change at line 2814 skipping to change at line 3088
Note that in the above table the last SL packet in the RTP packet Note that in the above table the last SL packet in the RTP packet
has a payload that is octet-aligned (at the end). When this happens has a payload that is octet-aligned (at the end). When this happens
paddingFlag is set to zero and the paddingBits field is omitted. paddingFlag is set to zero and the paddingBits field is omitted.
Overhead estimation Overhead estimation
The PayloadHeaderSection is 19 octets, the RSLHSection is 16 octets; The PayloadHeaderSection is 19 octets, the RSLHSection is 16 octets;
in this example we have therefore a RTP overhead of 40 + 35 octets in this example we have therefore a RTP overhead of 40 + 35 octets
for 1350 octets of payload i.e. around 6 % overhead. for 1350 octets of payload i.e. around 6 % overhead.
Gentric et al. Expires March 2002 51 Gentric et al. Expires March 2002 56
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/