draft-ietf-avt-mpeg4-simple-08.txt   rfc3640.txt 
Internet Engineering Task Force J. van der Meer
Internet Draft Philips Electronics Network Working Group J. van der Meer
D. Mackie Request for Comments: 3640 Philips Electronics
Category: Standards Track D. Mackie
Apple Computer Apple Computer
V. Swaminathan V. Swaminathan
Sun Microsystems Inc. Sun Microsystems Inc.
D. Singer D. Singer
Apple Computer Apple Computer
P. Gentric P. Gentric
Philips Electronics Philips Electronics
November 2003
August 2003
Expires February 2004
Document: draft-ietf-avt-mpeg4-simple-08.txt
RTP Payload Format for Transport of MPEG-4 Elementary Streams RTP Payload Format for Transport of MPEG-4 Elementary Streams
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document specifies an Internet standards track protocol for the
all provisions of section 10 of RFC 2026. Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Internet-Drafts are working documents of the Internet Engineering Official Protocol Standards" (STD 1) for the standardization state
Task Force (IETF), its areas, and its working groups. Note that and status of this protocol. Distribution of this memo is unlimited.
other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This specification is a product of the Audio/Video Transport working Copyright Notice
group within the Internet Engineering Task Force. Comments are
solicited and should be addressed to the working group's mailing
list at avt@ietf.org and/or the authors.
<< Note for the RFC editor: xxxx should be replaced with the RFC Copyright (C) The Internet Society (2003). All Rights Reserved.
number that will be assigned. >>
Abstract Abstract
The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in The Motion Picture Experts Group (MPEG) Committee (ISO/IEC JTC1/SC29
ISO that produced the MPEG-4 standard. MPEG defines tools to WG11) is a working group in ISO that produced the MPEG-4 standard.
compress content such as audio-visual information into elementary MPEG defines tools to compress content such as audio-visual
streams. This specification defines a simple, but generic RTP information into elementary streams. This specification defines a
payload format for transport of any non-multiplexed MPEG-4 simple, but generic RTP payload format for transport of any non-
elementary stream. multiplexed MPEG-4 elementary stream.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Carriage of MPEG-4 elementary streams over RTP . . . . . . . 6 2. Carriage of MPEG-4 Elementary Streams Over RTP . . . . . . . . 4
2.1. Signaling by MIME format parameters . . . . . . . . . . . 6 2.1. Signaling by MIME Format Parameters . . . . . . . . . . 4
2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . . 6 2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . 5
2.3. Concatenation of Access Units . . . . . . . . . . . . . . 6 2.3. Concatenation of Access Units . . . . . . . . . . . . . 5
2.4. Fragmentation of Access Units . . . . . . . . . . . . . . 7 2.4. Fragmentation of Access Units . . . . . . . . . . . . . 6
2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . . 7 2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . 6
2.6. Time stamp information . . . . . . . . . . . . . . . . . . 8 2.6. Time Stamp Information . . . . . . . . . . . . . . . . . 7
2.7. State indication of MPEG-4 system streams . . . . . . . . 8 2.7. State Indication of MPEG-4 System Streams . . . . . . . 8
2.8. Random Access Indication . . . . . . . . . . . . . . . . . 8 2.8. Random Access Indication . . . . . . . . . . . . . . . . 8
2.9. Carriage of auxiliary information . . . . . . . . . . . . 9 2.9. Carriage of Auxiliary Information . . . . . . . . . . . 8
2.10. MIME format parameters and configuring conditional field . 9 2.10. MIME Format Parameters and Configuring Conditional Field 8
2.11. Global structure of payload format . . . . . . . . . . . . 9 2.11. Global Structure of Payload Format . . . . . . . . . . . 9
2.12. Modes to transport MPEG-4 streams . . . . . . . . . . . . 10 2.12. Modes to Transport MPEG-4 Streams . . . . . . . . . . . 9
2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . 10 2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . 10
3. Payload format . . . . . . . . . . . . . . . . . . . . . . . 11 3. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1. Usage of RTP header fields and RTCP . . . . . . . . . . . 11 3.1. Usage of RTP Header Fields and RTCP . . . . . . . . . . 10
3.2. RTP payload structure . . . . . . . . . . . . . . . . . . 12 3.2. RTP Payload Structure . . . . . . . . . . . . . . . . . 11
3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . 12 3.2.1. The AU Header Section . . . . . . . . . . . . . 11
3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . 12 3.2.1.1. The AU-header . . . . . . . . . . . . 12
3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . 15 3.2.2. The Auxiliary Section . . . . . . . . . . . . . 14
3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . 15 3.2.3. The Access Unit Data Section . . . . . . . . . . 15
3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . 16 3.2.3.1. Fragmentation. . . . . . . . . . . . . 16
3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 16 3.2.3.2. Interleaving . . . . . . . . . . . . . 16
3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . 18 3.2.3.3. Constraints for Interleaving . . . . . 17
3.2.3.4. Crucial and non-crucial AUs with MPEG-4 System data . 20 3.2.3.4. Crucial and Non-Crucial AUs with
3.3. Usage of this specification . . . . . . . . . . . . . . . 22 MPEG-4 System Data . . . . . . . . . . 20
3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3. Usage of this Specification. . . . . . . . . . . . . . . 21
3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . 22 3.3.1. General. . . . . . . . . . . . . . . . . . . . . 21
3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . 23 3.3.2. The Generic Mode . . . . . . . . . . . . . . . . 22
3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . 23 3.3.3. Constant Bit Rate CELP . . . . . . . . . . . . . 22
3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . 24 3.3.4. Variable Bit Rate CELP . . . . . . . . . . . . . 23
3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . 25 3.3.5. Low Bit Rate AAC . . . . . . . . . . . . . . . . 24
3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . 26 3.3.6. High Bit Rate AAC. . . . . . . . . . . . . . . . 25
4. IANA considerations . . . . . . . . . . . . . . . . . . . . 27 3.3.7. Additional Modes . . . . . . . . . . . . . . . . 26
4.1. MIME type registration . . . . . . . . . . . . . . . . . . 27 4. IANA Considerations. . . . . . . . . . . . . . . . . . . . . . 27
4.2. Registration of mode definitions with IANA . . . . . . . . 32 4.1. MIME Type Registration . . . . . . . . . . . . . . . . . 27
4.3. Concatenation of parameters . . . . . . . . . . . . . . . 32 4.2. Registration of Mode Definitions with IANA . . . . . . . 33
4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . . 33 4.3. Concatenation of Parameters. . . . . . . . . . . . . . . 33
4.4.1. The a=fmtp keyword . . . . . . . . . . . . . . . . . . . 33 4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . 34
5. Security considerations . . . . . . . . . . . . . . . . . . 33 4.4.1. The a=fmtp Keyword . . . . . . . . . . . . . . . 34
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 5. Security Considerations. . . . . . . . . . . . . . . . . . . . 34
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 35
7.1 Normative references . . . . . . . . . . . . . . . . . . . . 34 APPENDIX: Usage of this Payload Format. . . . . . . . . . . . . . 36
7.2 Informative references . . . . . . . . . . . . . . . . . . . 35 Appendix A. Interleave Analysis . . . . . . . . . . . . . . . . . 36
8. Author addresses . . . . . . . . . . . . . . . . . . . . . . 35 A. Examples of Delay Analysis with Interleave. . . . . . . . . . 36
APPENDIX: Usage of this payload format . . . . . . . . . . . 37 A.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 36
A. Examples of delay analysis with interleave . . . . . . . 37 A.2. De-interleaving and Error Concealment . . . . . . . . . 36
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 37 A.3. Simple Group Interleave . . . . . . . . . . . . . . . . 36
A.2 De-interleaving and error concealment . . . . . . . . . 37 A.3.1. Introduction . . . . . . . . . . . . . . . . . . 36
A.3 Simple Group interleave . . . . . . . . . . . . . . . . 37 A.3.2. Determining the De-interleave Buffer Size . . . 37
A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 37 A.3.3. Determining the Maximum Displacement . . . . . . 37
A.3.2 Determining the de-interleave buffer size . . . . . . 38 A.4. More Subtle Group Interleave . . . . . . . . . . . . . . 38
A.3.3 Determining the maximum displacement . . . . . . . . . 38 A.4.1. Introduction . . . . . . . . . . . . . . . . . . 38
A.4 More subtle group interleave . . . . . . . . . . . . . . 38 A.4.2. Determining the De-interleave Buffer Size. . . . 38
A.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 38 A.4.3. Determining the Maximum Displacement . . . . . . 39
A.4.2 Determining the de-interleave buffer size . . . . . . 39 A.5. Continuous Interleave . . . . . . . . . . . . . . . . . 39
A.4.3 Determining the maximum displacement . . . . . . . . . 39 A.5.1. Introduction . . . . . . . . . . . . . . . . . . 39
A.5 Continuous interleave . . . . . . . . . . . . . . . . . 40 A.5.2. Determining the De-interleave Buffer Size . . . 40
A.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 40 A.5.3. Determining the Maximum Displacement . . . . . . 40
A.5.2 Determining the de-interleave buffer size . . . . . . 40 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
A.5.3 Determining the maximum displacement . . . . . . . . . 41 Normative References . . . . . . . . . . . . . . . . . . . . . . . 41
Informative References . . . . . . . . . . . . . . . . . . . . . . 41
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42
Full Copyright Statement . . . . . . . . . . . . . . . . . . . . . 43
1. Introduction 1. Introduction
The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29
that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4
standards [1]. The MPEG-4 standard specifies compression of standards [1]. The MPEG-4 standard specifies compression of audio-
audio-visual data into for example an audio or video elementary visual data into, for example an audio or video elementary stream.
stream. In the MPEG-4 standard, these streams take the form of In the MPEG-4 standard, these streams take the form of audio-visual
audio-visual objects that may be arranged into an audio-visual scene objects that may be arranged into an audio-visual scene by means of a
by means of a scene description. Each MPEG-4 elementary stream scene description. Each MPEG-4 elementary stream consists of a
consists of a sequence of Access Units; examples of an Access Unit sequence of Access Units; examples of an Access Unit (AU) are an
(AU) are an audio frame and a video picture. audio frame and a video picture.
This specification defines a general and configurable payload This specification defines a general and configurable payload
structure to transport MPEG-4 elementary streams, in particular structure to transport MPEG-4 elementary streams, in particular
MPEG-4 audio (including speech) streams, MPEG-4 video streams and MPEG-4 audio (including speech) streams, MPEG-4 video streams and
also MPEG-4 systems streams, such as BIFS (BInary Format for also MPEG-4 systems streams, such as BIFS (BInary Format for Scenes),
Scenes), OCI (Object Content Information), OD (Object Descriptor) OCI (Object Content Information), OD (Object Descriptor) and IPMP
and IPMP (Intellectual Property Management and Protection) streams. (Intellectual Property Management and Protection) streams. The RTP
The RTP payload defined in this document is simple to implement and payload defined in this document is simple to implement and
reasonably efficient. It allows for optional interleaving of Access reasonably efficient. It allows for optional interleaving of Access
Units (such as audio frames) to increase error resiliency in packet Units (such as audio frames) to increase error resiliency in packet
loss. loss.
Some types of MPEG-4 elementary streams include "crucial" Some types of MPEG-4 elementary streams include "crucial" information
information whose loss cannot be tolerated, but RTP does not provide whose loss cannot be tolerated. However, RTP does not provide
reliable transmission so receipt of that crucial information is not reliable transmission, so receipt of that crucial information is not
assured. Section 3.2.3.4 specifies how stream state is conveyed so assured. Section 3.2.3.4 specifies how stream state is conveyed so
that the receiver can detect the loss of crucial information and that the receiver can detect the loss of crucial information and
cease decoding until the next random access point is received. cease decoding until the next random access point has been received.
Applications transmitting streams that include crucial information, Applications transmitting streams that include crucial information,
such as OD commands, BIFS commands, or programmatic content such as such as OD commands, BIFS commands, or programmatic content such as
MPEG-J (Java) and ECMAScript, should include random access points MPEG-J (Java) and ECMAScript, should include random access points, at
sufficiently often, depending upon the probability of loss, to a suitable periodicity depending upon the probability of loss, in
reduce stream corruption to an acceptable level. An example is the order to reduce stream corruption to an acceptable level. An example
carousel mechanism as defined by MPEG in ISO/IEC 14496-1. is the carousel mechanism as defined by MPEG in ISO/IEC 14496-1 [1].
Such applications may also employ additional protocols or services Such applications may also employ additional protocols or services to
to reduce the probability of loss. At the RTP layer, these measures reduce the probability of loss. At the RTP layer, these measures
include payload formats and profiles for retransmission or forward include payload formats and profiles for retransmission or forward
error correction (such as in RFC 2733 [10]), which must be employed error correction (such as in RFC 2733 [10]), that must be employed
with due consideration to congestion control. Another solution that with due consideration to congestion control. Another solution that
may be appropriate for some applications is to carry RTP over TCP may be appropriate for some applications is to carry RTP over TCP
(such as in RFC 2326 [8], section 10.12). At the network layer, (such as in RFC 2326 [8], section 10.12). At the network layer,
resource allocation or preferential service may be available to resource allocation or preferential service may be available to
reduce the probability of loss. For a general description of methods reduce the probability of loss. For a general description of methods
to repair streaming media see RFC 2354 [9]. to repair streaming media, see RFC 2354 [9].
Though the RTP payload format defined in this document is capable Though the RTP payload format defined in this document is capable of
of transporting any MPEG-4 stream, other, more specific, formats transporting any MPEG-4 stream, other, more specific, formats may
may exist, such as RFC 3016 [12] for transport of MPEG-4 video exist, such as RFC 3016 [12] for transport of MPEG-4 video (ISO/IEC
(ISO/IEC 14496 [1] part 2). 14496 [1] part 2).
Configuration of the payload is provided to accommodate transport Configuration of the payload is provided to accommodate the
of any MPEG-4 stream at any possible bit rate. However, for a transportation of any MPEG-4 stream at any possible bit rate.
specific MPEG-4 elementary stream typically only very few However, for a specific MPEG-4 elementary stream typically only very
configurations are needed. So as to allow for the design of few configurations are needed. So as to allow for the design of
simplified, but dedicated receivers, this specification requires simplified, but dedicated receivers, this specification requires that
that specific modes are defined for transport of MPEG-4 streams. specific modes be defined for transport of MPEG-4 streams. This
This document defines modes for MPEG-4 CELP and AAC streams, as document defines modes for MPEG-4 CELP and AAC streams, as well as a
well as a generic mode that can be used to transport any MPEG-4 generic mode that can be used to transport any MPEG-4 stream. In the
stream. In the future new RFCs are expected to specify additional future, new RFCs are expected to specify additional modes for the
modes for transport of MPEG-4 streams. transportation of MPEG-4 streams.
The RTP payload format defined in this document specifies carriage The RTP payload format defined in this document specifies carriage of
of system-related information that is often equivalent to the system-related information that is often equivalent to the
information that may be contained in the MPEG-4 Sync Layer (SL) as information that may be contained in the MPEG-4 Sync Layer (SL) as
defined in MPEG-4 Systems [1]. This document does not prescribe how defined in MPEG-4 Systems [1]. This document does not prescribe how
to transcode or map information from the SL to fields defined in to transcode or map information from the SL to fields defined in the
the RTP payload format. Such processing, if any, is left to the RTP payload format. Such processing, if any, is left to the
discretion of the application. However, to anticipate the need for discretion of the application. However, to anticipate the need for
transport of any additional system-related information in future, the transportation of any additional system-related information in
an auxiliary field can be configured that may carry any such data. the future, an auxiliary field can be configured that may carry any
such data.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
this document are to be interpreted as described in RFC 2119 [4]. document are to be interpreted as described in BCP 14, RFC 2119 [4].
2. Carriage of MPEG-4 elementary streams over RTP 2. Carriage of MPEG-4 Elementary Streams over RTP
2.1 Signaling by MIME format parameters 2.1. Signaling by MIME Format Parameters
With this payload format a single MPEG-4 elementary stream can be With this payload format, a single MPEG-4 elementary stream can be
transported. Information on the type of MPEG-4 stream carried in transported. Information on the type of MPEG-4 stream carried in the
the payload is conveyed by MIME format parameters, for example in payload is conveyed by MIME format parameters, as in an SDP [5]
an SDP [5] message or by other means (see section 4). These MIME message or by other means (see section 4). These MIME format
format parameters specify the configuration of the payload. To parameters specify the configuration of the payload. To allow for
allow for simplified and dedicated receivers, a MIME format simplified and dedicated receivers, a MIME format parameter is
parameter is available to signal a specific mode of using this available to signal a specific mode of using this payload. A mode
payload. A mode definition MAY include the type of MPEG-4 definition MAY include the type of MPEG-4 elementary stream, as well
elementary stream as well as the applied configuration, so as to as the applied configuration, so as to avoid the need for receivers
avoid the need for receivers to parse all MIME format parameters. to parse all MIME format parameters. The applied mode MUST be
The applied mode MUST be signaled. signaled.
2.2 MPEG Access Units 2.2. MPEG Access Units
For carriage of compressed audio-visual data MPEG defines Access For carriage of compressed audio-visual data, MPEG defines Access
Units. An MPEG Access Unit (AU) is the smallest data entity to Units. An MPEG Access Unit (AU) is the smallest data entity to which
which timing information is attributed. In case of audio an Access timing information is attributed. In the case of audio, an Access
Unit may represent an audio frame and in case of video a picture. Unit may represent an audio frame and in the case of video, a
MPEG Access Units are by definition octet-aligned. If for example picture. MPEG Access Units are octet-aligned by definition. If, for
an audio frame is not octet-aligned, up to 7 zero-padding bits MUST example, an audio frame is not octet-aligned, up to 7 zero-padding
be inserted at the end of the frame to achieve the octet-aligned bits MUST be inserted at the end of the frame to achieve the octet-
Access Units, as required by the MPEG-4 specification. MPEG-4 aligned Access Units, as required by the MPEG-4 specification.
decoders MUST be able to decode AUs in which such padding is MPEG-4 decoders MUST be able to decode AUs in which such padding is
applied. applied.
Consistent with the MPEG-4 specification, this document requires Consistent with the MPEG-4 specification, this document requires that
that each MPEG-4 part 2 video Access Unit includes all the coded each MPEG-4 part 2 video Access Unit include all the coded data of a
data of a picture, any video stream headers that may precede the picture, any video stream headers that may precede the coded picture
coded picture data, and any video stream stuffing that may follow data, and any video stream stuffing that may follow it, up to but not
it, up to, but not including the startcode indicating the start of including the startcode indicating the start of a new video stream or
a new video stream or the next Access Unit. the next Access Unit.
2.3 Concatenation of Access Units 2.3. Concatenation of Access Units
Frequently it is possible to carry multiple Access Units in one RTP Frequently it is possible to carry multiple Access Units in one RTP
packet. This is particularly useful for audio; for example, when packet. This is particularly useful for audio; for example, when AAC
AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC is used for encoding a stereo signal at 64 kbits/sec, AAC frames
frames contain on average approximately 200 octets. On a LAN with a contain on average, approximately 200 octets. On a LAN with a 1500
1500 octet MTU this would allow on average 7 complete AAC frames to octet MTU, this would allow an average of 7 complete AAC frames to be
be carried per RTP packet. carried per RTP packet.
Access Units may have a fixed size in octets, but a variable size Access Units may have a fixed size in octets, but a variable size is
is also possible. To facilitate parsing in case of multiple also possible. To facilitate parsing in the case of multiple
concatenated AUs in one RTP packet, the size of each AU is made concatenated AUs in one RTP packet, the size of each AU is made known
known to the receiver. When concatenating in case of a constant AU to the receiver. When concatenating in the case of a constant AU
size, this size is communicated "out of band" through a MIME format size, this size is communicated "out of band" through a MIME format
parameter. When concatenating in case of variable size AUs, the RTP parameter. When concatenating in case of variable size AUs, the RTP
payload carries "in band" an AU size field for each contained AU. payload carries "in band" an AU size field for each contained AU.
In combination with the RTP payload length the size information In combination with the RTP payload length, the size information
allows the RTP payload to be split by the receiver back into the allows the RTP payload to be split by the receiver back into the
individual AUs. individual AUs.
To simplify the implementation of RTP receivers, it is required To simplify the implementation of RTP receivers, it is required that
that when multiple AUs are carried in an RTP packet, each AU MUST when multiple AUs are carried in an RTP packet, each AU MUST be
be complete, i.e. the number of AUs in an RTP packet MUST be complete, i.e., the number of AUs in an RTP packet MUST be integral.
integral. In addition, an AU MUST NOT be repeated in other RTP
packets; hence repetition of an AU is only possible by using a
duplicate RTP packet.
2.4 Fragmentation of Access Units In addition, an AU MUST NOT be repeated in other RTP packets; hence
repetition of an AU is only possible when using a duplicate RTP
packet.
MPEG allows for very large Access Units. Since most IP networks 2.4. Fragmentation of Access Units
have significantly smaller MTU sizes, this payload format allows
for the fragmentation of an Access Unit over multiple RTP packets.
Hence when an IP packet is lost after IP-level fragmentation, only an
AU fragment may get lost instead of the entire AU. To simplify the
implementation of RTP receivers, an RTP packet SHALL either carry
one or more complete Access Units or a single fragment of one AU,
i.e. packets MUST NOT contain fragments of multiple Access Units.
2.5 Interleaving MPEG allows for very large Access Units. Since most IP networks have
significantly smaller MTU sizes, this payload format allows for the
fragmentation of an Access Unit over multiple RTP packets. Hence,
when an IP packet is lost after IP-level fragmentation, only an AU
fragment may get lost instead of the entire AU. To simplify the
implementation of RTP receivers, an RTP packet SHALL either carry one
or more complete Access Units or a single fragment of one AU, i.e.,
packets MUST NOT contain fragments of multiple Access Units.
When an RTP packet carries a contiguous sequence of Access Units, 2.5. Interleaving
the loss of such a packet can result in a "decoding gap" for the
user. One method to alleviate this problem is to allow for the
Access Units to be interleaved in the RTP packets. For a modest
cost in latency and implementation complexity, significant error
resiliency to packet loss can be achieved.
To support optional interleaving of Access Units, this payload When an RTP packet carries a contiguous sequence of Access Units, the
format allows for index information to be sent for each Access Unit. loss of such a packet can result in a "decoding gap" for the user.
After informing receivers about buffer resources to allocate for One method of alleviating this problem is to allow for the Access
de-interleaving, the RTP sender is free to choose the interleaving Units to be interleaved in the RTP packets. For a modest cost in
latency and implementation complexity, significant error resiliency
to packet loss can be achieved.
To support optional interleaving of Access Units, this payload format
allows for index information to be sent for each Access Unit. After
informing receivers about buffer resources to allocate for de-
interleaving, the RTP sender is free to choose the interleaving
pattern without propagating this information a priori to the pattern without propagating this information a priori to the
receiver(s). Indeed the sender could dynamically adjust the receiver(s). Indeed, the sender could dynamically adjust the
interleaving pattern based on the Access Unit size, error rates, interleaving pattern based on the Access Unit size, error rates, etc.
etc. The RTP receiver does not need to know the interleaving The RTP receiver does not need to know the interleaving pattern used;
pattern used, it only needs to extract the index information of the it only needs to extract the index information of the Access Unit and
Access Unit and insert the Access Unit into the appropriate insert the Access Unit into the appropriate sequence in the decoding
sequence in the decoding or rendering queue. An example of or rendering queue. An example of interleaving is given below.
interleaving is given below.
For example, if we assume that an RTP packet contains 3 AUs, and that
the AUs are numbered 0, 1, 2, 3, 4, and so forth, and if an
interleaving group length of 9 is chosen, then RTP packet(i) contains
the following AU(n):
For example, if we assume that an RTP packet contains 3 AUs, and
that the AUs are numbered 0, 1, 2, 3, 4, and so forth, and if an
interleaving group length of 9 is chosen, then RTP packet(i)
contains the following AU(n):
RTP packet(0): AU(0), AU(3), AU(6) RTP packet(0): AU(0), AU(3), AU(6)
RTP packet(1): AU(1), AU(4), AU(7) RTP packet(1): AU(1), AU(4), AU(7)
RTP packet(2): AU(2), AU(5), AU(8) RTP packet(2): AU(2), AU(5), AU(8)
RTP packet(3): AU(9), AU(12), AU(15) RTP packet(3): AU(9), AU(12), AU(15)
RTP packet(4): AU(10), AU(13), AU(16) Etc. RTP packet(4): AU(10), AU(13), AU(16) Etc.
2.6 Time stamp information 2.6. Time Stamp Information
The RTP time stamp MUST carry the sampling instant of the first AU The RTP time stamp MUST carry the sampling instant of the first AU
(fragment) in the RTP packet. When multiple AUs are carried within (fragment) in the RTP packet. When multiple AUs are carried within
an RTP packet, the time stamps of subsequent AUs can be calculated an RTP packet, the time stamps of subsequent AUs can be calculated if
if the frame period of each AU is known. For audio and video this the frame period of each AU is known. For audio and video, this is
is possible if the frame rate is constant. However, in some cases possible if the frame rate is constant. However, in some cases it is
it is not possible to make such calculation. For example, for not possible to make such a calculation (for example, for variable
variable frame rate video, or for MPEG-4 BIFS streams carrying frame rate video, or for MPEG-4 BIFS streams carrying composition
composition information. To support such cases, this payload format information). To support such cases, this payload format can be
can be configured to carry a time stamp in the RTP payload for each configured to carry a time stamp in the RTP payload for each
contained Access Unit. A time stamp MAY be conveyed in the RTP contained Access Unit. A time stamp MAY be conveyed in the RTP
payload only for non-first AUs in the RTP packet, and SHALL NOT be payload only for non-first AUs in the RTP packet, and SHALL NOT be
conveyed for the first AU (fragment), as the time stamp for the conveyed for the first AU (fragment), as the time stamp for the first
first AU in the RTP packet is carried by the RTP time stamp. AU in the RTP packet is carried by the RTP time stamp.
MPEG-4 defines two types of time stamp: the composition time stamp MPEG-4 defines two types of time stamps: the composition time stamp
(CTS) and the decoding time stamp (DTS). The CTS represents the (CTS) and the decoding time stamp (DTS). The CTS represents the
sampling instant of an AU, and hence the CTS is equivalent to the sampling instant of an AU, and hence the CTS is equivalent to the RTP
RTP time stamp. The DTS may be used in MPEG-4 video streams that time stamp. The DTS may be used in MPEG-4 video streams that use
use bi-directional coding, i.e. when pictures are predicted in both bi-directional coding, i.e., when pictures are predicted in both
forward and backward direction by using either a reference picture forward and backward direction by using either a reference picture in
in the past, or a reference picture in the future. The DTS cannot the past, or a reference picture in the future. The DTS cannot be
be carried in the RTP header. In some cases the DTS can be derived carried in the RTP header. In some cases, the DTS can be derived
from the RTP time stamp using frame rate information; this requires from the RTP time stamp using frame rate information; this requires
deep parsing in the video stream, which may be considered deep parsing in the video stream, which may be considered
objectionable. But if the video frame rate is variable, the required objectionable. If the video frame rate is variable, the required
information may not even be present in the video stream. For both information may not even be present in the video stream. For both
reasons, the capability has been defined to optionally carry the reasons, the capability has been defined to optionally carry the DTS
DTS in the RTP payload for each contained Access Unit. in the RTP payload for each contained Access Unit.
To keep the coding of time stamps efficient, each time stamp To keep the coding of time stamps efficient, each time stamp
contained in the RTP payload is coded differentially, the CTS from contained in the RTP payload is coded as a difference. For the CTS,
the RTP time stamp, and the DTS from the CTS. the offset from the RTP time stamps is provided, and for the DTS, the
offset from the CTS.
2.7 State indication of MPEG-4 system streams 2.7. State Indication of MPEG-4 System Streams
ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to
convey state information when transporting MPEG-4 system streams, convey state information when transporting MPEG-4 system streams,
this payload format allows for the optional carriage in the RTP this payload format allows for the optional carriage in the RTP
payload of the stream state for each contained Access Unit. Stream payload of the stream state for each contained Access Unit. Stream
states are used to signal "crucial" AUs that carry information whose states are used to signal "crucial" AUs that carry information whose
loss cannot be tolerated and are also useful when repeating AUs loss cannot be tolerated and are also useful when repeating AUs
according to the carousel mechanism defined in ISO/IEC 14496-1. according to the carousel mechanism defined in ISO/IEC 14496-1.
2.8 Random access indication 2.8. Random Access Indication
Random access to the content of MPEG-4 elementary streams may be Random access to the content of MPEG-4 elementary streams may be
possible at some but not all Access Units. To signal Access Units possible at some but not all Access Units. To signal Access Units
where random access is possible, a random access point flag can where random access is possible, a random access point flag can
optionally be carried in the RTP payload for each contained Access optionally be carried in the RTP payload for each contained Access
Unit. Carriage of random access points is particularly useful for Unit. Carriage of random access points is particularly useful for
MPEG-4 system streams in combination with the stream state. MPEG-4 system streams in combination with the stream state.
2.9 Carriage of auxiliary information. 2.9. Carriage of Auxiliary Information
This payload format defines a specific field to carry auxiliary This payload format defines a specific field to carry auxiliary data.
data. The auxiliary data field is preceded by a field that specifies The auxiliary data field is preceded by a field that specifies the
the length of the auxiliary data, so as to facilitate skipping of length of the auxiliary data, so as to facilitate the skipping of
the data without parsing it. The coding of the auxiliary data is not data without parsing it. The coding of the auxiliary data is not
defined in this document; instead the format, meaning and signaling defined in this document; instead, the format, meaning and signaling
of auxiliary information is expected to be specified in one or more of auxiliary information is expected to be specified in one or more
future RFCs. Auxiliary information MUST NOT be transmitted until its future RFCs. Auxiliary information MUST NOT be transmitted until its
format, meaning and signaling have been specified and its use has format, meaning and signaling have been specified and its use has
been signaled. Receivers that have knowledge of the auxiliary data been signaled. Receivers that have knowledge of the auxiliary data
MAY decode the auxiliary data, but receivers without knowledge of MAY decode the auxiliary data, but receivers without knowledge of
such data MUST skip the auxiliary data field. such data MUST skip the auxiliary data field.
2.10 MIME format parameters and configuring conditional fields 2.10. MIME Format Parameters and Configuring Conditional Fields
To support the features described in the previous sections several To support the features described in the previous sections, several
fields are defined for carriage in the RTP payload. However, their fields are defined for carriage in the RTP payload. However, their
use strongly depends on the type of MPEG-4 elementary stream that use strongly depends on the type of MPEG-4 elementary stream that is
is carried. Sometimes a specific field is needed with a certain carried. Sometimes a specific field is needed with a certain length,
length, while in other cases such field is not needed at all. To be while in other cases such a field is not needed. To be efficient in
efficient in either case, the fields to support these features are either case, the fields to support these features are configurable by
configurable by means of MIME format parameters. In general, a MIME means of MIME format parameters. In general, a MIME format parameter
format parameter defines the presence and length of the associated defines the presence and length of the associated field. A length of
field. A length of zero indicates absence of the field. As a zero indicates absence of the field. As a consequence, parsing of
consequence, parsing of the payload requires knowledge of MIME the payload requires knowledge of MIME format parameters. The MIME
format parameters. The MIME format parameters are conveyed to the format parameters are conveyed to the receiver via SDP [5] messages,
receiver via SDP [5] messages, as specified in section 4.4.1, or as specified in section 4.4.1, or through other means.
through other means.
2.11 Global structure of payload format 2.11. Global Structure of Payload Format
The RTP payload following the RTP header, contains three The RTP payload following the RTP header, contains three octet-
octet-aligned data sections, of which the first two MAY be empty. aligned data sections, of which the first two MAY be empty, see
See figure 1. Figure 1.
+---------+-----------+-----------+---------------+ +---------+-----------+-----------+---------------+
| RTP | AU Header | Auxiliary | Access Unit | | RTP | AU Header | Auxiliary | Access Unit |
| Header | Section | Section | Data Section | | Header | Section | Section | Data Section |
+---------+-----------+-----------+---------------+ +---------+-----------+-----------+---------------+
<----------RTP Packet Payload-----------> <----------RTP Packet Payload----------->
Figure 1: Data sections within an RTP packet Figure 1: Data sections within an RTP packet
The first data section is the AU (Access Unit) Header Section, that The first data section is the AU (Access Unit) Header Section, that
contains one or more AU-headers; however, each AU-header MAY be contains one or more AU-headers; however, each AU-header MAY be
empty, in which case the entire AU Header Section is empty. The empty, in which case the entire AU Header Section is empty. The
second section is the Auxiliary Section, containing auxiliary data; second section is the Auxiliary Section, containing auxiliary data;
this section MAY also be configured empty. The third section is the this section MAY also be configured empty. The third section is the
Access Unit Data Section, containing either a single fragment of Access Unit Data Section, containing either a single fragment of one
one Access Unit or one or more complete Access Units. The Access Access Unit or one or more complete Access Units. The Access Unit
Unit Data Section MUST NOT be empty. Data Section MUST NOT be empty.
2.12 Modes to transport MPEG-4 streams 2.12. Modes to Transport MPEG-4 Streams
While it is possible to build fully configurable receivers capable While it is possible to build fully configurable receivers capable of
of receiving any MPEG-4 stream, this specification also allows for receiving any MPEG-4 stream, this specification also allows for the
the design of simplified, but dedicated receivers, that are capable design of simplified, but dedicated receivers, that are for example,
for example of receiving only one type of MPEG-4 stream. This capable of receiving only one type of MPEG-4 stream. This is
is achieved by requiring that specific modes be defined for using achieved by requiring that specific modes be defined in order to use
this specification. Each mode may define constraints for transport this specification. Each mode may define constraints for transport
of one or more type of MPEG-4 streams, for instance on the payload of one or more types of MPEG-4 streams, for instance on the payload
configuration. configuration.
The applied mode MUST be signaled. Signaling the mode is The applied mode MUST be signaled. Signaling the mode is
particularly important for receivers that are only capable of particularly important for receivers that are only capable of
decoding one or more specific modes. Such receivers need to decoding one or more specific modes. Such receivers need to
determine whether the applied mode is supported, so as to avoid determine whether the applied mode is supported, so as to avoid
problems with processing of payloads that are beyond the problems with processing of payloads that are beyond the capabilities
capabilities of the receiver. of the receiver.
In this document several modes are defined for transport of MPEG-4 In this document several modes are defined for the transportation of
CELP and AAC streams, as well as a generic mode that can be used MPEG-4 CELP and AAC streams, as well as a generic mode that can be
for any MPEG-4 stream. In the future, new RFCs may specify other used for any MPEG-4 stream. In the future, new RFCs may specify
modes of using this specification. However, each mode MUST be in other modes of using this specification. However, each mode MUST be
full compliance with this specification (see section 3.3.7). in full compliance with this specification (see section 3.3.7).
2.13 Alignment with RFC 3016 2.13. Alignment with RFC 3016
This payload can be configured to be nearly identical to the This payload can be configured as nearly identical to the payload
payload format defined in RFC 3016 [12] for the MPEG-4 video format defined in RFC 3016 [12] for the MPEG-4 video configurations
configurations recommended in RFC 3016. Hence, receivers that recommended in RFC 3016. Hence, receivers that comply with RFC 3016
comply with RFC 3016 can decode such RTP payload, providing that can decode such RTP payload, provided that additional packets
additional packets containing video decoder configuration (VO, containing video decoder configuration (VO, VOL, VOSH) are inserted
VOL, VOSH) are inserted in the stream, as required by RFC 3016. in the stream, as required by RFC 3016 [12]. Conversely, receivers
Conversely, receivers that comply with the specification in this that comply with the specification in this document SHOULD be able to
document SHOULD be able to decode payloads, names and parameters decode payloads, names and parameters defined for MPEG-4 video in RFC
defined for MPEG-4 video in RFC 3016. In this respect it is 3016 [12]. In this respect, it is strongly RECOMMENDED that the
strongly RECOMMENDED to implement the ability to ignore "in band" implementation provide the ability to ignore "in band" video decoder
video decoder configuration packets in the RFC 3016 payload. configuration packets that may be found in streams conforming to the
RFC 3016 video payload.
Note the "out of band" availability of the video decoder Note the "out of band" availability of the video decoder
configuration is optional in RFC 3016. To achieve maximum configuration is optional in RFC 3016 [12]. To achieve maximum
interoperability with the RTP payload format defined in this interoperability with the RTP payload format defined in this
document, applications that use RFC 3016 to transport MPEG-4 video document, applications that use RFC 3016 to transport MPEG-4 video
(part 2) are recommended to make the video decoder configuration (part 2) are recommended to make the video decoder configuration
available as a MIME parameter. available as a MIME parameter.
3. Payload Format 3. Payload Format
3.1 Usage of RTP Header Fields and RTCP 3.1. Usage of RTP Header Fields and RTCP
Payload Type (PT): The assignment of an RTP payload type for this Payload Type (PT): The assignment of an RTP payload type for this
packet format is outside the scope of this document; it is packet format is outside the scope of this document; it is
specified by the RTP profile under which this payload format is specified by the RTP profile under which this payload format is
used, or signaled dynamically out-of-band (e.g. using SDP). used, or signaled dynamically out-of-band (e.g., using SDP).
Marker (M) bit: The M bit is set to 1 to indicate that the RTP Marker (M) bit: The M bit is set to 1 to indicate that the RTP packet
packet payload contains either the final fragment of a fragmented payload contains either the final fragment of a fragmented Access
Access Unit or one or more complete Access Units. Unit or one or more complete Access Units.
Extension (X) bit: Defined by the RTP profile used. Extension (X) bit: Defined by the RTP profile used.
Sequence Number: The RTP sequence number SHOULD be generated by the Sequence Number: The RTP sequence number SHOULD be generated by the
sender in the usual manner with a constant random offset. sender in the usual manner with a constant random offset.
Timestamp: Indicates the sampling instant of the first AU Timestamp: Indicates the sampling instant of the first AU contained
contained in the RTP payload. This sampling instant is equivalent in the RTP payload. This sampling instant is equivalent to the
to the CTS in the MPEG-4 time domain. When using SDP the clock rate CTS in the MPEG-4 time domain. When using SDP, the clock rate of
of the RTP time stamp MUST be expressed using the "rtpmap" the RTP time stamp MUST be expressed using the "rtpmap" attribute.
attribute. If an MPEG-4 audio stream is transported, the rate SHOULD If an MPEG-4 audio stream is transported, the rate SHOULD be set
be set to the same value as the sampling rate of the audio stream. to the same value as the sampling rate of the audio stream. If an
If an MPEG-4 video stream is transported, it is RECOMMENDED to set MPEG-4 video stream is transported, it is RECOMMENDED that the
the rate to 90 kHz. rate be set to 90 kHz.
In all cases, the sender SHALL make sure that RTP time stamps In all cases, the sender SHALL make sure that RTP time stamps are
are identical only if the RTP time stamp refers to fragments of the identical only if the RTP time stamp refers to fragments of the same
same Access Unit. Access Unit.
According to RFC 1889 [2] (section 5.1), RTP time stamps are According to RFC 3550 [2] (section 5.1), it is RECOMMENDED that RTP
RECOMMENDED to start at a random value for security reasons. This time stamps start at a random value for security reasons. This is
is not an issue for synchronization of multiple RTP streams. When, not an issue for synchronization of multiple RTP streams. However,
however, streams from multiple sources are to be synchronized (for when streams from multiple sources are to be synchronized (for
example one stream from local storage, another from an RTP streaming example one stream from local storage, another from an RTP streaming
server), synchronization may become impossible if the receiver only server), synchronization may become impossible if the receiver only
knows the original time stamp relationships. Synchronization in such knows the original time stamp relationships. In such cases the time
cases, may require to provide the correct relationship between time stamp relationship required for obtaining synchronization may be
stamps for obtaining synchronization by out of band means. The provided by out of band means. The format of such information, as
format of such information as well as methods to convey such well as methods to convey such information, are beyond the scope of
information are beyond the scope of this specification. this specification.
SSRC: set as described in RFC 1889 [2]. SSRC: set as described in RFC 3550 [2].
CC and CSRC fields are used as described in RFC 1889 [2]. CC and CSRC fields are used as described in RFC 3550 [2].
RTCP SHOULD be used as defined in RFC 1889 [2]. Note that time RTCP SHOULD be used as defined in RFC 3550 [2]. Note that time
stamps in RTCP Sender Reports may be used to synchronize multiple stamps in RTCP Sender Reports may be used to synchronize multiple
MPEG-4 elementary streams and also to synchronize MPEG-4 streams MPEG-4 elementary streams and also to synchronize MPEG-4 streams with
with non-MPEG-4 streams, in case the delivery of these streams uses non-MPEG-4 streams, in case the delivery of these streams uses RTP.
RTP.
3.2 RTP Payload Structure 3.2. RTP Payload Structure
3.2.1 The AU Header Section 3.2.1. The AU Header Section
When present, the AU Header Section consists of the When present, the AU Header Section consists of the AU-headers-length
AU-headers-length field, followed by a number of AU-headers. See field, followed by a number of AU-headers, see Figure 2.
figure 2.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
|AU-headers-length|AU-header|AU-header| |AU-header|padding| |AU-headers-length|AU-header|AU-header| |AU-header|padding|
| | (1) | (2) | | (n) | bits | | | (1) | (2) | | (n) | bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
Figure 2: The AU Header Section Figure 2: The AU Header Section
The AU-headers are configured using MIME format parameters and MAY The AU-headers are configured using MIME format parameters and MAY be
be empty. If the AU-header is configured empty, the empty. If the AU-header is configured empty, the AU-headers-length
AU-headers-length field SHALL NOT be present and consequently the field SHALL NOT be present and consequently the AU Header Section is
AU Header Section is empty. If the AU-header is not configured empty. If the AU-header is not configured empty, then the AU-
empty, then the AU-headers-length is a two octet field that headers-length is a two octet field that specifies the length in bits
specifies the length in bits of the immediately following of the immediately following AU-headers, excluding the padding bits.
AU-headers, excluding the padding bits.
Each AU-header is associated with a single Access Unit (fragment) Each AU-header is associated with a single Access Unit (fragment)
contained in the Access Unit Data Section in the same RTP packet. contained in the Access Unit Data Section in the same RTP packet.
For each contained Access Unit (fragment) there is exactly one
AU-header. Within the AU Header Section, the AU-headers are
bit-wise concatenated in the order in which the Access Units are
contained in the Access Unit Data Section. Hence, the n-th
AU-header refers to the n-th AU (fragment). If the concatenated
AU-headers consume a non-integer number of octets, up to 7
zero-padding bits MUST be inserted at the end in order to achieve
octet-alignment of the AU Header Section.
3.2.1.1 The AU-header For each contained Access Unit (fragment), there is exactly one AU-
header. Within the AU Header Section, the AU-headers are bit-wise
concatenated in the order in which the Access Units are contained in
the Access Unit Data Section. Hence, the n-th AU-header refers to
the n-th AU (fragment). If the concatenated AU-headers consume a
non-integer number of octets, up to 7 zero-padding bits MUST be
inserted at the end in order to achieve octet-alignment of the AU
Header Section.
Each AU-header may contain the fields given in figure 3. The length 3.2.1.1. The AU-header
Each AU-header may contain the fields given in Figure 3. The length
in bits of the fields, with the exception of the CTS-flag, the in bits of the fields, with the exception of the CTS-flag, the
DTS-flag and the RAP-flag fields is defined by MIME format DTS-flag and the RAP-flag fields, is defined by MIME format
parameters; see section 4.1. If a MIME format parameter has the parameters; see section 4.1. If a MIME format parameter has the
default value of zero, then the associated field is not present. default value of zero, then the associated field is not present. The
The number of bits for fields that are present and that represent number of bits for fields that are present and that represent the
the value of a parameter MUST be chosen large enough to correctly value of a parameter MUST be chosen large enough to correctly encode
encode the largest value of that parameter during the session. the largest value of that parameter during the session.
If present, the fields MUST occur in the mutual order given in If present, the fields MUST occur in the mutual order given in Figure
figure 3. In the general case a receiver can only discover the size 3. In the general case, a receiver can only discover the size of an
of an AU-header by parsing it since the presence of the CTS-delta AU-header by parsing it since the presence of the CTS-delta and DTS-
and DTS-delta fields is signaled by the value of the CTS-flag and delta fields is signaled by the value of the CTS-flag and DTS-flag,
DTS-flag, respectively. respectively.
+---------------------------------------+ +---------------------------------------+
| AU-size | | AU-size |
+---------------------------------------+ +---------------------------------------+
| AU-Index / AU-Index-delta | | AU-Index / AU-Index-delta |
+---------------------------------------+ +---------------------------------------+
| CTS-flag | | CTS-flag |
+---------------------------------------+ +---------------------------------------+
| CTS-delta | | CTS-delta |
+---------------------------------------+ +---------------------------------------+
skipping to change at page 13, line 25 skipping to change at page 12, line 51
+---------------------------------------+ +---------------------------------------+
| DTS-delta | | DTS-delta |
+---------------------------------------+ +---------------------------------------+
| RAP-flag | | RAP-flag |
+---------------------------------------+ +---------------------------------------+
| Stream-state | | Stream-state |
+---------------------------------------+ +---------------------------------------+
Figure 3: The fields in the AU-header. If used, the AU-Index field Figure 3: The fields in the AU-header. If used, the AU-Index field
only occurs in the first AU-header within an AU Header only occurs in the first AU-header within an AU Header
Section; in any other AU-header the AU-Index-delta field Section; in any other AU-header, the AU-Index-delta field
occurs instead. occurs instead.
AU-size: Indicates the size in octets of the associated Access Unit AU-size: Indicates the size in octets of the associated Access Unit
in the Access Unit Data Section in the same RTP packet. When in the Access Unit Data Section in the same RTP packet. When the
the AU-size is associated with an AU fragment, the AU size AU-size is associated with an AU fragment, the AU size indicates
indicates the size of the entire AU and not the size of the the size of the entire AU and not the size of the fragment. In
fragment. In this case, the size of the fragment is known this case, the size of the fragment is known from the size of the
from the size of the AU data section. This can be exploited AU data section. This can be exploited to determine whether a
to determine whether a packet contains an entire AU or a packet contains an entire AU or a fragment, which is particularly
fragment, which is particularly useful after losing a packet useful after losing a packet carrying the last fragment of an AU.
carrying the last fragment of an AU.
AU-Index: Indicates the serial number of the associated Access Unit AU-Index: Indicates the serial number of the associated Access Unit
(fragment). For each (in decoding order) consecutive AU or AU (fragment). For each (in decoding order) consecutive AU or AU
fragment, the serial number is incremented with 1. When fragment, the serial number is incremented by 1. When present,
present, the AU-Index field occurs in the first AU-header in the AU-Index field occurs in the first AU-header in the AU Header
the AU Header Section, but MUST NOT occur in any subsequent Section, but MUST NOT occur in any subsequent (non-first) AU-
(non-first) AU-header in that Section. To encode the serial header in that Section. To encode the serial number in any such
number in any such non-first AU-header, the AU-Index-delta non-first AU-header, the AU-Index-delta field is used.
field is used.
AU-Index-delta: The AU-Index-delta field is an unsigned integer that
specifies the serial number of the associated AU as the difference
with respect to the serial number of the previous Access Unit.
Hence, for the n-th (n>1) AU, the serial number is found from:
AU-Index-delta: The AU-Index-delta field is an unsigned integer
that specifies the serial number of the associated AU as the
difference with respect to the serial number of the previous
Access Unit. Hence, for the n-th (n>1) AU the serial number
is found from:
AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1 AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1
If the AU-Index field is present in the first AU-header in
the AU Header Section, then the AU-Index-delta field MUST be
present in any subsequent (non-first) AU-header. When the
AU-Index-delta is coded with the value 0, it indicates that
the Access Units are consecutive in decoding order. An
AU-Index-delta value larger than 0 signals that interleaving
is applied.
CTS-flag: Indicates whether the CTS-delta field is present. If the AU-Index field is present in the first AU-header in the AU
A value of 1 indicates that the field is present, a value Header Section, then the AU-Index-delta field MUST be present in
of 0 that it is not present. any subsequent (non-first) AU-header. When the AU-Index-delta is
The CTS-flag field MUST be present in each AU-header if the coded with the value 0, it indicates that the Access Units are
length of the CTS-delta field is signaled to be larger than consecutive in decoding order. An AU-Index-delta value larger
zero. In that case, the CTS-flag field MUST have the value 0 than 0 signals that interleaving is applied.
in the first AU-header and MAY have the value 1 in all
non-first AU-headers. The CTS-flag field SHOULD be 0 for CTS-flag: Indicates whether the CTS-delta field is present. A value
any non-first fragment of an Access Unit. of 1 indicates that the field is present, a value of 0 indicates
that it is not present.
The CTS-flag field MUST be present in each AU-header if the length
of the CTS-delta field is signaled to be larger than zero. In
that case, the CTS-flag field MUST have the value 0 in the first
AU-header and MAY have the value 1 in all non-first AU-headers.
The CTS-flag field SHOULD be 0 for any non-first fragment of an
Access Unit.
CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's
complement offset (delta) from the time stamp in the RTP complement offset (delta) from the time stamp in the RTP header of
header of this RTP packet. The CTS MUST use the same clock this RTP packet. The CTS MUST use the same clock rate as the time
rate as the time stamp in the RTP header. stamp in the RTP header.
DTS-flag: Indicates whether the DTS-delta field is present. A value DTS-flag: Indicates whether the DTS-delta field is present. A value
of 1 indicates that DTS-delta is present, a value of 0 that of 1 indicates that DTS-delta is present, a value of 0 indicates
it is not present. that it is not present.
The DTS-flag field MUST be present in each AU-header if the
length of the DTS-delta field is signaled to be larger than
zero. The DTS-flag field MUST have the same value for all
fragments of an Access Unit.
DTS-delta: Specifies the value of the DTS as a 2's complement The DTS-flag field MUST be present in each AU-header if the length
offset (delta) from the CTS. The DTS MUST use the of the DTS-delta field is signaled to be larger than zero. The
same clock rate as the time stamp in the RTP header. The DTS-flag field MUST have the same value for all fragments of an
DTS-delta field MUST have the same value for all fragments of Access Unit.
an Access Unit.
RAP-flag: Indicates when set to 1 that the associated Access Unit DTS-delta: Specifies the value of the DTS as a 2's complement offset
provides a random access point to the content of the stream. (delta) from the CTS. The DTS MUST use the same clock rate as the
If an Access Unit is fragmented, the RAP flag, if present, time stamp in the RTP header. The DTS-delta field MUST have the
MUST be set to 0 for each non-first fragment of the AU. same value for all fragments of an Access Unit.
RAP-flag: When set to 1, indicates that the associated Access Unit
provides a random access point to the content of the stream. If
an Access Unit is fragmented, the RAP flag, if present, MUST be
set to 0 for each non-first fragment of the AU.
Stream-state: Specifies the state of the stream for an AU of an Stream-state: Specifies the state of the stream for an AU of an
MPEG-4 system stream; each state is identified by a value of MPEG-4 system stream; each state is identified by a value of a
a modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams use the
use the AU_SequenceNumber to signal stream states. When the AU_SequenceNumber to signal stream states. When the stream state
stream state changes, the value of stream-state MUST be changes, the value of the stream-state MUST be incremented by one.
incremented by one.
Note: no relation is required between stream-states of Note: no relation is required between stream-states of different
different streams. streams.
3.2.2 The Auxiliary Section 3.2.2. The Auxiliary Section
The Auxiliary Section consists of the auxiliary-data-size field The Auxiliary Section consists of the auxiliary-data-size field
followed by the auxiliary-data field. Receivers MAY (but are not followed by the auxiliary-data field. Receivers MAY (but are not
required to) parse the auxiliary-data field; to facilitate skipping required to) parse the auxiliary-data field; to facilitate skipping
of the auxiliary-data field by receivers, the auxiliary-data-size of the auxiliary-data field by receivers, the auxiliary-data-size
field indicates the length in bits of the auxiliary-data. If the field indicates the length in bits of the auxiliary-data. If the
concatenation of the auxiliary-data-size and the auxiliary-data concatenation of the auxiliary-data-size and the auxiliary-data
fields consume a non-integer number of octets, up to 7 zero padding fields consume a non-integer number of octets, up to 7 zero padding
bits MUST be inserted immediately after the auxiliary data in order bits MUST be inserted immediately after the auxiliary data in order
to achieve octet-alignment. See figure 4. to achieve octet-alignment. See Figure 4.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
| auxiliary-data-size | auxiliary-data |padding bits | | auxiliary-data-size | auxiliary-data |padding bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
Figure 4: The fields in the Auxiliary Section Figure 4: The fields in the Auxiliary Section
The length in bits of the auxiliary-data-size field is configurable The length in bits of the auxiliary-data-size field is configurable
by a MIME format parameter; see section 4.1. The default length of by a MIME format parameter; see section 4.1. The default length of
zero indicates that the entire Auxiliary Section is absent. zero indicates that the entire Auxiliary Section is absent.
auxiliary-data-size: specifies the length in bits of the immediately auxiliary-data-size: specifies the length in bits of the immediately
following auxiliary-data field; following auxiliary-data field;
auxiliary-data: the auxiliary-data field contains data of a format auxiliary-data: the auxiliary-data field contains data of a format
not defined by this specification. not defined by this specification.
3.2.3 The Access Unit Data Section 3.2.3. The Access Unit Data Section
The Access Unit Data Section contains an integer number of complete The Access Unit Data Section contains an integer number of complete
Access Units or a single fragment of one AU. The Access Unit Data Access Units or a single fragment of one AU. The Access Unit Data
Section is never empty. If data of more than one Access Unit is Section is never empty. If data of more than one Access Unit is
present, then the AUs are concatenated into a contiguous string present, then the AUs are concatenated into a contiguous string of
of octets. See figure 5. The AUs inside the Access Unit Data octets. See Figure 5. The AUs inside the Access Unit Data Section
Section MUST be in decoding order, though not necessarily contiguous MUST be in decoding order, though not necessarily contiguous in the
in the case of interleaving. case of interleaving.
The size and number of Access Units SHOULD be adjusted such that The size and number of Access Units SHOULD be adjusted such that the
the resulting RTP packet is not larger than the path MTU. To handle resulting RTP packet is not larger than the path MTU. To handle
larger packets, this payload format relies on lower layers for larger packets, this payload format relies on lower layers for
fragmentation, which may result in reduced performance. fragmentation, which may result in reduced performance.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|AU(1) | |AU(1) |
+ | + |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |AU(2) | | |AU(2) |
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | AU(n) | | | AU(n) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|AU(n) continued| |AU(n) continued|
|-+-+-+-+-+-+-+-+ |-+-+-+-+-+-+-+-+
Figure 5: Access Unit Data Section; each AU is octet-aligned. Figure 5: Access Unit Data Section; each AU is octet-aligned.
When multiple Access Units are carried, the size of each AU MUST be When multiple Access Units are carried, the size of each AU MUST be
made available to the receiver. If the AU size is variable then the made available to the receiver. If the AU size is variable, then the
size of each AU MUST be indicated in the AU-size field of the size of each AU MUST be indicated in the AU-size field of the
corresponding AU-header. However, if the AU size is constant for a corresponding AU-header. However, if the AU size is constant for a
stream, this mechanism SHOULD NOT be used, but instead the fixed stream, this mechanism SHOULD NOT be used; instead, the fixed size
size SHOULD be signaled by the MIME format parameter SHOULD be signaled by the MIME format parameter "constantSize"; see
"constantSize", see section 4.1. section 4.1.
The absence of both AU-size in the AU-header and the constantSize The absence of both AU-size in the AU-header and the constantSize
MIME format parameter indicates carriage of a single AU (fragment), MIME format parameter indicates the carriage of a single AU
i.e. that a single Access Unit (fragment) is transported in each (fragment), i.e., that a single Access Unit (fragment) is transported
RTP packet for that stream. in each RTP packet for that stream.
3.2.3.1 Fragmentation 3.2.3.1. Fragmentation
A packet SHALL carry either one or more complete Access Units, or A packet SHALL carry either one or more complete Access Units, or a
a single fragment of an Access Unit. Fragments of the same Access single fragment of an Access Unit. Fragments of the same Access Unit
Unit have the same time stamp but different RTP sequence numbers. have the same time stamp but different RTP sequence numbers. The
The marker bit in the RTP header is 1 on the last fragment of an marker bit in the RTP header is 1 on the last fragment of an Access
Access Unit, and 0 on all other fragments. Unit, and 0 on all other fragments.
3.2.3.2 Interleaving 3.2.3.2. Interleaving
Unless prohibited by the signaled mode, a sender MAY interleave Unless prohibited by the signaled mode, a sender MAY interleave
Access Units. Receivers that are capable of receiving modes that Access Units. Receivers that are capable of receiving modes that
support interleaving, MUST be able to decode interleaved Access support interleaving MUST be able to decode interleaved Access Units.
Units.
When a sender interleaves Access Units, it needs to provide When a sender interleaves Access Units, it needs to provide
sufficient information to enable a receiver to unambiguously sufficient information to enable a receiver to unambiguously
reconstruct the original order, even in case of out-of-order reconstruct the original order, even in the case of out-of-order
packets, packet loss or duplication. The information that senders packets, packet loss or duplication. The information that senders
need to provide depends on whether or not the Access Units have a need to provide depends on whether or not the Access Units have a
constant time duration. Access Units have a constant time duration, constant time duration. Access Units have a constant time duration,
if: if:
TS(i+1) - TS(i) = constant, for any i, where TS(i+1) - TS(i) = constant
i indicates the index of the AU in original order for any i, where:
i indicates the index of the AU in the original order, and
TS(i) denotes the time stamp of AU(i) TS(i) denotes the time stamp of AU(i)
The MIME parameter "constantDuration" SHOULD be used to signal that The MIME parameter "constantDuration" SHOULD be used to signal that
Access Units have a constant time duration, see section 4.1. Access Units have a constant time duration; see section 4.1.
If the "constantDuration" parameter is present, the receiver can If the "constantDuration" parameter is present, the receiver can
reconstruct the original Access Unit timing based solely on the RTP reconstruct the original Access Unit timing based solely on the RTP
timestamp and AU-Index-delta. Accordingly, when transmitting Access timestamp and AU-Index-delta. Accordingly, when transmitting Access
Units of constant duration, the AU-Index, if present, MUST be set Units of constant duration, the AU-Index, if present, MUST be set to
to the value 0. Receivers of constant duration Access Units MUST the value 0. Receivers of constant duration Access Units MUST use
use the RTP timestamp to determine the index of the first AU in the the RTP timestamp to determine the index of the first AU in the RTP
RTP packet. The AU-Index-delta header and the signaled packet. The AU-Index-delta header and the signaled
"constantDuration" are used to reconstruct AU timing. "constantDuration" are used to reconstruct AU timing.
If the "constantDuration" parameter is not present, then Access If the "constantDuration" parameter is not present, then senders MAY
Units are assumed to have a variable duration, unless the AU-Index signal AUs of constant duration by coding the AU-Index with zero in
is present and coded with the value 0 in each RTP packet. When each RTP packet. In the absence of the constantDuration parameter
transmitting Access Units of variable duration, then the receivers MUST conclude that the AUs have constant duration if the
"constantDuration" parameter MUST NOT be present, and the AU-index is zero in two consecutive RTP packets.
transmitter MUST use the AU-Index to encode the index information
required for re-ordering, and the receiver MUST use that value to When transmitting Access Units of variable duration, then the
determine the index of each AU in the RTP packet. The number of "constantDuration" parameter MUST NOT be present, and the transmitter
bits of the AU-Index field MUST be chosen so that valid index MUST use the AU-Index to encode the index information required for
information is provided at the applied interleaving scheme, without re-ordering, and the receiver MUST use that value to determine the
causing problems due to roll-over of the AU-Index field. In index of each AU in the RTP packet. The number of bits of the AU-
addition, the CTS-delta MUST be coded in the AU header for each Index field MUST be chosen so that valid index information is
non-first AU in the RTP packet, so that receivers can place the AUs provided at the applied interleaving scheme, without causing problems
correctly in time. due to roll-over of the AU-Index field. In addition, the CTS-delta
MUST be coded in the AU header for each non-first AU in the RTP
packet, so that receivers can place the AUs correctly in time.
When interleaving is applied, a de-interleave buffer is needed in When interleaving is applied, a de-interleave buffer is needed in
receivers to put the Access Units in their correct logical receivers to put the Access Units in their correct logical
consecutive decoding order. This requires the computation of the consecutive decoding order. This requires the computation of the
time stamp for each Access Unit. In case of a constant time duration time stamp for each Access Unit. In case of a constant time duration
per Access Unit, the time stamp of the i-th access unit in an RTP per Access Unit, the time stamp of the i-th access unit in an RTP
packet with RTP time stamp T is calculated as follows: packet with RTP time stamp T is calculated as follows:
Timestamp[0] = T Timestamp[0] = T
Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k]
skipping to change at page 18, line 4 skipping to change at page 17, line 36
When interleaving is applied, a de-interleave buffer is needed in When interleaving is applied, a de-interleave buffer is needed in
receivers to put the Access Units in their correct logical receivers to put the Access Units in their correct logical
consecutive decoding order. This requires the computation of the consecutive decoding order. This requires the computation of the
time stamp for each Access Unit. In case of a constant time duration time stamp for each Access Unit. In case of a constant time duration
per Access Unit, the time stamp of the i-th access unit in an RTP per Access Unit, the time stamp of the i-th access unit in an RTP
packet with RTP time stamp T is calculated as follows: packet with RTP time stamp T is calculated as follows:
Timestamp[0] = T Timestamp[0] = T
Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k]
+ 1))) * access-unit-duration + 1))) * access-unit-duration
When AU-Index-delta is always 0, this reduces to T + i * (access- When AU-Index-delta is always 0, this reduces to T + i * (access-
unit-duration). This is the non-interleaved case, where the frames unit-duration). This is the non-interleaved case, where the frames
are consecutive in decoding order. Note that the AU-Index field are consecutive in decoding order. Note that the AU-Index field
(present for the first Access Unit) is indeed not needed in this (present for the first Access Unit) is indeed not needed in this
calculation. calculation.
3.2.3.3 Constraints for interleaving 3.2.3.3. Constraints for Interleaving
The size of the packets should be suitably chosen to be appropriate The size of the packets should be suitably chosen to be appropriate
to both the path MTU and the capacity of the receiver's to both the path MTU and the capacity of the receiver's de-interleave
de-interleave buffer. The maximum packet size for a session SHOULD buffer. The maximum packet size for a session SHOULD be chosen to
be chosen not to exceed the path MTU. not exceed the path MTU.
To allow receivers to allocate sufficient resources for To allow receivers to allocate sufficient resources for de-
de-interleaving, senders MUST provide the information to receivers interleaving, senders MUST provide the information to receivers as
as specified in this section. specified in this section.
AUs enter the decoder in decoding order. The de-interleave buffer AUs enter the decoder in decoding order. The de-interleave buffer is
is used to re-order a stream of interleaved AUs back into decoding used to re-order a stream of interleaved AUs back into decoding
order. When interleaving is applied, the decoding of "early" AUs order. When interleaving is applied, the decoding of "early" AUs has
has to be postponed until all AUs that precede in decoding order to be postponed until all AUs that precede it in decoding order are
are present. Therefore these "early" AUs are stored in the present. Therefore, these "early" AUs are stored in the de-
de-interleave buffer. As an example in figure 6 the interleaving interleave buffer. As an example in Figure 6, the interleaving
pattern from section 2.5 is considered. pattern from section 2.5 is considered.
+--+--+--+--+--+--+--+--+--+--+--+- +--+--+--+--+--+--+--+--+--+--+--+-
Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|.. Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
+--+--+--+--+--+--+--+--+--+--+--+- +--+--+--+--+--+--+--+--+--+--+--+-
Storage of "early" AUs 3 3 3 3 3 3 Storage of "early" AUs 3 3 3 3 3 3
6 6 6 6 6 6 6 6 6 6 6 6
4 4 4 4 4 4
7 7 7 7 7 7
12 12 12 12
Figure 6: Storage of "early" AUs in the de-interleave buffer per Figure 6: Storage of "early" AUs in the de-interleave buffer per
interleaved AU. interleaved AU.
AU(3) is to be delivered to the decoder after AU(0), AU(1)and AU(3) is to be delivered to the decoder after AU(0), AU(1) and AU(2);
AU(2); of these AUs, AU(2) is most late and hence AU(3) needs to be of these AUs, AU(2) arrives from the network last and hence AU(3)
stored until AU(2) is present in the pattern. Similarly, AU(6) is needs to be stored until AU(2) is present in the pattern. Similarly,
to be stored until AU(5) is present, while AU(4) and AU(7) are to AU(6) is to be stored until AU(5) is present, while AU(4) and AU(7)
be stored until AU(2) and AU(5) are present, respectively. Note are to be stored until AU(2) and AU(5) are present, respectively.
that the fullness of the de-interleave buffer varies in time. In Note that the fullness of the de-interleave buffer varies in time.
figure 6, the de-interleave buffer contains at most 4, but often In Figure 6, the de-interleave buffer contains at most 4, but often
less AUs. less AUs.
So as to give a rough indication of the resources needed in the So as to give a rough indication of the resources needed in the
receiver for de-interleaving, the maximum displacement in time of receiver for de-interleaving, the maximum displacement in time of an
an AU is defined. For any AU in the pattern it can be verified AU is defined. For any AU(j) in the pattern, each AU(i) with i<j
which AUs are not yet present. The maximum displacement in time of that is not yet present can be determined. The maximum displacement
an AU is the maximum difference between the time stamp of an AU in in time of an AU is the maximum difference between the time stamp of
the pattern and the time stamp of the earliest AU that is not yet an AU in the pattern and the time stamp of the earliest AU that is
present. In other words, when considering a sequence of interleaved not yet present. In other words, when considering a sequence of
AUs, then: interleaved AUs, then:
Maximum displacement = max{TS(i) - TS(j)}, for any i and any j>i, Maximum displacement = max{TS(i) - TS(j)}
where i and j indicate the index of the AU in the for any i and any j>i, where:
interleaving pattern and TS denotes the time stamp i and j indicate the index of the AU in the interleaving
of the AU pattern, and
TS denotes the time stamp of the AU.
As an example in figure 7 the interleaving pattern from section 2.5 As an example in Figure 7, the interleaving pattern from section 2.5
is considered. For each AU in the pattern the earliest not yet is considered. For each AU in the pattern, the index is given of the
present AU is indicated. A "-" indicates that all previous AUs earliest of any earlier AUs not yet present. Hence for each AU(n) in
the interleaving pattern the smallest index k (with k<n) of not yet
delivered AUs is indicated. A "-" indicates that all previous AUs
are present. If the AU period is constant, the maximum displacement are present. If the AU period is constant, the maximum displacement
equals 5 AU periods, as found for AU(6) and AU(7). equals 5 AU periods, as found for AU(6) and AU(7).
+--+--+--+--+--+--+--+--+--+--+--+- +--+--+--+--+--+--+--+--+--+--+--+-
Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|.. Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
+--+--+--+--+--+--+--+--+--+--+--+- +--+--+--+--+--+--+--+--+--+--+--+-
Earliest not yet present AU - 1 1 - 2 2 - - - - 10 Earliest not yet present AU - 1 1 - 2 2 - - - - 10
Figure 7: The earliest not yet present AU for each AU in the Figure 7: For each AU in the interleaving pattern, the earliest of
interleaving pattern. any earlier AUs not yet present
When interleaving, senders MUST signal the maximum displacement When interleaving, senders MUST signal the maximum displacement in
in time during the session via the MIME format parameter time during the session via the MIME format parameter
"maxDisplacement"; see section 4.1. "maxDisplacement"; see section 4.1.
An estimate of the size of the de-interleave buffer is found by An estimate of the size of the de-interleave buffer is found by
multiplying the maximum displacement by the maximum bit rate: multiplying the maximum displacement by the maximum bit rate:
size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP
clock frequency), clock frequency),
where Rate(max) is the maximum bit-rate of the transported stream. where:
Rate(max) is the maximum bit-rate of the transported stream.
Note that receivers can derive Rate(max) from the MIME format Note that receivers can derive Rate(max) from the MIME format
parameters streamType, profile-level-id, and config. parameters streamType, profile-level-id, and config.
However, this calculation estimates the size of the de-interleave However, this calculation estimates the size of the de-interleave
buffer and the really required size may differ from the calculated buffer and the required size may differ from the calculated value.
value. If this calculation under-estimates the size of the If this calculation under-estimates the size of the
de-interleave buffer, then senders, when interleaving, MUST signal de-interleave buffer, then senders, when interleaving, MUST signal a
a size of the de-interleave buffer via the MIME format parameter size of the de-interleave buffer via the MIME format parameter
"de-interleaveBufferSize"; see section 4.1. If the calculation "de-interleaveBufferSize"; see section 4.1. If the calculation
over-estimates the size of the de-interleave buffer, then senders, over-estimates the size of the de-interleave buffer, then senders,
when interleaving, MAY signal a size of the de-interleave buffer when interleaving, MAY signal a size of the de-interleave buffer via
via the MIME format parameter "de-interleaveBufferSize". the MIME format parameter "de-interleaveBufferSize".
The signaled size of the de-interleave buffer MUST be large enough The signaled size of the de-interleave buffer MUST be large enough to
to contain all "early" AUs at any point in time during the session, contain all "early" AUs at any point in time during the session.
that is: That is:
minimum de-interleave buffer size = max [sum {if TS(i) > TS(j) then minimum de-interleave buffer size = max [sum {if TS(i) > TS(j) then
AU-size(i) else 0}] for any j AU-size(i) else 0}]
and any i<j, where
for any j and any i<j, where:
i and j indicate the index of an AU in the interleaving i and j indicate the index of an AU in the interleaving
pattern, pattern,
TS(i) denotes the time stamp of AU(i), and TS(i) denotes the time stamp of AU(i), and
AU-size(i) denotes the size of AU(i) in number of octets. AU-size(i) denotes the size of AU(i) in number of octets.
If the "de-interleaveBufferSize" parameter is present, then the If the "de-interleaveBufferSize" parameter is present, then the
applied buffer for de-interleaving in a receiver MUST have a size applied buffer for de-interleaving in a receiver MUST have a size
that is at least equal to the signaled size of the de-interleave that is at least equal to the signaled size of the de-interleave
buffer, else a size that is at least equal to the calculated size buffer, else a size that is at least equal to the calculated size of
of the de-interleave buffer. the de-interleave buffer.
No matter what interleaving scheme is used, the scheme must be No matter what interleaving scheme is used, the scheme must be
analyzed to calculate the applicable maxDisplacement value, as well analyzed to calculate the applicable maxDisplacement value, as well
as the required size of the de-interleave buffer. Senders SHOULD as the required size of the de-interleave buffer. Senders SHOULD
signal values that are not larger than the strictly required signal values that are not larger than the strictly required values;
values; if larger values are signaled, the receiver will buffer if larger values are signaled, the receiver will buffer excessively.
excessively.
Note that for low bit-rate material, the applied interleaving Note that for low bit-rate material, the applied interleaving may
may make packets shorter than the MTU size. make packets shorter than the MTU size.
3.2.3.4 Crucial and non-crucial AUs with MPEG-4 System data 3.2.3.4. Crucial and Non-Crucial AUs with MPEG-4 System Data
Some Access Units with MPEG-4 system data, called "crucial" AUs, Some Access Units with MPEG-4 system data, called "crucial" AUs,
carry information whose loss cannot be tolerated, either in the carry information whose loss cannot be tolerated, either in the
presentation or in the decoder. At each crucial AU in an MPEG-4 presentation or in the decoder. At each crucial AU in an MPEG-4
system stream, the stream state changes. The stream-state MAY system stream, the stream state changes. The stream-state MAY remain
remain constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4 constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4 system
system streams use the AU_SequenceNumber to signal stream states. streams use the AU_SequenceNumber to signal stream states.
Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set
position of node X", AU3 = "Set position of node X". AU1 is crucial, position of node X", AU3 = "Set position of node X". AU1 is crucial,
since if it is lost, AU2 cannot be executed. However, AU2 is not since if it is lost, AU2 cannot be executed. However, AU2 is not
crucial, since AU3 can be executed even if AU2 is lost. crucial, since AU3 can be executed even if AU2 is lost.
When a crucial AU is (possibly) lost, the stream is corrupted. For When a crucial AU is (possibly) lost, the stream is corrupted. For
example, when an AU is lost and the stream state has changed at the example, when an AU is lost and the stream state has changed at the
next received AU, then it is possible that the lost AU was crucial. next received AU, then it is possible that the lost AU was crucial.
Once corrupted, the stream remains corrupted until the next random Once corrupted, the stream remains corrupted until the next random
access point. Note that loss of non-crucial AUs does not corrupt the access point. Note that loss of non-crucial AUs does not corrupt the
stream. When a decoder starts receiving a stream, the decoder MUST stream. When a decoder starts receiving a stream, the decoder MUST
consider the stream corrupted until an AU is received that provides consider the stream corrupted until an AU is received that provides a
a random access point. random access point.
An AU that provides a random access point, as signaled by the An AU that provides a random access point, as signaled by the RAP-
RAP-flag, may be crucial or not. Non-crucial RAP AUs provide a flag, may or may not be crucial. Non-crucial RAP AUs provide a
"repeated" random access point for use by decoders that recently "repeated" random access point for use by decoders that recently
joined the stream or that need to re-start decoding after a stream joined the stream or that need to re-start decoding after a stream
corruption. Non-crucial RAP AUs MUST include all updates since the corruption. Non-crucial RAP AUs MUST include all updates since the
last crucial RAP AU. last crucial RAP AU.
Upon receiving AUs, decoders are to react as follows: Upon receiving AUs, decoders are to react as follows:
a) if the RAP-flag is set to 1 and the stream-state changes, then
the AU is a crucial RAP AU, and the AU MUST be decoded. a) if the RAP-flag is set to 1 and the stream-state changes, then the
AU is a crucial RAP AU, and the AU MUST be decoded.
b) if the RAP-flag is set to 1 and the stream state does not change, b) if the RAP-flag is set to 1 and the stream state does not change,
then the AU is a non-crucial RAP AU, and the receiver SHOULD then the AU is a non-crucial RAP AU, and the receiver SHOULD
decode it if the stream is corrupted. Otherwise, the decoder MUST decode it if the stream is corrupted. Otherwise, the decoder MUST
ignore the AU. ignore the AU.
c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless
the stream is corrupted, in which case the AU MUST be ignored. the stream is corrupted, in which case the AU MUST be ignored.
3.3 Usage of this specification 3.3. Usage of this Specification
3.3.1 General 3.3.1. General
Usage of this specification requires definition of a mode. A mode Usage of this specification requires definition of a mode. A mode
defines how to use this specification, as deemed appropriate. defines how to use this specification, as deemed appropriate.
Senders MUST signal the applied mode via the MIME format parameter Senders MUST signal the applied mode via the MIME format parameter
"mode", as specified in section 4.1. This specification defines a "mode", as specified in section 4.1. This specification defines a
generic mode that can be used for any MPEG-4 stream, as well as generic mode that can be used for any MPEG-4 stream, as well as
specific modes for transport of MPEG-4 CELP and MPEG-4 AAC streams, specific modes for the transportation of MPEG-4 CELP and MPEG-4 AAC
defined in ISO/IEC 14496-3. streams, defined in ISO/IEC 14496-3 [1].
When use of this payload format is signaled using SDP [5], an When use of this payload format is signaled using SDP [5], an
"rtpmap" attribute is part of that signaling. The same requirements "rtpmap" attribute is part of that signaling. The same requirements
apply for the rtpmap attribute in any mode compliant to this apply for the rtpmap attribute in any mode compliant to this
specification. The general form of an rtpmap attribute is: specification. The general form of an rtpmap attribute is:
a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding
parameters>] parameters>]
For audio streams, <encoding parameters> specifies the number of For audio streams, <encoding parameters> specifies the number of
audio channels: 2 for stereo material (see RFC 2327 [5]) and 1 for audio channels: 2 for stereo material (see RFC 2327 [5]) and 1 for
mono. Provided no additional parameters are needed, this parameter mono. Provided no additional parameters are needed, this parameter
may be omitted for mono material, hence its default value is 1. may be omitted for mono material, hence its default value is 1.
3.3.2 The generic mode 3.3.2. The Generic Mode
The generic mode can be used for any MPEG-4 stream. In this mode The generic mode can be used for any MPEG-4 stream. In this mode, no
no mode-specific constraints are applied; hence, in the generic mode-specific constraints are applied; hence, in the generic mode,
mode the full flexibility of this specification can be exploited. the full flexibility of this specification can be exploited. The
The generic mode is signaled by mode=generic. generic mode is signaled by mode=generic.
An example is given below for transport of a BIFS-Anim stream. In An example is given below for the transportation of a BIFS-Anim
this example carriage of multiple BIFS-Anim Access Units is allowed stream. In this example carriage of multiple BIFS-Anim Access Units
in one RTP packet. The AU-header contains the AU-size field, the is allowed in one RTP packet. The AU-header contains the AU-size
CTS-flag and, if the CTS flag is set to 1, the CTS-delta field. The field, the CTS-flag and, if the CTS flag is set to 1, the CTS-delta
number of bits of the AU-size and the CTS-delta fields is 10 and field. The number of bits of the AU-size and the CTS-delta fields
16, respectively. The AU-header also contains the RAP-flag and the are 10 and 16, respectively. The AU-header also contains the RAP-
Stream-state of 4 bits. This results in an AU-header with a flag and the Stream-state of 4 bits. This results in an AU-header
total size of two or four octets per BIFS-Anim AU. The RTP time with a total size of two or four octets per BIFS-Anim AU. The RTP
stamp uses a 1 kHz clock. Note that the media type name is video, time stamp uses a 1 kHz clock. Note that the media type name is
because the BIFS-Anim stream is part of an audio-visual video, because the BIFS-Anim stream is part of an audio-visual
presentation. For conventions on media type names see section 4.1. presentation. For conventions on media type names, see section 4.1.
In detail: In detail:
m=video 49230 RTP/AVP 96 m=video 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/1000 a=rtpmap:96 mpeg4-generic/1000
a=fmtp:96 streamtype=3; profile-level-id=1807; mode=generic; a=fmtp:96 streamtype=3; profile-level-id=1807; mode=generic;
objectType=2; config=0842237F24001FB400094002C0; sizeLength=10; objectType=2; config=0842237F24001FB400094002C0; sizeLength=10;
CTSDeltaLength=16; randomAccessIndication=1; CTSDeltaLength=16; randomAccessIndication=1;
streamStateIndication=4 streamStateIndication=4
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
skipping to change at page 23, line 8 skipping to change at page 22, line 39
objectType=2; config=0842237F24001FB400094002C0; sizeLength=10; objectType=2; config=0842237F24001FB400094002C0; sizeLength=10;
CTSDeltaLength=16; randomAccessIndication=1; CTSDeltaLength=16; randomAccessIndication=1;
streamStateIndication=4 streamStateIndication=4
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
The hexadecimal value of the "config" parameter is the The hexadecimal value of the "config" parameter is the
BIFSConfiguration() as defined in ISO/IEC 14496-1. The BIFSConfiguration() as defined in ISO/IEC 14496-1. The
BIFSConfiguration() specifies that the BIFS stream is a BIFS-Anim BIFSConfiguration() specifies that the BIFS stream is a BIFS-Anim
stream. For the description of MIME parameters see section 4.1. stream. For the description of MIME parameters, see section 4.1.
3.3.3 Constant bit-rate CELP 3.3.3. Constant Bit-rate CELP
This mode is signaled by mode=CELP-cbr. In this mode one or more This mode is signaled by mode=CELP-cbr. In this mode, one or more
complete CELP frames of fixed size can be transported in one RTP complete CELP frames of fixed size can be transported in one RTP
packet; interleaving MUST NOT be used with this mode. The RTP packet; interleaving MUST NOT be used with this mode. The RTP
payload consists of one or more concatenated CELP frames, each of payload consists of one or more concatenated CELP frames, each of
the same size. CELP frames MUST NOT be fragmented when using this equal size. CELP frames MUST NOT be fragmented when using this mode.
mode. Both the AU Header Section and the Auxiliary Section MUST be Both the AU Header Section and the Auxiliary Section MUST be empty.
empty.
The MIME format parameter constantSize MUST be provided to specify The MIME format parameter constantSize MUST be provided to specify
the length of each CELP frame. the length of each CELP frame.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/16000/1 a=rtpmap:96 mpeg4-generic/16000/1
a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-cbr; config= a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-cbr; config=
440E00; constantSize=27; constantDuration=240 440E00; constantSize=27; constantDuration=240
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
The hexadecimal value of the "config" parameter is the The hexadecimal value of the "config" parameter is the
AudioSpecificConfig()as defined in ISO/IEC 14496-3. AudioSpecificConfig()as defined in ISO/IEC 14496-3.
AudioSpecificConfig() specifies a mono CELP stream with a sampling AudioSpecificConfig() specifies a mono CELP stream with a sampling
rate of 16 kHz at a fixed bitrate of 14.4 kb/s and 6 sub-frames per rate of 16 kHz at a fixed bitrate of 14.4 kb/s and 6 sub-frames per
CELP frame. For the description of MIME parameters see section 4.1. CELP frame. For the description of MIME parameters, see section 4.1.
3.3.4 Variable bit-rate CELP 3.3.4. Variable Bit-rate CELP
This mode is signaled by mode=CELP-vbr. With this mode one or more This mode is signaled by mode=CELP-vbr. With this mode, one or more
complete CELP frames of variable size can be transported in one RTP complete CELP frames of variable size can be transported in one RTP
packet with OPTIONAL interleaving. As CELP frames are very small, packet with OPTIONAL interleaving. In this mode, the largest
while the largest possible AU-size in this mode is greater than the possible value for AU-size is greater than the maximum CELP frame
maximum CELP frame size, there is no support for fragmentation of size. Because CELP frames are very small, there is no support for
CELP frames. Hence CELP frames MUST NOT be fragmented when using fragmentation of CELP frames. Hence, CELP frames MUST NOT be
this mode. fragmented when using this mode.
In this mode the RTP payload consists of the AU Header Section, In this mode, the RTP payload consists of the AU Header Section,
followed by one or more concatenated CELP frames. The Auxiliary followed by one or more concatenated CELP frames. The Auxiliary
Section MUST be empty. For each CELP frame contained in the payload Section MUST be empty. For each CELP frame contained in the payload,
there MUST be a one octet AU-header in the AU Header Section to there MUST be a one octet AU-header in the AU Header Section to
provide: provide:
(a) the size of each CELP frame in the payload and
(b) index information for computing the sequence (and hence timing)
of each CELP frame.
Transport of CELP frames requires that the AU-size field is coded a) the size of each CELP frame in the payload and
with 6 bits. In this mode therefore 6 bits are allocated to the
AU-size field, and 2 bits to the AU-Index(-delta) field. Each b) index information for computing the sequence (and hence timing) of
AU-Index field MUST be coded with the value 0. In the AU Header each CELP frame.
Section, the concatenated AU-headers are preceded by the 16-bit
AU-headers-length field, as specified in section 3.2.1. Transport of CELP frames requires that the AU-size field be coded
with 6 bits. Therefore, in this mode 6 bits are allocated to the
AU-size field, and 2 bits to the AU-Index(-delta) field. Each AU-
Index field MUST be coded with the value 0. In the AU Header
Section, the concatenated AU-headers are preceded by the 16-bit AU-
headers-length field, as specified in section 3.2.1.
In addition to the required MIME format parameters, the following In addition to the required MIME format parameters, the following
parameters MUST be present: sizeLength, indexLength, and parameters MUST be present: sizeLength, indexLength, and
indexDeltaLength. CELP frames have fixed time duration per Access indexDeltaLength. CELP frames always have a fixed duration per
Unit; when interleaving in this mode, the applicable duration MUST Access Unit; when interleaving in this mode, this specific duration
be signaled by the MIME format parameter constantDuration. In MUST be signaled by the MIME format parameter constantDuration. In
addition, the parameter maxDisplacement MUST be present when addition, the parameter maxDisplacement MUST be present when
interleaving. interleaving.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/16000/1 a=rtpmap:96 mpeg4-generic/16000/1
a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-vbr; config= a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-vbr; config=
440F20; sizeLength=6; indexLength=2; indexDeltaLength=2; 440F20; sizeLength=6; indexLength=2; indexDeltaLength=2;
constantDuration=160; maxDisplacement=5 constantDuration=160; maxDisplacement=5
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page; it comprises
a single line in the SDP file. a single line in the SDP file.
The hexadecimal value of the "config" parameter is the The hexadecimal value of the "config" parameter is the
AudioSpecificConfig()as defined in ISO/IEC 14496-3. AudioSpecificConfig()as defined in ISO/IEC 14496-3.
AudioSpecificConfig() specifies a mono CELP stream with a sampling AudioSpecificConfig() specifies a mono CELP stream with a sampling
rate of 16 kHz at a bitrate that varies between 13.9 and 16.2 kb/s rate of 16 kHz, at a bitrate that varies between 13.9 and 16.2 kb/s
and with 4 sub-frames per CELP frame. For the description of MIME and with 4 sub-frames per CELP frame. For the description of MIME
parameters see section 4.1. parameters, see section 4.1.
3.3.5 Low bit-rate AAC 3.3.5. Low Bit-rate AAC
This mode is signaled by mode=AAC-lbr. This mode supports transport This mode is signaled by mode=AAC-lbr. This mode supports the
of one or more complete AAC frames of variable size. In this mode transportation of one or more complete AAC frames of variable size.
the AAC frames are allowed to be interleaved and hence receivers In this mode, the AAC frames are allowed to be interleaved and hence
MUST support de-interleaving. The maximum size of an AAC frame in receivers MUST support de-interleaving. The maximum size of an AAC
this mode is 63 octets. AAC frames MUST NOT be fragmented when frame in this mode is 63 octets. AAC frames MUST NOT be fragmented
using this mode. Hence, when using this mode, encoders MUST ensure when using this mode. Hence, when using this mode, encoders MUST
that the size of each AAC frame is at most 63 octets. ensure that the size of each AAC frame is at most 63 octets.
The payload configuration in this mode is the same as in the The payload configuration in this mode is the same as in the variable
variable bit-rate CELP mode as defined in 3.3.4. The RTP payload bit-rate CELP mode as defined in 3.3.4. The RTP payload consists of
consists of the AU Header Section, followed by concatenated AAC the AU Header Section, followed by concatenated AAC frames. The
frames. The Auxiliary Section MUST be empty. For each AAC frame Auxiliary Section MUST be empty. For each AAC frame contained in the
contained in the payload the one octet AU-header MUST provide: payload, the one octet AU-header MUST provide:
(a) the size of each AAC frame in the payload and
(b) index information for computing the sequence (and hence timing) a) the size of each AAC frame in the payload and
of each AAC frame.
b) index information for computing the sequence (and hence timing) of
each AAC frame.
In the AU-header Section, the concatenated AU-headers MUST be In the AU-header Section, the concatenated AU-headers MUST be
preceded by the 16-bit AU-headers-length field, as specified in preceded by the 16-bit AU-headers-length field, as specified in
section 3.2.1. section 3.2.1.
In addition to the required MIME format parameters, the following In addition to the required MIME format parameters, the following
parameters MUST be present: sizeLength, indexLength, and parameters MUST be present: sizeLength, indexLength, and
indexDeltaLength. AAC frames have fixed time duration per Access indexDeltaLength. AAC frames always have a fixed duration per Access
Unit; when interleaving in this mode, the applicable duration MUST Unit; when interleaving in this mode, this specific duration MUST be
be signaled by the MIME format parameter constantDuration. In signaled by the MIME format parameter constantDuration. In addition,
addition, the parameter maxDisplacement MUST be present when the parameter maxDisplacement MUST be present when interleaving.
interleaving.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/22050/1 a=rtpmap:96 mpeg4-generic/22050/1
a=fmtp:96 streamtype=5; profile-level-id=14; mode=AAC-lbr; config= a=fmtp:96 streamtype=5; profile-level-id=14; mode=AAC-lbr; config=
1388; sizeLength=6; indexLength=2; indexDeltaLength=2; 1388; sizeLength=6; indexLength=2; indexDeltaLength=2;
constantDuration=1024; maxDisplacement=5 constantDuration=1024; maxDisplacement=5
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page; it comprises
a single line in the SDP file. a single line in the SDP file.
The hexadecimal value of the "config" parameter is the The hexadecimal value of the "config" parameter is the
AudioSpecificConfig() as defined in ISO/IEC 14496-3. AudioSpecificConfig(), as defined in ISO/IEC 14496-3.
AudioSpecificConfig() specifies a mono AAC stream with a sampling AudioSpecificConfig() specifies a mono AAC stream with a sampling
rate of 22.05 kHz. For the description of MIME parameters see rate of 22.05 kHz. For the description of MIME parameters, see
section 4.1. section 4.1.
3.3.6 High bit-rate AAC 3.3.6. High Bit-rate AAC
This mode is signaled by mode=AAC-hbr. This mode supports transport This mode is signaled by mode=AAC-hbr. This mode supports the
of variable size AAC frames. In one RTP packet either one or more transportation of variable size AAC frames. In one RTP packet,
complete AAC frames are carried, or a single fragment of an AAC either one or more complete AAC frames are carried, or a single
frame. In this mode the AAC frames are allowed to be interleaved fragment of an AAC frame is carried. In this mode, the AAC frames
and hence receivers MUST support de-interleaving. The maximum size are allowed to be interleaved and hence receivers MUST support de-
of an AAC frame in this mode is 8191 octets. interleaving. The maximum size of an AAC frame in this mode is 8191
octets.
In this mode the RTP payload consists of the AU Header Section, In this mode, the RTP payload consists of the AU Header Section,
followed by either one AAC frame, several concatenated AAC frames followed by either one AAC frame, several concatenated AAC frames or
or one fragmented AAC frame. The Auxiliary Section MUST be empty. one fragmented AAC frame. The Auxiliary Section MUST be empty. For
For each AAC frame contained in the payload there MUST be an each AAC frame contained in the payload, there MUST be an AU-header
AU-header in the AU Header Section to provide: in the AU Header Section to provide:
(a) the size of each AAC frame in the payload and
(b) index information for computing the sequence (and hence timing)
of each AAC frame.
To code the maximum size of an AAC frame requires 13 bits. Therefore a) the size of each AAC frame in the payload and
in this configuration 13 bits are allocated to the AU-size, and
3 bits to the AU-Index(-delta) field. Thus each AU-header has a size b) index information for computing the sequence (and hence timing) of
of 2 octets. Each AU-Index field MUST be coded with the value 0. In each AAC frame.
the AU Header Section, the concatenated AU-headers MUST be preceded
by the 16-bit AU-headers-length field, as specified in To code the maximum size of an AAC frame requires 13 bits.
Therefore, in this configuration 13 bits are allocated to the AU-
size, and 3 bits to the AU-Index(-delta) field. Thus, each AU-header
has a size of 2 octets. Each AU-Index field MUST be coded with the
value 0. In the AU Header Section, the concatenated AU-headers MUST
be preceded by the 16-bit AU-headers-length field, as specified in
section 3.2.1. section 3.2.1.
In addition to the required MIME format parameters, the following In addition to the required MIME format parameters, the following
parameters MUST be present: sizeLength, indexLength, and parameters MUST be present: sizeLength, indexLength, and
indexDeltaLength. AAC frames have fixed time duration per Access indexDeltaLength. AAC frames always have a fixed duration per Access
Unit; when interleaving in this mode, the applicable duration MUST Unit; when interleaving in this mode, this specific duration MUST be
be signaled by the MIME format parameter constantDuration. In signaled by the MIME format parameter constantDuration. In addition,
addition, the parameter maxDisplacement MUST be present when the parameter maxDisplacement MUST be present when interleaving.
interleaving.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/48000/6 a=rtpmap:96 mpeg4-generic/48000/6
a=fmtp:96 streamtype=5; profile-level-id=16; mode=AAC-hbr; a=fmtp:96 streamtype=5; profile-level-id=16; mode=AAC-hbr;
config=11B0; sizeLength=13; indexLength=3; config=11B0; sizeLength=13; indexLength=3;
indexDeltaLength=3; constantDuration=1024 indexDeltaLength=3; constantDuration=1024
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page; it comprises
a single line in the SDP file. a single line in the SDP file.
The hexadecimal value of the "config" parameter is the The hexadecimal value of the "config" parameter is the
AudioSpecificConfig() as defined in ISO/IEC 14496-3. AudioSpecificConfig(), as defined in ISO/IEC 14496-3.
AudioSpecificConfig() specifies a 5.1 channel AAC stream with a AudioSpecificConfig() specifies a 5.1 channel AAC stream with a
sampling rate of 48 kHz. For the description of MIME parameters see sampling rate of 48 kHz. For the description of MIME parameters, see
section 4.1. section 4.1.
3.3.7 Additional modes 3.3.7. Additional Modes
This specification only defines the modes specified in sections This specification only defines the modes specified in sections 3.3.2
3.3.2 up to 3.3.6. Additional modes are expected to be defined in through 3.3.6. Additional modes are expected to be defined in future
future RFCs. Each additional mode MUST be in full compliance with RFCs. Each additional mode MUST be in full compliance with this
this specification. specification.
Any new mode MUST be defined such that an implementation including Any new mode MUST be defined such that an implementation including
all the features of this specification can decode the payload format all the features of this specification can decode the payload format
corresponding to this new mode. For this reason a mode MUST NOT corresponding to this new mode. For this reason, a mode MUST NOT
specify new default values for MIME parameters. In particular, MIME specify new default values for MIME parameters. In particular, MIME
parameters that configure the RTP payload MUST be present (unless parameters that configure the RTP payload MUST be present (unless
they have the default value), even if its presence is redundant in they have the default value), even if its presence is redundant in
case the mode assigns a fixed value to a parameter. A mode may case the mode assigns a fixed value to a parameter. A mode may
define additionally that some MIME parameters are required instead additionally define that some MIME parameters are required instead of
of optional, that some MIME parameters have fixed values (or optional, that some MIME parameters have fixed values (or ranges),
ranges), and that there are rules restricting the usage. and that there are rules restricting its usage.
4. IANA considerations 4. IANA Considerations
This section describes the MIME types and names associated with This section describes the MIME types and names associated with this
this payload format. Section 4.1 registers the MIME types, as per payload format. Section 4.1 registers the MIME types, as per RFC
RFC 2048 [3]. 2048 [3].
This format may require additional information about the mapping to This format may require additional information about the mapping to
be made available to the receiver. This is done using parameters be made available to the receiver. This is done using parameters
also described in the next section. described in the next section.
4.1 MIME type registration 4.1. MIME Type Registration
MIME media type name: "video" or "audio" or "application" MIME media type name: "video" or "audio" or "application"
"video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2) "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2) or
or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information
needed for an audio/visual presentation. needed for an audio/visual presentation.
"audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3) "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or
or MPEG-4 Systems streams that convey information needed for an MPEG-4 Systems streams that convey information needed for an audio
audio only presentation. only presentation.
"application" MUST be used for MPEG-4 Systems streams (ISO/IEC "application" MUST be used for MPEG-4 Systems streams (ISO/IEC
14496-1) that serve purposes other than audio/visual presentation, 14496-1) that serve purposes other than audio/visual presentation,
e.g. in some cases when MPEG-J (Java) streams are transmitted. e.g., in some cases when MPEG-J (Java) streams are transmitted.
Depending on the required payload configuration, MIME format Depending on the required payload configuration, MIME format
parameters need to be available to the receiver. This is done using parameters may need to be available to the receiver. This is done
the parameters described in the next section. There are required using the parameters described in the next section. There are
and optional parameters. required and optional parameters.
Optional parameters are of two types: general parameters and Optional parameters are of two types: general parameters and
configuration parameters. The configuration parameters are used to configuration parameters. The configuration parameters are used to
configure the fields in the AU Header section and in the auxiliary configure the fields in the AU Header section and in the auxiliary
section. The absence of any configuration parameter is equivalent to section. The absence of any configuration parameter is equivalent to
the associated field set to its default value, which is always zero. the associated field set to its default value, which is always zero.
The absence of all configuration parameters resolves into a default The absence of all configuration parameters results in a default
"basic" configuration with an empty AU-header section and an empty "basic" configuration with an empty AU-header section and an empty
auxiliary section in each RTP packet. auxiliary section in each RTP packet.
MIME subtype name: mpeg4-generic MIME subtype name: mpeg4-generic
Required parameters: Required parameters:
MIME format parameters are not case dependent; however for clarity MIME format parameters are not case dependent; for clarity however,
both upper and lower case are used in the names of the parameters both upper and lower case are used in the names of the parameters
described in this specification. described in this specification.
streamType: streamType:
The integer value that indicates the type of MPEG-4 stream that The integer value that indicates the type of MPEG-4 stream that is
is carried; its coding corresponds to the values of the carried; its coding corresponds to the values of the streamType,
streamType as defined in Table 9 (streamType Values) in ISO/IEC as defined in Table 9 (streamType Values) in ISO/IEC 14496-1.
14496-1.
profile-level-id: profile-level-id:
A decimal representation of the MPEG-4 Profile Level indication. A decimal representation of the MPEG-4 Profile Level indication.
This parameter MUST be used in the capability exchange or This parameter MUST be used in the capability exchange or session
session set-up procedure to indicate the MPEG-4 Profile and Level set-up procedure to indicate the MPEG-4 Profile and Level
combination of which the relevant MPEG-4 media codec is capable combination of which the relevant MPEG-4 media codec is capable.
of.
For MPEG-4 Audio streams, this parameter is the decimal value For MPEG-4 Audio streams, this parameter is the decimal value from
from Table 5 (audioProfileLevelIndication Values) in ISO/IEC Table 5 (audioProfileLevelIndication Values) in ISO/IEC 14496-
14496-1, indicating which MPEG-4 Audio tool subsets are 1, indicating which MPEG-4 Audio tool subsets are required to
required to decode the audio stream. decode the audio stream.
For MPEG-4 Visual streams, this parameter is the decimal value For MPEG-4 Visual streams, this parameter is the decimal value
from Table G-1 (FLC table for profile and level indication) of from Table G-1 (FLC table for profile and level indication) of
ISO/IEC 14496-2, indicating which MPEG-4 Visual tool subsets ISO/IEC 14496-2 [1], indicating which MPEG-4 Visual tool
are required to decode the visual stream. subsets are required to decode the visual stream.
For BIFS streams, this parameter is the decimal value that is
obtained from (SPLI + 256*GPLI), where: For BIFS streams, this parameter is the decimal value obtained
from (SPLI + 256*GPLI), where:
SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with
the applied sceneProfileLevelIndication; the applied sceneProfileLevelIndication;
GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with
the applied graphicsProfileLevelIndication. the applied graphicsProfileLevelIndication.
For MPEG-J streams, this parameter is the decimal value from
table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1, For MPEG-J streams, this parameter is the decimal value from table
indicating the profile and level of the MPEG-J stream. 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1, indicating
the profile and level of the MPEG-J stream.
For OD streams, this parameter is the decimal value from table 3 For OD streams, this parameter is the decimal value from table 3
(ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the
profile and level of the OD stream. profile and level of the OD stream.
For IPMP streams, this parameter has either the decimal value 0, For IPMP streams, this parameter has either the decimal value 0,
indicating an unspecified profile and level, or a value larger indicating an unspecified profile and level, or a value larger
than zero, indicating an MPEG-4 IPMP profile and level as than zero, indicating an MPEG-4 IPMP profile and level as
defined in a future MPEG-4 specification. defined in a future MPEG-4 specification.
For Clock Reference streams and Object Content Info streams, this For Clock Reference streams and Object Content Info streams, this
parameter has the decimal value zero, indicating that profile parameter has the decimal value zero, indicating that profile
and level information is conveyed through the OD framework. and level information is conveyed through the OD framework.
config: config:
A hexadecimal representation of an octet string that expresses A hexadecimal representation of an octet string that expresses the
the media payload configuration. Configuration data is mapped media payload configuration. Configuration data is mapped onto
onto the hexadecimal octet string in an MSB-first basis. The the hexadecimal octet string in an MSB-first basis. The first bit
first bit of the configuration data SHALL be located at the MSB of the configuration data SHALL be located at the MSB of the first
of the first octet. In the last octet, if necessary to achieve octet. In the last octet, if necessary to achieve octet-
octet-alignment, up to 7 zero-valued padding bits shall follow alignment, up to 7 zero-valued padding bits shall follow the
the configuration data. configuration data.
For MPEG-4 Audio streams, config is the audio object type
specific decoder configuration data AudioSpecificConfig() as For MPEG-4 Audio streams, config is the audio object type specific
defined in ISO/IEC 14496-3. For Structured Audio, the decoder configuration data AudioSpecificConfig(), as defined in
ISO/IEC 14496-3. For Structured Audio, the
AudioSpecificConfig() may be conveyed by other means, not AudioSpecificConfig() may be conveyed by other means, not
defined by this specification. If the AudioSpecificConfig() defined by this specification. If the AudioSpecificConfig() is
is conveyed by other means for Structured Audio, then the conveyed by other means for Structured Audio, then the config
config MUST be a quoted empty hexadecimal octet string, as MUST be a quoted empty hexadecimal octet string, as follows:
follows: config="". config="".
Note that a future mode of using this RTP payload format for Note that a future mode of using this RTP payload format for
Structured Audio may define such other means. Structured Audio may define such other means.
For MPEG-4 Visual streams, config is the MPEG-4 Visual For MPEG-4 Visual streams, config is the MPEG-4 Visual
configuration information as defined in subclause 6.2.1 Start configuration information as defined in subclause 6.2.1, Start
codes of ISO/IEC 14496-2. The configuration information codes of ISO/IEC 14496-2. The configuration information
indicated by this parameter SHALL be the same as the indicated by this parameter SHALL be the same as the
configuration information in the corresponding MPEG-4 Visual configuration information in the corresponding MPEG-4 Visual
stream, except for first-half-vbv-occupancy and stream, except for first-half-vbv-occupancy and latter-half-
latter-half-vbv-occupancy, if it exists, which may vary in vbv-occupancy, if it exists, which may vary in the repeated
the repeated configuration information inside an MPEG-4 configuration information inside an MPEG-4 Visual stream (See
Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2). 6.2.1 Start codes of ISO/IEC 14496-2).
For BIFS streams, this is the BIFSConfig() information as defined For BIFS streams, this is the BIFSConfig() information as defined
in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in in ISO/IEC 14496-1. Version 1 of BIFSConfig is defined in
section 9.3.5.2, and for version 2 in section 9.3.5.3. The section 9.3.5.2, and version 2 is defined in section 9.3.5.3.
MIME format parameter objectType signals the version of The MIME format parameter objectType signals the version of
BIFSConfig. BIFSConfig.
For IPMP streams, this is either a quoted empty hexadecimal octet For IPMP streams, this is either a quoted empty hexadecimal octet
string, indicating the absence of any decoder configuration string, indicating the absence of any decoder configuration
information (config=""), or the IPMPConfiguration() as information (config=""), or the IPMPConfiguration() as will be
defined in a future MPEG-4 IPMP specification. defined in a future MPEG-4 IPMP specification.
For Object Content Info (OCI) streams, this is the For Object Content Info (OCI) streams, this is the
OCIDecoderConfiguration() information of the OCI stream, as OCIDecoderConfiguration() information of the OCI stream, as
defined in section 8.4.2.4 in ISO/IEC 14496-1. defined in section 8.4.2.4 in ISO/IEC 14496-1.
For OD streams, Clock Reference streams and MPEG-J streams, this For OD streams, Clock Reference streams and MPEG-J streams, this
is a quoted empty hexadecimal octet string (config=""), as is a quoted empty hexadecimal octet string (config=""), as no
no information on the decoder configuration is required. information on the decoder configuration is required.
mode: mode:
The mode in which this specification is used. The following modes The mode in which this specification is used. The following modes
can be signaled: can be signaled:
mode=generic, mode=generic,
mode=CELP-cbr, mode=CELP-cbr,
mode=CELP-vbr, mode=CELP-vbr,
mode=AAC-lbr and mode=AAC-lbr and
mode=AAC-hbr. mode=AAC-hbr.
Other modes are expected to be defined in future RFCs. See also Other modes are expected to be defined in future RFCs. See also
section 3.3.7 and 4.2 of RFC xxxx. section 3.3.7 and 4.2 of RFC 3640.
Optional general parameters: Optional general parameters:
objectType: objectType:
The decimal value from Table 8 in ISO/IEC 14496-1, indicating The decimal value from Table 8 in ISO/IEC 14496-1, indicating the
the value of the objectTypeIndication of the transported stream. value of the objectTypeIndication of the transported stream. For
For BIFS streams this parameter MUST be present to signal the BIFS streams, this parameter MUST be present to signal the version
version of BIFSConfiguration(). Note that objectTypeIndication of BIFSConfiguration(). Note that objectTypeIndication may signal
may signal a non-MPEG-4 stream and that the RTP payload format a non-MPEG-4 stream and that the RTP payload format defined in
defined in this document may not be suitable to carry a stream this document may not be suitable for carrying a stream that is
that is not defined by MPEG-4. The objectType parameter SHOULD not defined by MPEG-4. The objectType parameter SHOULD NOT be set
NOT be set to a value that signals a stream that cannot be to a value that signals a stream that cannot be carried by this
carried by this payload format. payload format.
constantSize: constantSize:
The constant size in octets of each Access Unit for this stream. The constant size in octets of each Access Unit for this stream.
The constantSize and the sizeLength parameters MUST NOT be The constantSize and the sizeLength parameters MUST NOT be
simultaneously present. simultaneously present.
constantDuration: constantDuration:
The constant duration of each Access Unit for this stream, The constant duration of each Access Unit for this stream,
measured with the same units as the RTP time stamp. measured with the same units as the RTP time stamp.
maxDisplacement: maxDisplacement:
The decimal representation of the maximum displacement in time The decimal representation of the maximum displacement in time of
of an interleaved AU, as defined in section 3.2.3.3, expressed an interleaved AU, as defined in section 3.2.3.3, expressed in
in units of the RTP time stamp clock. units of the RTP time stamp clock.
This parameter MUST be present when interleaving is applied. This parameter MUST be present when interleaving is applied.
de-interleaveBufferSize: de-interleaveBufferSize:
The decimal representation in number of octets of the size of The decimal representation in number of octets of the size of the
the de-interleave buffer, described in section 3.2.3.3. de-interleave buffer, described in section 3.2.3.3. When
When interleaving, this parameter MUST be present if the interleaving, this parameter MUST be present if the calculation of
calculation of the de-interleave buffer size given in 3.2.3.3 the de-interleave buffer size given in 3.2.3.3 and based on
and based on maxDisplacement and rate(max) under-estimates the maxDisplacement and rate(max) under-estimates the size of the
size of the de-interleave buffer. If this calculation does not de-interleave buffer. If this calculation does not under-estimate
under-estimate the size of the de-interleave buffer, then the the size of the de-interleave buffer, then the
de-interleaveBufferSize parameter SHOULD NOT be present. de-interleaveBufferSize parameter SHOULD NOT be present.
Optional configuration parameters: Optional configuration parameters:
sizeLength: sizeLength:
The number of bits on which the AU-size field is encoded in the The number of bits on which the AU-size field is encoded in the
AU-header. The sizeLength and the constantSize parameters MUST AU-header. The sizeLength and the constantSize parameters MUST
NOT be simultaneously present. NOT be simultaneously present.
indexLength: indexLength:
The number of bits on which the AU-Index is encoded in the first The number of bits on which the AU-Index is encoded in the first
AU-header. The default value of zero indicates the absence of AU-header. The default value of zero indicates the absence of the
the AU-Index field in each first AU-header. AU-Index field in each first AU-header.
indexDeltaLength: indexDeltaLength:
The number of bits on which the AU-Index-delta field is encoded The number of bits on which the AU-Index-delta field is encoded in
in any non-first AU-header. The default value of zero indicates any non-first AU-header. The default value of zero indicates the
the absence of the AU-Index-delta field in each non-first absence of the AU-Index-delta field in each non-first AU-header.
AU-header.
CTSDeltaLength: CTSDeltaLength:
The number of bits on which the CTS-delta field is encoded in The number of bits on which the CTS-delta field is encoded in the
the AU-header. AU-header.
DTSDeltaLength: DTSDeltaLength:
The number of bits on which the DTS-delta field is encoded in The number of bits on which the DTS-delta field is encoded in the
the AU-header. AU-header.
randomAccessIndication: randomAccessIndication:
A decimal value of zero or one, indicating whether the RAP-flag A decimal value of zero or one, indicating whether the RAP-flag is
is present in the AU-header. The decimal value of one indicates present in the AU-header. The decimal value of one indicates
presence of the RAP-flag, the default value zero its absence. presence of the RAP-flag, the default value zero indicates its
absence.
streamStateIndication: streamStateIndication:
The number of bits on which the Stream-state field is encoded in The number of bits on which the Stream-state field is encoded in
the AU-header. This parameter MAY be present when transporting the AU-header. This parameter MAY be present when transporting
MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio
and MPEG-4 video streams. and MPEG-4 video streams.
auxiliaryDataSizeLength: auxiliaryDataSizeLength:
The number of bits that is used to encode the auxiliary-data-size The number of bits that is used to encode the auxiliary-data-size
field. field.
Applications MAY use more parameters, in addition to those defined Applications MAY use more parameters, in addition to those defined
above. Each additional parameter MUST be registered with IANA, to above. Each additional parameter MUST be registered with IANA to
ensure that there is no clash of names. Each additional parameter ensure that there is not a clash of names. Each additional parameter
MUST be accompanied by a specification in the form of an RFC, MPEG MUST be accompanied by a specification in the form of an RFC, MPEG
standard, or other permanent and readily available reference (the standard, or other permanent and readily available reference (the
"Specification Required" policy defined in RFC 2434 [6]). Receivers "Specification Required" policy defined in RFC 2434 [6]). Receivers
MUST tolerate the presence of such additional parameters, but these MUST tolerate the presence of such additional parameters, but these
parameters SHALL NOT impact the decoding of receivers that comply to parameters SHALL NOT impact the decoding of receivers that comply
this specification. with this specification.
Encoding considerations: Encoding considerations:
This MIME subtype is defined for RTP transport only. System This MIME subtype is defined for RTP transport only. System
bitstreams MUST be generated according to MPEG-4 Systems bitstreams MUST be generated according to MPEG-4 Systems
specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated
according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio
bitstreams MUST be generated according to MPEG-4 Audio bitstreams MUST be generated according to MPEG-4 Audio specifications
specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized (ISO/IEC 14496-3). The RTP packets MUST be packetized according to
according to the RTP payload format defined in RFC xxxx. the RTP payload format defined in RFC 3640.
Security considerations: Security considerations:
As defined in section 5 of RFC xxxx. As defined in section 5 of RFC 3640.
Interoperability considerations: Interoperability considerations:
MPEG-4 provides a large and rich set of tools for the coding of MPEG-4 provides a large and rich set of tools for the coding of
visual objects. For effective implementation of the standard, visual objects. For effective implementation of the standard,
subsets of the MPEG-4 tool sets have been provided for use in subsets of the MPEG-4 tool sets have been provided for use in
specific applications. These subsets, called 'Profiles', limit the specific applications. These subsets, called 'Profiles', limit the
size of the tool set a decoder is required to implement. In order to size of the tool set a decoder is required to implement. In order to
restrict computational complexity, one or more 'Levels' are set for restrict computational complexity, one or more 'Levels' are set for
each Profile. A Profile@Level combination allows: each Profile. A Profile@Level combination allows:
. a codec builder to implement only the subset of the standard he
needs, while maintaining interworking with other MPEG-4 devices . a codec builder to implement only the subset of the standard
that implement the same combination, and he needs, while maintaining interworking with other MPEG-4
devices that implement the same combination, and
. checking whether MPEG-4 devices comply with the standard . checking whether MPEG-4 devices comply with the standard
('conformance testing'). ('conformance testing').
A stream SHALL be compliant with the MPEG-4 Profile@Level specified A stream SHALL be compliant with the MPEG-4 Profile@Level specified
by the parameter "profile-level-id". Interoperability between a by the parameter "profile-level-id". Interoperability between a
sender and a receiver is achieved by specifying the parameter sender and a receiver is achieved by specifying the parameter
"profile-level-id" in MIME content. In the capability exchange / "profile-level-id" in MIME content. In the capability
announcement procedure this parameter may mutually be set to the exchange/announcement procedure, this parameter may mutually be set
same value. to the same value.
Published specification: Published specification:
The specifications for MPEG-4 streams are presented in ISO/IEC The specifications for MPEG-4 streams are presented in ISO/IEC
14496-1, 14496-2, and 14496-3. The RTP payload format is described 14496-1, 14496-2, and 14496-3. The RTP payload format is described
in RFC xxxx. in RFC 3640.
Applications which use this media type: Applications which use this media type:
Multimedia streaming and conferencing tools. Multimedia streaming and conferencing tools.
Additional information: none Additional information: none
Magic number(s): none Magic number(s): none
File extension(s): File extension(s):
None. A file format with the extension .mp4 has been defined for None. A file format with the extension .mp4 has been defined for
MPEG-4 content but is not directly correlated with this MIME type MPEG-4 content but is not directly correlated with this MIME type for
for which the sole purpose is RTP transport. which the sole purpose is RTP transport.
Macintosh File Type Code(s): none Macintosh File Type Code(s): none
Person & email address to contact for further information: Person & email address to contact for further information:
Authors of RFC xxxx, IETF Audio/Video Transport working group. Authors of RFC 3640, IETF Audio/Video Transport working group.
Intended usage: COMMON Intended usage: COMMON
Author/Change controller: Author/Change controller:
Authors of RFC xxxx, IETF Audio/Video Transport working group. Authors of RFC 3640, IETF Audio/Video Transport working group.
4.2 Registration of mode definitions with IANA 4.2. Registration of Mode Definitions with IANA
This specification can be used in a number of modes. The mode of This specification can be used in a number of modes. The mode of
operation is signaled using the "mode" MIME parameter, with the operation is signaled using the "mode" MIME parameter, with the
initial set of values specified in section 4.1. New modes may be initial set of values specified in section 4.1. New modes may be
defined at any time, as described in section 3.3.7. These modes defined at any time, as described in section 3.3.7. These modes MUST
MUST be registered with IANA, to ensure that there is no clash be registered with IANA, to ensure that there is not a clash of
of names. names.
A new mode registration MUST be accompanied by a specification in A new mode registration MUST be accompanied by a specification in the
the form of an RFC, MPEG standard, or other permanent and readily form of an RFC, MPEG standard, or other permanent and readily
available reference (the "Specification Required" policy defined available reference (the "Specification Required" policy defined in
in RFC 2434 [6]). RFC 2434 [6]).
4.3 Concatenation of parameters 4.3. Concatenation of Parameters
Multiple parameters SHOULD be expressed as a MIME media type string, Multiple parameters SHOULD be expressed as a MIME media type string,
in the form of a semicolon-separated list of parameter=value pairs in the form of a semicolon-separated list of parameter=value pairs
(for parameter usage examples see sections 3.3.2 up to 3.3.6). (for parameter usage examples see sections 3.3.2 up to 3.3.6).
4.4 Usage of SDP 4.4. Usage of SDP
4.4.1 The a=fmtp keyword 4.4.1. The a=fmtp Keyword
It is assumed that one typical way to transport the above-described It is assumed that one typical way to transport the above-described
parameters associated with this payload format is via a SDP message parameters associated with this payload format is via an SDP message
[5] for example transported to the client in reply to a RTSP [5] for example transported to the client in reply to an RTSP
DESCRIBE [8] or via SAP [11]. In that case the (a=fmtp) keyword DESCRIBE [8] or via SAP [11]. In that case, the (a=fmtp) keyword
MUST be used as described in RFC 2327 [5], section 6, the syntax MUST be used as described in RFC 2327 [5], section 6, the syntax then
being then: being:
a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>] a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>]
5. Security Considerations 5. Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [2]. This implies that confidentiality of the media specification [2]. This implies that confidentiality of the media
streams is achieved by encryption. Because the data compression used streams is achieved by encryption. Because the data compression used
with this payload format is applied end-to-end, encryption may be with this payload format is applied end-to-end, encryption may be
performed on the compressed data so there is no conflict between the performed on the compressed data so there is no conflict between the
two operations. The packet processing complexity of this payload two operations. The packet processing complexity of this payload
type (i.e. excluding media data processing) does not exhibit any type (i.e., excluding media data processing) does not exhibit any
significant non-uniformity in the receiver side to cause a denial- significant non-uniformity in the receiver side to cause a denial-
of-service threat. of-service threat.
However, it is possible to inject non-compliant MPEG streams (Audio, However, it is possible to inject non-compliant MPEG streams (Audio,
Video, and Systems) to overload the receiver/decoder's buffers, Video, and Systems) so that the receiver/decoder's buffers are
which might compromise the functionality of the receiver or even overloaded, which might compromise the functionality of the receiver
crash it. This is especially true for end-to-end systems like MPEG or even crash it. This is especially true for end-to-end systems
where the buffer models are precisely defined. like MPEG, where the buffer models are precisely defined.
MPEG-4 Systems supports stream types including commands that are MPEG-4 Systems support stream types including commands that are
executed on the terminal like OD commands, BIFS commands, etc. and executed on the terminal, like OD commands, BIFS commands, etc. and
programmatic content like MPEG-J (Java(TM) Byte Code) and MPEG-4 programmatic content like MPEG-J (Java(TM) Byte Code) and MPEG-4
scripts. It is possible to use one or more of the above in a scripts. It is possible to use one or more of the above in a manner
manner non-compliant to MPEG to crash the receiver or make it non-compliant to MPEG to crash the receiver or make it temporarily
temporarily unavailable. Senders that transport MPEG-4 content unavailable. Senders that transport MPEG-4 content SHOULD ensure
SHOULD ensure that such content is MPEG compliant, as defined in the that such content is MPEG compliant, as defined in the compliance
compliance part of IEC/ISO 14496 [1]. Receivers that support MPEG-4 part of IEC/ISO 14496 [1]. Receivers that support MPEG-4 content
content should prevent malfunctioning of the receiver in case of should prevent malfunctioning of the receiver in case of non MPEG
non MPEG compliant content. compliant content.
Authentication mechanisms can be used to validate the sender and Authentication mechanisms can be used to validate the sender and the
the data to prevent security problems due to non-compliant malignant data to prevent security problems due to non-compliant malignant
MPEG-4 streams. MPEG-4 streams.
In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems In ISO/IEC 14496-1, a security model is defined for MPEG-4 Systems
streams carrying MPEG-J access units which comprise Java(TM) classes streams carrying MPEG-J access units that comprise Java(TM) classes
and objects. MPEG-J defines a set of Java APIs and a secure and objects. MPEG-J defines a set of Java APIs and a secure
execution model. MPEG-J content can call this set of APIs and execution model. MPEG-J content can call this set of APIs and
Java(TM) methods from a set of Java packages supported in the Java(TM) methods from a set of Java packages supported in the
receiver within the defined security model. According to this receiver within the defined security model. According to this
security model, downloaded byte code is forbidden to load libraries, security model, downloaded byte code is forbidden to load libraries,
define native methods, start programs, read or write files, or read define native methods, start programs, read or write files, or read
system properties. system properties. Receivers can implement intelligent filters to
Receivers can implement intelligent filters to validate the buffer validate the buffer requirements or parametric (OD, BIFS, etc.) or
requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, programmatic (MPEG-J, MPEG-4 scripts) commands in the streams.
MPEG-4 scripts) commands in the streams. However, this can increase However, this can increase the complexity significantly.
the complexity significantly.
Implementors of MPEG-4 streaming over RTP who also implement MPEG-4 Implementors of MPEG-4 streaming over RTP who also implement MPEG-4
scripts (subset of ECMAScript) MUST ensure that the action of such scripts (subset of ECMAScript) MUST ensure that the action of such
scripts is limited solely to the domain of the single presentation scripts is limited solely to the domain of the single presentation in
in which they reside (thus disallowing session to session which they reside (thus disallowing session to session communication,
communication, access to local resources and storage, etc). Though access to local resources and storage, etc). Though loading static
loading static network-located resources (such as media) into the network-located resources (such as media) into the presentation
presentation should be permitted, network access by scripts MUST be should be permitted, network access by scripts MUST be restricted to
restricted to such (media) download. such a (media) download.
6. Acknowledgements 6. Acknowledgements
This document evolved through several revisions thanks to This document evolved into RFC 3640 after several revisions. Thanks
contributions by people from the ISMA forum, from the IETF AVT to contributions from people in the ISMA forum, the IETF AVT Working
Working Group and from the 4-on-IP ad-hoc group within MPEG. The Group and the 4-on-IP ad-hoc group within MPEG. The authors wish to
authors wish to thank all involved people, and in particular Andrea thank all people involved, particularly Andrea Basso, Stephen Casner,
Basso, Stephen Casner, M. Reha Civanlar, Carsten Herpel, John M. Reha Civanlar, Carsten Herpel, John Lazaro, Zvi Lifshitz, Young-
Lazaro, Zvi Lifshitz, Young-kwon Lim, Alex MacAulay, Bill May, kwon Lim, Alex MacAulay, Bill May, Colin Perkins, Dorairaj V and
Colin Perkins, Dorairaj V and Stephan Wenger for their valuable Stephan Wenger for their valuable comments and support.
comments and support.
7. References
7.1 Normative references
[1] ISO/IEC International Standard 14496 (MPEG-4); "Information
technology - Coding of audio-visual objects", January 2000
[2] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson RTP, "A
Transport Protocol for Real Time Applications", RFC 1889, Internet
Engineering Task Force, January 1996.
[3] N. Freed, J. Klensin, J. Postel, " Multipurpose Internet Mail
Extensions (MIME) Part Four: Registration Procedures", RFC 2048,
Internet Engineering Task Force, November 1996.
[4] S. Bradner, "Key words for use in RFCs to Indicate Requirement
Levels", RFC 2119, March 1997.
[5] M. Handley, V. Jacobson, "SDP: Session Description Protocol",
RFC 2327, Internet Engineering Task Force, April 1998.
[6] T. Narten, H. Alvestrand, " Guidelines for Writing an IANA
Considerations Section in RFCs", RFC 2434, October 1998.
7.2 Informative references
[7] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload
format for MPEG1/MPEG2 Video", RFC 2250, January 1998.
[8] H. Schulzrinne, A. Rao, R. Lanphier, "RTSP: Real-Time Session
Protocol", RFC 2326, Internet Engineering Task Force, April 1998.
[9] C. Perkins, O. Hodson, "Options for Repair of Streaming Media"
RFC 2354, Internet Engineering Task Force, June 1998.
[10] H. Schulzrinne, J. Rosenberg, "An RTP Payload Format for
Generic Forward Error Correction", RFC 2733, Internet Engineering
Task Force, December 1999.
[11] M. Handley, C. Perkins, E. Whelan, "SAP: Session Announcement
Protocol", RFC 2974, Internet Engineering Task Force, October 2000.
[12] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP
payload format for MPEG-4 Audio/Visual streams", RFC 3016, Internet
Engineering Task Force, November 2000.
8. Author Addresses
Jan van der Meer
Philips Electronics, MP4Net
Prof Holstlaan 4
Building WDB-1
5600 JZ Eindhoven
Netherlands
Email : jan.vandermeer@philips.com
David Mackie
Apple Computer, Inc.
One Infinite Loop, MS:302-2LF
Cupertino CA 95014
Email: dmackie@apple.com
Viswanathan Swaminathan
Sun Microsystems Inc.
901 San Antonio Road, M/S UMPK15-214
Palo Alto, CA 94303
Email: viswanathan.swaminathan@sun.com
David Singer
Apple Computer, Inc.
One Infinite Loop, MS:302-3MT
Cupertino CA 95014
Email: singer@apple.com
Philippe Gentric
Philips Electronics, MP4Net
51 rue Carnot
92156 Suresnes
France
e-mail: philippe.gentric@philips.com
Full Copyright Statement
Copyright (C) The Internet Society (August 2003). All Rights
Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain
it or assist in its implementation may be prepared, copied,
published and distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice and this
paragraph are included on all such copies and derivative works.
However, this document itself may not be modified in any way, such
as by removing the copyright notice or references to the Internet
Society or other Internet organizations, except as needed for the
purpose of developing Internet standards in which case the
procedures for copyrights defined in the Internet Standards process
MUST be followed, or as required to translate it into languages
other than English.
The limited permissions granted above are perpetual and will
not be revoked by the Internet Society or its successors or
assigns.
This document and the information contained herein is provided on APPENDIX: Usage of this Payload Format
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
APPENDIX: Usage of this payload format Appendix A. Interleave Analysis
Appendix A. Interleave analysis A. Examples of Delay Analysis with Interleave
A.1 Introduction A.1. Introduction
In this appendix interleaving issues are discussed. Some general Interleaving issues are discussed in this appendix. Some general
notes are provided on de-interleaving and error concealment, while notes are provided on de-interleaving and error concealment, while a
a number of interleaving patterns are examined, in particular number of interleaving patterns are examined, in particular for
for determining the maximum displacement in time and the size of determining the size of the de-interleave buffer and the maximum
the de-interleave buffer. In these examples, the maximum displacement of access units in time. In these examples, the maximum
displacement is cited in terms of an access unit count, for ease of displacement is cited in terms of an access unit count, for ease of
reading. In actual streams, it is signaled in units of the RTP reading. In actual streams, it is signaled in units of the RTP time
time stamp clock. stamp clock.
A.2 De-interleaving and error concealment A.2. De-interleaving and Error Concealment
This appendix does not describe any details on de-interleaving and This appendix does not describe any details on de-interleaving and
error concealment, as the control of the AU decoding and error error concealment, as the control of the AU decoding and error
concealment process has little to do with interleaving. If the concealment process has little to do with interleaving. If the next
next AU to be decoded is present and there is sufficient storage AU to be decoded is present and there is sufficient storage available
available for the decoded AU, then decode it now. If not, wait. for the decoded AU, then decode it immediately. If not, wait. When
When the decoding deadline is reached (i.e., the time when decoding the decoding deadline is reached (i.e., the time when decoding must
must begin in order to be completed by the time the AU is to be begin in order to be completed by the time the AU is to be
presented), or if the decoder is some hardware that presents a presented), or if the decoder is some hardware that presents a
constant delay between initiation of decoding of an AU and constant delay between initiation of decoding of an AU and
presentation of that AU, then decoding must begin at that deadline presentation of that AU, then decoding must begin at that deadline
time. time.
If the next AU to be decoded is not present when the decoding If the next AU to be decoded is not present when the decoding
deadline is reached, then that AU is lost so the receiver must take deadline is reached, then that AU is lost so the receiver must take
whatever error concealment measures is deemed appropriate. The whatever error concealment measures are deemed appropriate. The
play-out delay may need to be adjusted at that point (especially if play-out delay may need to be adjusted at that point (especially if
other AUs have also missed their deadline recently). Or, if it was other AUs have also missed their deadline recently). Or, if it was a
a momentary delay, and maintaining the latency is important, then momentary delay, and maintaining the latency is important, then the
the receiver should minimize the glitch and continue processing receiver should minimize the glitch and continue processing with the
with the next AU. next AU.
A.3 Simple Group interleave A.3. Simple Group Interleave
A.3.1 Introduction A.3.1. Introduction
An example of regular interleave is when packets are formed into An example of regular interleave is when packets are formed into
groups. If the 'stride' of the interleave (the distance between groups. If the 'stride' of the interleave (the distance between
interleaved AUs) is N, packet 0 could contain AU(0), AU(N), AU(2N), interleaved AUs) is N, packet 0 could contain AU(0), AU(N), AU(2N),
and so on; packet 1 could contain AU(1), AU(1+N), AU(1+2N), and so and so on; packet 1 could contain AU(1), AU(1+N), AU(1+2N), and so
on. If there are M access units in a packet, then there are M*N on. If there are M access units in a packet, then there are M*N
access units in the group. access units in the group.
An example with N=M=3 follows; note that this is the same example An example with N=M=3 follows; note that this is the same example as
as given in section 2.5 and that a fixed time duration per Access given in section 2.5 and that a fixed time duration per Access Unit
Unit is assumed: is assumed:
Packet Time stamp Carried AUs AU-Index, AU-Index-delta Packet Time stamp Carried AUs AU-Index, AU-Index-delta
P(0) T[0] 0, 3, 6 0, 2, 2 P(0) T[0] 0, 3, 6 0, 2, 2
P(1) T[1] 1, 4, 7 0, 2, 2 P(1) T[1] 1, 4, 7 0, 2, 2
P(2) T[2] 2, 5, 8 0, 2, 2 P(2) T[2] 2, 5, 8 0, 2, 2
P(3) T[9] 9,12,15 0, 2, 2 P(3) T[9] 9,12,15 0, 2, 2
In this example the AU-Index is present in the first AU-header and In this example, the AU-Index is present in the first AU-header and
coded with the value 0, as required for fixed duration AUs. The coded with the value 0, as required for fixed duration AUs. The
position of the first AU of each packet within the group is defined position of the first AU of each packet within the group is defined
by the RTP time stamp, while the AU-Index-delta field indicates the by the RTP time stamp, while the AU-Index-delta field indicates the
position of subsequent AUs relative to the first AU in the packet. position of subsequent AUs relative to the first AU in the packet.
All AU-Index-delta fields are coded with the value N-1, equal to 2 All AU-Index-delta fields are coded with the value N-1, equal to 2 in
in this example. Hence the RTP time stamp and the AU-Index-delta are this example. Hence the RTP time stamp and the AU-Index-delta are
used to reconstruct the original order. See also section 3.2.3.2. used to reconstruct the original order. See also section 3.2.3.2.
A.3.2 Determining the de-interleave buffer size A.3.2. Determining the De-interleave Buffer Size
For the regular pattern as in this example, figure 6 in section For the regular pattern as in this example, Figure 6 in section
3.2.3.3 shows that the de-interleave buffer stores at most 4 AUs. A 3.2.3.3 shows that the de-interleave buffer stores at most 4 AUs. A
de-interleaveBufferSize value may be signaled that is at least de-interleaveBufferSize value that is at least equal to the total
equal to the total number of octets of any 4 "early" AUs that are number of octets of any 4 "early" AUs that are stored at the same
stored at the same time. time may be signaled.
A.3.3 Determining the maximum displacement A.3.3. Determining the Maximum Displacement
For the regular pattern as in this example, figure 7 in section 3.3 For the regular pattern as in this example, Figure 7 in section 3.3
shows that the maximum displacement in time equals 5 AU periods. shows that the maximum displacement in time equals 5 AU periods.
Hence the minimum maxDisplacement value that must be signaled is 5 Hence, the minimum maxDisplacement value that must be signaled is 5
AU periods. In case each AU has the same size, this maxDisplacement AU periods. In case each AU has the same size, this maxDisplacement
value over-estimates the de-interleave buffer size with one AU. value over-estimates the de-interleave buffer size with one AU.
However, note that in case of variable AU sizes the total size of However, note that in case of variable AU sizes, the total size of
any 4 "early" AUs that must be stored at the same time may exceed any 4 "early" AUs that must be stored at the same time may exceed
maxDisplacement times the maximum bitrate, in which case the maxDisplacement times the maximum bitrate, in which case the de-
de-interleaveBufferSize must be signaled. interleaveBufferSize must be signaled.
A.4 More subtle group interleave A.4. More Subtle Group Interleave
A.4.1 Introduction A.4.1. Introduction
Another example of forming packets with group interleave is given Another example of forming packets with group interleave is given
below. In this example the packets are formed such that the loss of below. In this example, the packets are formed such that the loss of
two subsequent RTP packets does not cause the loss of two subsequent two subsequent RTP packets does not cause the loss of two subsequent
AUs. Note that in this example the RTP time stamps of packet 3 and AUs. Note that in this example, the RTP time stamps of packet 3 and
packet 4 are earlier than the RTP time stamps of packets 1 and 2, packet 4 are earlier than the RTP time stamps of packets 1 and 2,
respectively; a fixed time duration per Access Unit is assumed. respectively; a fixed time duration per Access Unit is assumed.
Packet Time stamp Carried AUs AU-Index, AU-Index-delta Packet Time stamp Carried AUs AU-Index, AU-Index-delta
0 T[0] 0, 5 0, 4 0 T[0] 0, 5 0, 4
1 T[2] 2, 7 0, 4 1 T[2] 2, 7 0, 4
2 T[4] 4, 9 0, 4 2 T[4] 4, 9 0, 4
3 T[1] 1, 6 0, 4 3 T[1] 1, 6 0, 4
4 T[3] 3, 8 0, 4 4 T[3] 3, 8 0, 4
5 T[10] 10, 15 0, 4 5 T[10] 10, 15 0, 4
and so on .. and so on ..
In this example the AU-Index is present in the first AU-header and In this example, the AU-Index is present in the first AU-header and
coded with the value 0, as required for AUs with a fixed duration. coded with the value 0, as required for AUs with a fixed duration.
To reconstruct the original order, the RTP time stamp and the To reconstruct the original order, the RTP time stamp and the AU-
AU-Index-delta (coded with the value 4) are used. See also Index-delta (coded with the value 4) are used. See also section
section 3.2.3.2. 3.2.3.2.
A.4.2 Determining the de-interleave buffer size A.4.2. Determining the De-interleave Buffer Size
From figure 8 it can be to determined that at most 5 "early" AUs From Figure 8, it can be to determined that at most 5 "early" AUs are
are to be stored. If the AUs are of constant size, then this value to be stored. If the AUs are of constant size, then this value
equals 5 times the AU size. The minimum size of the de-interleave equals 5 times the AU size. The minimum size of the de-interleave
buffer equals the maximum total number of octets of the "early" AUs buffer equals the maximum total number of octets of the "early" AUs
that are to be stored at the same time. This gives the minimum that are to be stored at the same time. This gives the minimum value
value of the de-interleaveBufferSize that may be signaled. of the de-interleaveBufferSize that may be signaled.
+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+
Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+
- - 5 - 5 - 2 7 4 9 - - 5 - 5 - 2 7 4 9
7 4 9 5 7 4 9 5
"Early" AUs 5 6 "Early" AUs 5 6
7 7 7 7
9 9 9 9
Figure 8: Storage of "early" AUs in the de-interleave buffer per Figure 8: Storage of "early" AUs in the de-interleave buffer per
interleaved AU. interleaved AU.
A.4.3 Determining the maximum displacement A.4.3. Determining the Maximum Displacement
From figure 9 it can be seen that the maximum displacement in time From Figure 9, it can be seen that the maximum displacement in time
equals 8 AU periods. Hence the minimum maxDisplacement value to be equals 8 AU periods. Hence the minimum maxDisplacement value to be
signaled is 8 AU periods. signaled is 8 AU periods.
+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+
Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+
Earliest not yet present AU - 1 1 1 1 1 - 3 - - Earliest not yet present AU - 1 1 1 1 1 - 3 - -
Figure 9: The earliest not yet present AU for each AU in the Figure 9: For each AU in the interleaving pattern, the earliest of
interleaving pattern. any earlier AUs not yet present
In case each AU has the same size, the found maxDisplacement value In case each AU has the same size, the found maxDisplacement value
over-estimates the de-interleave buffer size with three AUs. over-estimates the de-interleave buffer size with three AUs.
However, in case of variable AU sizes the total size of any 5 However, in case of variable AU sizes, the total size of any 5
"early" AUs stored at the same time may exceed maxDisplacement "early" AUs stored at the same time may exceed maxDisplacement times
times the maximum bitrate, in which case de-interleaveBufferSize the maximum bitrate, in which case de-interleaveBufferSize must be
must be signaled. signaled.
A.5 Continuous interleave A.5. Continuous Interleave
A.5.1 Introduction A.5.1. Introduction
In continuous interleave, once the scheme is 'primed', the number In continuous interleave, once the scheme is 'primed', the number of
of AUs in a packet exceeds the 'stride' (the distance between AUs in a packet exceeds the 'stride' (the distance between them).
them). This shortens the buffering needed, smooths the data-flow, This shortens the buffering needed, smoothes the data-flow, and gives
and gives slightly larger packets -- and thus lower overhead -- for slightly larger packets -- and thus lower overhead -- for the same
the same interleave. For example, here is a continuous interleave interleave. For example, here is a continuous interleave also over a
also over a stride of 3 AUs, but with 4 AUs per packet, for a run stride of 3 AUs, but with 4 AUs per packet, for a run of 20 AUs.
of 20 AUs. This shows both how the scheme 'starts up' and how it This shows both how the scheme 'starts up' and how it finishes. Once
finishes. Once again, the example assumes fixed time duration per again, the example assumes fixed time duration per Access Unit.
Access Unit.
Packet Time-stamp Carried AUs AU-Index, AU-Index-delta Packet Time-stamp Carried AUs AU-Index, AU-Index-delta
0 T[0] 0 0 0 T[0] 0 0
1 T[1] 1 4 0 2 1 T[1] 1 4 0 2
2 T[2] 2 5 8 0 2 2 2 T[2] 2 5 8 0 2 2
3 T[3] 3 6 9 12 0 2 2 2 3 T[3] 3 6 9 12 0 2 2 2
4 T[7] 7 10 13 16 0 2 2 2 4 T[7] 7 10 13 16 0 2 2 2
5 T[11] 11 14 17 20 0 2 2 2 5 T[11] 11 14 17 20 0 2 2 2
6 T[15] 15 18 0 2 6 T[15] 15 18 0 2
7 T[19] 19 0 7 T[19] 19 0
In this example the AU-Index is present in the first AU-header and In this example, the AU-Index is present in the first AU-header and
coded with the value 0, as required for AUs with a fixed duration. coded with the value 0, as required for AUs with a fixed duration.
To reconstruct the original order, the RTP time stamp and the To reconstruct the original order, the RTP time stamp and the
AU-Index-delta (coded with the value 2) are used. See also 3.2.3.2. AU-Index-delta (coded with the value 2) are used. See also 3.2.3.2.
Note that this example has RTP time-stamps in increasing order. Note that this example has RTP time-stamps in increasing order.
A.5.2 Determining the de-interleave buffer size A.5.2. Determining the De-interleave Buffer Size
For this example the de-interleave buffer size can be derived from For this example the de-interleave buffer size can be derived from
figure 10. The maximum number of "early" AUs is three. If the AUs Figure 10. The maximum number of "early" AUs is 3. If the AUs are
are of constant size, then this value equals 3 times the AU size. of constant size, then the de-interleave buffer size equals 3 times
Compared to the example in A.2, for constant size AUs the the AU size. Compared to the example in A.2, for constant size AUs
de-interleave buffer size is reduced from 4 to 3 times the AU size, the de-interleave buffer size is reduced from 4 to 3 times the AU
while maintaining the same 'stride'. size, while maintaining the same 'stride'.
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+- +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16| Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+- +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
- - - 4 - - 4 8 - - 8 12 - - - - - 4 - - 4 8 - - 8 12 - -
5 9 5 9
"Early" AUs 8 12 "Early" AUs 8 12
Figure 10: Storage of "early" AUs in the de-interleave buffer per Figure 10: Storage of "early" AUs in the de-interleave buffer per
interleaved AU. interleaved AU.
A.5.3 Determining the maximum displacement A.5.3. Determining the Maximum Displacement
For this example the maximum displacement has a value of 5 AU For this example, the maximum displacement has a value of 5 AU
periods. See figure 11. Compared to the example in A.2, the maximum periods. See Figure 11. Compared to the example in A.2, the maximum
displacement does not decrease, though in fact less de-interleave displacement does not decrease, though in fact less de-interleave
buffering is required. buffering is required.
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+- +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16| Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+- +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
Earliest not yet Earliest not yet
present AU - - 2 - 3 3 - - 7 7 - - 11 11 present AU - - 2 - 3 3 - - 7 7 - - 11 11
Figure 11: The earliest not yet present AU for each AU in the Figure 11: For each AU in the interleaving pattern, the earliest of
interleaving pattern. any earlier AUs not yet present
References
Normative References
[1] ISO/IEC International Standard 14496 (MPEG-4); "Information
technology - Coding of audio-visual objects", January 2000
[2] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
"RTP: A Transport Protocol for Real-Time Applications", RFC
3550, July 2003.
[3] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet
Mail Extensions (MIME) Part Four: Registration Procedures", BCP
13, RFC 2048, November 1996.
[4] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[5] Handley, M. and V. Jacobson, "SDP: Session Description
Protocol", RFC 2327, April 1998.
[6] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
Considerations Section in RFCs", BCP 26, RFC 2434, October 1998.
Informative References
[7] Hoffman, D., Fernando, G., Goyal, V. and M. Civanlar, "RTP
Payload Format for MPEG1/MPEG2 Video", RFC 2250, January 1998.
[8] Schulzrinne, H., Rao, A. and R. Lanphier, "Real-Time Session
Protocol (RTSP)", RFC 2326, April 1998.
[9] Perkins, C. and O. Hodson, "Options for Repair of Streaming
Media", RFC 2354, June 1998.
[10] Schulzrinne, H. and J. Rosenberg, "An RTP Payload Format for
Generic Forward Error Correction", RFC 2733, December 1999.
[11] Handley, M., Perkins, C. and E. Whelan, "Session Announcement
Protocol", RFC 2974, October 2000.
[12] Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y. and H. Kimata,
"RTP Payload Format for MPEG-4 Audio/Visual Streams", RFC 3016,
November 2000.
Authors' Addresses
Jan van der Meer
Philips Electronics
Prof Holstlaan 4
Building WAH-1
5600 JZ Eindhoven
Netherlands
EMail: jan.vandermeer@philips.com
David Mackie
Apple Computer, Inc.
One Infinite Loop, MS:302-3KS
Cupertino CA 95014
EMail: dmackie@apple.com
Viswanathan Swaminathan
Sun Microsystems Inc.
2600 Casey Avenue
Mountain View, CA 94043
EMail: viswanathan.swaminathan@sun.com
David Singer
Apple Computer, Inc.
One Infinite Loop, MS:302-3MT
Cupertino CA 95014
EMail: singer@apple.com
Philippe Gentric
Philips Electronics
51 rue Carnot
92156 Suresnes
France
EMail: philippe.gentric@philips.com
Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 

This html diff was produced by rfcdiff 1.25, available from http://www.levkowetz.com/ietf/tools/rfcdiff/