draft-ietf-avt-mpeg4-simple-04.txt   draft-ietf-avt-mpeg4-simple-05.txt 
Internet Engineering Task Force J. van der Meer Internet Engineering Task Force J. van der Meer
Internet Draft Philips Electronics Internet Draft Philips Electronics
D. Mackie D. Mackie
Cisco Systems Inc. Apple Computer
V. Swaminathan V. Swaminathan
Sun Microsystems Inc. Sun Microsystems Inc.
D. Singer D. Singer
Apple Computer Apple Computer
P. Gentric P. Gentric
Philips Electronics Philips Electronics
July 2002 December 2002
Expires January 2003 Expires June 2003
Document: draft-ietf-avt-mpeg4-simple-04.txt Document: draft-ietf-avt-mpeg4-simple-05.txt
Transport of MPEG-4 Elementary Streams Transport of MPEG-4 Elementary Streams
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts documents at any time. It is inappropriate to use Internet- Drafts
as reference material or to cite them other than as "work in as reference material or to cite them other than as "work in
progress." progress."
skipping to change at page 2, line 7 skipping to change at page 2, line 7
The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in
ISO that produced the MPEG-4 standard. MPEG defines tools to ISO that produced the MPEG-4 standard. MPEG defines tools to
compress content such as audio-visual information into elementary compress content such as audio-visual information into elementary
streams. This specification defines a simple, but generic RTP streams. This specification defines a simple, but generic RTP
payload format for transport of any non-multiplexed MPEG-4 payload format for transport of any non-multiplexed MPEG-4
elementary stream. elementary stream.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Carriage of MPEG-4 elementary streams over RTP . . . . . . . 4 2. Carriage of MPEG-4 elementary streams over RTP . . . . . . . 6
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 6
2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . . 4 2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . . 6
2.3. Concatenation of Access Units . . . . . . . . . . . . . . 4 2.3. Concatenation of Access Units . . . . . . . . . . . . . . 6
2.4. Fragmentation of Access Units . . . . . . . . . . . . . . 5 2.4. Fragmentation of Access Units . . . . . . . . . . . . . . 7
2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . . 5 2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . . 7
2.6. Time stamp information . . . . . . . . . . . . . . . . . . 6 2.6. Time stamp information . . . . . . . . . . . . . . . . . . 8
2.7. State indication of MPEG-4 system streams . . . . . . . . 6 2.7. State indication of MPEG-4 system streams . . . . . . . . 8
2.8. Random Access Indication . . . . . . . . . . . . . . . . . 6 2.8. Random Access Indication . . . . . . . . . . . . . . . . . 8
2.9. Carriage of auxiliary information . . . . . . . . . . . . 7 2.9. Carriage of auxiliary information . . . . . . . . . . . . 9
2.10. MIME format parameters and configuring conditional field . 7 2.10. MIME format parameters and configuring conditional field . 9
2.11. Global structure of payload format . . . . . . . . . . . . 7 2.11. Global structure of payload format . . . . . . . . . . . . 9
2.12. Modes to transport MPEG-4 streams . . . . . . . . . . . . 8 2.12. Modes to transport MPEG-4 streams . . . . . . . . . . . . 10
2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . 8 2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . 10
3. Payload format . . . . . . . . . . . . . . . . . . . . . . . 9 3. Payload format . . . . . . . . . . . . . . . . . . . . . . . 11
3.1. Usage of RTP header fields and RTCP . . . . . . . . . . . 9 3.1. Usage of RTP header fields and RTCP . . . . . . . . . . . 11
3.2. RTP payload structure . . . . . . . . . . . . . . . . . . 10 3.2. RTP payload structure . . . . . . . . . . . . . . . . . . 12
3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . 10 3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . 12
3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . 10 3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . 12
3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . 12 3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . 14
3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . 13 3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . 15
3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . 14 3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . 16
3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 14 3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 16
3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . 15 3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . 17
3.2.3.4. Crucial and non-crucial AUs with MPEG-4 System data . 16 3.3. Usage of this specification . . . . . . . . . . . . . . . 21
3.3. Usage of this specification . . . . . . . . . . . . . . . 17 3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . 21
3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . 17 3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . 22
3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . 18 3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . 22
3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . 18 3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . 23
3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . 19 3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . 24
3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . 20 3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . 25
3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . 21 4. IANA considerations . . . . . . . . . . . . . . . . . . . . 26
4. IANA considerations . . . . . . . . . . . . . . . . . . . . 22 4.1. MIME type registration . . . . . . . . . . . . . . . . . . 26
4.1. MIME type registration . . . . . . . . . . . . . . . . . . 22 4.2. Registration of mode definitions with IANA . . . . . . . . 31
4.2. Registration of mode definitions with IANA . . . . . . . . 27 4.3. Concatenation of parameters . . . . . . . . . . . . . . . 31
4.3. Concatenation of parameters . . . . . . . . . . . . . . . 27 4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . . 32
4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . . 28 4.4.1. The a=fmtp keyword . . . . . . . . . . . . . . . . . . . 32
4.4.1. The a=fmtp keyword . . . . . . . . . . . . . . . . . . . 28 5. Security considerations . . . . . . . . . . . . . . . . . . 32
5. Security considerations . . . . . . . . . . . . . . . . . . 28 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 29 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 33
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 8. Author addresses . . . . . . . . . . . . . . . . . . . . . . 34
8. Author addresses . . . . . . . . . . . . . . . . . . . . . . 30
APPENDIX: Usage of this payload format . . . . . . . . . . . 31 APPENDIX: Usage of this payload format . . . . . . . . . . . 36
A. Examples of delay analysis with interleave . . . . . . . 31 A. Examples of delay analysis with interleave . . . . . . . 36
A.1 Group interleave . . . . . . . . . . . . . . . . . . . . 31 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 36
A.2 Continuous interleave . . . . . . . . . . . . . . . . . 32 A.2 De-interleaving and error concealment . . . . . . . . . 36
A.3 Simple Group interleave . . . . . . . . . . . . . . . . 36
A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 36
A.3.2 Determining the de-interleave buffer size . . . . . . 37
A.3.3 Determining the maximum displacement . . . . . . . . . 37
A.4 More subtle group interleave . . . . . . . . . . . . . . 37
A.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 37
A.4.2 Determining the de-interleave buffer size . . . . . . 38
A.4.3 Determining the maximum displacement . . . . . . . . . 38
A.5 Continuous interleave . . . . . . . . . . . . . . . . . 38
A.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 38
A.5.2 Determining the de-interleave buffer size . . . . . . 39
A.5.3 Determining the maximum displacement . . . . . . . . . 39
1. Introduction 1. Introduction
The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29
that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4
standards [1]. The MPEG-4 standard specifies compression of standards [1]. The MPEG-4 standard specifies compression of
audio-visual data into for example an audio or video elementary audio-visual data into for example an audio or video elementary
stream. In the MPEG-4 standard, these streams take the form of stream. In the MPEG-4 standard, these streams take the form of
audiovisual objects that may be arranged into an audio-visual scene audio-visual objects that may be arranged into an audio-visual scene
by means of a scene description. Each MPEG-4 elementary stream by means of a scene description. Each MPEG-4 elementary stream
consists of a sequence of Access Units; examples of an Access Unit consists of a sequence of Access Units; examples of an Access Unit
(AU) are an audio frame and a video picture. (AU) are an audio frame and a video picture.
This specification defines a general and configurable payload This specification defines a general and configurable payload
structure to transport MPEG-4 elementary streams, in particular structure to transport MPEG-4 elementary streams, in particular
MPEG-4 audio (including speech) streams, MPEG-4 video streams and MPEG-4 audio (including speech) streams, MPEG-4 video streams and
also MPEG-4 systems streams, such as BIFS (BInary Format for also MPEG-4 systems streams, such as BIFS (BInary Format for
Scenes), OCI (Object Content Information), OD (Object Descriptor) Scenes), OCI (Object Content Information), OD (Object Descriptor)
and IPMP (Intellectual Property Management and Protection) streams. and IPMP (Intellectual Property Management and Protection) streams.
The RTP payload defined in this document is simple to implement and The RTP payload defined in this document is simple to implement and
reasonably efficient. It allows for optional interleaving of Access reasonably efficient. It allows for optional interleaving of Access
Units (such as audio frames) to increase error resiliency in packet Units (such as audio frames) to increase error resiliency in packet
loss. loss.
Some types of MPEG-4 elementary streams include "crucial"
information whose loss cannot be tolerated, but RTP does not provide
reliable transmission so receipt of that crucial information is not
assured. Section 3.2.3.4 specifies how stream state is conveyed so
that the receiver can detect the loss of crucial information and
cease decoding until the next random access point is received.
Applications transmitting streams that include crucial information,
such as OD commands, BIFS commands, or programmatic content such as
MPEG-J (Java) and ECMAScript, should include random access points
sufficiently often, depending upon the probability of loss, to
reduce stream corruption to an acceptable level. An example is the
carousel mechanism as defined by MPEG in ISO/IEC 14496-1.
Such applications may also employ additional protocols or services
to reduce the probability of loss. At the RTP layer, these measures
include payload formats and profiles for retransmission or forward
error correction (such as in RFC 2733), which must be employed with
due consideration to congestion control. Another solution that may
be appropriate for some applications is to carry RTP over TCP (such
as in RFC 2326, section 10.12). At the network layer, resource
allocation or preferential service may be available to reduce the
probability of loss. For a general description of methods to repair
streaming media see RFC 2354.
Though the RTP payload format defined in this document is capable Though the RTP payload format defined in this document is capable
of transporting any MPEG-4 stream, other, more specific, formats of transporting any MPEG-4 stream, other, more specific, formats
may exist, such as RFC 3016 for transport of MPEG-4 video (part 2). may exist, such as RFC 3016 for transport of MPEG-4 video (part 2).
Configuration of the payload is provided to accommodate transport Configuration of the payload is provided to accommodate transport
of any MPEG-4 stream at any possible bit rate. However, for a of any MPEG-4 stream at any possible bit rate. However, for a
specific MPEG-4 elementary stream typically only very few specific MPEG-4 elementary stream typically only very few
configurations are needed. So as to allow for the design of configurations are needed. So as to allow for the design of
simplified, but dedicated receivers, this specification requires simplified, but dedicated receivers, this specification requires
that specific modes are defined for transport of MPEG-4 streams. that specific modes are defined for transport of MPEG-4 streams.
This document defines modes for MPEG-4 CELP and AAC streams, as This document defines modes for MPEG-4 CELP and AAC streams, as
well as a generic mode that can be used to transport any MPEG-4 well as a generic mode that can be used to transport any MPEG-4
stream. In the future new RFCs are expected to specify additional stream. In the future new RFCs are expected to specify additional
modes for transport of MPEG-4 streams. modes for transport of MPEG-4 streams.
The RTP payload format defined in this document specifies carriage The RTP payload format defined in this document specifies carriage
of system-related information that is often equivalent to the of system-related information that is often equivalent to the
information that may be contained in the MPEG-4 SL. This information that may be contained in the MPEG-4 Sync Layer (SL) as
document does not prescribe how to transcode or map information defined in MPEG-4 Systems [1]. This document does not prescribe how
from the SL to fields defined in the RTP payload format. Such to transcode or map information from the SL to fields defined in
processing, if any, is left to the discretion of the application. the RTP payload format. Such processing, if any, is left to the
However, to anticipate the need for transport of any additional discretion of the application. However, to anticipate the need for
system-related information in future, an auxiliary field can be transport of any additional system-related information in future,
configured that may carry any such data. an auxiliary field can be configured that may carry any such data.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119 [3]. this document are to be interpreted as described in RFC 2119 [3].
2. Carriage of MPEG-4 elementary streams over RTP 2. Carriage of MPEG-4 elementary streams over RTP
2.1 Introduction 2.1 Introduction
With this payload format a single MPEG-4 elementary stream can be With this payload format a single MPEG-4 elementary stream can be
skipping to change at page 4, line 27 skipping to change at page 6, line 27
elementary stream as well as the applied configuration, so as to elementary stream as well as the applied configuration, so as to
avoid the need in receivers to parse all MIME format parameters. avoid the need in receivers to parse all MIME format parameters.
The applied mode MUST be signaled. The applied mode MUST be signaled.
2.2 MPEG Access Units 2.2 MPEG Access Units
For carriage of compressed audio-visual data MPEG defines Access For carriage of compressed audio-visual data MPEG defines Access
Units. An MPEG Access Unit (AU) is the smallest data entity to Units. An MPEG Access Unit (AU) is the smallest data entity to
which timing information is attributed. In case of audio an Access which timing information is attributed. In case of audio an Access
Unit may represent an audio frame and in case of video a picture. Unit may represent an audio frame and in case of video a picture.
MPEG Access Units are by definition octet aligned. If for example MPEG Access Units are by definition octet-aligned. If for example
an audio frame is not octet aligned, up to 7 zero-padding bits MUST an audio frame is not octet-aligned, up to 7 zero-padding bits MUST
be inserted at the end of the frame to achieve the octet-aligned be inserted at the end of the frame to achieve the octet-aligned
Access Units, as required by the MPEG-4 specification. MPEG-4 Access Units, as required by the MPEG-4 specification. MPEG-4
decoders MUST be able to decode AUs in which such padding is decoders MUST be able to decode AUs in which such padding is
applied. applied.
Consistent with the MPEG-4 specification, this document requires Consistent with the MPEG-4 specification, this document requires
that each MPEG-4 part 2 video Access Unit includes all the coded that each MPEG-4 part 2 video Access Unit includes all the coded
data of a picture, any video stream headers that may precede the data of a picture, any video stream headers that may precede the
coded picture data, and any video stream stuffing that may follow coded picture data, and any video stream stuffing that may follow
it, up to, but not including the startcode indicating the start of it, up to, but not including the startcode indicating the start of
skipping to change at page 5, line 6 skipping to change at page 7, line 4
1500 octet MTU this would allow on average 7 complete AAC frames to 1500 octet MTU this would allow on average 7 complete AAC frames to
be carried per AAC packet. be carried per AAC packet.
Access Units may have a fixed size in octets, but a variable size Access Units may have a fixed size in octets, but a variable size
is also possible. To facilitate parsing in case of multiple is also possible. To facilitate parsing in case of multiple
concatenated AUs in one RTP packet, the size of each AU is made concatenated AUs in one RTP packet, the size of each AU is made
known to the receiver. When concatenating in case of a constant AU known to the receiver. When concatenating in case of a constant AU
size, this size is communicated "out of band" through a MIME format size, this size is communicated "out of band" through a MIME format
parameter. When concatenating in case of variable size AUs, the RTP parameter. When concatenating in case of variable size AUs, the RTP
payload carries "in band" an AU size field for each contained AU. payload carries "in band" an AU size field for each contained AU.
In combination with the RTP payload length the size information In combination with the RTP payload length the size information
allows the RTP payload to be split by the receiver back into the allows the RTP payload to be split by the receiver back into the
individual AUs. individual AUs.
To simplify the implementation of RTP receivers, it is required To simplify the implementation of RTP receivers, it is required
that when multiple AUs are carried in an RTP packet, each AU MUST that when multiple AUs are carried in an RTP packet, each AU MUST
be complete, i.e. the number of AUs in an RTP packet MUST be be complete, i.e. the number of AUs in an RTP packet MUST be
integral. integral. In addition, an AU MUST NOT be repeated in other RTP
packets; hence repetition of an AU is only possible by using a
duplicate RTP packet.
2.4 Fragmentation of Access Units 2.4 Fragmentation of Access Units
MPEG allows for very large Access Units. Since most IP networks MPEG allows for very large Access Units. Since most IP networks
have significantly smaller MTU sizes, this payload format allows have significantly smaller MTU sizes, this payload format allows
for the fragmentation of an Access Unit over multiple RTP packets for the fragmentation of an Access Unit over multiple RTP packets
so as to avoid IP layer fragmentation. To simplify the so as to avoid IP layer fragmentation. To simplify the
implementation of RTP receivers, an RTP packet SHALL either carry implementation of RTP receivers, an RTP packet SHALL either carry
one or more complete Access Units or a single fragment of one one or more complete Access Units or a single fragment of one
Access Unit (i.e. packets MUST NOT contain fragments of multiple Access Unit (i.e. packets MUST NOT contain fragments of multiple
skipping to change at page 5, line 37 skipping to change at page 7, line 38
When an RTP packet carries a contiguous sequence of Access Units, When an RTP packet carries a contiguous sequence of Access Units,
the loss of such a packet can result in a "decoding gap" for the the loss of such a packet can result in a "decoding gap" for the
user. One method to alleviate this problem is to allow for the user. One method to alleviate this problem is to allow for the
Access Units to be interleaved in the RTP packets. For a modest Access Units to be interleaved in the RTP packets. For a modest
cost in latency and implementation complexity, significant error cost in latency and implementation complexity, significant error
resiliency to packet loss can be achieved. resiliency to packet loss can be achieved.
To support optional interleaving of Access Units, this payload To support optional interleaving of Access Units, this payload
format allows for index information to be sent for each Access Unit. format allows for index information to be sent for each Access Unit.
The RTP sender is free to choose the interleaving pattern without After informing receivers about buffer resources to allocate for
propagating this information to the receiver(s). Indeed the sender de-interleaving, the RTP sender is free to choose the interleaving
could dynamically adjust the interleaving pattern based on the pattern without propagating this information a priori to the
Access Unit size, error rates, etc. The RTP receiver does not need receiver(s). Indeed the sender could dynamically adjust the
to know the interleaving pattern used, it only needs to extract the interleaving pattern based on the Access Unit size, error rates,
index information of the Access Unit and insert the Access Unit etc. The RTP receiver does not need to know the interleaving
into the appropriate sequence in the rendering queue. An example of pattern used, it only needs to extract the index information of the
Access Unit and insert the Access Unit into the appropriate
sequence in the decoding or rendering queue. An example of
interleaving is given below. interleaving is given below.
Assume that an RTP packet contains 3 AUs, and that the AUs are Assume that an RTP packet contains 3 AUs, and that the AUs are
numbered 1, 2, 3, 4, etc. If an interleaving group length of 9 is numbered 0, 1, 2, 3, 4, etc. If an interleaving group length of 9 is
chosen, then RTP packet(i) contains the following AU(n): chosen, then RTP packet(i) contains the following AU(n):
RTP packet(0): AU(0), AU(3), AU(6)
RTP packet(1): AU(1), AU(4), AU(7) RTP packet(1): AU(1), AU(4), AU(7)
RTP packet(2): AU(2), AU(5), AU(8) RTP packet(2): AU(2), AU(5), AU(8)
RTP packet(3): AU(3), AU(6), AU(9) RTP packet(3): AU(9), AU(12), AU(15)
RTP packet(4): AU(10), AU(13), AU(16) RTP packet(4): AU(10), AU(13), AU(16)
RTP packet(5): AU(11), AU(14), AU(17)
Etc. Etc.
2.6 Time stamp information 2.6 Time stamp information
The RTP time stamp MUST carry the sampling instance of the first AU The RTP time stamp MUST carry the sampling instant of the first AU
(fragment) in the RTP packet. When multiple AUs are carried within (fragment) in the RTP packet. When multiple AUs are carried within
an RTP packet, the time stamps of subsequent AUs can be calculated an RTP packet, the time stamps of subsequent AUs can be calculated
if the frame period of each AU is known. For audio and video this if the frame period of each AU is known. For audio and video this
is possible if the frame rate is constant. However, in some cases is possible if the frame rate is constant. However, in some cases
it is not possible to make such calculation, for example for it is not possible to make such calculation, for example for
variable frame rate video and for MPEG-4 BIFS streams carrying variable frame rate video and for MPEG-4 BIFS streams carrying
composition information. To support such cases, this payload format composition information. To support such cases, this payload format
can be configured to carry a time stamp in the RTP payload for each can be configured to carry a time stamp in the RTP payload for each
contained Access Unit. A time stamp MAY be conveyed in the RTP contained Access Unit. A time stamp MAY be conveyed in the RTP
payload only for non-first AUs in the RTP packet, and SHALL NOT be payload only for non-first AUs in the RTP packet, and SHALL NOT be
conveyed for the first AU (fragment), as the time stamp for the conveyed for the first AU (fragment), as the time stamp for the
first AU in the RTP packet is carried by the RTP time stamp. first AU in the RTP packet is carried by the RTP time stamp.
MPEG-4 defines two type of time stamps, the composition time stamp MPEG-4 defines two type of time stamps, the composition time stamp
(CTS) and the decoding time stamp (DTS). The CTS represents the (CTS) and the decoding time stamp (DTS). The CTS represents the
sampling instance of an AU, and hence the CTS is equivalent to the sampling instant of an AU, and hence the CTS is equivalent to the
RTP time stamp. The DTS may be used only in MPEG-4 video streams RTP time stamp. The DTS may be used in MPEG-4 video streams that
that use bi-directional coding, i.e. when pictures are predicted in use bi-directional coding, i.e. when pictures are predicted in both
both forward and backward direction by using either a reference forward and backward direction by using either a reference picture
picture in the past, or a reference picture in the future. The DTS in the past, or a reference picture in the future. The DTS cannot
cannot be carried in the RTP header. In some cases the DTS can be be carried in the RTP header. In some cases the DTS can be derived
derived from the RTP time stamp using frame rate information; this from the RTP time stamp using frame rate information; this requires
requires deep parsing in the video stream, which may be considered deep parsing in the video stream, which may be considered
objectionable. But if the video frame rate is variable, the required objectionable. But if the video frame rate is variable, the required
information may not even be present in the video stream. For both information may not even be present in the video stream. For both
reasons, the capability has been defined to optionally carry the reasons, the capability has been defined to optionally carry the
DTS in the RTP payload for each contained Access Unit. DTS in the RTP payload for each contained Access Unit.
Since RTP time stamps may be re-stamped by RTP devices, each time To keep the coding of time stamps efficient, each time stamp
stamp contained in the RTP payload is coded differentially, the CTS contained in the RTP payload is coded differentially, the CTS from
from the RTP time stamp, and the DTS from the CTS, so as to avoid the RTP time stamp, and the DTS from the CTS.
extensive parsing by re-stamping devices.
2.7 State indication of MPEG-4 system streams 2.7 State indication of MPEG-4 system streams
ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to
convey state information when transporting MPEG-4 system streams, convey state information when transporting MPEG-4 system streams,
this payload format allows for the optional carriage in the RTP this payload format allows for the optional carriage in the RTP
payload of the stream state for each contained Access Unit. Stream payload of the stream state for each contained Access Unit. Stream
states are used to signal "crucial" AUs that carry information whose states are used to signal "crucial" AUs that carry information whose
loss cannot be tolerated and are also useful when repeating AUs loss cannot be tolerated and are also useful when repeating AUs
according to the carousel mechanism defined in ISO/IEC 14496-1. according to the carousel mechanism defined in ISO/IEC 14496-1.
skipping to change at page 7, line 13 skipping to change at page 9, line 14
optionally be carried in the RTP payload for each contained Access optionally be carried in the RTP payload for each contained Access
Unit. Carriage of random access points is particularly useful for Unit. Carriage of random access points is particularly useful for
MPEG-4 system streams in combination with the stream state. MPEG-4 system streams in combination with the stream state.
2.9 Carriage of auxiliary information. 2.9 Carriage of auxiliary information.
This payload format defines a specific field to carry auxiliary This payload format defines a specific field to carry auxiliary
data. The auxiliary data field is preceded by a field that specifies data. The auxiliary data field is preceded by a field that specifies
the length of the auxiliary data, so as to facilitate skipping of the length of the auxiliary data, so as to facilitate skipping of
the data without parsing it. The coding of the auxiliary data is not the data without parsing it. The coding of the auxiliary data is not
defined in this document, but is left to the discretion of defined in this document; instead the format, meaning and signaling
applications. Receivers that have knowledge of the auxiliary data of auxiliary information is expected to be specified in one or more
future RFCs. Auxiliary information MUST NOT be transmitted until its
format, meaning and signaling have been specified and its use has
been signaled. Receivers that have knowledge of the auxiliary data
MAY decode the auxiliary data, but receivers without knowledge of MAY decode the auxiliary data, but receivers without knowledge of
such data MUST skip the auxiliary data field. such data MUST skip the auxiliary data field.
2.10 MIME format parameters and configuring conditional fields 2.10 MIME format parameters and configuring conditional fields
To support the features described in the previous sections several To support the features described in the previous sections several
fields are defined for carriage in the RTP payload. However, their fields are defined for carriage in the RTP payload. However, their
use strongly depends on the type of MPEG-4 elementary stream that use strongly depends on the type of MPEG-4 elementary stream that
is carried. Sometimes a specific field is needed with a certain is carried. Sometimes a specific field is needed with a certain
length, while in other cases such field is not needed at all. To be length, while in other cases such field is not needed at all. To be
efficient in either case, the fields to support these features are efficient in either case, the fields to support these features are
configurable by means of MIME format parameters. In general, a MIME configurable by means of MIME format parameters. In general, a MIME
format parameter defines the presence and length of the associated format parameter defines the presence and length of the associated
field. A length of zero indicates absence of the field. As a field. A length of zero indicates absence of the field. As a
consequence, parsing of the payload requires knowledge of MIME consequence, parsing of the payload requires knowledge of MIME
format parameters. The MIME format parameters are conveyed to the format parameters. The MIME format parameters are conveyed to the
receiver via SDP [6] messages or through other means. receiver via SDP [6] messages, as specified in section 4.4.1, or
through other means.
2.11 Global structure of payload format 2.11 Global structure of payload format
The RTP payload following the RTP header, contains three octet The RTP payload following the RTP header, contains three
aligned data sections, of which the first two MAY be empty. See octet-aligned data sections, of which the first two MAY be empty.
figure 1. See figure 1.
+---------+-----------+-----------+---------------+ +---------+-----------+-----------+---------------+
| RTP | AU Header | Auxiliary | Access Unit | | RTP | AU Header | Auxiliary | Access Unit |
| Header | Section | Section | Data Section | | Header | Section | Section | Data Section |
+---------+-----------+-----------+---------------+ +---------+-----------+-----------+---------------+
<----------RTP Packet Payload-----------> <----------RTP Packet Payload----------->
Figure 1: Data sections within an RTP packet Figure 1: Data sections within an RTP packet
skipping to change at page 8, line 25 skipping to change at page 10, line 30
The applied mode MUST be signaled. Signaling the mode is The applied mode MUST be signaled. Signaling the mode is
particularly important for receivers that are only capable of particularly important for receivers that are only capable of
decoding one or more specific modes. Such receivers need to decoding one or more specific modes. Such receivers need to
determine whether the applied mode is supported, so as to avoid determine whether the applied mode is supported, so as to avoid
problems with processing of payloads that are beyond the problems with processing of payloads that are beyond the
capabilities of the receiver. capabilities of the receiver.
In this document several modes are defined for transport of MPEG-4 In this document several modes are defined for transport of MPEG-4
CELP and AAC streams, as well as a generic mode that can be used CELP and AAC streams, as well as a generic mode that can be used
for any MPEG-4 stream. In future, new RFCs are expected to specify for any MPEG-4 stream. In the future, new RFCs may specify other
additional modes of using this specification. New modes can be modes of using this specification. However, each mode MUST be in
defined as deemed appropriate, typically by specifications that are full compliance with this specification (see section 3.3.7).
hierarchically higher than this payload format. However, each mode
MUST be in full compliance with this specification.
2.13 Alignment with RFC 3016 2.13 Alignment with RFC 3016
This payload can be configured to be nearly identical to the This payload can be configured to be nearly identical to the
payload format defined in RFC 3016 [5] for the MPEG-4 video payload format defined in RFC 3016 [5] for the MPEG-4 video
configurations recommended in RFC 3016. Hence, receivers that configurations recommended in RFC 3016. Hence, receivers that
comply with RFC 3016 can decode such RTP payload, providing that comply with RFC 3016 can decode such RTP payload, providing that
additional packets containing video decoder configuration (VO, additional packets containing video decoder configuration (VO,
VOL, VOSH) are inserted in the stream, as required by RFC 3016. VOL, VOSH) are inserted in the stream, as required by RFC 3016.
Conversely, receivers that comply with the specification in this Conversely, receivers that comply with the specification in this
document SHOULD be able to decode payloads, names and parameters document should be able to decode payloads, names and parameters
defined for MPEG-4 video in RFC 3016. In this respect it is defined for MPEG-4 video in RFC 3016. In this respect it is
strongly recommended to implement the ability to ignore "in band" strongly RECOMMENDED to implement the ability to ignore "in band"
video decoder configuration packets in the RFC 3016 payload. video decoder configuration packets in the RFC 3016 payload.
Note the "out of band" availability of the video decoder Note the "out of band" availability of the video decoder
configuration is optional in RFC 3016. To achieve maximum configuration is optional in RFC 3016. To achieve maximum
interoperability with the RTP payload format defined in this interoperability with the RTP payload format defined in this
document, applications that use RFC 3016 to transport MPEG-4 video document, applications that use RFC 3016 to transport MPEG-4 video
(part 2) are RECOMMENDED to make the video decoder configuration (part 2) are recommended to make the video decoder configuration
available as a MIME parameter. available as a MIME parameter.
3. Payload Format 3. Payload Format
3.1 Usage of RTP Header Fields and RTCP 3.1 Usage of RTP Header Fields and RTCP
Payload Type (PT): The assignment of an RTP payload type for this Payload Type (PT): The assignment of an RTP payload type for this
RTP packet format is outside the scope of this document, and will packet format is outside the scope of this document; it is
not be specified here. It is expected that the RTP profile for a specified by the RTP profile under which this payload format is
particular class of applications will assign a payload type for used.
this encoding, or if that is not done, then a payload type in the
dynamic range shall be chosen.
Marker (M) bit: The M bit is set to 1 to indicate that the RTP Marker (M) bit: The M bit is set to 1 to indicate that the RTP
packet payload includes the end of each Access Unit of which data packet payload contains either the final fragment of a fragmented
is contained in this RTP packet. As the payload either carries one Access Unit or one or more complete Access Units.
or more complete Access Units or a single fragment of an Access
Unit, the M bit is usually set to 1, except when the packet carries
a single fragment of an Access Unit that is not the last one.
Extension (X) bit: Defined by the RTP profile used. Extension (X) bit: Defined by the RTP profile used.
Sequence Number: The RTP sequence number SHOULD be generated by the Sequence Number: The RTP sequence number SHOULD be generated by the
sender in the usual manner with a constant random offset. sender in the usual manner with a constant random offset.
Timestamp: Indicates the sampling instance of the first AU Timestamp: Indicates the sampling instant of the first AU
contained in the RTP payload. This sampling instance is equivalent contained in the RTP payload. This sampling instant is equivalent
to the CTS in the MPEG-4 time domain. When using SDP the clock rate to the CTS in the MPEG-4 time domain. When using SDP the clock rate
of the RTP time stamp MUST be expressed using the "rtpmap" of the RTP time stamp MUST be expressed using the "rtpmap"
attribute. If an MPEG-4 audio stream is transported, the rate SHOULD attribute. If an MPEG-4 audio stream is transported, the rate SHOULD
be set to the same value as the sampling rate of the audio stream. be set to the same value as the sampling rate of the audio stream.
If an MPEG-4 video stream is transported, it is RECOMMENDED to set If an MPEG-4 video stream is transported, it is RECOMMENDED to set
the rate to 90 kHz. the rate to 90 kHz.
In all cases, the sender SHALL make sure that RTP time stamps In all cases, the sender SHALL make sure that RTP time stamps
are identical only if the RTP time stamp refers to fragments of the are identical only if the RTP time stamp refers to fragments of the
same Access Unit. same Access Unit.
skipping to change at page 9, line 57 skipping to change at page 11, line 52
knows the original time stamp relationships. Synchronization in such knows the original time stamp relationships. Synchronization in such
cases, may require to provide the correct relationship between time cases, may require to provide the correct relationship between time
stamps for obtaining synchronization by out of band means. The stamps for obtaining synchronization by out of band means. The
format of such information as well as methods to convey such format of such information as well as methods to convey such
information are beyond the scope of this specification. information are beyond the scope of this specification.
SSRC: set as described in RFC1889 [2]. SSRC: set as described in RFC1889 [2].
CC and CSRC fields are used as described in RFC 1889 [2]. CC and CSRC fields are used as described in RFC 1889 [2].
RTCP SHOULD be used as defined in RFC 1889 [2]. RTCP SHOULD be used as defined in RFC 1889 [2]. Note that time
stamps in RTCP Sender Reports may be used to synchronize multiple
MPEG-4 elementary streams and also to synchronize MPEG-4 streams
with non-MPEG-4 streams, in case the delivery of these streams uses
RTP.
3.2 RTP Payload Structure 3.2 RTP Payload Structure
3.2.1 The AU Header Section 3.2.1 The AU Header Section
When present, the AU Header Section consists of the AU-header-length When present, the AU Header Section consists of the
field, followed by a number of AU-headers. See figure 2. AU-headers-length field, followed by a number of AU-headers. See
figure 2.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
|AU-headers-length|AU-header|AU-header| |AU-header|padding| |AU-headers-length|AU-header|AU-header| |AU-header|padding|
| | (1) | (2) | | (n) | bits | | | (1) | (2) | | (n) | bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
Figure 2: The AU Header Section Figure 2: The AU Header Section
The AU-headers are configured using MIME format parameters and MAY The AU-headers are configured using MIME format parameters and MAY
be empty. If the AU-header is configured empty, the be empty. If the AU-header is configured empty, the
skipping to change at page 11, line 32 skipping to change at page 13, line 32
Figure 3: The fields in the AU-header. If used, the AU-Index field Figure 3: The fields in the AU-header. If used, the AU-Index field
only occurs in the first AU-header within an AU Header only occurs in the first AU-header within an AU Header
Section; in any other AU-header the AU-Index-delta field Section; in any other AU-header the AU-Index-delta field
occurs instead. occurs instead.
AU-size: Indicates the size in octets of the associated Access Unit AU-size: Indicates the size in octets of the associated Access Unit
in the Access Unit Data Section in the same RTP packet. When in the Access Unit Data Section in the same RTP packet. When
the AU-size is associated with an AU fragment, the AU size the AU-size is associated with an AU fragment, the AU size
indicates the size of the entire AU and not the size of the indicates the size of the entire AU and not the size of the
fragment. This can be exploited to determine whether a packet fragment. In this case, the size of the fragment is known
contains an entire AU or a fragment, which is particularly from the size of the AU data section. This can be exploited
useful after losing a packet carrying the last fragment of an to determine whether a packet contains an entire AU or a
AU. fragment, which is particularly useful after losing a packet
carrying the last fragment of an AU.
AU-Index: Indicates the serial number of the associated Access Unit AU-Index: Indicates the serial number of the associated Access Unit
(fragment). For each (in decoding order) consecutive AU or AU (fragment). For each (in decoding order) consecutive AU or AU
fragment, the serial number is incremented with 1. When fragment, the serial number is incremented with 1. When
present, the AU-Index field occurs in the first AU-header in present, the AU-Index field occurs in the first AU-header in
the AU Header Section, but MUST NOT occur in any subsequent the AU Header Section, but MUST NOT occur in any subsequent
(non-first) AU-header in that Section. To encode the serial (non-first) AU-header in that Section. To encode the serial
number in any such non-first AU-header, the AU-Index-delta number in any such non-first AU-header, the AU-Index-delta
field is used. If each AU-Index field is coded with the value field is used. If each AU-Index field is coded with the value
0, the serial number of the AU (fragment) is not specified, 0, the serial number of the AU (fragment) is not specified,
and in that case receivers MAY ignore the AU-Index field. and in that case receivers may ignore the AU-Index field.
AU-Index-delta: The AU-Index-delta field is an unsigned integer AU-Index-delta: The AU-Index-delta field is an unsigned integer
that specifies the serial number of the associated AU as the that specifies the serial number of the associated AU as the
difference with respect to the serial number of the previous difference with respect to the serial number of the previous
Access Unit. Hence, for the n-th (n>1) AU the serial number Access Unit. Hence, for the n-th (n>1) AU the serial number
is found from: is found from:
AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1 AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1
If the AU-Index field is present in the first AU-header in If the AU-Index field is present in the first AU-header in
the AU Header Section, then the AU-Index-delta field MUST be the AU Header Section, then the AU-Index-delta field MUST be
present in any subsequent (non-first) AU-header. When the present in any subsequent (non-first) AU-header. When the
skipping to change at page 12, line 30 skipping to change at page 14, line 31
CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's
complement offset (delta) from the time stamp in the RTP complement offset (delta) from the time stamp in the RTP
header of this RTP packet. The CTS MUST use the same clock header of this RTP packet. The CTS MUST use the same clock
rate as the time stamp in the RTP header. rate as the time stamp in the RTP header.
DTS-flag: Indicates whether the DTS-delta field is present. A value DTS-flag: Indicates whether the DTS-delta field is present. A value
of 1 indicates that DTS-delta is present, a value of 0 that of 1 indicates that DTS-delta is present, a value of 0 that
it is not present. it is not present.
The DTS-flag field MUST be present in each AU-header if the The DTS-flag field MUST be present in each AU-header if the
length of the DTS-delta field is signaled to be larger than length of the DTS-delta field is signaled to be larger than
zero. The DTS-flag field SHOULD be 0 for any non-first zero. The DTS-flag field MUST have the same value for all
fragment of an Access Unit. fragments of an Access Unit.
DTS-delta: Specifies the value of the DTS as a 2's complement DTS-delta: Specifies the value of the DTS as a 2's complement
offset (delta) from the CTS. The DTS MUST use the offset (delta) from the CTS. The DTS MUST use the
same clock rate as the time stamp in the RTP header. same clock rate as the time stamp in the RTP header. The
DTS-delta field MUST have the same value for all fragments of
an Access Unit.
RAP-flag: Indicates when set to 1 that the associated Access Unit RAP-flag: Indicates when set to 1 that the associated Access Unit
provides a random access point to the content of the stream. provides a random access point to the content of the stream.
If an Access Unit is fragmented, the RAP flag, if present, If an Access Unit is fragmented, the RAP flag, if present,
MUST be set to 0 for each non-first fragment of the AU. MUST be set to 0 for each non-first fragment of the AU.
Stream-state: Specifies the state of the stream for an AU of an Stream-state: Specifies the state of the stream for an AU of an
MPEG-4 system stream; each state is identified by a value of MPEG-4 system stream; each state is identified by a value of
a modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams a modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams
use the AU_SequenceNumber to signal stream states. When the use the AU_SequenceNumber to signal stream states. When the
skipping to change at page 13, line 34 skipping to change at page 15, line 40
auxiliary-data: the auxiliary-data field contains data of a format auxiliary-data: the auxiliary-data field contains data of a format
not defined by this specification. not defined by this specification.
3.2.3 The Access Unit Data Section 3.2.3 The Access Unit Data Section
The Access Unit Data Section contains an integer number of complete The Access Unit Data Section contains an integer number of complete
Access Units or a single fragment of one AU. The Access Unit Data Access Units or a single fragment of one AU. The Access Unit Data
Section is never empty. If data of more than one Access Unit is Section is never empty. If data of more than one Access Unit is
present, then the AUs are concatenated into a contiguous string present, then the AUs are concatenated into a contiguous string
of octets. See figure 5. The AUs inside the Access Unit Data of octets. See figure 5. The AUs inside the Access Unit Data
Section MUST be in decoding order. Section MUST be in decoding order, though not necessarily contiguous
in the case of interleaving.
The size and number of Access Units SHOULD be adjusted such that The size and number of Access Units SHOULD be adjusted such that
the resulting RTP packet is not larger than the path MTU. To handle the resulting RTP packet is not larger than the path MTU. To handle
larger packets, this payload format relies on lower layers for larger packets, this payload format relies on lower layers for
fragmentation, which may not be desirable. fragmentation, which may result in reduced performance.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|AU(1) | |AU(1) |
+ | + |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |AU(2) | | |AU(2) |
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | AU(n) | | | AU(n) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|AU(n) continued| |AU(n) continued|
|-+-+-+-+-+-+-+-+ |-+-+-+-+-+-+-+-+
Figure 5: Access Unit Data Section; each AU is octet aligned. Figure 5: Access Unit Data Section; each AU is octet-aligned.
When multiple Access Units are carried, the size of each AU MUST be When multiple Access Units are carried, the size of each AU MUST be
made available to the receiver. If the AU size is variable then the made available to the receiver. If the AU size is variable then the
size of each AU MUST be indicated in the AU-size field of the size of each AU MUST be indicated in the AU-size field of the
corresponding AU-header. However, if the AU size is constant for a corresponding AU-header. However, if the AU size is constant for a
stream, this mechanism SHOULD NOT be used, but instead the fixed stream, this mechanism SHOULD NOT be used, but instead the fixed
size SHOULD be signaled by the MIME format parameter size SHOULD be signaled by the MIME format parameter
"ConstantSize", see section 4.1. "ConstantSize", see section 4.1.
The absence of both AU-size in the AU-header and the ConstantSize The absence of both AU-size in the AU-header and the ConstantSize
MIME format parameter indicates carriage of a single AU (fragment), MIME format parameter indicates carriage of a single AU (fragment),
i.e. that a single Access Unit (fragment) is transported in each i.e. that a single Access Unit (fragment) is transported in each
RTP packet for that stream. RTP packet for that stream.
3.2.3.1 Fragmentation 3.2.3.1 Fragmentation
A packet SHALL carry either one or more Access Units, or a single A packet SHALL carry either one or more complete Access Units, or
fragment of an Access Unit. Fragments of the same Access Unit have a single fragment of an Access Unit. Fragments of the same Access
the same time stamp but different RTP sequence numbers. The marker Unit have the same time stamp but different RTP sequence numbers.
bit in the RTP header is 1 on the last fragment of an Access Unit, The marker bit in the RTP header is 1 on the last fragment of an
and 0 on all other fragments. Access Unit, and 0 on all other fragments.
3.2.3.2 Interleaving 3.2.3.2 Interleaving
Access Units MAY be interleaved. Senders MAY perform interleaving. Access Units MAY be interleaved. Senders MAY perform interleaving.
Receivers MUST support interleaving. When interleaving of Access Receivers MUST support interleaving, except if the receiver only
Units is used it SHALL be implemented using the AU-Index and supports modes in which no interleaving is allowed. When
AU-Index-delta fields in the AU-header. interleaving of Access Units is used it SHALL be implemented using
the AU-Index and AU-Index-delta fields in the AU-header.
Based on the RTP sequence number, the RTP time stamp, the AU-Index Based on the RTP sequence number, the RTP time stamp, the AU-Index
and the AU-Index-delta, a receiver can unambiguously reconstruct and the AU-Index-delta, a receiver can unambiguously reconstruct
the original order even in case of out-of-order packets, packet the original order even in case of out-of-order packets, packet
loss or duplication. Note that for this purpose the AU-Index is loss or duplication. Note that for this purpose the AU-Index is
redundant when the RTP time stamp and the AU-Index-delta values are redundant when the RTP time stamp and the AU-Index-delta values are
sufficient for placing the AUs correctly in time. In such cases sufficient for placing the AUs correctly in time. In such cases
receivers MAY ignore the AU-Index value and senders MAY code the receivers MAY ignore the AU-Index value and senders MAY code the
AU-Index field with the value 0, but only if they code each AU-Index AU-Index field with the value 0, but only if they code each AU-Index
field with that value. field with that value. If the AU-Index is not redundant, senders
SHOULD use a length of the AU-Index field so that this field is not
coded with the value 0 in two subsequent RTP packets.
When interleaving is applied, a de-interleave buffer is needed in When interleaving is applied, a de-interleave buffer is needed in
receivers to put the Access Units in their correct logical receivers to put the Access Units in their correct logical
consecutive decoding order. This requires the computation of the consecutive decoding order. This requires the computation of the
time stamp for each Access Unit. In case of a fixed time duration time stamp for each Access Unit. In case of a fixed time duration
per Access Unit, the time stamp of the i-th access unit in an RTP per Access Unit, the time stamp of the i-th access unit in an RTP
packet with RTP time stamp T is calculated as follows: packet with RTP time stamp T is calculated as follows:
Timestamp[0] = T Timestamp[0] = T
Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k]
skipping to change at page 15, line 17 skipping to change at page 17, line 40
semantics of the AU-Index field in 3.2.1.1. semantics of the AU-Index field in 3.2.1.1.
If the Access Units are not fixed duration, the AU-Index is not If the Access Units are not fixed duration, the AU-Index is not
redundant, and MUST provide the index information required for redundant, and MUST provide the index information required for
re-ordering. The number of bits of the AU-Index field MUST be chosen re-ordering. The number of bits of the AU-Index field MUST be chosen
so that valid index information is provided at the applied so that valid index information is provided at the applied
interleaving scheme, without causing problems due to roll-over of interleaving scheme, without causing problems due to roll-over of
the AU-Index field. Note that the CTS-delta may be required to the AU-Index field. Note that the CTS-delta may be required to
compute the correct time stamp for each AU. compute the correct time stamp for each AU.
When an RTP packet arrives (after any reordering has been done),
receivers may 'flush' all Access Units from the interleave buffer
if the time stamp of each Access Units in the interleave buffer is
strictly less than the time stamp of the arriving packet. Access
Units should also be flushed in time to be played; this can be
important if there is loss before end-of-stream, before a silence
interval, or before a large drop-out.
3.2.3.3 Constraints for interleaving 3.2.3.3 Constraints for interleaving
The size of the packets should be suitably chosen to be appropriate The size of the packets should be suitably chosen to be appropriate
to both the path MTU and the duration and capacity of the receiver's to both the path MTU and the capacity of the receiver's
de-interleave buffer. The maximum packet size for a session SHOULD de-interleave buffer. The maximum packet size for a session SHOULD
be chosen not to exceed the path MTU. be chosen not to exceed the path MTU.
In order to control receiver latency and mitigate the effects of To allow receivers to allocate sufficient resources for
loss, there are profile-based limits on the size of the packet. de-interleaving, senders MUST provide the information to receivers
This is expressed as a duration: it is calculated from the duration as specified in this section.
of the Access Units contained within a packet. Note that this
duration is NOT the difference between the time stamps of the first
and last Access Unit in a packet.
No matter what interleaving scheme is used, the scheme must be AUs enter the decoder in decoding order. The de-interleave buffer
analyzed to calculate the minimum number of frames a receiver has is used to re-order a stream of interleaved AUs back into decoding
to buffer in order to de-interleave. order. When interleaving is applied, the decoding of "early" AUs
has to be postponed until all AUs that precede in decoding order
have been received. Therefore these "early" AUs are stored in the
de-interleave buffer. As an example in figure 6 the interleaving
pattern from section 2.5 is considered.
Three profiles are defined to constrain the latency when +--+--+--+--+--+--+--+--+--+--+--+-
interleaving. The applied profile is signaled by the MIME format Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
parameter "Profile", indicating the decimal number of the profile. +--+--+--+--+--+--+--+--+--+--+--+-
The maximum de-interleave buffer required at the receiver can be Storage of "early" AUs 3 3 3 3 3 3
determined if the maximum packet duration is known. The maximum 6 6 6 6 6 6
packet duration in milliseconds for the three profiles, SHALL NOT 4 4 4
exceed: 7 7 7
12 12
Profile 0 -- 200 milliseconds Figure 6: Storage of "early" AUs in the de-interleave buffer per
Profile 1 -- 500 milliseconds interleaved AU.
Profile 2 -- 1500 milliseconds
When interleaving is applied, the applied profile MUST be signaled
by the MIME format parameter "Profile"; see section 4.1.
Note that for low bit-rate material, this duration limit may make AU(3) is to be delivered to the decoder after AU(0), AU(1)and AU(2);
packets shorter than the MTU size. of these AUs, AU(2) is most late and hence AU(3) needs to be stored
until AU(2) is received. Similarly, AU(6) is to be stored until
AU(5) is received, while AU(4) and AU(7) are to be stored until
AU(2) and AU(5) are received, respectively. Note that the fullness
of the de-interleave buffer varies in time. In figure 6, the
de-interleave buffer contains at most 4, but often less AUs.
3.2.3.4. Crucial and non-crucial AUs with MPEG-4 System data So as to give a rough indication of the resources needed in the
receiver for de-interleaving, the maximum displacement in time of
an AU is defined. The maximum displacement in time of an AU is the
maximum difference between the time stamp of any received AU and
the time stamp of the earliest AU that is not yet received. In other
words, when considering a sequence of interleaved AUs, then:
Maximum displacement = max{TS(i) - TS(j)}, for any i and any j>i,
where i and j indicate the index of the AU in the
interleaving pattern and
TS denotes the time stamp of the AU
As an example in figure 7 the interleaving pattern from section 2.5
is considered. For each AU in the pattern the earliest not yet
received AU is indicated. A "-" indicates that all previous AUs
are received. If the AU period is constant, the maximum displacement
equals 5 AU periods, as found for AU(6) and AU(7).
+--+--+--+--+--+--+--+--+--+--+--+-
Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
+--+--+--+--+--+--+--+--+--+--+--+-
Earliest not yet received AU - 1 1 - 2 2 - - - - 10
Figure 7: The earliest not yet received AU for each AU in the
interleaving pattern.
When interleaving, senders MUST signal the maximum displacement
in time during the session via the MIME format parameter
"maxDisplacement"; see section 4.1.
An estimate of the size of the de-interleave buffer is found by
multiplying the maximum displacement by the maximum bit rate:
size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP
clock frequency),
where Rate(max) is the maximum bit-rate of the transported stream.
Note that receivers can derive Rate(max) from the MIME format
parameters StreamType, Profile-level-id, and config.
However, this calculation estimates the size of the de-interleave
buffer and its size may be larger than calculated. If this
calculation under-estimates the size of the de-interleave buffer,
then senders, when interleaving, MUST signal a size of the
de-interleave buffer that is large enough to contain all "early"
AUs at any point in time during the session via the MIME format
parameter "de-interleaveBufferSize"; see section 4.1.
If the "de-interleaveBufferSize" parameter is present, then the
applied buffer for de-interleaving in a receiver MUST have a size
that is at least equal to the signaled size of the de-interleave
buffer, else a size that is at least equal to the calculated size
of the de-interleave buffer.
No matter what interleaving scheme is used, the scheme must be
analyzed to calculate the applicable maxDisplacement value, as well
as the required size of the de-interleave buffer. Senders SHOULD
signal values that are not larger than the strictly required
values; if larger values are signalled, the receiver will buffer
excessively.
Note that for low bit-rate material, the applied interleaving
may make packets shorter than the MTU size.
3.2.3.5. Crucial and non-crucial AUs with MPEG-4 System data
Some Access Units with MPEG-4 system data, called "crucial" AUs, Some Access Units with MPEG-4 system data, called "crucial" AUs,
carry information whose loss cannot be tolerated, either in the carry information whose loss cannot be tolerated, either in the
presentation or in the decoder. At each crucial AU in an MPEG-4 presentation or in the decoder. At each crucial AU in an MPEG-4
system stream, the stream state changes. The stream-state MAY system stream, the stream state changes. The stream-state MAY
remain constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4 remain constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4
system streams use the AU_SequenceNumber to signal stream states. system streams use the AU_SequenceNumber to signal stream states.
Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set
position of node X", AU3 = "Set position of node X". AU1 is crucial, position of node X", AU3 = "Set position of node X". AU1 is crucial,
skipping to change at page 17, line 12 skipping to change at page 21, line 12
c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless
the stream is corrupted, in which case the AU MUST be ignored. the stream is corrupted, in which case the AU MUST be ignored.
3.3 Usage of this specification 3.3 Usage of this specification
3.3.1 General 3.3.1 General
Usage of this specification requires definition of a mode. A mode Usage of this specification requires definition of a mode. A mode
defines how to use this specification, as deemed appropriate. defines how to use this specification, as deemed appropriate.
Senders MUST signal the applied mode via the MIME format parameter Senders MUST signal the applied mode via the MIME format parameter
"Mode". This specification defines a generic mode that can be used "Mode", as specified in section 4.1. This specification defines a
for any MPEG-4 stream, as well as specific modes for transport of generic mode that can be used for any MPEG-4 stream, as well as
MPEG-4 CELP and MPEG-4 AAC streams, defined in ISO/IEC 14496-3. specific modes for transport of MPEG-4 CELP and MPEG-4 AAC streams,
defined in ISO/IEC 14496-3.
In any mode compliant to this specification the same requirements When use of this payload format is signaled using SDP [6], an
apply for the rtpmap attributes. The general form of an rtpmap "rtpmap" attribute is part of that signaling. The same requirements
attribute is: apply for the rtpmap attribute in any mode compliant to this
specification. The general form of an rtpmap attribute is:
a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding
parameters>] parameters>]
For audio streams, <encoding parameters> specifies the number of For audio streams, <encoding parameters> specifies the number of
audio channels: 2 for stereo material (see RFC 2327) and 1 for audio channels: 2 for stereo material (see RFC 2327) and 1 for
mono. Provided no additional parameters are needed, this parameter mono. Provided no additional parameters are needed, this parameter
may be omitted for mono material, hence its default value is 1. may be omitted for mono material, hence its default value is 1.
3.3.2 The generic mode 3.3.2 The generic mode
The generic mode can be used for any MPEG-4 stream. In this mode The generic mode can be used for any MPEG-4 stream. In this mode
skipping to change at page 17, line 42 skipping to change at page 21, line 44
An example is given below for transport of a BIFS stream. In this An example is given below for transport of a BIFS stream. In this
example carriage of multiple BIFS Access Units is allowed in one example carriage of multiple BIFS Access Units is allowed in one
RTP packet. The AU-header contains the AU-size field, the CTS-flag RTP packet. The AU-header contains the AU-size field, the CTS-flag
and, if the CTS flag is set to 1, the CTS-delta field. The number and, if the CTS flag is set to 1, the CTS-delta field. The number
of bits of the AU-size and the CTS-delta fields is 10 and 16, of bits of the AU-size and the CTS-delta fields is 10 and 16,
respectively. The AU-header also contains the RAP-flag and the respectively. The AU-header also contains the RAP-flag and the
Stream-state of 4 bits. This results in an AU-header with a Stream-state of 4 bits. This results in an AU-header with a
total size of two or four octets per BIFS AU. The RTP time stamp total size of two or four octets per BIFS AU. The RTP time stamp
uses a 1 kHz clock. Note that the media type name is video, uses a 1 kHz clock. Note that the media type name is video,
because the BIFS stream is part of an audiovisual presentation. For because the BIFS stream is part of an audio-visual presentation. For
conventions on media type names see section 4.1. conventions on media type names see section 4.1.
In detail: In detail:
m=video 49230 RTP/AVP 96 m=video 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/1000 a=rtpmap:96 mpeg4-generic/1000
a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic; a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic;
ObjectType=2; config=BIFSConfiguration(); SizeLength=10; ObjectType=2; config=BIFSConfiguration(); SizeLength=10;
CTSDeltaLength=16; RandomAccessIndication=1; CTSDeltaLength=16; RandomAccessIndication=1;
StreamStateIndication=4 StreamStateIndication=4
skipping to change at page 17, line 53 skipping to change at page 22, line 4
conventions on media type names see section 4.1. conventions on media type names see section 4.1.
In detail: In detail:
m=video 49230 RTP/AVP 96 m=video 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/1000 a=rtpmap:96 mpeg4-generic/1000
a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic; a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic;
ObjectType=2; config=BIFSConfiguration(); SizeLength=10; ObjectType=2; config=BIFSConfiguration(); SizeLength=10;
CTSDeltaLength=16; RandomAccessIndication=1; CTSDeltaLength=16; RandomAccessIndication=1;
StreamStateIndication=4 StreamStateIndication=4
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
BIFSConfiguration() is the hexadecimal string as defined in ISO/IEC BIFSConfiguration() is the hexadecimal string as defined in ISO/IEC
14496-1; for the description of MIME parameters see section 4.1. 14496-1; for the description of MIME parameters see section 4.1.
3.3.3 Constant bit-rate CELP 3.3.3 Constant bit-rate CELP
This mode is signaled by mode=CELP-cbr. In this mode one or more This mode is signaled by mode=CELP-cbr. In this mode one or more
fixed size CELP frames can be transported in one RTP packet; there complete CELP frames of fixed size can be transported in one RTP
is no support for interleaving. The RTP payload consist of one or packet; there is no support for interleaving. The RTP payload
more concatenated CELP frames, each of the same size. Both the AU consists of one or more concatenated CELP frames, each of the same
Header Section and the Auxiliary Section MUST be empty. size. CELP frames MUST not be fragmented when using this mode. Both
the AU Header Section and the Auxiliary Section MUST be empty.
The MIME format parameter ConstantSize MUST be provided to specify The MIME format parameter ConstantSize MUST be provided to specify
the length of each CELP frame. the length of each CELP frame.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config= a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config=
AudioSpecificConfig(); ConstantSize=xxx; AudioSpecificConfig(); ConstantSize=xxx;
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
AudioSpecificConfig() is the haxadecimal string as defined in AudioSpecificConfig() is the hexadecimal string as defined in
ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio
stream type is CELP. For the description of MIME parameters see stream type is CELP. For the description of MIME parameters see
section 4.1. section 4.1.
3.3.4 Variable bit-rate CELP 3.3.4 Variable bit-rate CELP
This mode is signaled by mode=CELP-vbr. With this mode one or This mode is signaled by mode=CELP-vbr. With this mode one or more
more variable size CELP frames can be transported in one RTP packet complete CELP frames of variable size can be transported in one RTP
with optional interleaving. As the largest possible frame size in packet with optional interleaving. As CELP frames are very small,
this mode is greater than the maximum CELP frame size, there is no while the largest possible AU-size in this mode is greater than the
support for fragmentation of CELP frames. maximum CELP frame size, there is no support for fragmentation of
CELP frames. Hence CELP frames MUST not be fragmented when using
this mode.
In this mode the RTP payload consists of the AU Header Section, In this mode the RTP payload consists of the AU Header Section,
followed by one or more concatenated CELP frames. The Auxiliary followed by one or more concatenated CELP frames. The Auxiliary
Section MUST be empty. For each CELP frame contained in the payload Section MUST be empty. For each CELP frame contained in the payload
there MUST be a one octet AU-header in the AU Header Section to there MUST be a one octet AU-header in the AU Header Section to
provide: provide:
(a) the size of each CELP frame in the payload and (a) the size of each CELP frame in the payload and
(b) index information for computing the sequence (and hence timing) (b) index information for computing the sequence (and hence timing)
of each CELP frame. of each CELP frame.
Transport of CELP frames requires that the AU-size field is coded Transport of CELP frames requires that the AU-size field is coded
with 6 bits. In this mode therefore 6 bits are allocated to the with 6 bits. In this mode therefore 6 bits are allocated to the
AU-size field, and 2 bits to the AU-Index(-delta) field. Each AU-size field, and 2 bits to the AU-Index(-delta) field. Each
AU-Index field MUST be coded with the value 0. In the AU Header AU-Index field MUST be coded with the value 0. In the AU Header
Section, the concatenated AU-headers are preceded by the 16-bit Section, the concatenated AU-headers are preceded by the 16-bit
AU-headers-length field, as specified in section 3.2.1. AU-headers-length field, as specified in section 3.2.1.
In addition to the required MIME format parameters, the following In addition to the required MIME format parameters, the following
parameters MUST be present: SizeLength, IndexLength, and parameters MUST be present: SizeLength, IndexLength, and
IndexDeltaLength. IndexDeltaLength.
When interleaving is applied (AU-Index-delta coded with a value When interleaving is applied (AU-Index-delta coded with a value
larger than 0), the parameter Profile MUST also be present. larger than 0), the parameter InterleaveDelay MUST also be present.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config= a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config=
AudioSpecificConfig(); SizeLength=6; IndexLength=2; AudioSpecificConfig(); SizeLength=6; IndexLength=2;
IndexDeltaLength=2; Profile=1 IndexDeltaLength=2
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
AudioSpecificConfig() is the hexadecimal string as defined in AudioSpecificConfig() is the hexadecimal string as defined in
ISO/IEC 14496-3, AudioSpecificConfig() specifies that the audio ISO/IEC 14496-3, AudioSpecificConfig() specifies that the audio
stream type is CELP. For the description of MIME parameters see stream type is CELP. For the description of MIME parameters see
section 4.1. section 4.1.
3.3.5 Low bit-rate AAC 3.3.5 Low bit-rate AAC
This mode is signaled by mode=AAC-lbr. This mode supports transport This mode is signaled by mode=AAC-lbr. This mode supports transport
of one or more variable size AAC frames with optional support for of one or more complete AAC frames of variable size. In this mode
interleaving and fragmenting. The maximum size of an AAC frame the AAC frames are allowed to be interleaved and hence receivers
(fragment) in this mode is 63 octets. MUST support de-interleaving. The maximum size of an AAC frame in
this mode is 63 octets. CELP frames MUST not be fragmented when
using this mode.
The payload configuration in this mode is the same as in the The payload configuration in this mode is the same as in the
variable bit-rate CELP mode as defined in 3.3.4. The RTP payload variable bit-rate CELP mode as defined in 3.3.4. The RTP payload
consists of the AU Header Section, followed by concatenated AAC consists of the AU Header Section, followed by concatenated AAC
frames. The Auxiliary Section MUST be empty. For each AAC frame frames. The Auxiliary Section MUST be empty. For each AAC frame
contained in the payload the one octet AU-header MUST provide: contained in the payload the one octet AU-header MUST provide:
(a) the size of each AAC frame in the payload and (a) the size of each AAC frame in the payload and
(b) index information for computing the sequence (and hence timing) (b) index information for computing the sequence (and hence timing)
of each AAC frame. of each AAC frame.
In the AU-header, the AU-size MUST be coded with 6 bits and the In the AU-header, the AU-size MUST be coded with 6 bits and the
AU-Index(-delta) with 2 bits; the AU-Index field MUST have the AU-Index(-delta) with 2 bits; the AU-Index field MUST have the
value 0 in each AU-header. value 0 in each AU-header.
In the AU-header Section, the concatenated AU-headers MUST be In the AU-header Section, the concatenated AU-headers MUST be
preceded by the 16-bit AU-headers-length field, as specified in preceded by the 16-bit AU-headers-length field, as specified in
section 3.2.1. section 3.2.1.
In addition to the required MIME format parameters, the following In addition to the required MIME format parameters, the following
parameters MUST be present: SizeLength, IndexLength, and parameters MUST be present: SizeLength, IndexLength, and
IndexDeltaLength. IndexDeltaLength.
When interleaving is applied (AU-Index-delta coded with a value When interleaving is applied (AU-Index-delta coded with a value
larger than 0), also the parameter Profile MUST be present. larger than 0), also the parameter InterleaveDelay MUST be present.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config= a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config=
AudioSpecificConfig(); SizeLength=6; IndexLength=2; AudioSpecificConfig(); SizeLength=6; IndexLength=2;
IndexDeltaLength=2; Profile=1 IndexDeltaLength=2
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC
14496-3. AudioSpecificConfig() specifies that the audio 14496-3. AudioSpecificConfig() specifies that the audio
stream type is AAC. For the description of MIME parameters see stream type is AAC. For the description of MIME parameters see
section 4.1. section 4.1.
3.3.6 High bit-rate AAC 3.3.6 High bit-rate AAC
This mode is signaled by mode=AAC-hbr. This mode supports transport This mode is signaled by mode=AAC-hbr. This mode supports transport
of one or more large variable size AAC frames in one RTP packet with of variable size AAC frames. In one RTP packet either one or more
optional support for interleaving and fragmenting. The maximum size complete AAC frames are carried, or a single fragment of an AAC
of an AAC frame (fragment) in this mode is 8191 octets. frame. In this mode the AAC frames are allowed to be interleaved
and hence receivers MUST support de-interleaving. The maximum size
of an AAC frame in this mode is 8191 octets.
In this mode the RTP payload consists of the AU Header Section, In this mode the RTP payload consists of the AU Header Section,
followed by one or more concatenated AAC frames. The Auxiliary followed by either one AAC frame, several concatenated AAC frames
Section MUST be empty. For each AAC frame contained in the payload or one fragmented AAC frame. The Auxiliary Section MUST be empty.
there MUST be an AU-header in the AU Header Section to provide: For each AAC frame contained in the payload there MUST be an
AU-header in the AU Header Section to provide:
(a) the size of each AAC frame in the payload and (a) the size of each AAC frame in the payload and
(b) index information for computing the sequence (and hence timing) (b) index information for computing the sequence (and hence timing)
of each AAC frame. of each AAC frame.
To code the maximum size of an AAC frame requires 13 bits. Therefore To code the maximum size of an AAC frame requires 13 bits. Therefore
in this configuration 13 bits are allocated to the AU-size, and in this configuration 13 bits are allocated to the AU-size, and
3 bits to the AU-Index(-delta) field. Thus each AU-header has a size 3 bits to the AU-Index(-delta) field. Thus each AU-header has a size
of 2 octets. Each AU-Index field MUST be coded with the value 0. In of 2 octets. Each AU-Index field MUST be coded with the value 0. In
the AU Header Section, the concatenated AU-headers MUST be preceded the AU Header Section, the concatenated AU-headers MUST be preceded
by the 16-bit AU-headers-length field, as specified in section 3.2.1. by the 16-bit AU-headers-length field, as specified in section 3.2.1.
In addition to the required MIME format parameters, the following In addition to the required MIME format parameters, the following
parameters MUST be present: SizeLength, IndexLength, and parameters MUST be present: SizeLength, IndexLength, and
IndexDeltaLength. IndexDeltaLength.
When interleaving is applied (AU-Index-delta coded with a value When interleaving is applied (AU-Index-delta coded with a value
larger than 0), also the parameter Profile MUST be present. larger than 0), also the parameter InterleaveDelay MUST be present.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr; a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr;
config=AudioSpecificConfig(); SizeLength=13; IndexLength=3; config=AudioSpecificConfig(); SizeLength=13; IndexLength=3;
IndexDeltaLength=3; Profile=1 IndexDeltaLength=3
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
AudioSpecificConfig() is the hexadecimal string as defined in AudioSpecificConfig() is the hexadecimal string as defined in
ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio
stream type is AAC. For the description of MIME parameters see stream type is AAC. For the description of MIME parameters see
section 4.1. section 4.1.
3.3.7 Additional modes 3.3.7 Additional modes
This specification only defines the modes specified in sections This specification only defines the modes specified in sections
3.3.2 up to 3.3.6. Additional modes are expected to be defined in 3.3.2 up to 3.3.6. Additional modes are expected to be defined in
future RFCs. Each additional mode MUST be in full compliance with future RFCs. Each additional mode MUST be in full compliance with
this specification. this specification.
When defining a new mode care MUST be taken that an implementation Any new mode MUST be defined such that an implementation including
of all features of this specification can decode the payload format all the features of this specification can decode the payload format
corresponding to this new mode. For this reason a mode MUST NOT corresponding to this new mode. For this reason a mode MUST NOT
specify new default values for MIME parameters. In particular, MIME specify new default values for MIME parameters. In particular, MIME
parameters that configure the RTP payload MUST be present (unless parameters that configure the RTP payload MUST be present (unless
they have the default value), even if its presence is redundant in they have the default value), even if its presence is redundant in
case the mode assigns a fixed value to a parameter. A mode may case the mode assigns a fixed value to a parameter. A mode may
define additionally that some MIME parameters are required instead define additionally that some MIME parameters are required instead
of optional, that some MIME parameters have fixed values (or of optional, that some MIME parameters have fixed values (or
ranges), and that there are rules restricting the usage. ranges), and that there are rules restricting the usage.
4. IANA considerations 4. IANA considerations
skipping to change at page 22, line 29 skipping to change at page 26, line 29
"video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2) "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2)
or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information
needed for an audio/visual presentation. needed for an audio/visual presentation.
"audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3) "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3)
or MPEG-4 Systems streams that convey information needed for an or MPEG-4 Systems streams that convey information needed for an
audio only presentation. audio only presentation.
"application" MUST be used for MPEG-4 Systems streams (ISO/IEC "application" MUST be used for MPEG-4 Systems streams (ISO/IEC
14496-1) that serve purposes other than audio/visual presentation, 14496-1) that serve purposes other than audio/visual presentation,
e.g. in some cases when MPEG-J streams are transmitted. e.g. in some cases when MPEG-J (Java) streams are transmitted.
Depending on the required payload configuration, MIME format Depending on the required payload configuration, MIME format
parameters need to be available to the receiver. This is done using parameters need to be available to the receiver. This is done using
the parameters described in the next section. There are required the parameters described in the next section. There are required
and optional parameters. and optional parameters.
Optional parameters are of two types: general parameters and Optional parameters are of two types: general parameters and
configuration parameters. The configuration parameters are used to configuration parameters. The configuration parameters are used to
configure the fields in the AU Header section and in the auxiliary configure the fields in the AU Header section and in the auxiliary
section. The absence of any configuration parameter is equivalent to section. The absence of any configuration parameter is equivalent to
skipping to change at page 23, line 4 skipping to change at page 26, line 46
Optional parameters are of two types: general parameters and Optional parameters are of two types: general parameters and
configuration parameters. The configuration parameters are used to configuration parameters. The configuration parameters are used to
configure the fields in the AU Header section and in the auxiliary configure the fields in the AU Header section and in the auxiliary
section. The absence of any configuration parameter is equivalent to section. The absence of any configuration parameter is equivalent to
the associated field set to its default value, which is always zero. the associated field set to its default value, which is always zero.
The absence of all configuration parameters resolves into a default The absence of all configuration parameters resolves into a default
"basic" configuration with an empty AU-header section and an empty "basic" configuration with an empty AU-header section and an empty
auxiliary section in each RTP packet. auxiliary section in each RTP packet.
MIME subtype name: mpeg4-generic MIME subtype name: mpeg4-generic
Required parameters: Required parameters:
MIME format parameters are not case dependent; however for clarity MIME format parameters are not case dependent; however for clarity
both upper and lower case are used in the names of the parameters both upper and lower case are used in the names of the parameters
described in this specification. described in this specification.
StreamType: StreamType:
The integer value that indicates the type of MPEG-4 stream that The integer value that indicates the type of MPEG-4 stream that
is carried; its coding corresponds to the values of the is carried; its coding corresponds to the values of the
streamType as defined in Table 9 (objectTypeIndication Values) streamType as defined in Table 9 (streamType Values) in ISO/IEC
in ISO/IEC 14496-1. Note that the StreamType allows signaling of 14496-1.
an MPEG-7 stream; this RTP payload format is not designed to
carry an MPEG-7 stream, and may not be suitable for transport of
MPEG-7 streams.
Profile-level-id: Profile-level-id:
A decimal representation of the MPEG-4 Profile Level indication. A decimal representation of the MPEG-4 Profile Level indication.
This parameter MUST be used in the capability exchange or This parameter MUST be used in the capability exchange or
session set-up procedure to indicate the MPEG-4 Profile and Level session set-up procedure to indicate the MPEG-4 Profile and Level
combination of which the relevant MPEG-4 media codec is capable combination of which the relevant MPEG-4 media codec is capable
of. of.
For MPEG-4 Audio streams, this parameter is the decimal value For MPEG-4 Audio streams, this parameter is the decimal value
from Table 5 (audioProfileLevelIndication Values) in ISO/IEC from Table 5 (audioProfileLevelIndication Values) in ISO/IEC
14496-1, indicating which MPEG-4 Audio tool subsets are 14496-1, indicating which MPEG-4 Audio tool subsets are
required to decode the audio stream. required to decode the audio stream.
For MPEG-4 Visual streams, this parameter is the decimal value For MPEG-4 Visual streams, this parameter is the decimal value
from Table G-1 (FLC table for profile and level indication of from Table G-1 (FLC table for profile and level indication) of
ISO/IEC 14496-2), indicating which MPEG-4 Visual tool subsets ISO/IEC 14496-2, indicating which MPEG-4 Visual tool subsets
are required to decode the visual stream. are required to decode the visual stream.
For BIFS streams, this parameter is the decimal value that is For BIFS streams, this parameter is the decimal value that is
obtained from (SPLI + 256*GPLI), where: obtained from (SPLI + 256*GPLI), where:
SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with
the applied sceneProfileLevelIndication; the applied sceneProfileLevelIndication;
GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with
the applied graphicsProfileLevelIndication. the applied graphicsProfileLevelIndication.
For MPEG-J streams, this parameter is the decimal value from For MPEG-J streams, this parameter is the decimal value from
table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1, table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1,
indicating the profile and level of the MPEG-J stream. indicating the profile and level of the MPEG-J stream.
skipping to change at page 24, line 11 skipping to change at page 27, line 45
For Clock Reference streams and Object Content Info streams, this For Clock Reference streams and Object Content Info streams, this
parameter has the decimal value zero, indicating that profile parameter has the decimal value zero, indicating that profile
and level information is conveyed through the OD framework. and level information is conveyed through the OD framework.
Config: Config:
A hexadecimal representation of an octet string that expresses A hexadecimal representation of an octet string that expresses
the media payload configuration. Configuration data is mapped the media payload configuration. Configuration data is mapped
onto the hexadecimal octet string in an MSB-first basis. The onto the hexadecimal octet string in an MSB-first basis. The
first bit of the configuration data SHALL be located at the MSB first bit of the configuration data SHALL be located at the MSB
of the first octet. In the last octet, if necessary to achieve of the first octet. In the last octet, if necessary to achieve
octet alignment, up to 7 zero-valued padding bits shall follow octet-alignment, up to 7 zero-valued padding bits shall follow
the configuration data. the configuration data.
For MPEG-4 Audio streams, config is the audio object type For MPEG-4 Audio streams, config is the audio object type
specific decoder configuration data AudioSpecificConfig() as specific decoder configuration data AudioSpecificConfig() as
defined in ISO/IEC 14496-3. For Stuctured Audio, the defined in ISO/IEC 14496-3. For Structured Audio, the
AudioSpecificConfig()may be conveyed by other means, not AudioSpecificConfig()may be conveyed by other means, not
defined by this specification. If the AudioSpecificConfig() defined by this specification. If the AudioSpecificConfig()
is conveyed by other means for Stuctured Audio, then the is conveyed by other means for Structured Audio, then the
config MUST be a quoted empty hexadecimal octet string, as config MUST be a quoted empty hexadecimal octet string, as
follows: config="". follows: config="".
Note that a future mode of using this RTP payload format for Note that a future mode of using this RTP payload format for
Structured Audio may define such other means. Structured Audio may define such other means.
For MPEG-4 Visual streams, config is the MPEG-4 Visual For MPEG-4 Visual streams, config is the MPEG-4 Visual
configuration information as defined in subclause 6.2.1 Start configuration information as defined in subclause 6.2.1 Start
codes of ISO/IEC 14496-2. The configuration information codes of ISO/IEC 14496-2. The configuration information
indicated by this parameter SHALL be the same as the indicated by this parameter SHALL be the same as the
configuration information in the corresponding MPEG-4 Visual configuration information in the corresponding MPEG-4 Visual
stream, except for first-half-vbv-occupancy and stream, except for first-half-vbv-occupancy and
latter-half-vbv-occupancy, if it exists, which may vary in latter-half-vbv-occupancy, if it exists, which may vary in
the repeated configuration information inside an MPEG-4 the repeated configuration information inside an MPEG-4
Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2). Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2).
For BIFS streams, this is the BIFSConfig() information as defined For BIFS streams, this is the BIFSConfig() information as defined
skipping to change at page 25, line 11 skipping to change at page 28, line 47
mode=AAC-hbr. mode=AAC-hbr.
Other modes are expected to be defined in future RFCs. See also Other modes are expected to be defined in future RFCs. See also
section 3.3.7 and 4.2 of RFCxxxx. section 3.3.7 and 4.2 of RFCxxxx.
Optional general parameters: Optional general parameters:
ObjectType: ObjectType:
The decimal value from Table 8 in ISO/IEC 14496-1, indicating The decimal value from Table 8 in ISO/IEC 14496-1, indicating
the value of the objectTypeIndication of the transported stream. the value of the objectTypeIndication of the transported stream.
For BIFS streams this parameter MUST be present to signal the For BIFS streams this parameter MUST be present to signal the
version of BIFSConfiguration(). Note that the ObjectType MAY version of BIFSConfiguration(). Note that ObjectTypeIndication
signal a non-MPEG-4 stream, and that the RTP payload format may signal a non-MPEG-4 stream and that the RTP payload format
defined in this document may not be suitable to carry a stream defined in this document may not be suitable to carry a stream
that is not defined by MPEG-4. that is not defined by MPEG-4. ObjectType SHOULD NOT be set to
a value that signals a stream that cannot be carried by this
payload format.
ConstantSize: ConstantSize:
The constant size in octets of each Access Unit for this stream. The constant size in octets of each Access Unit for this stream.
Simultaneous presence of ConstantSize and the SizeLength The ConstantSize and the SizeLength parameters MUST NOT be
parameters is not permitted. simultaneously present.
Profile: maxDisplacement:
The decimal representation of the applied profile to constrain The decimal representation of the maximum displacement in time
the latency when interleaving; see section 3.2.3.3. Absence of of an interleaved AU, as defined in section 3.2.3.3, expressed
this parameter signals that the profile is not specified. This in units of the RTP time stamp clock.
parameter MUST be present when interleaving is applied. This parameter MUST be present when interleaving is applied.
de-interleaveBufferSize:
The decimal representation in number of octets of the size of
the de-interleave buffer, described in section 3.2.3.3.
When interleaving, this parameter MUST be present if the
calculation of the de-interleave buffer size given in 3.2.3.3
and based on maxDisplacement and rate(max) under-estimates the
size of the de-interleave buffer. If this calculation does not
under-estimate the size of the de-interleave buffer, then the
de-interleaveBufferSize parameter SHOULD NOT be present.
Optional configuration parameters: Optional configuration parameters:
SizeLength: SizeLength:
The number of bits on which the AU-size field is encoded in the The number of bits on which the AU-size field is encoded in the
AU-header. Simultaneous presence of SizeLength and the AU-header. The SizeLength and the ConstantSize parameters MUST
ConstantSize parameter is not permitted. NOT be simultaneously present.
IndexLength: IndexLength:
The number of bits on which the AU-Index is encoded in the first The number of bits on which the AU-Index is encoded in the first
AU-header. The default value of zero indicates the absence of AU-header. The default value of zero indicates the absence of
the AU-Index and AU-Index-delta fields in each AU-header. the AU-Index and AU-Index-delta fields in each AU-header.
IndexDeltaLength: IndexDeltaLength:
The number of bits on which the AU-Index-delta field is encoded The number of bits on which the AU-Index-delta field is encoded
in any non-first AU-header. in any non-first AU-header.
skipping to change at page 26, line 8 skipping to change at page 29, line 53
the AU-header. the AU-header.
RandomAccessIndication: RandomAccessIndication:
A decimal value of zero or one, indicating whether the RAP-flag A decimal value of zero or one, indicating whether the RAP-flag
is present in the AU-header. The decimal value of one indicates is present in the AU-header. The decimal value of one indicates
presence of the RAP-flag, the default value zero its absence. presence of the RAP-flag, the default value zero its absence.
StreamStateIndication: StreamStateIndication:
The number of bits on which the Stream-state field is encoded in The number of bits on which the Stream-state field is encoded in
the AU-header. This parameter MAY be present when transporting the AU-header. This parameter MAY be present when transporting
MPEG-4 system streams, and SHALL NOT be present MPEG-4 audio and MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio
MPEG-4 video streams. and MPEG-4 video streams.
AuxiliaryDataSizeLength: AuxiliaryDataSizeLength:
The number of bits that is used to encode the auxiliary-data-size The number of bits that is used to encode the auxiliary-data-size
field. field.
Applications MAY use more parameters, in addition to those defined Applications MAY use more parameters, in addition to those defined
above. Each additional parameters MUST be registered with IANA, to above. Each additional parameter MUST be registered with IANA, to
ensure that there is no clash of names. Each additional parameter ensure that there is no clash of names. Each additional parameter
MUST be accompanied by a specification in the form of an RFC, MPEG MUST be accompanied by a specification in the form of an RFC, MPEG
standard, or other permanent and readily available reference (the standard, or other permanent and readily available reference (the
"Specification Required" policy defined in RFC 2434). Receivers MUST "Specification Required" policy defined in RFC 2434). Receivers MUST
tolerate the presence of such additional parameters, but these tolerate the presence of such additional parameters, but these
parameters SHALL NOT impact the decoding of receivers that comply to parameters SHALL NOT impact the decoding of receivers that comply to
this specification. this specification.
Encoding considerations: Encoding considerations:
System bitstreams MUST be generated according to MPEG-4 Systems This MIME subtype is defined for RTP transport only. System
bitstreams MUST be generated according to MPEG-4 Systems
specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated
according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio
bitstreams MUST be generated according to MPEG-4 Audio bitstreams MUST be generated according to MPEG-4 Audio
specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized
according to the RTP payload format defined in RFC xxxx. according to the RTP payload format defined in RFC xxxx.
Security considerations: Security considerations:
As defined in section 5 of RFC xxxx. As defined in section 5 of RFC xxxx.
Interoperability considerations: Interoperability considerations:
skipping to change at page 27, line 11 skipping to change at page 31, line 11
"profile-level-id" in MIME content. In the capability exchange / "profile-level-id" in MIME content. In the capability exchange /
announcement procedure this parameter may mutually be set to the announcement procedure this parameter may mutually be set to the
same value. same value.
Published specification: Published specification:
The specifications for MPEG-4 streams are presented in ISO/IEC The specifications for MPEG-4 streams are presented in ISO/IEC
14496-1, 14496-2, and 14496-3. The RTP payload format is described 14496-1, 14496-2, and 14496-3. The RTP payload format is described
in RFC xxxx. in RFC xxxx.
Applications which use this media type: Applications which use this media type:
Multimedia streaming and conferencing tools, Internet messaging and Multimedia streaming and conferencing tools.
Email applications.
Additional information: none Additional information: none
Magic number(s): none Magic number(s): none
File extension(s): File extension(s):
None. A file format with the extension .mp4 has been defined for None. A file format with the extension .mp4 has been defined for
MPEG-4 content but is not directly correlated with this MIME type MPEG-4 content but is not directly correlated with this MIME type
for which the sole purpose is RTP transport. for which the sole purpose is RTP transport.
skipping to change at page 27, line 36 skipping to change at page 31, line 35
Authors of RFC xxxx, IETF Audio/Video Transport working group. Authors of RFC xxxx, IETF Audio/Video Transport working group.
Intended usage: COMMON Intended usage: COMMON
Author/Change controller: Author/Change controller:
Authors of RFC xxxx, IETF Audio/Video Transport working group. Authors of RFC xxxx, IETF Audio/Video Transport working group.
4.2 Registration of mode definitions with IANA 4.2 Registration of mode definitions with IANA
This specification can be used in a number of modes. The mode of This specification can be used in a number of modes. The mode of
operation is signalled using the "Mode" MIME parameter, with the operation is signaled using the "Mode" MIME parameter, with the
initial set of values specified in Section 4.1. New modes may be initial set of values specified in section 4.1. New modes may be
defined at any time, as described in Section 3.3.7. These modes defined at any time, as described in section 3.3.7. These modes
MUST be registered with IANA, to ensure that there is no clash MUST be registered with IANA, to ensure that there is no clash
of names. of names.
A new mode registration MUST be accompanied by a specification in A new mode registration MUST be accompanied by a specification in
the form of an RFC, MPEG standard, or other permanent and readily the form of an RFC, MPEG standard, or other permanent and readily
available reference (the "Specification Required" policy defined available reference (the "Specification Required" policy defined
in RFC 2434). in RFC 2434).
4.3 Concatenation of parameters 4.3 Concatenation of parameters
skipping to change at page 28, line 12 skipping to change at page 32, line 12
in the form of a semicolon-separated list of parameter=value pairs in the form of a semicolon-separated list of parameter=value pairs
(for parameter usage examples see sections 3.3.2 up to 3.3.6). (for parameter usage examples see sections 3.3.2 up to 3.3.6).
4.4 Usage of SDP 4.4 Usage of SDP
4.4.1 The a=fmtp keyword 4.4.1 The a=fmtp keyword
It is assumed that one typical way to transport the above-described It is assumed that one typical way to transport the above-described
parameters associated with this payload format is via a SDP message parameters associated with this payload format is via a SDP message
[6] for example transported to the client in reply to a RTSP [6] for example transported to the client in reply to a RTSP
DESCRIBE or via SAP. In that case the (a=fmtp) keyword MUST be used DESCRIBE [8] or via SAP [7]. In that case the (a=fmtp) keyword MUST
as described in RFC 2327 [6], section 6, the syntax being then: be used as described in RFC 2327 [6], section 6, the syntax being
then:
a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>] a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>]
5. Security Considerations 5. Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [2]. This implies that confidentiality of the media specification [2]. This implies that confidentiality of the media
streams is achieved by encryption. Because the data compression used streams is achieved by encryption. Because the data compression used
with this payload format is applied end-to-end, encryption may be with this payload format is applied end-to-end, encryption may be
skipping to change at page 28, line 40 skipping to change at page 32, line 41
However, it is possible to inject non-compliant MPEG streams (Audio, However, it is possible to inject non-compliant MPEG streams (Audio,
Video, and Systems) to overload the receiver/decoder's buffers, Video, and Systems) to overload the receiver/decoder's buffers,
which might compromise the functionality of the receiver or even which might compromise the functionality of the receiver or even
crash it. This is especially true for end-to-end systems like MPEG crash it. This is especially true for end-to-end systems like MPEG
where the buffer models are precisely defined. where the buffer models are precisely defined.
MPEG-4 Systems supports stream types including commands that are MPEG-4 Systems supports stream types including commands that are
executed on the terminal like OD commands, BIFS commands, etc. and executed on the terminal like OD commands, BIFS commands, etc. and
programmatic content like MPEG-J (Java(TM) Byte Code) and programmatic content like MPEG-J (Java(TM) Byte Code) and
ECMAScript. It is possible to use one or more of the above in a ECMAScript. It is possible to use one or more of the above in a
manner non-compliant to MPEG to crash or temporarily make the manner non-compliant to MPEG to crash the receiver or make it
receiver unavailable. temporarily unavailable. Senders that transport MPEG-4 content
SHOULD ensure that such content is MPEG compliant, as defined in the
Senders SHOULD ensure that packet loss does not cause severe compliance part of IEC/ISO 14496 [1]. Receivers that support MPEG-4
problems in application execution when the packet carries OD content should prevent malfunctioning of the receiver in case of
commands, BIFS commands, or programmatic content such as MPEG-J and non MPEG compliant content.
ECMAScript. For example, the reliability can be improved by
re-transmission, or by using the carousel mechanism as defined by
MPEG in ISO/IEC 14496-1, while observing the general congestion
control principles. When such measures are deemed unsufficiently
adequate, instead of this payload format applications SHOULD use
more reliable means to transport the information, for example by
applying an FEC scheme for RTP (such as in RFC 2733), or by using
RTP over TCP (such as in RFC 2326, section 10.12), while giving due
consideration to congestion control. For a general description of
methods to repair streaming media see RFC 2354.
Authentication mechanisms can be used to validate the sender and Authentication mechanisms can be used to validate the sender and
the data to prevent security problems due to non-compliant malignant the data to prevent security problems due to non-compliant malignant
MPEG-4 streams. MPEG-4 streams.
In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems
streams carrying MPEG-J access units which comprise Java(TM) classes streams carrying MPEG-J access units which comprise Java(TM) classes
and objects. MPEG-J defines a set of Java APIs and a secure and objects. MPEG-J defines a set of Java APIs and a secure
execution model. MPEG-J content can call this set of APIs and execution model. MPEG-J content can call this set of APIs and
Java(TM) methods from a set of Java packages supported in the Java(TM) methods from a set of Java packages supported in the
skipping to change at page 29, line 28 skipping to change at page 33, line 24
Receivers can implement intelligent filters to validate the buffer Receivers can implement intelligent filters to validate the buffer
requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J,
ECMAScript) commands in the streams. However, this can increase the ECMAScript) commands in the streams. However, this can increase the
complexity significantly. complexity significantly.
6. Acknowledgements 6. Acknowledgements
This document evolved through several revisions thanks to This document evolved through several revisions thanks to
contributions by people from the ISMA forum, from the IETF AVT contributions by people from the ISMA forum, from the IETF AVT
Working Group and from the 4-on-IP ad-hoc group within MPEG. The Working Group and from the 4-on-IP ad-hoc group within MPEG. The
authors wish to thank all involved people, and in particular John authors wish to thank all involved people, and in particular Andrea
Lazarro, Alex MacAulay, Bill May, Colin Perkins, Dorairaj V and Basso, Stephen Casner, M. Reha Civanlar, Carsten Herpel, John
Stephan Wenger for their valuable comments and support. Lazaro, Zvi Lifshitz, Young-kwon Lim, Alex MacAulay, Bill May,
Colin Perkins, Dorairaj V and Stephan Wenger for their valuable
comments and support.
7. References 7. References
[1] ISO/IEC International Standard 14496 (MPEG-4); "Information [1] ISO/IEC International Standard 14496 (MPEG-4); "Information
technology - Coding of audio-visual objects", January 2000 technology - Coding of audio-visual objects", January 2000
[2] Schulzrinne, Casner, Frederick, Jacobson RTP, "A Transport [2] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson RTP, "A
Protocol for Real Time Applications", RFC 1889, Internet Transport Protocol for Real Time Applications", RFC 1889, Internet
Engineering Task Force, January 1996. Engineering Task Force, January 1996.
[3] S. Bradner, "Key words for use in RFCs to Indicate Requirement [3] S. Bradner, "Key words for use in RFCs to Indicate Requirement
Levels", RFC 2119, March 1997. Levels", RFC 2119, March 1997.
[4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload [4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload
format for MPEG1/MPEG2 Video", RFC 2250, January 1998. format for MPEG1/MPEG2 Video", RFC 2250, January 1998.
[5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP [5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP
payload format for MPEG-4 Audio/Visual streams", RFC 3016. payload format for MPEG-4 Audio/Visual streams", RFC 3016.
[6] Handley, Jacobson, "SDP: Session Description Protocol", [6] M. Handley, V. Jacobson, "SDP: Session Description Protocol",
RFC 2327, Internet Engineering Task Force, April 1998. RFC 2327, Internet Engineering Task Force, April 1998.
8. Author Adresses [7] M. Handley, C. Perkins, E. Whelan, "SAP: Session Announcement
Protocol", RFC 2974, Internet Engineering Task Force, October 2000.
[8] H. Schulzrinne, A. Rao, R. Lanphier, "RTSP: Real-Time Session
Protocol", RFC 2326, Internet Engineering Task Force, April 1998.
8. Author Addresses
Jan van der Meer Jan van der Meer
Philips Digital Networks Philips Digital Networks
Cederlaan 4 Cederlaan 4
5600 JB Eindhoven 5600 JB Eindhoven
Netherlands Netherlands
Email : jan.vandermeer@philips.com Email : jan.vandermeer@philips.com
David Mackie David Mackie
Cisco Systems Inc. Apple Computer, Inc.
170 West Tasman Dr. One Infinite Loop, MS:302-2LF
San Jose, CA 95134 Cupertino CA 95014
Email: dmackie@cisco.com Email: dmackie@apple.com
Viswanathan Swaminathan Viswanathan Swaminathan
Sun Microsystems Inc. Sun Microsystems Inc.
901 San Antonio Road, M/S UMPK15-214 901 San Antonio Road, M/S UMPK15-214
Palo Alto, CA 94303 Palo Alto, CA 94303
Email: viswanathan.swaminathan@sun.com Email: viswanathan.swaminathan@sun.com
David Singer David Singer
Apple Computer, Inc. Apple Computer, Inc.
One Infinite Loop, MS:302-3MT One Infinite Loop, MS:302-3MT
skipping to change at page 30, line 38 skipping to change at page 35, line 4
One Infinite Loop, MS:302-3MT One Infinite Loop, MS:302-3MT
Cupertino CA 95014 Cupertino CA 95014
Email: singer@apple.com Email: singer@apple.com
Philippe Gentric Philippe Gentric
Philips Digital Networks, MP4Net Philips Digital Networks, MP4Net
51 rue Carnot 51 rue Carnot
92156 Suresnes 92156 Suresnes
France France
e-mail: philippe.gentric@philips.com e-mail: philippe.gentric@philips.com
Full Copyright Statement Full Copyright Statement
"Copyright (C) The Internet Society (date). All Rights Reserved. Copyright (C) The Internet Society (December 2002). All Rights
Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain others, and derivative works that comment on or otherwise explain
it or assist in its implementation may be prepared, copied, it or assist in its implementation may be prepared, copied,
published and distributed, in whole or in part, without restriction published and distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice and this of any kind, provided that the above copyright notice and this
paragraph are included on all such copies and derivative works. paragraph are included on all such copies and derivative works.
However, this document itself may not be modified in any way, such However, this document itself may not be modified in any way, such
as by removing the copyright notice or references to the Internet as by removing the copyright notice or references to the Internet
Society or other Internet organizations, except as needed for the Society or other Internet organizations, except as needed for the
purpose of developing Internet standards in which case the purpose of developing Internet standards in which case the
procedures for copyrights defined in the Internet Standards process procedures for copyrights defined in the Internet Standards process
MUST be followed, or as required to translate it into. MUST be followed, or as required to translate it into languages
other than English.
The limited permissions granted above are perpetual and will
not be revoked by the Internet Society or its successors or
assigns.
This document and the information contained herein is provided on
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
APPENDIX: Usage of this payload format APPENDIX: Usage of this payload format
Appendix A. Examples of delay analysis with interleave Appendix A. Interleave analysis
A.1 Group interleave A.1 Introduction
An example of regular interleave is when packets are formed into In this appendix interleaving issues are discussed. Some general
groups. If the number of packets in a group is N, for example notes are provided on de-interleaving and error concealment, while
packet 0 could contain frame 0, frame N, frame 2N, and so on; a number of interleaving patterns are examined, in particular
packet 1 could contain frame 1, frame 1+N, 1+2N, and so on. The for determining the maximum displacement in time and the size of
AU-Index field is used to document the sequence of the packet the de-interleave buffer. In these examples, the maximum
within the group (or the first frame in the packet, which is the displacement is cited in terms of an access unit count, for ease of
same thing in this scheme), and all the AU-Index-delta fields reading. In actual streams, it is signalled in units of the RTP
contain N-1. time stamp clock.
Because each subsequent frame in the packet has a higher time stamp A.2 De-interleaving and error concealment
than the preceding frame, receivers can tell when a new interleave
group is starting, by noting that the computed time stamp of the
first frame in a packet is later than any previously computed time
stamp. In that case the time stamps of all frames contained in the
packet are higher than any previously computed time stamp, and
hence interleaving with any previously received frame is not
possible. In conclusion, a new group has been started.
If the group size is 3, then packets can be formed as follows: This appendix does not describe any details on de-interleaving and
error concealment, as the control of the AU decoding and error
concealment process has little to do with interleaving. If the
next AU to be decoded is present and there is sufficient storage
available for the decoded AU, then decode it now. If not, wait.
When the decoding deadline is reached (i.e., the time when decoding
must begin in order to be completed by the time the AU is to be
presented), or if the decoder is some hardware that presents a
constant delay between initiation of decoding of an AU and
presentation of that AU, then decoding must begin at that deadline
time.
Packet Time stamp Frame Numbers AU-Index, AU-Index-delta If the next AU to be decoded is not present when the decoding
0 T[0] 0, 3, 6 0, 2, 2 deadline is reached, then that AU is lost so the receiver must take
1 T[1] 1, 4, 7 0, 2, 2 whatever error concealment measures is deemed appropriate. The
2 T[2] 2, 5, 8 0, 2, 2 playout delay may need to be adjusted at that point (especially if
3 T[9] 9,12,15 0, 2, 2 other AUs have also missed their deadline recently). Or, if it
was a momentary delay, and maintaining the latency is important,
then the receiver should minimize the glitch and continue processing
with the next AU.
In this case, the receiver would have to buffer 4 frames at least A.3 Simple Group interleave
from packets 0 and 1, and can flush all frames when packet 2
arrives. (Frame 0 can be flushed as packet 0 arrives, since it is
the earliest frame we hold, and likewise frame 1 from packet 1; we
are therefore holding 3,4,6,7 until packet 2 arrives).
If there is loss, then the receiver may wait longer than is strictly A.3.1 Introduction
necessary before it emits frames. For example, say packet 1 is lost
from the above example. Packet 0 allows frame 0 to be emitted, and An example of regular interleave is when packets are formed into
then packet 2 arrives, allowing us to notice the loss of frame 1, groups. If the 'stride' of the interleave (the distance between
and emit frame 2 and 3. Then it is not until the arrival of packet 3 interleaved AUs) is N, packet 0 could contain AU(0), AU(N), AU(2N),
(which has a time-stamp beyond the times of all the frames seen so and so on; packet 1 could contain AU(1), AU(1+N), AU(1+2N), and so
far), that we can finish dealing with the loss, even though the on. If there are M access units in a packet, then there are M*N
first group has, in fact, ended. (This is in contrast to schemes access units in the group.
which signal the group size explicitly; if the receiver knows that
this is packet 3 of 3, then even if 2 of 3 is missing, it can An example with N=M=3 follows; note that this is the same example
de-interleave this group without waiting for the next one to start). as given in section 2.5:
Packet Time stamp Carried AUs AU-Index, AU-Index-delta
P(0) T[0] 0, 3, 6 0, 2, 2
P(1) T[1] 1, 4, 7 0, 2, 2
P(2) T[2] 2, 5, 8 0, 2, 2
P(3) T[9] 9,12,15 0, 2, 2
In the above example the AU-Index is coded with the value 0, as In the above example the AU-Index is coded with the value 0, as
required for the modes defined in this document. To reconstruct the required for the modes defined in this document. The position of
original order, the RTP time stamp and the AU-Index-delta are used. the first AU of each packet within the group is defined by the RTP
See also section 3.2.3.2. time stamp, while the AU-Index-delta field indicates the position
of subsequent AUs relative to the first AU in the packet. All
AU-Index-delta fields are coded with the value N-1, equal to 2 in
this example. Hence the RTP time stamp and the AU-Index-delta are
used to reconstruct the original order. See also section 3.2.3.2.
A.3.2 Determining the de-interleave buffer size
For the regular pattern as in this example, figure 6 in section
3.2.3.3 shows that the de-interleave buffer size is equal to 4 AU
sizes.
A.3.3 Determining the maximum displacement
For the regular pattern as in this example, figure 7 in section 3.3
shows that the value of the maxDisplacement equals 5 AU periods.
A.4 More subtle group interleave
A.4.1 Introduction
Another example of forming packets with group interleave is given Another example of forming packets with group interleave is given
below. In this example the packets are formed such that the loss of below. In this example the packets are formed such that the loss of
two subsequent RPT packets does not cause the loss of two subsequent two subsequent RTP packets does not cause the loss of two subsequent
audio frames. Note that in this example the RTP time stamps of AUs. Note that in this example the RTP time stamps of packet 3 and
packets 3 and 4 are earlier than the RTP time stamps of packets 1 packet 4 are earlier than the RTP time stamps of packets 1 and 2,
and 2, respectively. respectively.
Packet Time stamp Frame Numbers AU-Index, AU-Index-delta Packet Time stamp Carried AUs AU-Index, AU-Index-delta
0 T[0] 0, 5, 10, 15 0, 5, 5, 5 0 T[0] 0, 5 0, 5
1 T[2] 2, 7, 12, 17 0, 5, 5, 5 1 T[2] 2, 7 0, 5
2 T[4] 4, 9, 14, 19 0, 5, 5, 5 2 T[4] 4, 9 0, 5
3 T[1] 1, 6, 11, 16 0, 5, 5, 5 3 T[1] 1, 6 0, 5
4 T[3] 3, 8, 13, 18 0, 5, 5, 5 4 T[3] 3, 8 0, 5
5 T[20] 20, 25, 30, 35 0, 5, 5, 5 5 T[10] 10, 15 0, 5
and so on .. and so on ..
A.2 Continuous interleave In this example the AU-Index is coded with the value 0, as required
for the modes defined in this document. To reconstruct the original
order, the RTP time stamp and the AU-Index-delta (coded with the
value 5) are used. See also section 3.2.3.2.
In continuous interleave, once the scheme is 'primed', the number of A.4.2 Determining the de-interleave buffer size
frames in a packet exceeds the 'stride' (the distance between them).
This shortens the buffering needed, smooths the data-flow, and gives From figure 8 it can be to determined that at most 5 "early" AUs
slightly larger packets -- and thus lower overhead -- for the same are to be stored. If the AUs are of constant size, then this value
interleave. For example, here is a continuous interleave also over equals 5 times the AU size.
a stride of 3 frames, but with 4 frames per packet, for a run of 20
frames. This shows both how the scheme 'starts up' and how it +--+--+--+--+--+--+--+--+--+--+
Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
+--+--+--+--+--+--+--+--+--+--+
- - 5 - 5 - 2 7 4 9
7 4 9 5
Received "early" AUs 5 6
7 7
9 9
Figure 8: Storage of "early" AUs in the de-interleave buffer per
interleaved AU.
A.4.2 Determining the maximum displacement
From figure 9 it can be seen that max-interleaveDisplacement has
a value of 8 AU periods.
+--+--+--+--+--+--+--+--+--+--+
Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
+--+--+--+--+--+--+--+--+--+--+
Earliest not yet received AU - 1 1 1 1 1 - 3 - -
Figure 9: The earliest not yet received AU for each AU in the
interleaving pattern.
A.5 Continuous interleave
A.5.1 Introduction
In continuous interleave, once the scheme is 'primed', the number
of AUs in a packet exceeds the 'stride' (the distance between
them). This shortens the buffering needed, smooths the data-flow,
and gives slightly larger packets -- and thus lower overhead -- for
the same interleave. For example, here is a continuous interleave
also over a stride of 3 AUs, but with 4 AUs per packet, for a run
of 20 AUs. This shows both how the scheme 'starts up' and how it
finishes. finishes.
Packet Time-stamp Frame Numbers AU-Index, AU-Index-delta Packet Time-stamp Carried AUs AU-Index, AU-Index-delta
0 T[0] 0 0 0 T[0] 0 0
1 T[1] 1 4 0 2 1 T[1] 1 4 0 2
2 T[2] 2 5 8 0 2 2 2 T[2] 2 5 8 0 2 2
3 T[3] 3 6 9 12 0 2 2 2 3 T[3] 3 6 9 12 0 2 2 2
4 T[7] 7 10 13 16 0 2 2 2 4 T[7] 7 10 13 16 0 2 2 2
5 T[11] 11 14 17 20 0 2 2 2 5 T[11] 11 14 17 20 0 2 2 2
6 T[15] 15 18 0 2 6 T[15] 15 18 0 2
7 T[19] 19 0 7 T[19] 19 0
In this case, the receiver has to buffer only 3 frames, not 4. Say Also in this example the AU-Index is coded with the value 0, as
we are waiting for packet 4. We can flush frames 0, 1, 2, 3, 4, 5,
6; we are holding therefore 8, 9, 12. Packet 4 arrives, allowing
us to emit 7,8,9,10, and we are holding 12,13,16. Each arriving
packet contains 4 frames, and allows 4 frames to be flushed.
In the above example the AU-Index is coded with the value 0, as
required for the modes defined in this document. To reconstruct the required for the modes defined in this document. To reconstruct the
original order, the RTP time stamp and the AU-Index-delta are used. original order, the RTP time stamp and the AU-Index-delta (coded
See also 3.2.3.2. with the value 2) are used. See also 3.2.3.2. Note that this
example has RTP time-stamps in increasing order.
If there is loss, again the receiver has to wait to emit the erasure A.5.2 Determining the de-interleave buffer size
frames. In this case, say packet 3 is lost. We were holding frames
4, 5, and 8. On the arrival of packet 4, (time-stamp of frame 7), For this example the de-interleave buffer size can be derived from
we now know frame 3 was lost, we can emit frames 4,5, and we know 6 figure 10. The maximum number of "early" AUs is three. If the AUs
must be lost, and emit 7, which is in the packet that arrived. Then are of constant size, then this value equals 3 times the AU size.
on the arrival of packet 5 (time-stamp 11) we can emit 8, indicate Compared to the example in A.2, for constant size AUs the
loss of 9, and emit 10 and 11. Finally, the arrival of packet 6 de-interleave buffer size is reduced from 4 to 3 times the AU size,
(time-stamp 15) indicates that 12 must be lost; we have now while maintaining the same 'stride'.
detected all the lost frames.
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
- - - 4 - - 4 8 - - 8 12 - -
5 9
Received "early" AUs 8 12
Figure 10: Storage of "early" AUs in the de-interleave buffer per
interleaved AU.
A.5.3 Determining the maximum displacement
For this example the maxDisplacement has a value of 5 AU periods.
See figure 11.
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
Earliest not yet
received AU - - 2 - 3 3 - - 7 7 - - 11 11
Figure 11: The earliest not yet received AU for each AU in the
interleaving pattern.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/