draft-ietf-avt-mpeg4-simple-06.txt   draft-ietf-avt-mpeg4-simple-07.txt 
skipping to change at page 1, line 13 skipping to change at page 1, line 13
Internet Draft Philips Electronics Internet Draft Philips Electronics
D. Mackie D. Mackie
Apple Computer Apple Computer
V. Swaminathan V. Swaminathan
Sun Microsystems Inc. Sun Microsystems Inc.
D. Singer D. Singer
Apple Computer Apple Computer
P. Gentric P. Gentric
Philips Electronics Philips Electronics
December 2002 February 2003
Expires June 2003 Expires August 2003
Document: draft-ietf-avt-mpeg4-simple-06.txt Document: draft-ietf-avt-mpeg4-simple-07.txt
Transport of MPEG-4 Elementary Streams RTP Payload Format for Transport of MPEG-4 Elementary Streams
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of section 10 of RFC 2026. all provisions of section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of Drafts. Internet-Drafts are draft documents valid for a maximum of
skipping to change at page 2, line 27 skipping to change at page 2, line 27
2.9. Carriage of auxiliary information . . . . . . . . . . . . 9 2.9. Carriage of auxiliary information . . . . . . . . . . . . 9
2.10. MIME format parameters and configuring conditional field . 9 2.10. MIME format parameters and configuring conditional field . 9
2.11. Global structure of payload format . . . . . . . . . . . . 9 2.11. Global structure of payload format . . . . . . . . . . . . 9
2.12. Modes to transport MPEG-4 streams . . . . . . . . . . . . 10 2.12. Modes to transport MPEG-4 streams . . . . . . . . . . . . 10
2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . 10 2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . 10
3. Payload format . . . . . . . . . . . . . . . . . . . . . . . 11 3. Payload format . . . . . . . . . . . . . . . . . . . . . . . 11
3.1. Usage of RTP header fields and RTCP . . . . . . . . . . . 11 3.1. Usage of RTP header fields and RTCP . . . . . . . . . . . 11
3.2. RTP payload structure . . . . . . . . . . . . . . . . . . 12 3.2. RTP payload structure . . . . . . . . . . . . . . . . . . 12
3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . 12 3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . 12
3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . 12 3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . 12
3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . 14 3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . 15
3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . 15 3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . 15
3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . 16 3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . 16
3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 16 3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 16
3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . 17 3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . 18
3.2.3.4. Crucial and non-crucial AUs with MPEG-4 System data . 20 3.2.3.4. Crucial and non-crucial AUs with MPEG-4 System data . 20
3.3. Usage of this specification . . . . . . . . . . . . . . . 22 3.3. Usage of this specification . . . . . . . . . . . . . . . 22
3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . 22 3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . 22
3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . 23 3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . 23
3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . 23 3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . 23
3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . 24 3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . 24
3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . 25 3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . 25
3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . 26 3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . 26
4. IANA considerations . . . . . . . . . . . . . . . . . . . . 27 4. IANA considerations . . . . . . . . . . . . . . . . . . . . 27
skipping to change at page 3, line 16 skipping to change at page 3, line 16
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 37 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 37
A.2 De-interleaving and error concealment . . . . . . . . . 37 A.2 De-interleaving and error concealment . . . . . . . . . 37
A.3 Simple Group interleave . . . . . . . . . . . . . . . . 37 A.3 Simple Group interleave . . . . . . . . . . . . . . . . 37
A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 37 A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 37
A.3.2 Determining the de-interleave buffer size . . . . . . 38 A.3.2 Determining the de-interleave buffer size . . . . . . 38
A.3.3 Determining the maximum displacement . . . . . . . . . 38 A.3.3 Determining the maximum displacement . . . . . . . . . 38
A.4 More subtle group interleave . . . . . . . . . . . . . . 38 A.4 More subtle group interleave . . . . . . . . . . . . . . 38
A.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 38 A.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 38
A.4.2 Determining the de-interleave buffer size . . . . . . 39 A.4.2 Determining the de-interleave buffer size . . . . . . 39
A.4.3 Determining the maximum displacement . . . . . . . . . 39 A.4.3 Determining the maximum displacement . . . . . . . . . 39
A.5 Continuous interleave . . . . . . . . . . . . . . . . . 39 A.5 Continuous interleave . . . . . . . . . . . . . . . . . 40
A.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 39 A.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 40
A.5.2 Determining the de-interleave buffer size . . . . . . 40 A.5.2 Determining the de-interleave buffer size . . . . . . 40
A.5.3 Determining the maximum displacement . . . . . . . . . 41 A.5.3 Determining the maximum displacement . . . . . . . . . 41
1. Introduction 1. Introduction
The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29
that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4
standards [1]. The MPEG-4 standard specifies compression of standards [1]. The MPEG-4 standard specifies compression of
audio-visual data into for example an audio or video elementary audio-visual data into for example an audio or video elementary
stream. In the MPEG-4 standard, these streams take the form of stream. In the MPEG-4 standard, these streams take the form of
skipping to change at page 6, line 18 skipping to change at page 6, line 18
With this payload format a single MPEG-4 elementary stream can be With this payload format a single MPEG-4 elementary stream can be
transported. Information on the type of MPEG-4 stream carried in transported. Information on the type of MPEG-4 stream carried in
the payload is conveyed by MIME format parameters, for example in the payload is conveyed by MIME format parameters, for example in
an SDP [5] message or by other means (see section 4). These MIME an SDP [5] message or by other means (see section 4). These MIME
format parameters specify the configuration of the payload. To format parameters specify the configuration of the payload. To
allow for simplified and dedicated receivers, a MIME format allow for simplified and dedicated receivers, a MIME format
parameter is available to signal a specific mode of using this parameter is available to signal a specific mode of using this
payload. A mode definition MAY include the type of MPEG-4 payload. A mode definition MAY include the type of MPEG-4
elementary stream as well as the applied configuration, so as to elementary stream as well as the applied configuration, so as to
avoid the need in receivers to parse all MIME format parameters. avoid the need for receivers to parse all MIME format parameters.
The applied mode MUST be signaled. The applied mode MUST be signaled.
2.2 MPEG Access Units 2.2 MPEG Access Units
For carriage of compressed audio-visual data MPEG defines Access For carriage of compressed audio-visual data MPEG defines Access
Units. An MPEG Access Unit (AU) is the smallest data entity to Units. An MPEG Access Unit (AU) is the smallest data entity to
which timing information is attributed. In case of audio an Access which timing information is attributed. In case of audio an Access
Unit may represent an audio frame and in case of video a picture. Unit may represent an audio frame and in case of video a picture.
MPEG Access Units are by definition octet-aligned. If for example MPEG Access Units are by definition octet-aligned. If for example
an audio frame is not octet-aligned, up to 7 zero-padding bits MUST an audio frame is not octet-aligned, up to 7 zero-padding bits MUST
skipping to change at page 6, line 48 skipping to change at page 6, line 48
it, up to, but not including the startcode indicating the start of it, up to, but not including the startcode indicating the start of
a new video stream or the next Access Unit. a new video stream or the next Access Unit.
2.3 Concatenation of Access Units 2.3 Concatenation of Access Units
Frequently it is possible to carry multiple Access Units in one RTP Frequently it is possible to carry multiple Access Units in one RTP
packet. This is particularly useful for audio; for example, when packet. This is particularly useful for audio; for example, when
AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC
frames contain on average approximately 200 octets. On a LAN with a frames contain on average approximately 200 octets. On a LAN with a
1500 octet MTU this would allow on average 7 complete AAC frames to 1500 octet MTU this would allow on average 7 complete AAC frames to
be carried per AAC packet. be carried per RTP packet.
Access Units may have a fixed size in octets, but a variable size Access Units may have a fixed size in octets, but a variable size
is also possible. To facilitate parsing in case of multiple is also possible. To facilitate parsing in case of multiple
concatenated AUs in one RTP packet, the size of each AU is made concatenated AUs in one RTP packet, the size of each AU is made
known to the receiver. When concatenating in case of a constant AU known to the receiver. When concatenating in case of a constant AU
size, this size is communicated "out of band" through a MIME format size, this size is communicated "out of band" through a MIME format
parameter. When concatenating in case of variable size AUs, the RTP parameter. When concatenating in case of variable size AUs, the RTP
payload carries "in band" an AU size field for each contained AU. payload carries "in band" an AU size field for each contained AU.
In combination with the RTP payload length the size information In combination with the RTP payload length the size information
skipping to change at page 7, line 49 skipping to change at page 7, line 49
de-interleaving, the RTP sender is free to choose the interleaving de-interleaving, the RTP sender is free to choose the interleaving
pattern without propagating this information a priori to the pattern without propagating this information a priori to the
receiver(s). Indeed the sender could dynamically adjust the receiver(s). Indeed the sender could dynamically adjust the
interleaving pattern based on the Access Unit size, error rates, interleaving pattern based on the Access Unit size, error rates,
etc. The RTP receiver does not need to know the interleaving etc. The RTP receiver does not need to know the interleaving
pattern used, it only needs to extract the index information of the pattern used, it only needs to extract the index information of the
Access Unit and insert the Access Unit into the appropriate Access Unit and insert the Access Unit into the appropriate
sequence in the decoding or rendering queue. An example of sequence in the decoding or rendering queue. An example of
interleaving is given below. interleaving is given below.
Assume that an RTP packet contains 3 AUs, and that the AUs are For example, if we assume that an RTP packet contains 3 AUs, and
numbered 0, 1, 2, 3, 4, etc. If an interleaving group length of 9 is that the AUs are numbered 0, 1, 2, 3, 4, and so forth, and if an
chosen, then RTP packet(i) contains the following AU(n): interleaving group length of 9 is chosen, then RTP packet(i)
contains the following AU(n):
RTP packet(0): AU(0), AU(3), AU(6) RTP packet(0): AU(0), AU(3), AU(6)
RTP packet(1): AU(1), AU(4), AU(7) RTP packet(1): AU(1), AU(4), AU(7)
RTP packet(2): AU(2), AU(5), AU(8) RTP packet(2): AU(2), AU(5), AU(8)
RTP packet(3): AU(9), AU(12), AU(15) RTP packet(3): AU(9), AU(12), AU(15)
RTP packet(4): AU(10), AU(13), AU(16) RTP packet(4): AU(10), AU(13), AU(16) Etc.
Etc.
2.6 Time stamp information 2.6 Time stamp information
The RTP time stamp MUST carry the sampling instant of the first AU The RTP time stamp MUST carry the sampling instant of the first AU
(fragment) in the RTP packet. When multiple AUs are carried within (fragment) in the RTP packet. When multiple AUs are carried within
an RTP packet, the time stamps of subsequent AUs can be calculated an RTP packet, the time stamps of subsequent AUs can be calculated
if the frame period of each AU is known. For audio and video this if the frame period of each AU is known. For audio and video this
is possible if the frame rate is constant. However, in some cases is possible if the frame rate is constant. However, in some cases
it is not possible to make such calculation, for example for it is not possible to make such calculation. For example, for
variable frame rate video and for MPEG-4 BIFS streams carrying variable frame rate video, or for MPEG-4 BIFS streams carrying
composition information. To support such cases, this payload format composition information. To support such cases, this payload format
can be configured to carry a time stamp in the RTP payload for each can be configured to carry a time stamp in the RTP payload for each
contained Access Unit. A time stamp MAY be conveyed in the RTP contained Access Unit. A time stamp MAY be conveyed in the RTP
payload only for non-first AUs in the RTP packet, and SHALL NOT be payload only for non-first AUs in the RTP packet, and SHALL NOT be
conveyed for the first AU (fragment), as the time stamp for the conveyed for the first AU (fragment), as the time stamp for the
first AU in the RTP packet is carried by the RTP time stamp. first AU in the RTP packet is carried by the RTP time stamp.
MPEG-4 defines two type of time stamps, the composition time stamp MPEG-4 defines two types of time stamp: the composition time stamp
(CTS) and the decoding time stamp (DTS). The CTS represents the (CTS) and the decoding time stamp (DTS). The CTS represents the
sampling instant of an AU, and hence the CTS is equivalent to the sampling instant of an AU, and hence the CTS is equivalent to the
RTP time stamp. The DTS may be used in MPEG-4 video streams that RTP time stamp. The DTS may be used in MPEG-4 video streams that
use bi-directional coding, i.e. when pictures are predicted in both use bi-directional coding, i.e. when pictures are predicted in both
forward and backward direction by using either a reference picture forward and backward direction by using either a reference picture
in the past, or a reference picture in the future. The DTS cannot in the past, or a reference picture in the future. The DTS cannot
be carried in the RTP header. In some cases the DTS can be derived be carried in the RTP header. In some cases the DTS can be derived
from the RTP time stamp using frame rate information; this requires from the RTP time stamp using frame rate information; this requires
deep parsing in the video stream, which may be considered deep parsing in the video stream, which may be considered
objectionable. But if the video frame rate is variable, the required objectionable. But if the video frame rate is variable, the required
skipping to change at page 10, line 43 skipping to change at page 10, line 40
2.13 Alignment with RFC 3016 2.13 Alignment with RFC 3016
This payload can be configured to be nearly identical to the This payload can be configured to be nearly identical to the
payload format defined in RFC 3016 [12] for the MPEG-4 video payload format defined in RFC 3016 [12] for the MPEG-4 video
configurations recommended in RFC 3016. Hence, receivers that configurations recommended in RFC 3016. Hence, receivers that
comply with RFC 3016 can decode such RTP payload, providing that comply with RFC 3016 can decode such RTP payload, providing that
additional packets containing video decoder configuration (VO, additional packets containing video decoder configuration (VO,
VOL, VOSH) are inserted in the stream, as required by RFC 3016. VOL, VOSH) are inserted in the stream, as required by RFC 3016.
Conversely, receivers that comply with the specification in this Conversely, receivers that comply with the specification in this
document should be able to decode payloads, names and parameters document SHOULD be able to decode payloads, names and parameters
defined for MPEG-4 video in RFC 3016. In this respect it is defined for MPEG-4 video in RFC 3016. In this respect it is
strongly RECOMMENDED to implement the ability to ignore "in band" strongly RECOMMENDED to implement the ability to ignore "in band"
video decoder configuration packets in the RFC 3016 payload. video decoder configuration packets in the RFC 3016 payload.
Note the "out of band" availability of the video decoder Note the "out of band" availability of the video decoder
configuration is optional in RFC 3016. To achieve maximum configuration is optional in RFC 3016. To achieve maximum
interoperability with the RTP payload format defined in this interoperability with the RTP payload format defined in this
document, applications that use RFC 3016 to transport MPEG-4 video document, applications that use RFC 3016 to transport MPEG-4 video
(part 2) are recommended to make the video decoder configuration (part 2) are recommended to make the video decoder configuration
available as a MIME parameter. available as a MIME parameter.
3. Payload Format 3. Payload Format
3.1 Usage of RTP Header Fields and RTCP 3.1 Usage of RTP Header Fields and RTCP
Payload Type (PT): The assignment of an RTP payload type for this Payload Type (PT): The assignment of an RTP payload type for this
packet format is outside the scope of this document; it is packet format is outside the scope of this document; it is
specified by the RTP profile under which this payload format is specified by the RTP profile under which this payload format is
used. used, or signaled dynamically out-of-band (e.g. using SDP).
Marker (M) bit: The M bit is set to 1 to indicate that the RTP Marker (M) bit: The M bit is set to 1 to indicate that the RTP
packet payload contains either the final fragment of a fragmented packet payload contains either the final fragment of a fragmented
Access Unit or one or more complete Access Units. Access Unit or one or more complete Access Units.
Extension (X) bit: Defined by the RTP profile used. Extension (X) bit: Defined by the RTP profile used.
Sequence Number: The RTP sequence number SHOULD be generated by the Sequence Number: The RTP sequence number SHOULD be generated by the
sender in the usual manner with a constant random offset. sender in the usual manner with a constant random offset.
skipping to change at page 12, line 42 skipping to change at page 12, line 42
bit-wise concatenated in the order in which the Access Units are bit-wise concatenated in the order in which the Access Units are
contained in the Access Unit Data Section. Hence, the n-th contained in the Access Unit Data Section. Hence, the n-th
AU-header refers to the n-th AU (fragment). If the concatenated AU-header refers to the n-th AU (fragment). If the concatenated
AU-headers consume a non-integer number of octets, up to 7 AU-headers consume a non-integer number of octets, up to 7
zero-padding bits MUST be inserted at the end in order to achieve zero-padding bits MUST be inserted at the end in order to achieve
octet-alignment of the AU Header Section. octet-alignment of the AU Header Section.
3.2.1.1 The AU-header 3.2.1.1 The AU-header
Each AU-header may contain the fields given in figure 3. The length Each AU-header may contain the fields given in figure 3. The length
in bits of the above fields with the exception of the CTS-flag, the in bits of the fields, with the exception of the CTS-flag, the
DTS-flag and the RAP-flag fields is defined by MIME format DTS-flag and the RAP-flag fields is defined by MIME format
parameters; see section 4.1. If a MIME format parameter has the parameters; see section 4.1. If a MIME format parameter has the
default value of zero, then the associated field is not present. default value of zero, then the associated field is not present.
The number of bits for fields that are present and that represent The number of bits for fields that are present and that represent
the value of a parameter MUST be chosen large enough to correctly the value of a parameter MUST be chosen large enough to correctly
encode the largest value of that parameter during the session. encode the largest value of that parameter during the session.
If present, the fields MUST occur in the mutual order given in If present, the fields MUST occur in the mutual order given in
figure 3. In the general case a receiver can only discover the size figure 3. In the general case a receiver can only discover the size
of an AU-header by parsing it since the presence of the CTS-delta of an AU-header by parsing it since the presence of the CTS-delta
skipping to change at page 16, line 27 skipping to change at page 16, line 27
|-+-+-+-+-+-+-+-+ |-+-+-+-+-+-+-+-+
Figure 5: Access Unit Data Section; each AU is octet-aligned. Figure 5: Access Unit Data Section; each AU is octet-aligned.
When multiple Access Units are carried, the size of each AU MUST be When multiple Access Units are carried, the size of each AU MUST be
made available to the receiver. If the AU size is variable then the made available to the receiver. If the AU size is variable then the
size of each AU MUST be indicated in the AU-size field of the size of each AU MUST be indicated in the AU-size field of the
corresponding AU-header. However, if the AU size is constant for a corresponding AU-header. However, if the AU size is constant for a
stream, this mechanism SHOULD NOT be used, but instead the fixed stream, this mechanism SHOULD NOT be used, but instead the fixed
size SHOULD be signaled by the MIME format parameter size SHOULD be signaled by the MIME format parameter
"ConstantSize", see section 4.1. "constantSize", see section 4.1.
The absence of both AU-size in the AU-header and the ConstantSize The absence of both AU-size in the AU-header and the constantSize
MIME format parameter indicates carriage of a single AU (fragment), MIME format parameter indicates carriage of a single AU (fragment),
i.e. that a single Access Unit (fragment) is transported in each i.e. that a single Access Unit (fragment) is transported in each
RTP packet for that stream. RTP packet for that stream.
3.2.3.1 Fragmentation 3.2.3.1 Fragmentation
A packet SHALL carry either one or more complete Access Units, or A packet SHALL carry either one or more complete Access Units, or
a single fragment of an Access Unit. Fragments of the same Access a single fragment of an Access Unit. Fragments of the same Access
Unit have the same time stamp but different RTP sequence numbers. Unit have the same time stamp but different RTP sequence numbers.
The marker bit in the RTP header is 1 on the last fragment of an The marker bit in the RTP header is 1 on the last fragment of an
Access Unit, and 0 on all other fragments. Access Unit, and 0 on all other fragments.
3.2.3.2 Interleaving 3.2.3.2 Interleaving
Access Units MAY be interleaved. Senders MAY perform interleaving. Unless prohibited by the signaled mode, a sender MAY interleave
Receivers MUST support interleaving, except if the receiver only Access Units. Receivers that are capable of receiving modes that
supports modes in which no interleaving is allowed. When Access support interleaving, MUST be able to decode interleaved Access
Units are interleaved, it SHALL be implemented using the AU-Index Units.
and the AU-Index-delta fields in the AU-header.
When a sender interleaves Access Units, then the transmitter needs When a sender interleaves Access Units, it needs to provide
to provide sufficient information to enable a receiver to sufficient information to enable a receiver to unambiguously
unambiguously reconstruct the original order, even in case of reconstruct the original order, even in case of out-of-order
out-of-order packets, packet loss or duplication. The information packets, packet loss or duplication. The information that senders
that senders need to provide depends on whether or not the Access need to provide depends on whether or not the Access Units have a
Units have a constant time duration. Access Units have a constant constant time duration. Access Units have a constant time duration,
time duration, if: if:
TS(i+1) TS(i) = constant, for any i, where TS(i+1) - TS(i) = constant, for any i, where
i indicates the index of the AU in original order i indicates the index of the AU in original order
TS(i) denotes the time stamp of AU(i) TS(i) denotes the time stamp of AU(i)
If Access Units have a constant time duration then a receiver can The MIME parameter "constantDuration" SHOULD be used to signal that
unambiguously reconstruct the original order based on the RTP Access Units have a constant time duration, see section 4.1.
time stamp, the AU-Index and the AU-Index-delta. Note that for this
purpose the AU-Index is redundant, as the RTP time stamp and the
AU-Index-delta values are sufficient for placing the AUs correctly
in time. The RTP time stamp usually provides better robustness to
large bursts of packet losses, and is therefore to be preferred.
In order to unambiguously determine the index of each AU in the
most convenient way when the AUs have a constant time duration, the
value of the time duration SHOULD be signaled by the MIME format
parameter "constantDuration", see section 4.1.
If the "constantDuration" parameter is present, then the transmitter If the "constantDuration" parameter is present, the receiver can
MUST encode the AU-Index, if present, with the value 0 and the reconstruct the original Access Unit timing based solely on the RTP
receiver MUST use the RTP time stamp to determine the index of the timestamp and AU-Index-delta. Accordingly, when transmitting Access
first AU in the RTP packet. Units of constant duration, the AU-Index, if present, MUST be set
to the value 0. Receivers of constant duration Access Units MUST
use the RTP timestamp to determine the index of the first AU in the
RTP packet. The AU-Index-delta header and the signaled
"constantDuration" are used to reconstruct AU timing.
If the "constantDuration" parameter is not present, then Access If the "constantDuration" parameter is not present, then Access
Units are assumed to have a variable duration. In this case, the Units are assumed to have a variable duration, unless the AU-Index
AU-Index is not redundant, and MUST provide the index information is present and coded with the value 0 in each RTP packet. When
transmitting Access Units of variable duration, then the
"constantDuration" parameter MUST NOT be present, and the
transmitter MUST use the AU-Index to encode the index information
required for re-ordering, and the receiver MUST use that value to required for re-ordering, and the receiver MUST use that value to
determine the index of the first AU in the RTP packet. The number determine the index of each AU in the RTP packet. The number of
of bits of the AU-Index field MUST be chosen so that valid index bits of the AU-Index field MUST be chosen so that valid index
information is provided at the applied interleaving scheme, without information is provided at the applied interleaving scheme, without
causing problems due to roll-over of the AU-Index field. For causing problems due to roll-over of the AU-Index field. In
variable duration AUs, index information is needed to reconstruct addition, the CTS-delta MUST be coded in the AU header for each
the original order and to identify missing AUs, but to place the non-first AU in the RTP packet, so that receivers can place the AUs
AUs correctly in time, for each AU the time stamp is needed. correctly in time.
Therefore, if the "constantDuration" parameter is not present, then
the CTS-delta MUST be coded in the AU header for each non-first AU
in the RTP packet.
When interleaving is applied, a de-interleave buffer is needed in When interleaving is applied, a de-interleave buffer is needed in
receivers to put the Access Units in their correct logical receivers to put the Access Units in their correct logical
consecutive decoding order. This requires the computation of the consecutive decoding order. This requires the computation of the
time stamp for each Access Unit. In case of a constant time duration time stamp for each Access Unit. In case of a constant time duration
per Access Unit, the time stamp of the i-th access unit in an RTP per Access Unit, the time stamp of the i-th access unit in an RTP
packet with RTP time stamp T is calculated as follows: packet with RTP time stamp T is calculated as follows:
Timestamp[0] = T Timestamp[0] = T
Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k]
skipping to change at page 19, line 44 skipping to change at page 19, line 44
An estimate of the size of the de-interleave buffer is found by An estimate of the size of the de-interleave buffer is found by
multiplying the maximum displacement by the maximum bit rate: multiplying the maximum displacement by the maximum bit rate:
size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP
clock frequency), clock frequency),
where Rate(max) is the maximum bit-rate of the transported stream. where Rate(max) is the maximum bit-rate of the transported stream.
Note that receivers can derive Rate(max) from the MIME format Note that receivers can derive Rate(max) from the MIME format
parameters StreamType, Profile-level-id, and config. parameters streamType, profile-level-id, and config.
However, this calculation estimates the size of the de-interleave However, this calculation estimates the size of the de-interleave
buffer and the really required size may differ from the calculated buffer and the really required size may differ from the calculated
value. If this calculation under-estimates the size of the value. If this calculation under-estimates the size of the
de-interleave buffer, then senders, when interleaving, MUST signal de-interleave buffer, then senders, when interleaving, MUST signal
a size of the de-interleave buffer via the MIME format parameter a size of the de-interleave buffer via the MIME format parameter
"de-interleaveBufferSize"; see section 4.1. If the calculation "de-interleaveBufferSize"; see section 4.1. If the calculation
over-estimates the size of the de-interleave buffer, then senders, over-estimates the size of the de-interleave buffer, then senders,
when interleaving, MAY signal a size of the de-interleave buffer when interleaving, MAY signal a size of the de-interleave buffer
via the MIME format parameter "de-interleaveBufferSize". via the MIME format parameter "de-interleaveBufferSize".
skipping to change at page 20, line 31 skipping to change at page 20, line 31
If the "de-interleaveBufferSize" parameter is present, then the If the "de-interleaveBufferSize" parameter is present, then the
applied buffer for de-interleaving in a receiver MUST have a size applied buffer for de-interleaving in a receiver MUST have a size
that is at least equal to the signaled size of the de-interleave that is at least equal to the signaled size of the de-interleave
buffer, else a size that is at least equal to the calculated size buffer, else a size that is at least equal to the calculated size
of the de-interleave buffer. of the de-interleave buffer.
No matter what interleaving scheme is used, the scheme must be No matter what interleaving scheme is used, the scheme must be
analyzed to calculate the applicable maxDisplacement value, as well analyzed to calculate the applicable maxDisplacement value, as well
as the required size of the de-interleave buffer. Senders SHOULD as the required size of the de-interleave buffer. Senders SHOULD
signal values that are not larger than the strictly required signal values that are not larger than the strictly required
values; if larger values are signalled, the receiver will buffer values; if larger values are signaled, the receiver will buffer
excessively. excessively.
Note that for low bit-rate material, the applied interleaving Note that for low bit-rate material, the applied interleaving
may make packets shorter than the MTU size. may make packets shorter than the MTU size.
3.2.3.4. Crucial and non-crucial AUs with MPEG-4 System data 3.2.3.4 Crucial and non-crucial AUs with MPEG-4 System data
Some Access Units with MPEG-4 system data, called "crucial" AUs, Some Access Units with MPEG-4 system data, called "crucial" AUs,
carry information whose loss cannot be tolerated, either in the carry information whose loss cannot be tolerated, either in the
presentation or in the decoder. At each crucial AU in an MPEG-4 presentation or in the decoder. At each crucial AU in an MPEG-4
system stream, the stream state changes. The stream-state MAY system stream, the stream state changes. The stream-state MAY
remain constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4 remain constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4
system streams use the AU_SequenceNumber to signal stream states. system streams use the AU_SequenceNumber to signal stream states.
Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set
position of node X", AU3 = "Set position of node X". AU1 is crucial, position of node X", AU3 = "Set position of node X". AU1 is crucial,
skipping to change at page 22, line 12 skipping to change at page 22, line 12
c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless
the stream is corrupted, in which case the AU MUST be ignored. the stream is corrupted, in which case the AU MUST be ignored.
3.3 Usage of this specification 3.3 Usage of this specification
3.3.1 General 3.3.1 General
Usage of this specification requires definition of a mode. A mode Usage of this specification requires definition of a mode. A mode
defines how to use this specification, as deemed appropriate. defines how to use this specification, as deemed appropriate.
Senders MUST signal the applied mode via the MIME format parameter Senders MUST signal the applied mode via the MIME format parameter
"Mode", as specified in section 4.1. This specification defines a "mode", as specified in section 4.1. This specification defines a
generic mode that can be used for any MPEG-4 stream, as well as generic mode that can be used for any MPEG-4 stream, as well as
specific modes for transport of MPEG-4 CELP and MPEG-4 AAC streams, specific modes for transport of MPEG-4 CELP and MPEG-4 AAC streams,
defined in ISO/IEC 14496-3. defined in ISO/IEC 14496-3.
When use of this payload format is signaled using SDP [5], an When use of this payload format is signaled using SDP [5], an
"rtpmap" attribute is part of that signaling. The same requirements "rtpmap" attribute is part of that signaling. The same requirements
apply for the rtpmap attribute in any mode compliant to this apply for the rtpmap attribute in any mode compliant to this
specification. The general form of an rtpmap attribute is: specification. The general form of an rtpmap attribute is:
a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding
parameters>] parameters>]
skipping to change at page 22, line 35 skipping to change at page 22, line 35
mono. Provided no additional parameters are needed, this parameter mono. Provided no additional parameters are needed, this parameter
may be omitted for mono material, hence its default value is 1. may be omitted for mono material, hence its default value is 1.
3.3.2 The generic mode 3.3.2 The generic mode
The generic mode can be used for any MPEG-4 stream. In this mode The generic mode can be used for any MPEG-4 stream. In this mode
no mode-specific constraints are applied; hence, in the generic no mode-specific constraints are applied; hence, in the generic
mode the full flexibility of this specification can be exploited. mode the full flexibility of this specification can be exploited.
The generic mode is signaled by mode=generic. The generic mode is signaled by mode=generic.
An example is given below for transport of a BIFS stream. In this An example is given below for transport of a BIFS-Anim stream. In
example carriage of multiple BIFS Access Units is allowed in one this example carriage of multiple BIFS-Anim Access Units is allowed
RTP packet. The AU-header contains the AU-size field, the CTS-flag in one RTP packet. The AU-header contains the AU-size field, the
and, if the CTS flag is set to 1, the CTS-delta field. The number CTS-flag and, if the CTS flag is set to 1, the CTS-delta field. The
of bits of the AU-size and the CTS-delta fields is 10 and 16, number of bits of the AU-size and the CTS-delta fields is 10 and
respectively. The AU-header also contains the RAP-flag and the 16, respectively. The AU-header also contains the RAP-flag and the
Stream-state of 4 bits. This results in an AU-header with a Stream-state of 4 bits. This results in an AU-header with a
total size of two or four octets per BIFS AU. The RTP time stamp total size of two or four octets per BIFS-Anim AU. The RTP time
uses a 1 kHz clock. Note that the media type name is video, stamp uses a 1 kHz clock. Note that the media type name is video,
because the BIFS stream is part of an audio-visual presentation. For because the BIFS-Anim stream is part of an audio-visual
conventions on media type names see section 4.1. presentation. For conventions on media type names see section 4.1.
In detail: In detail:
m=video 49230 RTP/AVP 96 m=video 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/1000 a=rtpmap:96 mpeg4-generic/1000
a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic; a=fmtp:96 streamtype=3; profile-level-id=1807; mode=generic;
ObjectType=2; config=BIFSConfiguration(); SizeLength=10; objectType=2; config=0842237F24001FB400094002C0; sizeLength=10;
CTSDeltaLength=16; RandomAccessIndication=1; CTSDeltaLength=16; randomAccessIndication=1;
StreamStateIndication=4 streamStateIndication=4
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
BIFSConfiguration() is the hexadecimal string as defined in ISO/IEC
14496-1; for the description of MIME parameters see section 4.1. The hexadecimal value of the "config" parameter is the
BIFSConfiguration() as defined in ISO/IEC 14496-1. The
BIFSConfiguration() specifies that the BIFS stream is a BIFS-Anim
stream. For the description of MIME parameters see section 4.1.
3.3.3 Constant bit-rate CELP 3.3.3 Constant bit-rate CELP
This mode is signaled by mode=CELP-cbr. In this mode one or more This mode is signaled by mode=CELP-cbr. In this mode one or more
complete CELP frames of fixed size can be transported in one RTP complete CELP frames of fixed size can be transported in one RTP
packet; there is no support for interleaving. The RTP payload packet; interleaving MUST NOT be used with this mode. The RTP
consists of one or more concatenated CELP frames, each of the same payload consists of one or more concatenated CELP frames, each of
size. CELP frames MUST not be fragmented when using this mode. Both the same size. CELP frames MUST NOT be fragmented when using this
the AU Header Section and the Auxiliary Section MUST be empty. mode. Both the AU Header Section and the Auxiliary Section MUST be
empty.
The MIME format parameter ConstantSize MUST be provided to specify The MIME format parameter constantSize MUST be provided to specify
the length of each CELP frame. the length of each CELP frame.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/16000/1
a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config= a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-cbr; config=
AudioSpecificConfig(); ConstantSize=xxx; 440E00; constantSize=27; constantDuration=240
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
AudioSpecificConfig() is the hexadecimal string as defined in The hexadecimal value of the "config" parameter is the
ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio AudioSpecificConfig()as defined in ISO/IEC 14496-3.
stream type is CELP. For the description of MIME parameters see AudioSpecificConfig() specifies a mono CELP stream with a sampling
section 4.1. rate of 16 kHz at a fixed bitrate of 14.4 kb/s and 6 sub-frames per
CELP frame. For the description of MIME parameters see section 4.1.
3.3.4 Variable bit-rate CELP 3.3.4 Variable bit-rate CELP
This mode is signaled by mode=CELP-vbr. With this mode one or more This mode is signaled by mode=CELP-vbr. With this mode one or more
complete CELP frames of variable size can be transported in one RTP complete CELP frames of variable size can be transported in one RTP
packet with optional interleaving. As CELP frames are very small, packet with OPTIONAL interleaving. As CELP frames are very small,
while the largest possible AU-size in this mode is greater than the while the largest possible AU-size in this mode is greater than the
maximum CELP frame size, there is no support for fragmentation of maximum CELP frame size, there is no support for fragmentation of
CELP frames. Hence CELP frames MUST not be fragmented when using CELP frames. Hence CELP frames MUST NOT be fragmented when using
this mode. this mode.
In this mode the RTP payload consists of the AU Header Section, In this mode the RTP payload consists of the AU Header Section,
followed by one or more concatenated CELP frames. The Auxiliary followed by one or more concatenated CELP frames. The Auxiliary
Section MUST be empty. For each CELP frame contained in the payload Section MUST be empty. For each CELP frame contained in the payload
there MUST be a one octet AU-header in the AU Header Section to there MUST be a one octet AU-header in the AU Header Section to
provide: provide:
(a) the size of each CELP frame in the payload and (a) the size of each CELP frame in the payload and
(b) index information for computing the sequence (and hence timing) (b) index information for computing the sequence (and hence timing)
of each CELP frame. of each CELP frame.
Transport of CELP frames requires that the AU-size field is coded Transport of CELP frames requires that the AU-size field is coded
with 6 bits. In this mode therefore 6 bits are allocated to the with 6 bits. In this mode therefore 6 bits are allocated to the
AU-size field, and 2 bits to the AU-Index(-delta) field. Each AU-size field, and 2 bits to the AU-Index(-delta) field. Each
AU-Index field MUST be coded with the value 0. In the AU Header AU-Index field MUST be coded with the value 0. In the AU Header
Section, the concatenated AU-headers are preceded by the 16-bit Section, the concatenated AU-headers are preceded by the 16-bit
AU-headers-length field, as specified in section 3.2.1. AU-headers-length field, as specified in section 3.2.1.
In addition to the required MIME format parameters, the following In addition to the required MIME format parameters, the following
parameters MUST be present: SizeLength, IndexLength, and parameters MUST be present: sizeLength, indexLength, and
IndexDeltaLength. CELP frames have fixed time duration per Access indexDeltaLength. CELP frames have fixed time duration per Access
Unit; when interleaving in this mode, the applicable duration MUST Unit; when interleaving in this mode, the applicable duration MUST
be signaled by the MIME format parameter constantDuration. In be signaled by the MIME format parameter constantDuration. In
addition, the parameter maxDisplacement MUST be present when addition, the parameter maxDisplacement MUST be present when
interleaving. interleaving.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/8000/1 a=rtpmap:96 mpeg4-generic/16000/1
a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config= a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-vbr; config=
AudioSpecificConfig(); SizeLength=6; IndexLength=2; 440F20; sizeLength=6; indexLength=2; indexDeltaLength=2;
IndexDeltaLength=2; constantDuration=xxx; maxDisplacement=yyy constantDuration=160; maxDisplacement=5
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
AudioSpecificConfig() is the hexadecimal string as defined in The hexadecimal value of the "config" parameter is the
ISO/IEC 14496-3, AudioSpecificConfig() specifies that the audio AudioSpecificConfig()as defined in ISO/IEC 14496-3.
stream type is CELP. For the description of MIME parameters see AudioSpecificConfig() specifies a mono CELP stream with a sampling
section 4.1. rate of 16 kHz at a bitrate that varies between 13.9 and 16.2 kb/s
and with 4 sub-frames per CELP frame. For the description of MIME
parameters see section 4.1.
3.3.5 Low bit-rate AAC 3.3.5 Low bit-rate AAC
This mode is signaled by mode=AAC-lbr. This mode supports transport This mode is signaled by mode=AAC-lbr. This mode supports transport
of one or more complete AAC frames of variable size. In this mode of one or more complete AAC frames of variable size. In this mode
the AAC frames are allowed to be interleaved and hence receivers the AAC frames are allowed to be interleaved and hence receivers
MUST support de-interleaving. The maximum size of an AAC frame in MUST support de-interleaving. The maximum size of an AAC frame in
this mode is 63 octets. AAC frames MUST not be fragmented when this mode is 63 octets. AAC frames MUST NOT be fragmented when
using this mode. using this mode. Hence, when using this mode, encoders MUST ensure
that the size of each AAC frame is at most 63 octets.
The payload configuration in this mode is the same as in the The payload configuration in this mode is the same as in the
variable bit-rate CELP mode as defined in 3.3.4. The RTP payload variable bit-rate CELP mode as defined in 3.3.4. The RTP payload
consists of the AU Header Section, followed by concatenated AAC consists of the AU Header Section, followed by concatenated AAC
frames. The Auxiliary Section MUST be empty. For each AAC frame frames. The Auxiliary Section MUST be empty. For each AAC frame
contained in the payload the one octet AU-header MUST provide: contained in the payload the one octet AU-header MUST provide:
(a) the size of each AAC frame in the payload and (a) the size of each AAC frame in the payload and
(b) index information for computing the sequence (and hence timing) (b) index information for computing the sequence (and hence timing)
of each AAC frame. of each AAC frame.
In the AU-header, the AU-size MUST be coded with 6 bits and the
AU-Index(-delta) with 2 bits; the AU-Index field MUST have the
value 0 in each AU-header.
In the AU-header Section, the concatenated AU-headers MUST be In the AU-header Section, the concatenated AU-headers MUST be
preceded by the 16-bit AU-headers-length field, as specified in preceded by the 16-bit AU-headers-length field, as specified in
section 3.2.1. section 3.2.1.
In addition to the required MIME format parameters, the following In addition to the required MIME format parameters, the following
parameters MUST be present: SizeLength, IndexLength, and parameters MUST be present: sizeLength, indexLength, and
IndexDeltaLength. AAC frames have fixed time duration per Access indexDeltaLength. AAC frames have fixed time duration per Access
Unit; when interleaving in this mode, the applicable duration MUST Unit; when interleaving in this mode, the applicable duration MUST
be signaled by the MIME format parameter constantDuration. In be signaled by the MIME format parameter constantDuration. In
addition, the parameter maxDisplacement MUST be present when addition, the parameter maxDisplacement MUST be present when
interleaving. interleaving.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/22050/1
a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config= a=fmtp:96 streamtype=5; profile-level-id=14; mode=AAC-lbr; config=
AudioSpecificConfig(); SizeLength=6; IndexLength=2; 1388; sizeLength=6; indexLength=2; indexDeltaLength=2;
IndexDeltaLength=2; constantDuration=xxx; maxDisplacement=yyy constantDuration=1024; maxDisplacement=5
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC The hexadecimal value of the "config" parameter is the
14496-3. AudioSpecificConfig() specifies that the audio AudioSpecificConfig() as defined in ISO/IEC 14496-3.
stream type is AAC. For the description of MIME parameters see AudioSpecificConfig() specifies a mono AAC stream with a sampling
rate of 22.05 kHz. For the description of MIME parameters see
section 4.1. section 4.1.
3.3.6 High bit-rate AAC 3.3.6 High bit-rate AAC
This mode is signaled by mode=AAC-hbr. This mode supports transport This mode is signaled by mode=AAC-hbr. This mode supports transport
of variable size AAC frames. In one RTP packet either one or more of variable size AAC frames. In one RTP packet either one or more
complete AAC frames are carried, or a single fragment of an AAC complete AAC frames are carried, or a single fragment of an AAC
frame. In this mode the AAC frames are allowed to be interleaved frame. In this mode the AAC frames are allowed to be interleaved
and hence receivers MUST support de-interleaving. The maximum size and hence receivers MUST support de-interleaving. The maximum size
of an AAC frame in this mode is 8191 octets. of an AAC frame in this mode is 8191 octets.
skipping to change at page 26, line 6 skipping to change at page 26, line 8
To code the maximum size of an AAC frame requires 13 bits. Therefore To code the maximum size of an AAC frame requires 13 bits. Therefore
in this configuration 13 bits are allocated to the AU-size, and in this configuration 13 bits are allocated to the AU-size, and
3 bits to the AU-Index(-delta) field. Thus each AU-header has a size 3 bits to the AU-Index(-delta) field. Thus each AU-header has a size
of 2 octets. Each AU-Index field MUST be coded with the value 0. In of 2 octets. Each AU-Index field MUST be coded with the value 0. In
the AU Header Section, the concatenated AU-headers MUST be preceded the AU Header Section, the concatenated AU-headers MUST be preceded
by the 16-bit AU-headers-length field, as specified in by the 16-bit AU-headers-length field, as specified in
section 3.2.1. section 3.2.1.
In addition to the required MIME format parameters, the following In addition to the required MIME format parameters, the following
parameters MUST be present: SizeLength, IndexLength, and parameters MUST be present: sizeLength, indexLength, and
IndexDeltaLength. AAC frames have fixed time duration per Access indexDeltaLength. AAC frames have fixed time duration per Access
Unit; when interleaving in this mode, the applicable duration MUST Unit; when interleaving in this mode, the applicable duration MUST
be signaled by the MIME format parameter constantDuration. In be signaled by the MIME format parameter constantDuration. In
addition, the parameter maxDisplacement MUST be present when addition, the parameter maxDisplacement MUST be present when
interleaving. interleaving.
For example: For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/48000/6
a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr; a=fmtp:96 streamtype=5; profile-level-id=16; mode=AAC-hbr;
config=AudioSpecificConfig(); SizeLength=13; IndexLength=3; config=11B0; sizeLength=13; indexLength=3;
IndexDeltaLength=3; constantDuration=xxx; maxDisplacement=yyy indexDeltaLength=3; constantDuration=1024
Note: The a=fmtp line has been wrapped to fit the page, it comprises Note: The a=fmtp line has been wrapped to fit the page, it comprises
a single line in the SDP file. a single line in the SDP file.
AudioSpecificConfig() is the hexadecimal string as defined in The hexadecimal value of the "config" parameter is the
ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio AudioSpecificConfig() as defined in ISO/IEC 14496-3.
stream type is AAC. For the description of MIME parameters see AudioSpecificConfig() specifies a 5.1 channel AAC stream with a
sampling rate of 48 kHz. For the description of MIME parameters see
section 4.1. section 4.1.
3.3.7 Additional modes 3.3.7 Additional modes
This specification only defines the modes specified in sections This specification only defines the modes specified in sections
3.3.2 up to 3.3.6. Additional modes are expected to be defined in 3.3.2 up to 3.3.6. Additional modes are expected to be defined in
future RFCs. Each additional mode MUST be in full compliance with future RFCs. Each additional mode MUST be in full compliance with
this specification. this specification.
Any new mode MUST be defined such that an implementation including Any new mode MUST be defined such that an implementation including
skipping to change at page 27, line 53 skipping to change at page 27, line 53
auxiliary section in each RTP packet. auxiliary section in each RTP packet.
MIME subtype name: mpeg4-generic MIME subtype name: mpeg4-generic
Required parameters: Required parameters:
MIME format parameters are not case dependent; however for clarity MIME format parameters are not case dependent; however for clarity
both upper and lower case are used in the names of the parameters both upper and lower case are used in the names of the parameters
described in this specification. described in this specification.
StreamType: streamType:
The integer value that indicates the type of MPEG-4 stream that The integer value that indicates the type of MPEG-4 stream that
is carried; its coding corresponds to the values of the is carried; its coding corresponds to the values of the
streamType as defined in Table 9 (streamType Values) in ISO/IEC streamType as defined in Table 9 (streamType Values) in ISO/IEC
14496-1. 14496-1.
Profile-level-id: profile-level-id:
A decimal representation of the MPEG-4 Profile Level indication. A decimal representation of the MPEG-4 Profile Level indication.
This parameter MUST be used in the capability exchange or This parameter MUST be used in the capability exchange or
session set-up procedure to indicate the MPEG-4 Profile and Level session set-up procedure to indicate the MPEG-4 Profile and Level
combination of which the relevant MPEG-4 media codec is capable combination of which the relevant MPEG-4 media codec is capable
of. of.
For MPEG-4 Audio streams, this parameter is the decimal value For MPEG-4 Audio streams, this parameter is the decimal value
from Table 5 (audioProfileLevelIndication Values) in ISO/IEC from Table 5 (audioProfileLevelIndication Values) in ISO/IEC
14496-1, indicating which MPEG-4 Audio tool subsets are 14496-1, indicating which MPEG-4 Audio tool subsets are
required to decode the audio stream. required to decode the audio stream.
For MPEG-4 Visual streams, this parameter is the decimal value For MPEG-4 Visual streams, this parameter is the decimal value
skipping to change at page 28, line 39 skipping to change at page 28, line 39
(ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the
profile and level of the OD stream. profile and level of the OD stream.
For IPMP streams, this parameter has either the decimal value 0, For IPMP streams, this parameter has either the decimal value 0,
indicating an unspecified profile and level, or a value larger indicating an unspecified profile and level, or a value larger
than zero, indicating an MPEG-4 IPMP profile and level as than zero, indicating an MPEG-4 IPMP profile and level as
defined in a future MPEG-4 specification. defined in a future MPEG-4 specification.
For Clock Reference streams and Object Content Info streams, this For Clock Reference streams and Object Content Info streams, this
parameter has the decimal value zero, indicating that profile parameter has the decimal value zero, indicating that profile
and level information is conveyed through the OD framework. and level information is conveyed through the OD framework.
Config: config:
A hexadecimal representation of an octet string that expresses A hexadecimal representation of an octet string that expresses
the media payload configuration. Configuration data is mapped the media payload configuration. Configuration data is mapped
onto the hexadecimal octet string in an MSB-first basis. The onto the hexadecimal octet string in an MSB-first basis. The
first bit of the configuration data SHALL be located at the MSB first bit of the configuration data SHALL be located at the MSB
of the first octet. In the last octet, if necessary to achieve of the first octet. In the last octet, if necessary to achieve
octet-alignment, up to 7 zero-valued padding bits shall follow octet-alignment, up to 7 zero-valued padding bits shall follow
the configuration data. the configuration data.
For MPEG-4 Audio streams, config is the audio object type For MPEG-4 Audio streams, config is the audio object type
specific decoder configuration data AudioSpecificConfig() as specific decoder configuration data AudioSpecificConfig() as
defined in ISO/IEC 14496-3. For Structured Audio, the defined in ISO/IEC 14496-3. For Structured Audio, the
skipping to change at page 29, line 17 skipping to change at page 29, line 17
codes of ISO/IEC 14496-2. The configuration information codes of ISO/IEC 14496-2. The configuration information
indicated by this parameter SHALL be the same as the indicated by this parameter SHALL be the same as the
configuration information in the corresponding MPEG-4 Visual configuration information in the corresponding MPEG-4 Visual
stream, except for first-half-vbv-occupancy and stream, except for first-half-vbv-occupancy and
latter-half-vbv-occupancy, if it exists, which may vary in latter-half-vbv-occupancy, if it exists, which may vary in
the repeated configuration information inside an MPEG-4 the repeated configuration information inside an MPEG-4
Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2). Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2).
For BIFS streams, this is the BIFSConfig() information as defined For BIFS streams, this is the BIFSConfig() information as defined
in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in
section 9.3.5.2, and for version 2 in section 9.3.5.3. The section 9.3.5.2, and for version 2 in section 9.3.5.3. The
MIME format parameter ObjectType signals the version of MIME format parameter objectType signals the version of
BIFSConfig. BIFSConfig.
For IPMP streams, this is either a quoted empty hexadecimal octet For IPMP streams, this is either a quoted empty hexadecimal octet
string, indicating the absence of any decoder configuration string, indicating the absence of any decoder configuration
information (config=""), or the IPMPConfiguration() as information (config=""), or the IPMPConfiguration() as
defined in a future MPEG-4 IPMP specification. defined in a future MPEG-4 IPMP specification.
For Object Content Info (OCI) streams, this is the For Object Content Info (OCI) streams, this is the
OCIDecoderConfiguration() information of the OCI stream, as OCIDecoderConfiguration() information of the OCI stream, as
defined in section 8.4.2.4 in ISO/IEC 14496-1. defined in section 8.4.2.4 in ISO/IEC 14496-1.
For OD streams, Clock Reference streams and MPEG-J streams, this For OD streams, Clock Reference streams and MPEG-J streams, this
is a quoted empty hexadecimal octet string (config=""), as is a quoted empty hexadecimal octet string (config=""), as
no information on the decoder configuration is required. no information on the decoder configuration is required.
Mode: mode:
The mode in which this specification is used. The following modes The mode in which this specification is used. The following modes
can be signaled: can be signaled:
mode=generic, mode=generic,
mode=CELP-cbr, mode=CELP-cbr,
mode=CELP-vbr, mode=CELP-vbr,
mode=AAC-lbr and mode=AAC-lbr and
mode=AAC-hbr. mode=AAC-hbr.
Other modes are expected to be defined in future RFCs. See also Other modes are expected to be defined in future RFCs. See also
section 3.3.7 and 4.2 of RFC xxxx. section 3.3.7 and 4.2 of RFC xxxx.
Optional general parameters: Optional general parameters:
ObjectType: objectType:
The decimal value from Table 8 in ISO/IEC 14496-1, indicating The decimal value from Table 8 in ISO/IEC 14496-1, indicating
the value of the objectTypeIndication of the transported stream. the value of the objectTypeIndication of the transported stream.
For BIFS streams this parameter MUST be present to signal the For BIFS streams this parameter MUST be present to signal the
version of BIFSConfiguration(). Note that ObjectTypeIndication version of BIFSConfiguration(). Note that objectTypeIndication
may signal a non-MPEG-4 stream and that the RTP payload format may signal a non-MPEG-4 stream and that the RTP payload format
defined in this document may not be suitable to carry a stream defined in this document may not be suitable to carry a stream
that is not defined by MPEG-4. ObjectType SHOULD NOT be set to that is not defined by MPEG-4. The objectType parameter SHOULD
a value that signals a stream that cannot be carried by this NOT be set to a value that signals a stream that cannot be
payload format. carried by this payload format.
ConstantSize: constantSize:
The constant size in octets of each Access Unit for this stream. The constant size in octets of each Access Unit for this stream.
The ConstantSize and the SizeLength parameters MUST NOT be The constantSize and the sizeLength parameters MUST NOT be
simultaneously present. simultaneously present.
ConstantDuration: constantDuration:
The constant duration of each Access Unit for this stream, The constant duration of each Access Unit for this stream,
measured with the same units as the RTP time stamp. measured with the same units as the RTP time stamp.
maxDisplacement: maxDisplacement:
The decimal representation of the maximum displacement in time The decimal representation of the maximum displacement in time
of an interleaved AU, as defined in section 3.2.3.3, expressed of an interleaved AU, as defined in section 3.2.3.3, expressed
in units of the RTP time stamp clock. in units of the RTP time stamp clock.
This parameter MUST be present when interleaving is applied. This parameter MUST be present when interleaving is applied.
de-interleaveBufferSize: de-interleaveBufferSize:
skipping to change at page 30, line 27 skipping to change at page 30, line 27
the de-interleave buffer, described in section 3.2.3.3. the de-interleave buffer, described in section 3.2.3.3.
When interleaving, this parameter MUST be present if the When interleaving, this parameter MUST be present if the
calculation of the de-interleave buffer size given in 3.2.3.3 calculation of the de-interleave buffer size given in 3.2.3.3
and based on maxDisplacement and rate(max) under-estimates the and based on maxDisplacement and rate(max) under-estimates the
size of the de-interleave buffer. If this calculation does not size of the de-interleave buffer. If this calculation does not
under-estimate the size of the de-interleave buffer, then the under-estimate the size of the de-interleave buffer, then the
de-interleaveBufferSize parameter SHOULD NOT be present. de-interleaveBufferSize parameter SHOULD NOT be present.
Optional configuration parameters: Optional configuration parameters:
SizeLength: sizeLength:
The number of bits on which the AU-size field is encoded in the The number of bits on which the AU-size field is encoded in the
AU-header. The SizeLength and the ConstantSize parameters MUST AU-header. The sizeLength and the constantSize parameters MUST
NOT be simultaneously present. NOT be simultaneously present.
IndexLength: indexLength:
The number of bits on which the AU-Index is encoded in the first The number of bits on which the AU-Index is encoded in the first
AU-header. The default value of zero indicates the absence of AU-header. The default value of zero indicates the absence of
the AU-Index field in each first AU-header. the AU-Index field in each first AU-header.
IndexDeltaLength: indexDeltaLength:
The number of bits on which the AU-Index-delta field is encoded The number of bits on which the AU-Index-delta field is encoded
in any non-first AU-header. The default value of zero indicates in any non-first AU-header. The default value of zero indicates
the absence of the AU-Index-delta field in each non-first the absence of the AU-Index-delta field in each non-first
AU-header. AU-header.
CTSDeltaLength: CTSDeltaLength:
The number of bits on which the CTS-delta field is encoded in The number of bits on which the CTS-delta field is encoded in
the AU-header. the AU-header.
DTSDeltaLength: DTSDeltaLength:
The number of bits on which the DTS-delta field is encoded in The number of bits on which the DTS-delta field is encoded in
the AU-header. the AU-header.
RandomAccessIndication: randomAccessIndication:
A decimal value of zero or one, indicating whether the RAP-flag A decimal value of zero or one, indicating whether the RAP-flag
is present in the AU-header. The decimal value of one indicates is present in the AU-header. The decimal value of one indicates
presence of the RAP-flag, the default value zero its absence. presence of the RAP-flag, the default value zero its absence.
StreamStateIndication: streamStateIndication:
The number of bits on which the Stream-state field is encoded in The number of bits on which the Stream-state field is encoded in
the AU-header. This parameter MAY be present when transporting the AU-header. This parameter MAY be present when transporting
MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio
and MPEG-4 video streams. and MPEG-4 video streams.
AuxiliaryDataSizeLength: auxiliaryDataSizeLength:
The number of bits that is used to encode the auxiliary-data-size The number of bits that is used to encode the auxiliary-data-size
field. field.
Applications MAY use more parameters, in addition to those defined Applications MAY use more parameters, in addition to those defined
above. Each additional parameter MUST be registered with IANA, to above. Each additional parameter MUST be registered with IANA, to
ensure that there is no clash of names. Each additional parameter ensure that there is no clash of names. Each additional parameter
MUST be accompanied by a specification in the form of an RFC, MPEG MUST be accompanied by a specification in the form of an RFC, MPEG
standard, or other permanent and readily available reference (the standard, or other permanent and readily available reference (the
"Specification Required" policy defined in RFC 2434 [6]). Receivers "Specification Required" policy defined in RFC 2434 [6]). Receivers
MUST tolerate the presence of such additional parameters, but these MUST tolerate the presence of such additional parameters, but these
skipping to change at page 32, line 35 skipping to change at page 32, line 35
Authors of RFC xxxx, IETF Audio/Video Transport working group. Authors of RFC xxxx, IETF Audio/Video Transport working group.
Intended usage: COMMON Intended usage: COMMON
Author/Change controller: Author/Change controller:
Authors of RFC xxxx, IETF Audio/Video Transport working group. Authors of RFC xxxx, IETF Audio/Video Transport working group.
4.2 Registration of mode definitions with IANA 4.2 Registration of mode definitions with IANA
This specification can be used in a number of modes. The mode of This specification can be used in a number of modes. The mode of
operation is signaled using the "Mode" MIME parameter, with the operation is signaled using the "mode" MIME parameter, with the
initial set of values specified in section 4.1. New modes may be initial set of values specified in section 4.1. New modes may be
defined at any time, as described in section 3.3.7. These modes defined at any time, as described in section 3.3.7. These modes
MUST be registered with IANA, to ensure that there is no clash MUST be registered with IANA, to ensure that there is no clash
of names. of names.
A new mode registration MUST be accompanied by a specification in A new mode registration MUST be accompanied by a specification in
the form of an RFC, MPEG standard, or other permanent and readily the form of an RFC, MPEG standard, or other permanent and readily
available reference (the "Specification Required" policy defined available reference (the "Specification Required" policy defined
in RFC 2434 [6]). in RFC 2434 [6]).
skipping to change at page 35, line 8 skipping to change at page 35, line 8
Considerations Section in RFCs", RFC 2434, October 1998. Considerations Section in RFCs", RFC 2434, October 1998.
7.2 Informative references 7.2 Informative references
[7] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload [7] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload
format for MPEG1/MPEG2 Video", RFC 2250, January 1998. format for MPEG1/MPEG2 Video", RFC 2250, January 1998.
[8] H. Schulzrinne, A. Rao, R. Lanphier, "RTSP: Real-Time Session [8] H. Schulzrinne, A. Rao, R. Lanphier, "RTSP: Real-Time Session
Protocol", RFC 2326, Internet Engineering Task Force, April 1998. Protocol", RFC 2326, Internet Engineering Task Force, April 1998.
[9] C. Perkins, O. Hudson, "Options for Repair of Streaming Media" [9] C. Perkins, O. Hodson, "Options for Repair of Streaming Media"
RFC 2354, Internet Engineering Task Force, June 1998. RFC 2354, Internet Engineering Task Force, June 1998.
[10] H. Schulzrinne, J. Rosenberg, "An RTP Payload Format for [10] H. Schulzrinne, J. Rosenberg, "An RTP Payload Format for
Generic Forward Error Correction", RFC 2733, Internet Engineering Generic Forward Error Correction", RFC 2733, Internet Engineering
Task Force, December 1999. Task Force, December 1999.
[11] M. Handley, C. Perkins, E. Whelan, "SAP: Session Announcement [11] M. Handley, C. Perkins, E. Whelan, "SAP: Session Announcement
Protocol", RFC 2974, Internet Engineering Task Force, October 2000. Protocol", RFC 2974, Internet Engineering Task Force, October 2000.
[12] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP [12] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP
payload format for MPEG-4 Audio/Visual streams", RFC 3016, Internet payload format for MPEG-4 Audio/Visual streams", RFC 3016, Internet
Engineering Task Force, November 2000. Engineering Task Force, November 2000.
8. Author Addresses 8. Author Addresses
Jan van der Meer Jan van der Meer
Philips Digital Networks Philips Electronics, MP4Net
Cederlaan 4 Prof Holstlaan 4
5600 JB Eindhoven Building WDB-1
5600 JZ Eindhoven
Netherlands Netherlands
Email : jan.vandermeer@philips.com Email : jan.vandermeer@philips.com
David Mackie David Mackie
Apple Computer, Inc. Apple Computer, Inc.
One Infinite Loop, MS:302-2LF One Infinite Loop, MS:302-2LF
Cupertino CA 95014 Cupertino CA 95014
Email: dmackie@apple.com Email: dmackie@apple.com
Viswanathan Swaminathan Viswanathan Swaminathan
skipping to change at page 35, line 50 skipping to change at page 35, line 51
Palo Alto, CA 94303 Palo Alto, CA 94303
Email: viswanathan.swaminathan@sun.com Email: viswanathan.swaminathan@sun.com
David Singer David Singer
Apple Computer, Inc. Apple Computer, Inc.
One Infinite Loop, MS:302-3MT One Infinite Loop, MS:302-3MT
Cupertino CA 95014 Cupertino CA 95014
Email: singer@apple.com Email: singer@apple.com
Philippe Gentric Philippe Gentric
Philips Digital Networks, MP4Net Philips Electronics, MP4Net
51 rue Carnot 51 rue Carnot
92156 Suresnes 92156 Suresnes
France France
e-mail: philippe.gentric@philips.com e-mail: philippe.gentric@philips.com
Full Copyright Statement Full Copyright Statement
Copyright (C) The Internet Society (December 2002). All Rights Copyright (C) The Internet Society (February 2003). All Rights
Reserved. Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain others, and derivative works that comment on or otherwise explain
it or assist in its implementation may be prepared, copied, it or assist in its implementation may be prepared, copied,
published and distributed, in whole or in part, without restriction published and distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice and this of any kind, provided that the above copyright notice and this
paragraph are included on all such copies and derivative works. paragraph are included on all such copies and derivative works.
However, this document itself may not be modified in any way, such However, this document itself may not be modified in any way, such
as by removing the copyright notice or references to the Internet as by removing the copyright notice or references to the Internet
skipping to change at page 37, line 17 skipping to change at page 37, line 17
Appendix A. Interleave analysis Appendix A. Interleave analysis
A.1 Introduction A.1 Introduction
In this appendix interleaving issues are discussed. Some general In this appendix interleaving issues are discussed. Some general
notes are provided on de-interleaving and error concealment, while notes are provided on de-interleaving and error concealment, while
a number of interleaving patterns are examined, in particular a number of interleaving patterns are examined, in particular
for determining the maximum displacement in time and the size of for determining the maximum displacement in time and the size of
the de-interleave buffer. In these examples, the maximum the de-interleave buffer. In these examples, the maximum
displacement is cited in terms of an access unit count, for ease of displacement is cited in terms of an access unit count, for ease of
reading. In actual streams, it is signalled in units of the RTP reading. In actual streams, it is signaled in units of the RTP
time stamp clock. time stamp clock.
A.2 De-interleaving and error concealment A.2 De-interleaving and error concealment
This appendix does not describe any details on de-interleaving and This appendix does not describe any details on de-interleaving and
error concealment, as the control of the AU decoding and error error concealment, as the control of the AU decoding and error
concealment process has little to do with interleaving. If the concealment process has little to do with interleaving. If the
next AU to be decoded is present and there is sufficient storage next AU to be decoded is present and there is sufficient storage
available for the decoded AU, then decode it now. If not, wait. available for the decoded AU, then decode it now. If not, wait.
When the decoding deadline is reached (i.e., the time when decoding When the decoding deadline is reached (i.e., the time when decoding
skipping to change at page 39, line 41 skipping to change at page 39, line 41
+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+
- - 5 - 5 - 2 7 4 9 - - 5 - 5 - 2 7 4 9
7 4 9 5 7 4 9 5
"Early" AUs 5 6 "Early" AUs 5 6
7 7 7 7
9 9 9 9
Figure 8: Storage of "early" AUs in the de-interleave buffer per Figure 8: Storage of "early" AUs in the de-interleave buffer per
interleaved AU. interleaved AU.
A.4.2 Determining the maximum displacement A.4.3 Determining the maximum displacement
From figure 9 it can be seen that the maximum displacement in time From figure 9 it can be seen that the maximum displacement in time
equals 8 AU periods. Hence the minimum maxDisplacement value to be equals 8 AU periods. Hence the minimum maxDisplacement value to be
signaled is 8 AU periods. signaled is 8 AU periods.
+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+
Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+
Earliest not yet present AU - 1 1 1 1 1 - 3 - - Earliest not yet present AU - 1 1 1 1 1 - 3 - -
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/