draft-ietf-avt-mpeg4-simple-01.txt   draft-ietf-avt-mpeg4-simple-02.txt 
Internet Engineering Task Force J. van der Meer Internet Engineering Task Force J. van der Meer
Internet Draft Philips Electronics Internet Draft Philips Electronics
D. Mackie D. Mackie
Cisco Systems Inc. Cisco Systems Inc.
V. Swaminathan V. Swaminathan
Sun Microsystems Inc. Sun Microsystems Inc.
D. Singer D. Singer
Apple Computer Apple Computer
P. Gentric
Philips Electronics
March 2002 April 2002
Expires September 2002 Expires October 2002
Document: draft-ietf-avt-mpeg4-simple-01.txt Document: draft-ietf-avt-mpeg4-simple-02.txt
Use of "RFC XXXX" for MPEG-4 Elementary Streams with no SL layer Transport of MPEG-4 Elementary Streams
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of Drafts. Internet-Drafts are draft documents valid for a maximum of
skipping to change at line 41 skipping to change at page 1, line 44
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This specification is a product of the Audio/Video Transport working This specification is a product of the Audio/Video Transport working
group within the Internet Engineering Task Force. Comments are group within the Internet Engineering Task Force. Comments are
solicited and should be addressed to the working group's mailing solicited and should be addressed to the working group's mailing
list at avt@ietf.org and/or the authors. list at avt@ietf.org and/or the authors.
<< << Note for the RFC editor: xxxx should be replaced with the RFC
Note for the RFC editor: number that will be assigned. >>
XXXX should be replaced with the RFC number that will be assigned to
the companion RFC which draft is: draft-ietf-avt-mpeg4-multisl-**.txt.
>>
Abstract Abstract
The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in ISO The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in
that recently produced the MPEG-4 standard. MPEG defines tools to ISO that produced the MPEG-4 standard. MPEG defines tools to
compress content such as audio-visual information into elementary compress content such as audio-visual information into elementary
streams. In RFC XXXXX a generic RTP payload format is defined for streams. This specification defines a simple, but generic RTP
transport of any non-multiplexed MPEG-4 elementary stream. To achieve payload format for transport of any non-multiplexed MPEG-4
the generic MPEG-4 functionality, RFC XXXXX addresses detailed issues elementary stream.
related to the MPEG-4 SL layer. However, many initial applications will
not use the SL Layer. To facilitate usage of RFC XXXXX by such Table of Contents
applications, this document describes how to use RFC XXXX when no SL
layer is used. 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Carriage of MPEG-4 elementary streams over RTP . . . . . . . . 4
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . . . 4
2.3. Concatenation of Access Units . . . . . . . . . . . . . . . 4
2.4. Fragmentation of Access Units . . . . . . . . . . . . . . . 5
2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6. Time stamp information . . . . . . . . . . . . . . . . . . . 6
2.7. Carriage of auxiliary information . . . . . . . . . . . . . 6
2.8. MIME format parameters and configuring conditional field . . 6
2.9. Global structure of payload format . . . . . . . . . . . . . 7
2.10. Modes to transport MPEG-4 streams . . . . . . . . . . . . . 7
2.11. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . . 8
3. Payload format . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1. RTP header field usage . . . . . . . . . . . . . . . . . . . 8
3.2. RTP payload structure . . . . . . . . . . . . . . . . . . . 10
3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . . 10
3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . . 10
3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . . 12
3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . . 13
3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . . 14
3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . . 14
3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . . 15
3.3. Usage of this specification . . . . . . . . . . . . . . . . 15
3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . . 16
3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . . 16
3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . . 17
3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . . 18
3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . . 18
3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . . 19
4. IANA considerations . . . . . . . . . . . . . . . . . . . . . 19
4.1. MIME type registration . . . . . . . . . . . . . . . . . . . 20
4.2. Concatenation of parameters . . . . . . . . . . . . . . . . 24
4.3. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.1. The a=fmtp keyword . . . . . . . . . . . . . . . . . . . . 24
5. Security considerations . . . . . . . . . . . . . . . . . . . 25
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 26
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8. Author addresses . . . . . . . . . . . . . . . . . . . . . . . 26
APPENDIX: Usage of this payload format . . . . . . . . . . . . 27
A.1. Examples of delay analysis with interleave . . . . . . . 27
A.1.1 Group interleave . . . . . . . . . . . . . . . . . . . . 27
A.1.2 Continuous interleave . . . . . . . . . . . . . . . . . 28
1. Introduction 1. Introduction
The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29
that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4
standards [1]. The MPEG-4 standard specifies compression of standards [1]. The MPEG-4 standard specifies compression of
audio-visual data into for example an audio or video elementary audio-visual data into for example an audio or video elementary
stream. In the MPEG-4 standard, these streams take the form of stream. In the MPEG-4 standard, these streams take the form of
audiovisual objects that may be arranged into an audio-visual scene audiovisual objects that may be arranged into an audio-visual scene
by means of a scene description. Each MPEG-4 elementary stream by means of a scene description. Each MPEG-4 elementary stream
consists of a sequence of Access Units; in case of audio an Access consists of a sequence of Access Units; examples of an Access Unit
Unit (AU) is an audio frame and in case of video a picture. (AU) are an audio frame and a video picture.
The MPEG-4 system specification is a rather abstract specification in
the sense that no transport format for MPEG-4 elementary streams is
defined. Instead, a conceptual SL layer has been specified to store
transport specific information such as time stamps and random access
point information. When transporting an MPEG-4 elementary stream,
transport information from the SL layer is typically mapped to the
actual transport layer. Note however that the SL layer is conceptual
and may not exist in practice.
In RFC XXXX, a general payload format is defined for transport of a single
MPEG-4 elementary stream over RTP. The RTP payload format specified
in RFC XXXX allows for carriage of any information that may be contained in
the MPEG-4 SL layer, either by mapping to the RTP header fields or by
carriage in specific fields defined in the RTP payload. Consequently,
the format defined in RFC XXXX is very generic and complete; for example,
transcoding issues from and to the SL layer are described in detail.
However, in many initial MPEG-4 applications the SL layer does not
exist in practice. Such applications do not require any knowledge of
the SL layer. While the use of RFC XXXX is highly desirable for all MPEG-4
applications, to understand RFC XXXX may be difficult without knowledge of
the MPEG-4 SL layer. Therefore in this document the use of RFC XXXX is
described without requiring knowledge of the SL layer to understand
its functionality.
Sophisticated features on interleaving of fragmented Access Units are The MPEG-4 system specification is a rather abstract specification
defined in RFC XXXX. Because initial applications only need interleaving in the sense that no transport format for MPEG-4 elementary streams
of complete (non-fragmented) Access Units, these more sophisticated is defined. Instead, a conceptual synchronization layer (SL) has
features are not supported in this document. Hence, only a functional been specified to store transport specific information such as time
set of RFC XXXX is supported. stamps and random access point information. When transporting an
MPEG-4 elementary stream, transport information from the SL is
typically mapped to the actual transport layer. Note that the SL is
conceptual and may not exist in practice.
In RFC XXXX, a general and configurable payload structure is defined for This specification defines a general and configurable payload
transport of MPEG-4 streams. This allows for the design of receivers structure to transport MPEG-4 elementary streams such as audio,
that can be configured to receive any MPEG-4 stream. Configuration of speech, video and BIFS streams. The RTP payload defined in this
the payload is provided to accommodate transport of any MPEG-4 stream, document is simple to implement and reasonably efficient. It allows
but for a specific MPEG-4 elementary stream typically only very few for optional interleaving of Access Units (such as audio frames) to
configurations are needed. So as to allow for the design of simplified, increase error resiliency in packet loss.
but dedicated receivers, this specifications requires that specific
modes are defined for transport of MPEG-4 streams. In this document
only modes are defined for transport of MPEG-4 CELP and AAC streams,
but in future new RFCs are expected to specify additional modes for
transport of other MPEG-4 streams.
In summary, this document: Configuration of the payload is provided to accommodate transport
- is intended for applications that do not apply the SL layer; of any MPEG-4 stream at any possible bit rate. However, for a
- describes how to use RFC XXXX without requiring knowledge of the specific MPEG-4 elementary stream typically only very few
SL layer; configurations are needed. So as to allow for the design of
- defines a functional but true subset of RFC XXXX; simplified, but dedicated receivers, this specification requires
- defines modes how to use this specification for transport of MPEG-4 that specific modes are defined for transport of MPEG-4 streams.
CELP and AAC streams. This document defines modes for MPEG-4 CELP and AAC streams, as
well as a generic mode that can be used to transport any MPEG-4
stream. In the future new RFCs are expected to specify additional
modes for transport of MPEG-4 streams.
The use of RFC XXXX defined in this document is simple to implement The RTP payload format defined in this document specifies carriage
and reasonably efficient. It allows for optional interleaving of of system-related information that is often equivalent to the
Access Units (such as audio frames) to increase error resiliency in information that may be contained in the MPEG-4 SL. This
packet loss. document does not prescribe how to transcode or map information
from the SL to fields defined in the RTP payload format. Such
processing, if any, is left to the discretion of the application.
However, to anticipate the need for transport of any additional
system-related information in future, an auxiliary field can be
configured that may carry any such data.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119 [3]. this document are to be interpreted as described in RFC 2119 [3].
2. Carriage of MPEG-4 elementary streams over RTP 2. Carriage of MPEG-4 elementary streams over RTP
2.1 Introduction 2.1 Introduction
With this payload format a single MPEG-4 elementary stream can be With this payload format a single MPEG-4 elementary stream can be
transported. Information on the type of MPEG-4 stream carried in the transported. Information on the type of MPEG-4 stream carried in
payload is conveyed by format parameters in an SDP [7] message or the payload is conveyed by MIME format parameters, for example in
by other means. These format parameters specify the configuration an SDP [6] message or by other means. These MIME format parameters
of the payload. To simplify receivers, also a format parameter is specify the configuration of the payload. To allow for simplified
available to signal a specific mode of using this payload. A mode and dedicated receivers, a MIME format parameter is available
definition MAY include the type of MPEG-4 elementary stream as well to signal a specific mode of using this payload. A mode definition
as the applied configuration, so as to avoid the need in receivers MAY include the type of MPEG-4 elementary stream as well as the
for parsing all format parameters. applied configuration, so as to avoid the need in receivers
to parse all MIME format parameters. The applied mode MUST be
signalled.
2.2 MPEG Access Units 2.2 MPEG Access Units
For carriage of compressed audio-visual data MPEG defines Access For carriage of compressed audio-visual data MPEG defines Access
Units. An MPEG Access Unit (AU) is the smallest data entity to which Units. An MPEG Access Unit (AU) is the smallest data entity to
timing information can be attributed. In case of audio an Access which timing information is attributed. In case of audio an Access
Unit represents an audio frame and in case of video a picture. MPEG Unit may represent an audio frame and in case of video a picture.
Access Units are by definition byte aligned. If for example an audio MPEG Access Units are by definition byte aligned. If for example an
frame is not byte aligned, up to 7 zero-padding bits MUST be inserted audio frame is not byte aligned, up to 7 zero-padding bits MUST be
at the end of the frame to achieve a byte-aligned Access Unit. inserted at the end of the frame to achieve a byte-aligned Access
Decoders MUST be able to decode AUs in which such padding is applied. Unit. MPEG-4 decoders MUST be able to decode AUs in which such
padding is applied.
Consistent with the MPEG-4 specification, this document requires that Consistent with the MPEG-4 specification, this document requires
each MPEG-4 video Access Unit includes all the coded data of a that each MPEG-4 part 2 video Access Unit includes all the coded
picture, any video stream headers that may precede the coded picture data of a picture, any video stream headers that may precede the
data, and any video stream stuffing that may follow it, up to, but not coded picture data, and any video stream stuffing that may follow
including the startcode indicating the start of a new video stream or it, up to, but not including the startcode indicating the start of
the next Access Unit. a new video stream or the next Access Unit.
2.3 Concatenation of Access Units 2.3 Concatenation of Access Units
Frequently it is possible to carry multiple Access Units in one RTP Frequently it is possible to carry multiple Access Units in one RTP
packet. This is particularly useful for audio; for example, when AAC packet. This is particularly useful for audio; for example, when
is used for encoding of a stereo signal at 64 kbits/sec, AAC frames AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC
contain on average approximately 200 bytes. On a LAN with a 1500 octet frames contain on average approximately 200 octets. On a LAN with a
MTU this would allow on average 7 complete AAC frames to be carried 1500 octet MTU this would allow on average 7 complete AAC frames to
per AAC packet. be carried per AAC packet.
Access Units may have a fixed size in octets, but a variable size is Access Units may have a fixed size in octets, but a variable size
also possible. To facilitate parsing in case of multiple concatenated is also possible. To facilitate parsing in case of multiple
AUs in one RTP packet, the size of each AU is made known to the concatenated AUs in one RTP packet, the size of each AU is made
receiver. When concatenating in case of a constant AU size, this size known to the receiver. When concatenating in case of a constant AU
is communicated through a format parameter. When concatenating in case size, this size is communicated "out of band" through a MIME format
of variable size AUs, the RTP payload carries an AU size field for parameter. When concatenating in case of variable size AUs, the RTP
each contained AU. In combination with the RTP payload length the payload carries "in band" an AU size field for each contained AU.
size information allows the RTP payload to be split by the receiver In combination with the RTP payload length the size information
back into the individual AUs. allows the RTP payload to be split by the receiver back into the
individual AUs.
To simplify the implementation of RFC XXXX defined in this document, it To simplify the implementation of RTP receivers, it is required
is required that when multiple AUs are carried in an RTP packet, that that when multiple AUs are carried in an RTP packet, each AU MUST
each AU MUST be complete, i.e. the number of AUs in an RTP packet be complete, i.e. the number of AUs in an RTP packet MUST be
MUST be integral. integral.
2.4 Fragmentation of Access Units 2.4 Fragmentation of Access Units
MPEG allows for very large Access Units. Since most IP networks have MPEG allows for very large Access Units. Since most IP networks
significantly smaller MTU's, this payload format allows to fragment have significantly smaller MTU sizes, this payload format allows
the AUs over multiple RTP packets so as to avoid IP layer for the fragmentation of an Access Unit over multiple RTP packets
fragmentation. To simplify the implementation of RFC XXXX defined in this so as to avoid IP layer fragmentation. To simplify the
document, an RTP packet SHALL either carry one or more complete implementation of RTP receivers, an RTP packet SHALL either carry
Access Units or a single fragment of one Access Unit. one or more complete Access Units or a single fragment of one
Access Unit.
2.5 Interleaving 2.5 Interleaving
When an RTP packet carries a contiguous sequence of Access Units, When an RTP packet carries a contiguous sequence of Access Units,
the loss of such packet can result in "decoding gaps" for the user. the loss of such a packet can result in a "decoding gap" for the
One method to alleviate this problem is to allow for the Access user. One method to alleviate this problem is to allow for the
Units to be interleaved in the RTP packets. For a modest cost in Access Units to be interleaved in the RTP packets. For a modest
latency and implementation complexity, significant error resiliency cost in latency and implementation complexity, significant error
to packet loss can be achieved. resiliency to packet loss can be achieved.
To support optional interleaving of Access Units, this payload To support optional interleaving of Access Units, this payload
format allows for index information to be sent for each Access Unit. format allows for index information to be sent for each Access Unit.
The RTP sender is free to choose the interleaving pattern without The RTP sender is free to choose the interleaving pattern without
propagating this information to the receiver(s). Indeed the sender propagating this information to the receiver(s). Indeed the sender
could dynamically adjust the interleaving pattern based on the could dynamically adjust the interleaving pattern based on the
Access Unit size, error rates, etc. The RTP receiver does not need Access Unit size, error rates, etc. The RTP receiver does not need
to know the interleaving pattern used, it only need extract the to know the interleaving pattern used, it only needs to extract the
index information of the Access Unit and insert the Access Unit into index information of the Access Unit and insert the Access Unit
the appropriate sequence in the rendering queue. An example of into the appropriate sequence in the rendering queue. An example of
interleaving is given below. interleaving is given below.
Assume that an RTP packet contains 3 AUs, and that the AUs are Assume that an RTP packet contains 3 AUs, and that the AUs are
numbered 1, 2, 3, 4, etc. If an interleaving group length of 9 is numbered 1, 2, 3, 4, etc. If an interleaving group length of 9 is
chosen, then RTP packet(i) contain the following AU(n): chosen, then RTP packet(i) contains the following AU(n):
RTP packet(1): AU(1), AU(4), AU(7) RTP packet(1): AU(1), AU(4), AU(7)
RTP packet(2): AU(2), AU(5), AU(8) RTP packet(2): AU(2), AU(5), AU(8)
RTP packet(3): AU(3), AU(6), AU(9) RTP packet(3): AU(3), AU(6), AU(9)
RTP packet(4): AU(10), AU(13), AU(16) RTP packet(4): AU(10), AU(13), AU(16)
RTP packet(5): AU(11), AU(14), AU(17) RTP packet(5): AU(11), AU(14), AU(17)
Etc. Etc.
2.6 Time stamp information 2.6 Time stamp information
MPEG-4 defines two type of time stamps, the decoding time stamp DTS
and the composition time stamp CTS. The RTP timestamp is equivalent
to the composition time stamp.
The RTP time stamp MUST carry the sampling instance of the first AU The RTP time stamp MUST carry the sampling instance of the first AU
(fragment) in the RTP packet. When multiple AUs are carried within (fragment) in the RTP packet. When multiple AUs are carried within
an RTP packet, the time stamps of subsequent AUs can be calculated an RTP packet, the time stamps of subsequent AUs can be calculated
if the frame period of each AU is known. For audio and video this if the frame period of each AU is known. For audio and video this
is possible if the frame rate is constant. However, in some cases it is possible if the frame rate is constant. However, in some cases
is not possible to make such calculation, for example for variable it is not possible to make such calculation, for example for
frame rate video and for MPEG-4 BIFS streams carrying composition variable frame rate video and for MPEG-4 BIFS streams carrying
information. To support such cases, this payload format can be composition information. To support such cases, this payload format
configured to carry a CTS in the RTP payload for each contained can be configured to carry a time stamp in the RTP payload for each
Access Unit. A CTS time stamp MAY be conveyed in the RTP payload contained Access Unit. A time stamp MAY be conveyed in the RTP
only for non-first AUs in the RTP packet, and SHALL NOT be conveyed payload only for non-first AUs in the RTP packet, and SHALL NOT be
for the first AU (fragment), as the time stamp for the latter is conveyed for the first AU (fragment), as the time stamp for the
carried by the RTP time stamp. latter is carried by the RTP time stamp.
The DTS timestamp may be applied only in MPEG video streams that use MPEG-4 defines two type of time stamps, the composition time stamp
bi-directional coding, i.e. when pictures may be predicted in both (CTS) and the decoding time stamp (DTS). The CTS represents the
forward and backward direction by using either a reference picture in sampling instance of an AU, and hence the CTS is equivalent to the
the past, or a reference picture in the future. The DTS cannot be RTP time stamp. The DTS may be used only in MPEG-4 video streams
carried in the RTP header. In some cases the DTS can be derived from that use bi-directional coding, i.e. when pictures are predicted in
the RTP time stamp using frame rate information; this requires deep both forward and backward direction by using either a reference
parsing in the video stream, which may be considered objectionable. picture in the past, or a reference picture in the future. The DTS
But if the video frame rate is variable, the required information cannot be carried in the RTP header. In some cases the DTS can be
may not even present in the video stream. For both reasons, the derived from the RTP time stamp using frame rate information; this
capability has been defined to optionally carry a DTS in the RTP requires deep parsing in the video stream, which may be considered
payload for each contained Access Unit. objectionable. But if the video frame rate is variable, the required
information may not even be present in the video stream. For both
reasons, the capability has been defined to optionally carry the
DTS in the RTP payload for each contained Access Unit.
Since RTP time stamps may be re-stamped by RTP devices, each CTS Since RTP time stamps may be re-stamped by RTP devices, each time
and DTS contained in the RTP payload is coded differentially from the stamp contained in the RTP payload is coded differentially from the
RTP time stamp, so as to avoid extensive parsing by re-stamping RTP time stamp, so as to avoid extensive parsing by re-stamping
devices. devices.
2.7 Carriage of auxiliary information. 2.7 Carriage of auxiliary information.
This payload format defines a specific field to carry auxiliary data This payload format defines a specific field to carry auxiliary
on the contained MPEG-4 stream, representing MPEG-4 system information. data. The auxiliary data field is preceded by a field that specifies
The auxiliary data corresponds to the RSLH field defined in RFC XXXX. the length of the auxiliary data, so as to facilitate skipping of
Receivers MAY use the auxiliary data to decode the contained stream, the data without parsing it. The coding of the auxiliary data is not
but receivers that have no interest in such data MAY skip the defined in this document, but is left to the discretion of
auxiliary data field. To facilitate skipping of the data, and to avoid applications. Receivers that have knowledge of the auxiliary data
the need for parsing it, the auxiliary data field is preceded by a MAY decode the auxiliary data, but receivers without knowledge of
field that specifies the length of the auxiliary data. such data MUST skip the auxiliary data field.
2.8 Format parameters and the conditional presence and length of fields 2.8 MIME format parameters and configuring conditional fields
To support the features described in the previous sections several To support the features described in the previous sections several
fields are defined for carriage in the RTP payload. However, their use fields are defined for carriage in the RTP payload. However, their
strongly depends on the type of MPEG-4 elementary stream that is use strongly depends on the type of MPEG-4 elementary stream that
carried. Sometimes a specific field is needed with a certain length, is carried. Sometimes a specific field is needed with a certain
while in other cases such field is not needed at all. To be efficient length, while in other cases such field is not needed at all. To be
in either case, the fields needed for these features are configurable efficient in either case, the fields to support these features are
by means of format parameters. In general, a format parameter defines configurable by means of MIME format parameters. In general, a MIME
the presence and length of associated fields. A length of zero format parameter defines the presence and length of the associated
indicates absence of the field. As a consequence, parsing of the field. A length of zero indicates absence of the field. As a
payload requires knowledge of format parameters. The format consequence, parsing of the payload requires knowledge of MIME
parameters are conveyed to the receiver via SDP [7] messages or format parameters. The MIME format parameters are conveyed to the
through other means. receiver via SDP [6] messages or through other means.
2.9 Global structure of payload format 2.9 Global structure of payload format
The payload structure in RFC XXXX is described in terms derived from the The RTP payload following the RTP header, contains three byte
SL layer. In this document exactly the same structure is described aligned data sections, of which the first two MAY be empty. See
in more general terms, so as to improve the readability for people figure 1.
with no knowledge of the SL layer. So the payload structure described
below corresponds on bit level exactly to the payload structure
defined in RFC XXXX.
The RTP payload following the RTP header, contains three byte aligned
data sections, of which the first two MAY be empty. See figure 1.
+---------+-----------+-----------+---------------+ +---------+-----------+-----------+---------------+
| RTP | AU Header | Auxiliary | Access Unit | | RTP | AU Header | Auxiliary | Access Unit |
| Header | Section | Section | Data Section | | Header | Section | Section | Data Section |
+---------+-----------+-----------+---------------+ +---------+-----------+-----------+---------------+
<----------RTP Packet Payload-----------> <----------RTP Packet Payload----------->
Figure 1: Data sections within an RTP packet Figure 1: Data sections within an RTP packet
The first data section is the AU (Access Unit) Header Section, that The first data section is the AU (Access Unit) Header Section, that
contains one or more AU-headers; however, each AU-header MAY be empty, contains one or more AU-headers; however, each AU-header MAY be
in which case the entire AU Header Section is empty. The second empty, in which case the entire AU Header Section is empty. The
section is the Auxiliary Section, containing auxiliary data; also second section is the Auxiliary Section, containing auxiliary data;
this section MAY be configured empty. The third section is the Access this section MAY also be configured empty. The third section is the
Unit Data Section, containing either a single fragment of one Access Access Unit Data Section, containing either a single fragment of
Unit or one or more complete Access Units. The Access Unit Data one Access Unit or one or more complete Access Units. The Access
Section is never empty. Unit Data Section is never empty.
When compared to the terms used in RFC XXXX, the AU Header Section
exactly corresponds to the Payload Header Section, the Auxiliary
Section to the RSLH Section, and the Access Unit Data Section to the
Payload Section.
2.10 Modes to transport MPEG-4 streams 2.10 Modes to transport MPEG-4 streams
While it is possible to build fully configurable receivers capable of While it is possible to build fully configurable receivers capable
receiving any MPEG-4 stream, this specification also allows for the of receiving any MPEG-4 stream, this specification also allows for
design of simplified, but dedicated receivers, that are capable for the design of simplified, but dedicated receivers, that are capable
example to receive only one type of MPEG-4 stream. This is achieved by for example of receiving only one type of MPEG-4 stream. This
requiring that specific modes be defined for using this specification. is achieved by requiring that specific modes be defined for using
Each mode defines how to transport specific MPEG-4 streams, for example this specification. Each mode may define constraints for transport
by defining suitable constraints or payload configurations. Modes can of one or more type of MPEG-4 streams, for instance on the payload
be defined as deemed appropriate. However, each mode MUST be in full configuration.
compliance with this specification.
The applied mode MUST be signalled. Signalling the mode is particularly
important for receivers that are only capable of decoding a particular
mode. Such receivers need to determine whether that particular mode is
applied, so as to avoid problems with processing of payloads that are
beyond the capabilities of the receiver.
In this internet draft only modes are defined for transport of MPEG-4 The applied mode MUST be signalled. Signalling the mode is
CELP and AAC streams. However, in future new RFCs are expected to particularly important for receivers that are only capable of
specify additional modes of using this specification for transport of decoding one or more specific modes. Such receivers need to
other MPEG-4 streams. determine whether the applied mode is supported, so as to avoid
problems with processing of payloads that are beyond the
capabilities of the receiver.
2.11 Alignment with RFC XXXX and RFC 3016 In this document several modes are defined for transport of MPEG-4
CELP and AAC streams, as well as a generic mode that can be used
for any MPEG-4 stream. In future, new RFCs are expected to specify
additional modes of using this specification. New modes can be
defined as deemed appropriate, typically by specifications that are
hierarchically higher than this payload format. However, each mode
MUST be in full compliance with this specification.
This document defines a subset of the RFC XXXX. The main characteristic 2.11 Alignment with RFC 3016
of this subset is that each RTP payload is only allowed to contain either
a single fragment of one Access Unit or one or more complete Access Units.
Obviously, RTP payloads that apply this subset in conformance with this
document conform also to RFC XXXX. Receivers that comply with RFC XXXX
are able to decode MPEG-4 streams carried in compliance with this
document.
Receivers designed to only comply to this document may not be able to This payload can be configured to be nearly identical to the
decode a RTP payload that conforms to RFC XXXX but not to this document. payload format defined in RFC 3016 [5] for the MPEG-4 video
Such receivers may also not be capable of exploiting some of features configurations recommended in RFC 3016. Hence, receivers that
of the SL layer supported in RFC XXXX, such as knowledge of AU-start, comply with RFC 3016 can decode such RTP payload, providing that
random access information and other information carried in the SL header, additional packets containing video decoder configuration (VO,
but not described in this document. VOL, VOSH) are inserted in the stream, as required by RFC 3016.
Conversely, receivers that comply with the specification in this
document SHOULD be able to decode payloads, names and parameters
defined for MPEG-4 video in RFC 3016. In this respect it is
strongly recommended to implement the ability to ignore "in band"
video decoder configuration packets in the RFC 3016 payload.
Furthermore, this payload can be configured to be identical to the For interoperability reasons, applications that transport MPEG-4
payload format defined in RFC 3016 [5] for the MPEG-4 video configurations video part 2 over RTP SHOULD use the payload format and associated
recommended in RFC 3016. Hence, receivers that comply with RFC 3016 names and parameters defined in RFC 3016 if the functionality
can decode such RTP payload. Vice versa, receivers that comply with the provided by RFC 3016 can meet the requirements of that application.
specification in this document SHOULD be able to decode payloads, names On the other hand, if applications wish to use a single RTP payload
and parameters defined for MPEG-4 video in RFC 3016. format for transport of all type of MPEG-4 streams, then the RTP
payload defined in this document provides a suitable solution, also
for transport of MPEG-4 video part 2 streams.
For interoperability reasons, applications that transport MPEG-4 video Note that since the "out of band" availability of the video decoder
over RTP SHOULD use the payload format and associated names and configuration as a MIME parameter is optional in RFC 3016, for
parameters defined in RFC 3016 if the functionality provided by RFC 3016 obvious interoperability reasons with this specification it is
can meet the requirements of that application. recommended to systematically implement this optional feature.
3 Payload Format 3 Payload Format
3.1 RTP Header Fields Usage 3.1 RTP Header Fields Usage
Payload Type (PT): The assignment of an RTP payload type for this Payload Type (PT): The assignment of an RTP payload type for this
RTP packet format is outside the scope of this document, and will RTP packet format is outside the scope of this document, and will
not be specified here. It is expected that the RTP profile for a not be specified here. It is expected that the RTP profile for a
particular class of applications will assign a payload type for this particular class of applications will assign a payload type for
encoding, or if that is not done, then a payload type in the dynamic this encoding, or if that is not done, then a payload type in the
range shall be chosen. dynamic range shall be chosen.
Marker (M) bit: The M bit is set to 1 to indicate that the RTP packet Marker (M) bit: The M bit is set to 1 to indicate that the RTP
payload includes the end of each Access Unit of which data is packet payload includes the end of each Access Unit of which data
contained in this RTP packet. As the payload either carries one or is contained in this RTP packet. As the payload either carries one
more complete Access Units or a single fragment of an Access Unit, or more complete Access Units or a single fragment of an Access
the M is always set to set to 1, except when the packet carries a Unit, the M bit is always set to 1, except when the packet carries
single fragment of an Access Unit that is not the last one. a single fragment of an Access Unit that is not the last one.
Extension (X) bit: Defined by the RTP profile used. Extension (X) bit: Defined by the RTP profile used.
Sequence Number: The RTP sequence number SHOULD be generated by the Sequence Number: The RTP sequence number SHOULD be generated by
sender with a constant random offset. the sender with a constant random offset.
Timestamp: Indicates the sampling instance of the first AU contained Timestamp: Indicates the sampling instance of the first AU
in the RTP payload. This sampling instance is equivalent to the CTS contained in the RTP payload. This sampling instance is equivalent
in the MPEG-4 time domain. The clock rate of the RTP time stamp MUST to the CTS in the MPEG-4 time domain. When using SDP the clock rate
be expressed as part of the RTPMAP. If an audio or video stream with of the RTP time stamp MUST be expressed using the "rtpmap"
a fixed frame rate is transported, the rate SHOULD be set to the same attribute. If an MPEG-4 audio stream is transported, the rate SHOULD
value as the sampling frequency of the audio or video frames (number be set to the same value as the sampling rate of the audio stream.
of samples per second). If an MPEG-4 video stream is transported, it is RECOMMENDED to set
the rate to 90 kHz.
In all cases, the sender SHALL make sure that RTP time stamps In all cases, the sender SHALL make sure that RTP time stamps
are identical only if the RTP time stamp refers to fragments of the are identical only if the RTP time stamp refers to fragments of the
same Access Unit. same Access Unit.
According to RFC 1889 [2] (section 5.1), RTP timestamps are According to RFC 1889 [2] (section 5.1), RTP timestamps are
recommended to start at a random value for security reasons. However, recommended to start at a random value for security reasons. This
then a receiver is, in the general case, not able to reconstruct the is not an issue for synchronization of multiple RTP streams.
original MPEG Time Stamps, which creates problems for applications However, in applications where streams from multiple sources are to
where streams from multiple sources are to be synchronized. To enable be synchronized (for example one stream from local storage, another
synchronisation in such cases, for example between one stream from from a RTP streaming server), synchronization may become impossible.
local storage and another from an RTP streaming server, the applied To also enable synchronization in such cases, it may be necessary to
random offset MUST be provided out of band. Methods to convey the provide the required relationship between time stamps for obtaining
applied random offset value are beyond the scope of this synchronization by out of band means. The format of such information
specification. as well as methods to convey such information are beyond the scope
of this specification.
SSRC: set as described in RFC1889 [2]. SSRC: set as described in RFC1889 [2].
CC and CSRC fields are used as described in RFC 1889 [2]. CC and CSRC fields are used as described in RFC 1889 [2].
RTCP SHOULD be used as defined in RFC 1889 [2]. RTCP SHOULD be used as defined in RFC 1889 [2].
3.2 RTP Payload Structure 3.2 RTP Payload Structure
As already noted in section 2.9 of this document, this document uses
more general names to describe exactly the same payload structure as
defined in RFC XXXX. For mapping between section names in RFC XXXX and
in this document see section 2.9.
3.2.1 The AU Header Section 3.2.1 The AU Header Section
When present, the AU Header Section consists of the AU-header-length When present, the AU Header Section consists of the AU-header-length
field, followed by a number of AU-headers. See figure 2. field, followed by a number of AU-headers. See figure 2.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
|AU-headers-length|AU-header|AU-header| |AU-header|padding| |AU-headers-length|AU-header|AU-header| |AU-header|padding|
| | (1) | (2) | | (n) | bits | | | (1) | (2) | | (n) | bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
Figure 2: The AU Header Section Figure 2: The AU Header Section
The AU-headers are configured using format parameters and MAY be empty. The AU-headers are configured using MIME format parameters and MAY
If the AU-header is configured empty, the AU-headers-length field be empty. If the AU-header is configured empty, the
SHALL not be present and consequently the AU Header Section is empty. AU-headers-length field SHALL not be present and consequently the
If the AU-header is not configured empty, then the AU-headers-length AU Header Section is empty. If the AU-header is not configured
is a two octet field that specifies the length in bits of the empty, then the AU-headers-length is a two octet field that
immediately following AU-headers. specifies the length in bits of the immediately following
AU-headers, excluding the padding bits.
Each AU-header is associated with a single Access Unit (fragment) Each AU-header is associated with a single Access Unit (fragment)
contained in the Access Unit Data Section in the same RTP packet. For contained in the Access Unit Data Section in the same RTP packet.
each contained Access Unit (fragment) there is exactly one AU-header. For each contained Access Unit (fragment) there is exactly one
Within the AU Header Section, the AU-headers are bit-wise concatenated AU-header. Within the AU Header Section, the AU-headers are
in the order in which the Access Units are contained in the Access bit-wise concatenated in the order in which the Access Units are
Unit Data Section. Hence, the n-th AU-header refers to the n-th AU contained in the Access Unit Data Section. Hence, the n-th
(fragment). If the concatenated AU-headers consume a non-integer AU-header refers to the n-th AU (fragment). If the concatenated
number of octets, up to 7 zero-padding bits MUST be inserted at the end AU-headers consume a non-integer number of octets, up to 7
in order to achieve byte-alignment of the AU Header Section. zero-padding bits MUST be inserted at the end in order to achieve
byte-alignment of the AU Header Section.
3.2.1.1 The AU-header 3.2.1.1 The AU-header
The AU-header contains the fields given in figure 3. The length in The AU-header contains the fields given in figure 3. The length in
bits of the above fields with the exception of the CTS-flag and bits of the above fields with the exception of the CTS-flag and
the DTS-flag fields is defined by format parameters; see section 4.1. the DTS-flag fields is defined by MIME format parameters; see
If a format parameter has the default value of zero, then the section 4.1. If a MIME format parameter has the default value of
associated field is not present. zero, then the associated field is not present.
+---------------------------------------+ +---------------------------------------+
| AU-size | | AU-size |
+---------------------------------------+ +---------------------------------------+
| AU-Index / AU-Index-delta | | AU-Index / AU-Index-delta |
+---------------------------------------+ +---------------------------------------+
| CTS-flag | | CTS-flag |
+---------------------------------------+ +---------------------------------------+
| CTS-delta | | CTS-delta |
+---------------------------------------+ +---------------------------------------+
skipping to change at line 491 skipping to change at page 11, line 25
+---------------------------------------+ +---------------------------------------+
| DTS-delta | | DTS-delta |
+---------------------------------------+ +---------------------------------------+
Figure 3: The fields in the AU-header. If used, the AU-Index field Figure 3: The fields in the AU-header. If used, the AU-Index field
only occurs in the first AU-header within an AU Header only occurs in the first AU-header within an AU Header
Section; in any other AU-header the AU-Index-delta field Section; in any other AU-header the AU-Index-delta field
occurs instead. occurs instead.
AU-size: indicates the size in octets of the associated Access Unit AU-size: indicates the size in octets of the associated Access Unit
in the Access Unit Data Section in the same RTP packet. When the in the Access Unit Data Section in the same RTP packet. When
AU-size is associated to an AU fragment, the AU size indicates the AU-size is associated with an AU fragment, the AU size
the size of the entire AU and not the size of the fragment. This indicates the size of the entire AU and not the size of the
can be exploited to determine whether a packet contains an entire fragment. This can be exploited to determine whether a packet
AU or a fragment, which is particularly useful after losing a contains an entire AU or a fragment, which is particularly
packet carrying the last fragment of an AU. useful after losing a packet carrying the last fragment of an
AU.
AU-Index: indicates the serial number of the associated Access Unit AU-Index: indicates the serial number of the associated Access Unit
(fragment). For each (in time) consecutive AU or AU fragment, (fragment). For each (in decoding order) consecutive AU or AU
the serial number is incremented with 1. When present, the fragment, the serial number is incremented with 1. When
AU-Index field occurs in the first AU-header in the AU Header present, the AU-Index field occurs in the first AU-header in
Section, but MUST NOT occur in any subsequent (non-first) the AU Header Section, but MUST NOT occur in any subsequent
AU-header in that Section. To encode the serial number in any (non-first) AU-header in that Section. To encode the serial
such non-first AU-header, the AU-Index-delta field is used. number in any such non-first AU-header, the AU-Index-delta
When each AU-Index field is coded with the value 0, the serial field is used. If each AU-Index field is coded with the value
number of the AU (fragment) is not specified and in that case 0, the serial number of the AU (fragment) is not specified,
receivers MAY ignore the AU-Index field. and in that case receivers MAY ignore the AU-Index field.
AU-Index-delta: The AU-Index-delta field is an unsigned integer AU-Index-delta: The AU-Index-delta field is an unsigned integer
that specifies the serial number of the associated AU as the that specifies the serial number of the associated AU as the
difference with respect to the serial number of the previous difference with respect to the serial number of the previous
Access Unit. Hence, for the n-th (n>1) AU the serial number is Access Unit. Hence, for the n-th (n>1) AU the serial number
found from: is found from:
AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1 AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1
If the AU-Index field is present in the first AU-header in If the AU-Index field is present in the first AU-header in
the AU Header Section, then the AU-Index-delta field MUST be the AU Header Section, then the AU-Index-delta field MUST be
present in any subsequent (non-first) AU-header. When the present in any subsequent (non-first) AU-header. When the
AU-Index-delta is coded with the value 0, it indicates that AU-Index-delta is coded with the value 0, it indicates that
the Access Units are consecutive in time. An AU-Index-delta the Access Units are consecutive in decoding order. An
value larger than 0 signals that interleaving is applied. AU-Index-delta value larger than 0 signals that interleaving
is applied.
CTS-flag: Indicates whether the CTS-delta field is present. CTS-flag: Indicates whether the CTS-delta field is present.
A value of 1 indicates that the field is present, a value of 0 A value of 1 indicates that the field is present, a value
that it is not present. of 0 that it is not present.
The CTS-flag field MUST be present in each AU-header if the The CTS-flag field MUST be present in each AU-header if the
length of the CTS-delta field is signalled to be larger than length of the CTS-delta field is signalled to be larger than
zero. In that case, the CTS-flag field MUST have the value 0 zero. In that case, the CTS-flag field MUST have the value 0
in the first AU-header and MAY have the value 1 in all non-first in the first AU-header and MAY have the value 1 in all
AU-headers. The CTS-flag field SHOULD be 0 for any non-first non-first AU-headers. The CTS-flag field SHOULD be 0 for
fragment of an Access Unit. any non-first fragment of an Access Unit.
CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's
complement offset (delta) from the timestamp in the RTP header complement offset (delta) from the time stamp in the RTP
of this RTP packet. The CTS MUST use the same clock rate as the header of this RTP packet. The CTS MUST use the same clock
time stamp in the RTP header. rate as the time stamp in the RTP header.
DTS-flag: Indicates whether the DTS-delta field is present. A value DTS-flag: Indicates whether the DTS-delta field is present. A value
of 1 indicates that DTS-delta is present, a value of 0 that it of 1 indicates that DTS-delta is present, a value of 0 that
is not present. it is not present.
The DTS-flag field MUST be present in each AU-header if the The DTS-flag field MUST be present in each AU-header if the
length of the DTS-delta field is signalled to be larger than length of the DTS-delta field is signalled to be larger than
zero. The DTS-flag field SHOULD be 0 for any non-first zero. The DTS-flag field SHOULD be 0 for any non-first
fragment of an Access Unit. fragment of an Access Unit.
DTS-delta: specifies the value of the DTS as a 2's complement offset DTS-delta: specifies the value of the DTS as a 2's complement
(delta) from the CTS timestamp. The DTS MUST use the same clock offset (delta) from the CTS. The DTS MUST use the
rate as the time stamp in the RTP header. same clock rate as the time stamp in the RTP header.
If present, the fields MUST occur in the mutual order given in If present, the fields MUST occur in the mutual order given in
figure 3. In the general case a receiver can only discover the size figure 3. In the general case a receiver can only discover the size
of an AU-header by parsing it since the presence of the CTS-delta of an AU-header by parsing it since the presence of the CTS-delta
and DTS-delta fields is signalled by the value of the CTS-flag and and DTS-delta fields is signalled by the value of the CTS-flag and
DTS-flag, respectively. DTS-flag, respectively.
3.2.2 The Auxiliary Section 3.2.2 The Auxiliary Section
The Auxiliary Section consists of the auxiliary-data-size field The Auxiliary Section consists of the auxiliary-data-size field
skipping to change at line 572 skipping to change at page 13, line 4
concatenation of the auxiliary-data-size and the auxiliary-data concatenation of the auxiliary-data-size and the auxiliary-data
fields consume a non-integer number of octets, up to 7 zero padding fields consume a non-integer number of octets, up to 7 zero padding
bits MUST be inserted immediately after the auxiliary data in order bits MUST be inserted immediately after the auxiliary data in order
to achieve byte-alignment. See figure 4. to achieve byte-alignment. See figure 4.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
| auxiliary-data-size | auxiliary-data |padding bits | | auxiliary-data-size | auxiliary-data |padding bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
Figure 4: The fields in the Auxiliary Section Figure 4: The fields in the Auxiliary Section
The length in bits of the auxiliary-data-size field is configurable The length in bits of the auxiliary-data-size field is configurable
by a format parameter; see section 4.1. The default length of zero by a MIME format parameter; see section 4.1. The default length of
indicates that the entire Auxiliary Section is absent. zero indicates that the entire Auxiliary Section is absent.
auxiliary-data-size; specifies the length in bits of the immediately auxiliary-data-size: specifies the length in bits of the immediately
following auxiliary-data field; following auxiliary-data field;
auxiliary-data; the auxiliary-data field contains the Remaining SL auxiliary-data: the auxiliary-data field contains data of a format
headers (RSLHs) as defined in RFC XXXX. not defined by this specification.
3.2.3 The Access Unit Data Section 3.2.3 The Access Unit Data Section
The Access Unit Data Section contains an integer number of complete The Access Unit Data Section contains an integer number of complete
Access Units or a single fragment of one AU. The Access Unit Data Access Units or a single fragment of one AU. The Access Unit Data
Section is never empty. If data of more than one Access Units is Section is never empty. If data of more than one Access Unit is
contained, then the AUs are concatenated into a contiguous string of present, then the AUs are concatenated into a contiguous string
octets. See figure 5. The AUs inside the Access Unit Data Section of octets. See figure 5. The AUs inside the Access Unit Data
MUST be in decoding order. Section MUST be in decoding order.
The size and number of Access Units SHOULD be adjusted such that the The size and number of Access Units SHOULD be adjusted such that
resulting RTP packet is not larger than the path-MTU. To handle the resulting RTP packet is not larger than the path MTU. To handle
larger packets, this payload format relies on lower layers for larger packets, this payload format relies on lower layers for
fragmentation, which may not be desirable. fragmentation, which may not be desirable.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|AU(1) | |AU(1) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- |
| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |AU(2) | | |AU(2) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
skipping to change at line 617 skipping to change at page 13, line 48
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | |
|-+-+-+-+-+-+-+-+ |-+-+-+-+-+-+-+-+
Figure 5: Access Unit Data Section; each AU is byte aligned. Figure 5: Access Unit Data Section; each AU is byte aligned.
When multiple Access Units are carried, the size of each AU MUST be When multiple Access Units are carried, the size of each AU MUST be
made available to the receiver. If the AU size is variable then the made available to the receiver. If the AU size is variable then the
size of each AU MUST be indicated in the AU-size field of the size of each AU MUST be indicated in the AU-size field of the
corresponding AU-header. However, if the AU size is constant for a corresponding AU-header. However, if the AU size is constant for a
stream, this mechanism SHOULD NOT be used, but instead the fixed size stream, this mechanism SHOULD NOT be used, but instead the fixed
SHOULD be signalled by the format parameter "ConstantSize", see size SHOULD be signalled by the MIME format parameter
section 4.1. "ConstantSize", see section 4.1.
The absence of both AU-size in the AU-header and the ConstantSize The absence of both AU-size in the AU-header and the ConstantSize
format parameter indicates carriage of a single AU (fragment), i.e. MIME format parameter indicates carriage of a single AU (fragment),
that a single Access Unit (fragment) is transported in each RTP i.e. that a single Access Unit (fragment) is transported in each
packet for that stream. RTP packet for that stream.
3.2.3.1 Fragmentation 3.2.3.1 Fragmentation
A packet SHALL carry either one or more Access Units, or a single A packet SHALL carry either one or more Access Units, or a single
fragment of an Access Unit. Fragments of the same Access Unit have fragment of an Access Unit. Fragments of the same Access Unit have
the same time stamp but differing RTP sequence numbers. The marker the same time stamp but different RTP sequence numbers. The marker
bit in the RTP header is 1 on the last fragment of an Access Unit, bit in the RTP header is 1 on the last fragment of an Access Unit,
and 0 on all other fragments. and 0 on all other fragments.
3.2.3.2 Interleaving 3.2.3.2 Interleaving
Access Units MAY be interleaved. Senders MAY perform interleaving. Access Units MAY be interleaved. Senders MAY perform interleaving.
Receivers MUST support interleaving. Receivers MUST support interleaving. When interleaving of Access
Units is used it SHALL be implemented using the AU-Index and
When interleaving of Access Units is used it SHALL be implemented AU-Index-delta fields in the AU-header.
using the AU-Index and AU-Index-delta fields in the AU-header.
Based on the RTP sequence number, the RTP time stamp, the AU-Index and Based on the RTP sequence number, the RTP time stamp, the AU-Index
the AU-Index-delta, a receiver can unambiguously reconstruct the and the AU-Index-delta, a receiver can unambiguously reconstruct
original order even in case of out-of-order packets, packet loss or the original order even in case of out-of-order packets, packet
duplication. Note that for this purpose the AU-Index is redundant when loss or duplication. Note that for this purpose the AU-Index is
the RTP time stamp and the AU-Index-delta values are sufficient for redundant when the RTP time stamp and the AU-Index-delta values are
placing the AUs correctly in time. In such cases receivers MAY ignore sufficient for placing the AUs correctly in time. In such cases
the AU-Index value and senders MAY code the AU-Index field with the receivers MAY ignore the AU-Index value and senders MAY code the
value 0, but only if they code each AU-Index field with that value. AU-Index field with the value 0, but only if they code each AU-Index
field with that value.
When interleaving is applied, a de-interleave buffer is needed in When interleaving is applied, a de-interleave buffer is needed in
receivers to put the Access Units in their correct logical consecutive receivers to put the Access Units in their correct logical
order in time. This requires the computation of the time stamp for consecutive decoding order. This requires the computation of the
each Access Unit. In case of a fixed time duration per Access Unit, time stamp for each Access Unit. In case of a fixed time duration
the time-stamp of each access unit i in an RTP packet with RTP per Access Unit, the time stamp of the i-th access unit in an RTP
time-stamp T is calculated as follows: packet with RTP time stamp T is calculated as follows:
Timestamp[0] = T Timestamp[0] = T
Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k]
+ 1))) * access-unit-duration + 1))) * access-unit-duration
When AU-Index-delta is always 0, this reduces to T + I * (access-unit- When AU-Index-delta is always 0, this reduces to T + i * (access-
duration). This is the non-interleaved case, the frames are consecutive unit-duration). This is the non-interleaved case, where the frames
in time. Note that the AU-Index field (present for the first Access are consecutive in decoding order. Note that the AU-Index field
Unit) is not needed in this calculation. Hence in cases where the (present for the first Access Unit) is not needed in this
Access-unit-duration has a fixed and known value, the AU-Index does not calculation. Hence in cases where the Access-unit-duration has a
need to provide index information and can be coded with the value 0. fixed and known value, the AU-Index does not need to provide index
See also the semantics of the AU-Index field in 3.2.1.1. information and can be coded with the value 0. See also the
semantics of the AU-Index field in 3.2.1.1.
When an RTP packet arrives (after any re-ordering has been done), When an RTP packet arrives (after any reordering has been done),
receivers may 'flush' all Access Units from the interleave buffer receivers may 'flush' all Access Units from the interleave buffer
which have a time-stamp strictly less than the time-stamp of the which have a time stamp strictly less than the time stamp of the
arriving packet. Similarly the first Access Unit of every arriving arriving packet. Similarly the first Access Unit of every arriving
packet can always be flushed (as no following packet can provide an packet can always be flushed (as no following packet can provide
earlier Access Unit), and any Access Units which are consecutive with an earlier Access Unit), and any Access Units which are consecutive
it which have already been received. Access Units should also be with it which have already been received. Access Units should also
flushed in time to be played; this can be important if there is loss be flushed in time to be played; this can be important if there is
before end-of-stream, before a silence interval, or before a large loss before end-of-stream, before a silence interval, or before a
drop-out. large drop-out.
3.2.3.3 Constraints for interleaving 3.2.3.3 Constraints for interleaving
The size of the packets should be suitably chosen to be appropriate The size of the packets should be suitably chosen to be appropriate
to both the path MTU and the duration and capacity of the receiver's to both the path MTU and the duration and capacity of the receiver's
de-interleave buffer. The maximum packet size for a session should be de-interleave buffer. The maximum packet size for a session should
chosen not to exceed the path MTU. be chosen not to exceed the path MTU.
In order to control receiver latency and mitigate the effects of loss, In order to control receiver latency and mitigate the effects of
there are profile-based limits on the size of the packet. This is loss, there are profile-based limits on the size of the packet.
expressed as a duration: it is calculated from the duration of the This is expressed as a duration: it is calculated from the duration
Access Units contained within a packet. It is NOT the difference in of the Access Units contained within a packet. Note that this
time-stamp between the first and last Access Unit in a packet. duration is NOT the difference between the time stamps of the first
and last Access Unit in a packet.
No matter what interleaving scheme is used, the scheme must be No matter what interleaving scheme is used, the scheme must be
analyzed to calculate the minimum number of frames a receiver has to analyzed to calculate the minimum number of frames a receiver has
buffer in order to de-interleave. to buffer in order to de-interleave.
The maximum packet duration in milliseconds, and the maximum Three profiles are defined to constrain the latency when interlea-
de-interleave buffer required at the receiver, for the two profiles, ving. The applied profile is signalled by the MIME format parameter
shall not exceed: "Profile", indicating the decimal number of the profile. The maximum
de-interleave buffer required at the receiver can be determined if
the maximum packet duration is known. The maximum packet duration
in milliseconds for the three profiles, shall not exceed:
RTP transport profile 0 -- 200 milliseconds Profile 0 -- 200 milliseconds
RTP transport profile 1 -- 500 milliseconds Profile 1 -- 500 milliseconds
Profile 2 -- 1500 milliseconds
When interleaving is applied, the applied RTP transport profile MUST When interleaving is applied, the applied RTP transport profile
be signalled by the profile parameter; see section 4.1. MUST be signalled by the MIME format parameter "Profile"; see
section 4.1.
Note that for low bit-rate material, the duration limit may make Note that for low bit-rate material, this duration limit may make
packets shorter than the MTU size. packets shorter than the MTU size.
3.3 Usage of this specification 3.3 Usage of this specification
3.3.1 General 3.3.1 General
Usage of this specification requires definition of a mode. A mode Usage of this specification requires definition of a mode. A mode
defines how use this specification for transport of one or more types defines how to use this specification, as deemed appropriate.
of MPEG-4 streams. Each mode may specify constraints and payload Senders MUST signal the applied mode via the MIME format parameter
configurations as deemed appropriate. "Mode". This specification defines a generic mode that can be used
for any MPEG-4 stream, as well as specific modes for transport of
Senders MUST signal the mode that they use by the format parameter MPEG-4 CELP and MPEG-4 AAC streams.
Mode. In this document only modes are defined for transport of MPEG-4
CELP and AAC streams, but more modes are expected to be defined in
future RFCs.
3.3.2 Modes for MPEG-4 CELP and AAC streams
Four modes are defined for transport of MPEG-4 CELP and AAC streams. In any mode compliant to this specification the same requirements
In each of these modes, the same requirements apply for the rtpmap apply for the rtpmap attributes. The general form of an rtpmap
attributes. The general form of an rtpmap attribute is: attribute is:
a=rtpmap:<payload type><encoding name>/<clock rate>[/<encoding a=rtpmap:<payload type><encoding name>/<clock rate>[/<encoding
parameters>] parameters>]
For audio streams, <encoding parameters> specifies the number of For audio streams, <encoding parameters> specifies the number of
audio channels. This parameter may be omitted if the number of audio channels. This parameter may be omitted if the number of
channels is one, provided no additional parameters are needed. channels is one, provided no additional parameters are needed.
In all four modes, the following attributes are REQUIRED: In any mode, the following attributes are REQUIRED:
a) The encoding name a) The encoding name
b) The RTP clock rate MUST be expressed. It is RECOMMENDED that this b) The RTP clock rate MUST be expressed.
be the sampling rate of the audio, to give sample-accurate timing. c) The number of audio channels MUST be specified, for example as
However, other rates MAY be used (e.g. 90 kHz). 2 for stereo material (see RFC 2327) and MAY be specified as 1
c) The number of audio channels MUST be specified, for example as 2 for mono material; 1 is the default.
for stereo material (see RFC 2327) and MAY be specified as 1 for
mono material; 1 is the default.
3.3.3 Constant bit-rate CELP. 3.3.2 The generic mode
The generic mode can be used for any MPEG-4 stream. In this mode
no mode-specific constraints are applied; hence, the generic mode
exploits the full flexibility of this specification. The generic
mode is signalled by mode=generic.
An example is given below for transport of a BIFS stream. In this
example carriage of multiple BIFS Access Units is allowed in one
RTP packet. The AU-header section contains the AU-size field, the
CTS-flag and, if the CTS flag is set to 1, the CTS-delta field.
The number of bits of the AU-size and the CTS-delta fields is 15
and 16 respectively, which results in an AU-header of two or four
octets per BIFS AU. The RTP time stamp uses a 1 kHz clock.
In detail:
m=video 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/1000
a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic;
ObjectType=2; config=BIFSConfiguration(); SizeLength=15;
CTSDeltaLength=16
3.3.3 Constant bit-rate CELP
This mode is signalled by mode=CELP-cbr. In this mode one or more This mode is signalled by mode=CELP-cbr. In this mode one or more
fixed size CELP frames can be transported in one RTP packet; there is fixed size CELP frames can be transported in one RTP packet; there
no support for interleaving. The RTP payload consist of one or more is no support for interleaving. The RTP payload consist of one or
concatenated CELP frames, each of the same size. Both the AU Header more concatenated CELP frames, each of the same size. Both the AU
Section and the Auxiliary Section are empty. Header Section and the Auxiliary Section are empty.
The format parameter ConstantSize MUST be provided to specify the The MIME format parameter ConstantSize MUST be provided to specify
length of each CELP frame. the length of each CELP frame.
For an example see below. For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config= a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config=
AudioSpecificConfig(); ConstantSize=xxx; AudioSpecificConfig(); ConstantSize=xxx;
The AudioSpecificConfig() specifies that the audio stream type is CELP. The AudioSpecificConfig() specifies that the audio stream type is
CELP.
3.3.4 Variable bit-rate CELP 3.3.4 Variable bit-rate CELP
This mode is signalled by mode=CELP-vbr. With this mode in one RTP This mode is signalled by mode=CELP-vbr. With this mode one or
packet one or more variable size CELP frames can be transported with more variable size CELP frames can be transported in one RTP packet
optional interleaving. As the largest possible frame size in this mode with optional interleaving. As the largest possible frame size in
is greater than the maximum CELP frames size, there is no support for this mode is greater than the maximum CELP frame size, there is no
fragmentation on the CELP frames. support for fragmentation of CELP frames.
In this mode the RTP payload consists of the AU Header Section, In this mode the RTP payload consists of the AU Header Section,
followed by one or more concatenated CELP frames. The Auxiliary Section followed by one or more concatenated CELP frames. The Auxiliary
is empty. For each CELP frame contained in the payload there is a one Section is empty. For each CELP frame contained in the payload
octet AU-header in the AU Header Section to provide : there is a one octet AU-header in the AU Header Section to
provide:
(a) the size of each CELP frame in the payload and (a) the size of each CELP frame in the payload and
(b) index information for computing the sequence (and hence timing) of (b) index information for computing the sequence (and hence timing)
each CELP frame. of each CELP frame.
Transport of CELP frames requires that the AU-size field is coded with Transport of CELP frames requires that the AU-size field is coded
6 bits. In this mode therefore 6 bits are allocated to the AU-size with 6 bits. In this mode therefore 6 bits are allocated to the
field, and 2 bits to the AU-Index(-delta) field. Each AU-Index field AU-size field, and 2 bits to the AU-Index(-delta) field. Each
MUST be coded with the value 0. In the AU Header Section, the AU-Index field MUST be coded with the value 0. In the AU Header
concatenated AU-headers are preceded by the 16-bit AU-headers-length Section, the concatenated AU-headers are preceded by the 16-bit
field, as specified in 3.2.1. AU-headers-length field, as specified in 3.2.1.
Next to the required format parameters, the following parameters MUST In addition to the required MIME format parameters, the following
be present: parameters MUST be present: SizeLength, IndexLength, and
SizeLength, IndexLength, and IndexDeltaLength. IndexDeltaLength.
When interleaving is applied (AU-Index-delta coded with a value larger When interleaving is applied (AU-Index-delta coded with a value
than 0), also the parameter Profile MUST be present. larger than 0), the parameter Profile MUST also be present.
Example : For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config= a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config=
AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2; AudioSpecificConfig(); SizeLength=6; IndexLength=2;
Profile=1 IndexDeltaLength=2; Profile=1
The AudioSpecificConfig() specifies that the audio stream type is CELP. The AudioSpecificConfig() specifies that the audio stream type is
CELP.
3.3.5 Low bit-rate AAC 3.3.5 Low bit-rate AAC
This mode is signalled by AAC-lbr. This mode supports transport of one This mode is signalled by mode=AAC-lbr. This mode supports transport
or more variable size AAC frames with optional support for interleaving of one or more variable size AAC frames with optional support for
and fragmenting. The maximum size of an AAC frame (fragment) in this interleaving and fragmenting. The maximum size of an AAC frame
mode is 63 octets. (fragment) in this mode is 63 octets.
The payload configuration in this mode is the same as in the variable The payload configuration in this mode is the same as in the
bit-rate CELP mode as defined in 3.3.4. The RTP payload consists of the variable bit-rate CELP mode as defined in 3.3.4. The RTP payload
AU Header Section, followed by concatenated AAC frames. The Auxiliary consists of the AU Header Section, followed by concatenated AAC
Section is empty. For each AAC frame contained in the payload the one frames. The Auxiliary Section is empty. For each AAC frame contained
octet AU-header provides : in the payload the one octet AU-header provides:
(a) the size of each AAC frame in the payload and (a) the size of each AAC frame in the payload and
(b) index information for computing the sequence (and hence timing) of (b) index information for computing the sequence (and hence timing)
each AAC frame. of each AAC frame.
In the AU-header, the AU-size is coded with 6 and the AU-Index(-delta) In the AU-header, the AU-size is coded with 6 bits and the
with 2 bits; the AU-Index field MUST have the value 0 in each AU-header. AU-Index(-delta) with 2 bits; the AU-Index field MUST have the
In the AU-header Section, the concatenated AU-headers are preceded by value 0 in each AU-header.
the 16-bit AU-headers-length field, as specified in 3.2.1. In the AU-header Section, the concatenated AU-headers are preceded
by the 16-bit AU-headers-length field, as specified in 3.2.1.
Next to the required format parameters, the following parameters MUST In addition to the required MIME format parameters, the following
be present: parameters MUST be present: SizeLength, IndexLength, and
SizeLength, IndexLength, and IndexDeltaLength. IndexDeltaLength.
When interleaving is applied (AU-Index-delta coded with a value larger When interleaving is applied (AU-Index-delta coded with a value
than 0), also the parameter Profile MUST be present. larger than 0), also the parameter Profile MUST be present.
Example : For example:
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config= a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config=
AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2; AudioSpecificConfig(); SizeLength=6; IndexLength=2;
Profile=1 IndexDeltaLength=2; Profile=1
The AudioSpecificConfig() specifies that the audio stream type is AAC. The AudioSpecificConfig() specifies that the audio stream type is
AAC.
3.3.6 High bit-rate AAC 3.3.6 High bit-rate AAC
This mode is signalled by mode=AAC-hbr. This mode supports transport This mode is signalled by mode=AAC-hbr. This mode supports transport
of one or more large variable size AAC frames in one RTP packet with of one or more large variable size AAC frames in one RTP packet with
optional support for interleaving and fragmenting. The maximum size of optional support for interleaving and fragmenting. The maximum size
an AAC frame (fragment) in this mode is 8191 bytes. of an AAC frame (fragment) in this mode is 8191 octets.
In this mode the RTP payload consists of the AU Header Section, In this mode the RTP payload consists of the AU Header Section,
followed by one or more concatenated AAC frames. The Auxiliary Section followed by one or more concatenated AAC frames. The Auxiliary
is empty. For each AAC frame contained in the payload there is an Section is empty. For each AAC frame contained in the payload there
AU-header in the AU Header Section to provide : is an AU-header in the AU Header Section to provide:
(a) the size of each AAC frame in the payload and (a) the size of each AAC frame in the payload and
(b) index information for computing the sequence (and hence timing) of (b) index information for computing the sequence (and hence timing)
each AAC frame. of each AAC frame.
To code the maximum size of an AAC frame requires 13 bits. Therefore in
this configuration 13 bits are allocated to the AU-size, and 3 bits
to the AU-Index(-delta) field. Thus each AU-header has a size of 2
octets. Each AU-Index field MUST be coded with the value 0. In the
AU Header Section, the concatenated AU-headers are preceded by the
16-bit AU-headers-length field, as specified in 3.2.1.
Next to the required format parameters, the following parameters MUST To code the maximum size of an AAC frame requires 13 bits. Therefore
be present: in this configuration 13 bits are allocated to the AU-size, and
SizeLength, IndexLength, and IndexDeltaLength. 3 bits to the AU-Index(-delta) field. Thus each AU-header has a size
When interleaving is applied (AU-Index-delta coded with a value larger of 2 octets. Each AU-Index field MUST be coded with the value 0. In
than 0), also the parameter Profile MUST be present. the AU Header Section, the concatenated AU-headers are preceded by
the 16-bit AU-headers-length field, as specified in 3.2.1.
In addition to the required MIME format parameters, the following
parameters MUST be present: SizeLength, IndexLength, and
IndexDeltaLength.
When interleaving is applied (AU-Index-delta coded with a value
larger than 0), also the parameter Profile MUST be present.
For example:
Example :
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode= AAC-hbr; config= a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr;
AudioSpecificConfig(); SizeLength=13; IndexLength=3; IndexDeltaLength=3; config=AudioSpecificConfig(); SizeLength=13; IndexLength=3;
Profile=1 IndexDeltaLength=3; Profile=1
The AudioSpecificConfig() specifies that the audio stream type is AAC. The AudioSpecificConfig() specifies that the audio stream type is
AAC.
3.3.7 Additional modes
This specification only defines the modes specified in sections
3.3.2 up to 3.3.6. Additional modes are expected to be defined in
future RFCs. Each additional mode MUST be in full compliance with
this specification.
When defining a new mode care MUST be taken that an implementation
of all features of this specification can decode the payload format
corresponding to this new mode. For this reason a mode MUST NOT
specify new default values for MIME parameters. In particular, MIME
parameters that configure the RTP payload MUST be present (unless
they have the default value), even if its presence is redundant in
case the mode assigns a fixed value to a parameter. A mode may
define additionally that some MIME parameters are required instead
of optional, that some MIME parameters have fixed values (or
ranges), and that there are rules restricting the usage.
4. IANA considerations 4. IANA considerations
This payload format uses the same the MIME types and names as defined This section describes the MIME types and names associated with
in RFC XXXX. However, some additional format parameters are defined. this payload format. Section 4.1 registers the MIME types, as per
RFC 2048.
Depending on the required payload configuration, format parameters may This format may require additional information about the mapping to
need to be available to the receiver. This is done using the parameters be made available to the receiver. This is done using parameters
described in the next section. The absence of any of these parameters also described in the next section.
is equivalent to the associated field set to its default value, which
is always zero. The absence of any such parameters resolves into a 4.1 MIME type registration
default "basic" configuration.
MIME media type name: "video" or "audio" or "application"
"video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2)
or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information
needed for an audio/visual presentation.
"audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3)
or MPEG-4 Systems streams that convey information needed for an
audio only presentation.
"application" MUST be used for MPEG-4 Systems streams (ISO/IEC
14496-1) that serve purposes other than audio/visual presentation,
e.g. in some cases when MPEG-J streams are transmitted.
Depending on the required payload configuration, MIME format
parameters need to be available to the receiver. This is done using
the parameters described in the next section. There are required
and optional parameters.
Optional parameters are of two types: general parameters and
configuration parameters. The configuration parameters are used to
configure the fields in the AU Header section and in the auxiliary
section. The absence of any configuration parameter is equivalent to
the associated field set to its default value, which is always zero.
The absence of all configuration parameters resolves into a default
"basic" configuration with an empty AU-header section and an empty
auxiliary section in each RTP packet.
MIME subtype name: mpeg4-generic MIME subtype name: mpeg4-generic
Required parameters: Required parameters:
StreamType: MIME format parameters are not case dependent; however for clarity
both upper and lower case are used in the names of the parameters
described in this specification.
The integer value that indicates the type of MPEG-4 stream that is StreamType:
carried; its coding corresponds to the values of the streamType as The integer value that indicates the type of MPEG-4 stream that
defined for the DecoderConfigDescriptor in ISO/IEC 14496-1. is carried; its coding corresponds to the values of the
streamType as defined for the DecoderConfigDescriptor in
ISO/IEC 14496-1. The value 6, indicating an MPEG-7 stream, MUST
NOT be used, as this payload format is not intended for transport
of MPEG-7 streams.
Profile-level-id: Profile-level-id:
A decimal representation of the MPEG-4 Profile Level indication. A decimal representation of the MPEG-4 Profile Level indication.
This parameter MUST be used in the capability exchange or session This parameter MUST be used in the capability exchange or
set-up procedure to indicate the MPEG-4 Profile and Level session set-up procedure to indicate the MPEG-4 Profile and Level
combination of which the relevant MPEG-4 media codec is capable combination of which the relevant MPEG-4 media codec is capable
of. of.
For audio streams, this parameter is the decimal value from Table 5
(audioProfileLevelIndicationValues) in ISO/IEC 14496-1, indicating For MPEG-4 Audio streams, this parameter is the decimal value
which MPEG-4 Audio tool subsets are applied to encode the audio from Table 5 (audioProfileLevelIndication Values) in ISO/IEC
stream. 14496-1, indicating which MPEG-4 Audio tool subsets are
For visual streams, this parameter is the decimal value from Table required to decode the audio stream.
G-1 (FLC table for profile and level indication of ISO/IEC 14496-2, For MPEG-4 Visual streams, this parameter is the decimal value
indicating which MPEG-4 Visual tool subsets are applied to encode from Table G-1 (FLC table for profile and level indication of
the visual stream. ISO/IEC 14496-2), indicating which MPEG-4 Visual tool subsets
are required to decode the visual stream.
For BIFS streams, this parameter is the decimal value that is
obtained from (SPLI + 256*GPLI), where:
SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with
the applied sceneProfileLevelIndication;
GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with
the applied graphicsProfileLevelIndication.
For MPEG-J streams, this parameter is the decimal value from
table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1,
indicating the profile and level of the MPERG-J stream.
For OD streams, this parameter is the decimal value from table 3
(ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the
profile and level of the OD stream.
For IPMP streams, this parameter has either the decimal value 0,
indicating an unspecified profile and level, or a value larger
than zero, indicating an MPEG-4 IPMP profile and level as
defined in a future MPEG-4 specification.
For Clock Reference streams and Object Content Info streams, this
parameter has the decimal value zero, indicating that profile
and level information is conveyed through the OD framework.
Config: Config:
A hexadecimal representation of an octet string that expresses the A hexadecimal representation of an octet string that expresses
media payload configuration. Configuration data is mapped onto the the media payload configuration. Configuration data is mapped
octet string in an MSB-first basis. The first bit of the onto the octet string in an MSB-first basis. The first bit of
configuration data SHALL be located at the MSB of the first octet. the configuration data SHALL be located at the MSB of the first
In the last octet, if necessary to achieve byte alignment, up to octet. In the last octet, if necessary to achieve byte alignment,
7 zero-valued padding bits shall follow the configuration data. up to 7 zero-valued padding bits shall follow the configuration
For audio streams, config is the audio object type specific decoder data.
configuration data AudioSpecificConfig() as defined in ISO/IEC For MPEG-4 Audio streams, config is the audio object type
14496-3. specific decoder configuration data AudioSpecificConfig() as
For visual streams, config is the MPEG-4 Visual configuration defined in ISO/IEC 14496-3.
information, as defined in subclause 6.2.1 Start codes of For MPEG-4 Visual streams, config is the MPEG-4 Visual
ISO/IEC14496-2. The configuration information indicated by this configuration information as defined in subclause 6.2.1 Start
parameter SHALL be the same as the configuration information in the codes of ISO/IEC 14496-2. The configuration information
corresponding MPEG-4 Visual stream, except for first-half-vbv- indicated by this parameter SHALL be the same as the
occupancy and latter-half-vbv-occupancy, if it exists, which may configuration information in the corresponding MPEG-4 Visual
vary in the repeated configuration information inside an MPEG-4 stream, except for first-half-vbv-occupancy and
latter-half-vbv-occupancy, if it exists, which may vary in
the repeated configuration information inside an MPEG-4
Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2). Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2).
For BIFS streams, this the BIFSConfig() information as defined
in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in
section 9.3.2.4, and for version 2 in section 9.3.5. The MIME
format parameter ObjectType signals the version of BIFSConfig.
Optional parameters: For IPMP streams, this is either the decimal value 0, indicating
the absence of any decoder configuration information, or the
decimal value 1, followed by IPMPConfiguration() as defined
in a future MPEG-4 IPMP specification.
For Object Content Info (OCI) streams, this is the
OCIDecoderConfiguration() information of the OCI stream, as
defined in section 8.4.2.4 in ISO/IEC 14496-1.
For OD streams, Clock Reference streams and MPEG-J streams, this
is the decimal value 0, indicating that no information on the
decoder configuration is required.
Mode: Mode:
The mode in which this specification is used. The following modes The mode in which this specification is used. The following modes
can be signalled : can be signalled :
mode=generic,
mode=CELP-cbr, mode=CELP-cbr,
mode=CELP-vbr, mode=CELP-vbr,
mode=AAC-lbr and mode=AAC-lbr and
mode=AAC-hbr. mode=AAC-hbr.
Other modes are expected to be defined in future RFCs. When defining Other modes are expected to be defined in future RFCs. See also
a new mode care MUST be taken that an implementation of all features section 3.3.7.
of this specification can decode the payload format corresponding to
this new mode. For this reason a mode MUST NOT specify new default Optional general parameters:
values for MIME parameters; in particular, MIME parameters MUST be
present (unless they have the default value), even if it is redundant ObjectType:
in case the mode assigns fixed values. A mode may define additionally The decimal value from Table 8 in ISO/IEC 14496-1, indicating
that some MIME parameters are required instead of optional, that some the value of the objectTypeIndication of the transported stream.
MIME parameters have fixed values (or ranges), and that there are For BIFS streams this parameter MUST be present to signal the
rules restricting the usage. type of BIFSConfiguration(). The ObjectType SHALL not signal a
non-MPEG-4 stream.
ConstantSize: ConstantSize:
The constant size in octets of each Access Unit for this stream. The constant size in octets of each Access Unit for this stream.
Simultaneous presence of ConstantSize and the SizeLength Simultaneous presence of ConstantSize and the SizeLength
parameters is not permitted. parameters is not permitted.
Profile:
The decimal representation of the applied profile to constrain
the latency when interleaving; see section 3.2.3.3. Absence of
this parameter signals that the profile is not specified.
Optional configuration parameters:
SizeLength: SizeLength:
The number of bits on which the AU-size field is encoded in the The number of bits on which the AU-size field is encoded in the
AU-header. Simultaneous presence of SizeLength and the ConstantSize AU-header. Simultaneous presence of SizeLength and the
parameter is not permitted. ConstantSize parameter is not permitted.
IndexLength: IndexLength:
The number of bits on which the AU-Index is encoded in the first The number of bits on which the AU-Index is encoded in the first
AU-header. The default value of zero indicates the absence of the AU-header. The default value of zero indicates the absence of
AU-Index and AU-Index-delta fields in each AU-header. the AU-Index and AU-Index-delta fields in each AU-header.
IndexDeltaLength: IndexDeltaLength:
The number of bits on which the AU-Index-delta field is encoded in The number of bits on which the AU-Index-delta field is encoded
any non-first AU-header. in any non-first AU-header.
CTSDeltaLength: CTSDeltaLength:
The number of bits on which the CTS-delta field is encoded in the The number of bits on which the CTS-delta field is encoded in
AU-header. the AU-header.
DTSDeltaLength: DTSDeltaLength:
The number of bits on which the DTS-delta field is encoded in the The number of bits on which the DTS-delta field is encoded in
AU-header. the AU-header.
AuxiliaryDataSizeLength: AuxiliaryDataSizeLength:
The number of bits that is used to encode the auxiliary-data-size The number of bits that is used to encode the auxiliary-data-size
field. field.
Profile:
The decimal representation of the RTP transport profile.
Applications MAY use more parameters, in addition to those defined Applications MAY use more parameters, in addition to those defined
above. Receivers MUST tolerate the presence of such additional above. Receivers MUST tolerate the presence of such additional
parameters, but these parameters SHALL not impact the decoding of parameters, but these parameters SHALL not impact the decoding of
receivers that comply to this specification. receivers that comply to this specification.
Encoding considerations: Encoding considerations:
System bitstreams MUST be generated according to MPEG-4 System System bitstreams MUST be generated according to MPEG-4 Systems
specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated
according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio
bitstreams MUST be generated according to MPEG-4 Visual bitstreams MUST be generated according to MPEG-4 Visual
specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized
according to the RTP payload format defined in RFC <self-reference-to- according to the RTP payload format defined in RFC xxxx.
this>.
Security considerations: Security considerations:
As in RFC <self-reference-to-this>. As defined in section 5 of RFC xxxx.
Interoperability considerations: Interoperability considerations:
MPEG-4 provides a large and rich set of tools for the coding of MPEG-4 provides a large and rich set of tools for the coding of
visual objects. For effective implementation of the standard, visual objects. For effective implementation of the standard,
subsets of the MPEG-4 tool sets have been provided for use in subsets of the MPEG-4 tool sets have been provided for use in
specific applications. These subsets, called 'Profiles', limit the specific applications. These subsets, called 'Profiles', limit the
size of the tool set a decoder is required to implement. In order to size of the tool set a decoder is required to implement. In order to
restrict computational complexity, one or more 'Levels' are set for restrict computational complexity, one or more 'Levels' are set for
each Profile. A Profile@Level combination allows: each Profile. A Profile@Level combination allows:
. a codec builder to implement only the subset of the standard he . a codec builder to implement only the subset of the standard he
needs, while maintaining interworking with other MPEG-4 devices needs, while maintaining interworking with other MPEG-4 devices
included in the same combination, and that implement the same combination, and
. checking whether MPEG-4 devices comply with the standard . checking whether MPEG-4 devices comply with the standard
('conformance testing'). ('conformance testing').
A stream SHALL be compliant with the MPEG-4 Profile@Level specified A stream SHALL be compliant with the MPEG-4 Profile@Level specified
by the parameter "profile-level-id". Interoperability between a by the parameter "profile-level-id". Interoperability between a
sender and a receiver may be achieved by specifying the parameter sender and a receiver is achieved by specifying the parameter
"profile-level-id" in MIME content, or by arranging in the "profile-level-id" in MIME content. In the capability exchange /
capability exchange/announcement procedure to set this parameter announcement procedure this parameter may mutually be set to the
mutually to the same value. same value.
Published specification: Published specification:
The specifications for MPEG-4 streams are presented in ISO/IEC The specifications for MPEG-4 streams are presented in ISO/IEC
14469-1, 14469-2, and 14469-3. The RTP payload format is described 14469-1, 14469-2, and 14469-3. The RTP payload format is described
in RFC <self-reference-to-this>. in RFC xxxx.
Applications which use this media type: Applications which use this media type:
Multimedia streaming and conferencing tools, Internet messaging and Multimedia streaming and conferencing tools, Internet messaging and
Email applications. Email applications.
Additional information: none Additional information: none
Magic number(s): none Magic number(s): none
File extension(s): File extension(s):
None. A file format with the extension .mp4 has been defined for None. A file format with the extension .mp4 has been defined for
MPEG-4 content but is not directly correlated with this MIME type MPEG-4 content but is not directly correlated with this MIME type
which sole purpose is RTP transport. for which the sole purpose is RTP transport.
Macintosh File Type Code(s): none Macintosh File Type Code(s): none
Person & email address to contact for further information: Person & email address to contact for further information:
Authors of RFC <self-reference-to-this>. Authors of RFC xxxx, IETF Audio/Video Transport working group.
Intended usage: COMMON Intended usage: COMMON
Author/Change controller: Author/Change controller:
Authors of RFC <self-reference-to-this>. Authors of RFC xxxx, IETF Audio/Video Transport working group.
4.2 Concatenation of parameters 4.2 Concatenation of parameters
Multiple parameters SHOULD be expressed as a MIME media type string, Multiple parameters SHOULD be expressed as a MIME media type string,
in the form of a semicolon-separated list of parameter=value pairs in the form of a semicolon-separated list of parameter=value pairs
(for parameter usage examples see Appendix A). (for parameter usage examples see sections 3.3.2 up to 3.3.6).
4.3 Usage of SDP 4.3 Usage of SDP
4.3.1 The a=fmtp keyword 4.3.1 The a=fmtp keyword
It is assumed that one typical way to transport the above-described It is assumed that one typical way to transport the above-described
parameters associated with this payload format is via a SDP message parameters associated with this payload format is via a SDP message
[7] for example transported to the client in reply to a RTSP DESCRIBE [6] for example transported to the client in reply to a RTSP
of via SAP. In that case the (a=fmtp) keyword MUST be used as DESCRIBE or via SAP. In that case the (a=fmtp) keyword MUST be used
described in RFC 2327 [7, section 6]. The syntax being then: as described in RFC 2327 [7], section 6, the syntax being then:
a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>] a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>]
5. Security Considerations 5. Security Considerations
No additional security considerations apply beyond those discussed in RTP packets using the payload format defined in this specification
RFC 1889 and RFC XXXX. are subject to the security considerations discussed in the RTP
specification [5]. This implies that confidentiality of the media
streams is achieved by encryption. Because the data compression used
with this payload format is applied end-to-end, encryption may be
performed on the compressed data so there is no conflict between the
two operations. The packet processing complexity of this payload
type (i.e. excluding media data processing) does not exhibit any
significant non-uniformity in the receiver side to cause a denial-
of-service threat.
However, it is possible to inject non-compliant MPEG streams (Audio,
Video, and Systems) to overload the receiver/decoder's buffers,
which might compromise the functionality of the receiver or even
crash it. This is especially true for end-to-end systems like MPEG
where the buffer models are precisely defined.
MPEG-4 Systems supports stream types including commands that are
executed on the terminal like OD commands, BIFS commands, etc. and
programmatic content like MPEG-J (Java(TM) Byte Code) and
ECMAScript. It is possible to use one or more of the above in a
manner non-compliant to MPEG to crash or temporarily make the
receiver unavailable.
Senders SHOULD ensure that packet loss does not cause severe
problems in application execution when the packet carries OD
commands, BIFS commands, or programmatic content such as MPEG-J and
ECMAScript. When such measures cannot be taken, instead of this
payload format applications SHOULD use more reliable means to
transport the information.
Authentication mechanisms can be used to validate the sender and
the data to prevent security problems due to non-compliant malignant
MPEG-4 streams.
In ISO/IEC 14469-1 a security model is defined for MPEG-4 Systems
streams carrying MPEG-J access units which comprise Java(TM) classes
and objects. MPEG-J defines a set of Java APIs and a secure
execution model. MPEG-J content can call this set of APIs and
Java(TM) methods from a set of Java packages supported in the
receiver within the defined security model. According to this
security model, downloaded byte code is forbidden to load libraries,
define native methods, start programs, read or write files, or read
system properties.
Receivers can implement intelligent filters to validate the buffer
requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J,
ECMAScript) commands in the streams. However, this can increase the
complexity significantly.
6. Acknowledgements 6. Acknowledgements
This document evolved through several revisions thanks to contributions This document evolved through several revisions thanks to
from a people from the ISMA forum, from the IETF AVT working group and contributions by people from the ISMA forum, from the IETF AVT
the 4-on-IP ad-hoc group within MPEG. The authors wish to thank all Working Group and from the 4-on-IP ad-hoc group within MPEG. The
involved people, and in particular Colin Perkins, Stephan Wenger and authors wish to thank all involved people, and in particular Colin
Dorairaj V for their valuable comments and support. Perkins, Stephan Wenger and Dorairaj V for their valuable comments
and support.
7. References 7. References
[1] ISO/IEC International Standard 14496 (MPEG-4); "Information [1] ISO/IEC International Standard 14496 (MPEG-4); "Information
technology - Coding of audio-visual objects", January 2000 technology - Coding of audio-visual objects", January 2000
[2] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport [2] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport
Protocol for Real Time Applications RFC 1889, Internet Engineering Protocol for Real Time Applications RFC 1889, Internet Engineering
Task Force, January 1996. Task Force, January 1996.
[3] S. Bradner, Key words for use in RFCs to Indicate Requirement [3] S. Bradner, Key words for use in RFCs to Indicate Requirement
Levels, RFC 2119, March 1997. Levels, RFC 2119, March 1997.
[4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, RTP payload [4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, RTP payload
format for MPEG1/MPEG2 Video, RFC 2250, January 1998. format for MPEG1/MPEG2 Video, RFC 2250, January 1998.
[5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP [5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP
payload format for MPEG-4 Audio/Visual streams, RFC 3016. payload format for MPEG-4 Audio/Visual streams, RFC 3016.
[6] Avaro, Basso, Casner, Civanlar, Gentric, Herpel, Lim, Perkins, [6] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327,
van der Meer, RTP payload format for MPEG-4 streams, work in progress,
draft-gentric-avt-mpeg4-multiSL-01.txt, January 2001.
[7] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327,
Internet Engineering Task Force, April 1998. Internet Engineering Task Force, April 1998.
7. Author Adresses 7. Author Adresses
Jan van der Meer Jan van der Meer
Philips Digital Networks Philips Digital Networks
Cederlaan 4 Cederlaan 4
5600 JB Eindhoven 5600 JB Eindhoven
Netherlands Netherlands
Email : jan.vandermeer@philips.com Email : jan.vandermeer@philips.com
skipping to change at line 1125 skipping to change at page 27, line 4
Cisco Systems Inc. Cisco Systems Inc.
170 West Tasman Dr. 170 West Tasman Dr.
San Jose, CA 95034 San Jose, CA 95034
Email: dmackie@cisco.com Email: dmackie@cisco.com
Viswanathan Swaminathan Viswanathan Swaminathan
Sun Microsystems Inc. Sun Microsystems Inc.
901 San Antonio Road, M/S UMPK15-214 901 San Antonio Road, M/S UMPK15-214
Palo Alto, CA 94303 Palo Alto, CA 94303
Email: viswanathan.swaminathan@sun.com Email: viswanathan.swaminathan@sun.com
David Singer David Singer
Apple Computer, Inc. Apple Computer, Inc.
One Infinite Loop, MS:302-3MT One Infinite Loop, MS:302-3MT
Cupertino CA 95014 Cupertino CA 95014
Email: singer@apple.com Email: singer@apple.com
Philippe Gentric
Philips Digital Networks, MP4Net
51 rue Carnot
92156 Suresnes
France
e-mail: philippe.gentric@philips.com
Full Copyright Statement Full Copyright Statement
"Copyright (C) The Internet Society (date). All Rights Reserved. This "Copyright (C) The Internet Society (date). All Rights Reserved.
document and translations of it may be copied and furnished to others, This document and translations of it may be copied and furnished to
and derivative works that comment on or otherwise explain it or assist others, and derivative works that comment on or otherwise explain
in its implementation may be prepared, copied, published and it or assist in its implementation may be prepared, copied,
distributed, in whole or in part, without restriction of any kind, published and distributed, in whole or in part, without restriction
provided that the above copyright notice and this paragraph are of any kind, provided that the above copyright notice and this
included on all such copies and derivative works. However, this paragraph are included on all such copies and derivative works.
document itself may not be modified in any way, such as by removing However, this document itself may not be modified in any way, such
the copyright notice or references to the Internet Society or other as by removing the copyright notice or references to the Internet
Internet organizations, except as needed for the purpose of developing Society or other Internet organizations, except as needed for the
Internet standards in which case the procedures for copyrights defined purpose of developing Internet standards in which case the
in the Internet Standards process MUST be followed, or as required to procedures for copyrights defined in the Internet Standards process
translate it into. MUST be followed, or as required to translate it into.
APPENDIX: Usage of this payload format APPENDIX: Usage of this payload format
Appendix A. Examples Appendix A. Examples
A.1 Examples of delay analysis with interleave A.1 Examples of delay analysis with interleave
A.1.1 Group interleave A.1.1 Group interleave
An example of regular interleave is when packets are formed into An example of regular interleave is when packets are formed into
groups. If the number of packets in a group is N, packet 0 contains groups. If the number of packets in a group is N, packet 0 contains
frame 0, frame N, frame 2N, and so on; packet 1 contains frame 1, frame 0, frame N, frame 2N, and so on; packet 1 contains frame 1,
frame 1+N, 1+2N, and so on. The AU-Index field is used to document frame 1+N, 1+2N, and so on. The AU-Index field is used to document
the sequence of the packet within the group (or the first frame in the the sequence of the packet within the group (or the first frame in
packet, which is the same thing in this scheme), and all the the packet, which is the same thing in this scheme), and all the
AU-Index-delta fields contain N-1. AU-Index-delta fields contain N-1.
Receivers can tell when a new interleave group is starting, by noting Receivers can tell when a new interleave group is starting, by
that the computed time-stamp of the first frame in a packet is later noting that the computed time stamp of the first frame in a packet
than any previously computed time-stamp. This is because no is later than any previously computed time stamp. This is because no
following packet can contain an earlier RTP timestamp (RTP rules), following packet can contain an earlier RTP timestamp (RTP rules),
and the second and subsequent frames in a packet have larger and the second and subsequent frames in a packet have larger
time-stamps (the frames in a packet are also in time-order). time stamps (the frames in a packet are also in time-order).
If the group size is 3, then packets are formed as follows: If the group size is 3, then packets are formed as follows:
Packet Time-stamp Frame Numbers AU-Index, AU-Index-delta Packet Time stamp Frame Numbers AU-Index, AU-Index-delta
0 T[0] 0, 3, 6 0, 2, 2 0 T[0] 0, 3, 6 0, 2, 2
1 T[1] 1, 4, 7 0, 2, 2 1 T[1] 1, 4, 7 0, 2, 2
2 T[2] 2, 5, 8 0, 2, 2 2 T[2] 2, 5, 8 0, 2, 2
3 T[9] 9,12,15 0, 2, 2 3 T[9] 9,12,15 0, 2, 2
In this case, the receiver would have to buffer 4 frames at least In this case, the receiver would have to buffer 4 frames at least
from packets 0 and 1, and can flush all frames when packet 2 arrives. from packets 0 and 1, and can flush all frames when packet 2
(Frame 0 can be flushed as packet 0 arrives, since it is the earliest arrives. (Frame 0 can be flushed as packet 0 arrives, since it is
frame we hold, and likewise frame 1 from packet 1; we are therefore the earliest frame we hold, and likewise frame 1 from packet 1; we
holding 3,4,6,7 until packet 2 arrives). are therefore holding 3,4,6,7 until packet 2 arrives).
If there is loss, then the receiver may wait longer than is strictly If there is loss, then the receiver may wait longer than is strictly
necessary before it emits frames. For example, say packet 1 is lost necessary before it emits frames. For example, say packet 1 is lost
from the above example. Packet 0 allows frame 0 to be emitted, and from the above example. Packet 0 allows frame 0 to be emitted, and
then packet 2 arrives, allowing us to notice the loss of frame 1, and then packet 2 arrives, allowing us to notice the loss of frame 1,
emit frame 2 and 3. Then it is not until the arrival of packet 3 and emit frame 2 and 3. Then it is not until the arrival of packet 3
(which has a time-stamp beyond the times of all the frames seen so (which has a time-stamp beyond the times of all the frames seen so
far), that we can finish dealing with the loss, even though the first far), that we can finish dealing with the loss, even though the
group has, in fact, ended. (This is in contrast to schemes which first group has, in fact, ended. (This is in contrast to schemes
signal the group size explicitly; if the receiver knows that this is which signal the group size explicitly; if the receiver knows that
packet 3 of 3, then even if 2 of 3 is missing, it can de-interleave this is packet 3 of 3, then even if 2 of 3 is missing, it can
this group without waiting for the next one to start). de-interleave this group without waiting for the next one to start).
In the above example the AU-Index is coded with the value 0, as In the above example the AU-Index is coded with the value 0, as
required for the modes defined in this document. To reconstruct the required for the modes defined in this document. To reconstruct the
original order, the RTP time stamp and the AU-Index-delta are used. original order, the RTP time stamp and the AU-Index-delta are used.
See also 3.2.3.2. See also 3.2.3.2.
A.1.2 Continuous interleave A.1.2 Continuous interleave
In continuous interleave, once the scheme is 'primed', the number of In continuous interleave, once the scheme is 'primed', the number of
frames in a packet exceeds the 'stride' (the distance between them). frames in a packet exceeds the 'stride' (the distance between them).
This shortens the buffering needed, smooths the data-flow, and gives This shortens the buffering needed, smooths the data-flow, and gives
slightly larger packets -- and thus lower overhead -- for the same slightly larger packets -- and thus lower overhead -- for the same
interleave. For example, here is a continuous interleave also over a interleave. For example, here is a continuous interleave also over
stride of 3 frames, but with 4 frames per packet, for a run of 20 a stride of 3 frames, but with 4 frames per packet, for a run of 20
frames. This shows both how the scheme 'starts up' and how it frames. This shows both how the scheme 'starts up' and how it
finishes. finishes.
Packet Time-stamp Frame Numbers AU-Index, AU-Index-delta Packet Time-stamp Frame Numbers AU-Index, AU-Index-delta
0 T[0] 0 0 0 T[0] 0 0
1 T[1] 1 4 0 2 1 T[1] 1 4 0 2
2 T[2] 2 5 8 0 2 2 2 T[2] 2 5 8 0 2 2
3 T[3] 3 6 9 12 0 2 2 2 3 T[3] 3 6 9 12 0 2 2 2
4 T[7] 7 10 13 16 0 2 2 2 4 T[7] 7 10 13 16 0 2 2 2
5 T[11] 11 14 17 20 0 2 2 2 5 T[11] 11 14 17 20 0 2 2 2
skipping to change at line 1236 skipping to change at page 29, line 28
us to emit 7,8,9,10, and we are holding 12,13,16. Each arriving us to emit 7,8,9,10, and we are holding 12,13,16. Each arriving
packet contains 4 frames, and allows 4 frames to be flushed. packet contains 4 frames, and allows 4 frames to be flushed.
In the above example the AU-Index is coded with the value 0, as In the above example the AU-Index is coded with the value 0, as
required for the modes defined in this document. To reconstruct the required for the modes defined in this document. To reconstruct the
original order, the RTP time stamp and the AU-Index-delta are used. original order, the RTP time stamp and the AU-Index-delta are used.
See also 3.2.3.2. See also 3.2.3.2.
If there is loss, again the receiver has to wait to emit the erasure If there is loss, again the receiver has to wait to emit the erasure
frames. In this case, say packet 3 is lost. We were holding frames frames. In this case, say packet 3 is lost. We were holding frames
4, 5, and 8. On the arrival of packet 4, (time-stamp of frame 7), we 4, 5, and 8. On the arrival of packet 4, (time-stamp of frame 7),
now know frame 3 was lost, we can emit frames 4,5, and we know 6 must we now know frame 3 was lost, we can emit frames 4,5, and we know 6
be lost, and emit 7, which is in the packet that arrived. Then on must be lost, and emit 7, which is in the packet that arrived. Then
the arrival of packet 5 (time-stamp 11) we can emit 8, indicate loss on the arrival of packet 5 (time-stamp 11) we can emit 8, indicate
of 9, and emit 10 and 11. Finally, the arrival of packet 6 loss of 9, and emit 10 and 11. Finally, the arrival of packet 6
(time-stamp 15) indicates that 12 must be lost; we have now detected (time-stamp 15) indicates that 12 must be lost; we have now
all the lost frames. detected all the lost frames.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/