draft-ietf-avt-mpeg4-simple-00.txt   draft-ietf-avt-mpeg4-simple-01.txt 
Internet Engineering Task Force J. van der Meer Internet Engineering Task Force J. van der Meer
Internet Draft Philips Electronics Internet Draft Philips Electronics
D. Mackie D. Mackie
Cisco Systems Inc. Cisco Systems Inc.
V. Swaminathan V. Swaminathan
Sun Microsystems Inc. Sun Microsystems Inc.
D. Singer D. Singer
Apple Computer Apple Computer
September 2001 March 2002
Expires March 2002 Expires September 2002
Document: draft-ietf-avt-mpeg4-simple-00.txt Document: draft-ietf-avt-mpeg4-simple-01.txt
Use of "RFC-generic" for MPEG-4 Elementary Streams with no SL layer Use of "RFC XXXX" for MPEG-4 Elementary Streams with no SL layer
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts documents at any time. It is inappropriate to use Internet- Drafts
as reference material or to cite them other than as "work in as reference material or to cite them other than as "work in
progress." progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
0 Abstract
The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in ISO
that recently produced the MPEG-4 [1] standard. MPEG defines tools to
compress content such as audio-visual information into elementary
streams. In [6] a generic RTP payload format is defined for transport
of any non-multiplexed MPEG-4 elementary stream. To achieve the generic
MPEG-4 functionality, [6] addresses detailed issues related to the
MPEG-4 SL layer. However, many initial applications will not use the SL
Layer. To facilitate usage of [6] by such applications, this document
describes how to use [6] when no SL layer is used.
This specification is a product of the Audio/Video Transport working This specification is a product of the Audio/Video Transport working
group within the Internet Engineering Task Force. Comments are group within the Internet Engineering Task Force. Comments are
solicited and should be addressed to the working group's mailing solicited and should be addressed to the working group's mailing
list at avt@ietf.org and/or the authors. list at avt@ietf.org and/or the authors.
<<
Note for the RFC editor:
XXXX should be replaced with the RFC number that will be assigned to
the companion RFC which draft is: draft-ietf-avt-mpeg4-multisl-**.txt.
>>
Abstract
The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in ISO
that recently produced the MPEG-4 standard. MPEG defines tools to
compress content such as audio-visual information into elementary
streams. In RFC XXXXX a generic RTP payload format is defined for
transport of any non-multiplexed MPEG-4 elementary stream. To achieve
the generic MPEG-4 functionality, RFC XXXXX addresses detailed issues
related to the MPEG-4 SL layer. However, many initial applications will
not use the SL Layer. To facilitate usage of RFC XXXXX by such
applications, this document describes how to use RFC XXXX when no SL
layer is used.
1. Introduction 1. Introduction
The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29
that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4
standards [1]. The MPEG-4 standard specifies compression of standards [1]. The MPEG-4 standard specifies compression of
audio-visual data into for example an audio or video elementary audio-visual data into for example an audio or video elementary
stream. In the MPEG-4 standard, these streams take the form of stream. In the MPEG-4 standard, these streams take the form of
audiovisual objects that may be arranged into an audio-visual scene audiovisual objects that may be arranged into an audio-visual scene
by means of a scene description. Each MPEG-4 elementary stream by means of a scene description. Each MPEG-4 elementary stream
consists of a sequence of Access Units; in case of audio an Access consists of a sequence of Access Units; in case of audio an Access
skipping to change at line 74 skipping to change at line 81
The MPEG-4 system specification is a rather abstract specification in The MPEG-4 system specification is a rather abstract specification in
the sense that no transport format for MPEG-4 elementary streams is the sense that no transport format for MPEG-4 elementary streams is
defined. Instead, a conceptual SL layer has been specified to store defined. Instead, a conceptual SL layer has been specified to store
transport specific information such as time stamps and random access transport specific information such as time stamps and random access
point information. When transporting an MPEG-4 elementary stream, point information. When transporting an MPEG-4 elementary stream,
transport information from the SL layer is typically mapped to the transport information from the SL layer is typically mapped to the
actual transport layer. Note however that the SL layer is conceptual actual transport layer. Note however that the SL layer is conceptual
and may not exist in practice. and may not exist in practice.
In [6], a general payload format is defined for transport of a single In RFC XXXX, a general payload format is defined for transport of a single
MPEG-4 elementary stream over RTP. The RTP payload format specified MPEG-4 elementary stream over RTP. The RTP payload format specified
in [6] allows for carriage of any information that may be contained in in RFC XXXX allows for carriage of any information that may be contained in
the MPEG-4 SL layer, either by mapping to the RTP header fields or by the MPEG-4 SL layer, either by mapping to the RTP header fields or by
carriage in specific fields defined in the RTP payload. Consequently, carriage in specific fields defined in the RTP payload. Consequently,
the format defined in [6] is very generic and complete; for example, the format defined in RFC XXXX is very generic and complete; for example,
transcoding issues from and to the SL layer are described in detail. transcoding issues from and to the SL layer are described in detail.
However, in many initial MPEG-4 applications the SL layer does not However, in many initial MPEG-4 applications the SL layer does not
exist in practice. Such applications do not require any knowledge of exist in practice. Such applications do not require any knowledge of
the SL layer. While the use of [6] is highly desirable for all MPEG-4 the SL layer. While the use of RFC XXXX is highly desirable for all MPEG-4
applications, to understand [6] may be difficult without knowledge of applications, to understand RFC XXXX may be difficult without knowledge of
the MPEG-4 SL layer. Therefore in this document the use of [6] is the MPEG-4 SL layer. Therefore in this document the use of RFC XXXX is
described without requiring knowledge of the SL layer to understand described without requiring knowledge of the SL layer to understand
its functionality. its functionality.
Sophisticated features on interleaving of fragmented Access Units are Sophisticated features on interleaving of fragmented Access Units are
defined in [6]. Because initial applications do not require these defined in RFC XXXX. Because initial applications only need interleaving
complicated features, these features are not supported in this of complete (non-fragmented) Access Units, these more sophisticated
document. Hence, only a functional subset of [6] is supported. features are not supported in this document. Hence, only a functional
set of RFC XXXX is supported.
In [6], a general and configurable payload structure is defined for In RFC XXXX, a general and configurable payload structure is defined for
transport of MPEG-4 streams. This allows for the design of receivers transport of MPEG-4 streams. This allows for the design of receivers
that can be configured to receive any MPEG-4 stream. Configuration of that can be configured to receive any MPEG-4 stream. Configuration of
the payload is provided to accommodate transport of any MPEG-4 stream, the payload is provided to accommodate transport of any MPEG-4 stream,
but for a specific MPEG-4 elementary stream typically only very few but for a specific MPEG-4 elementary stream typically only very few
configurations are needed. So as to allow for the design of simplified, configurations are needed. So as to allow for the design of simplified,
but dedicated receivers, this specifications requires that specific but dedicated receivers, this specifications requires that specific
modes are defined for transport of MPEG-4 streams. In this document modes are defined for transport of MPEG-4 streams. In this document
only modes are defined for transport of MPEG-4 CELP and AAC streams, only modes are defined for transport of MPEG-4 CELP and AAC streams,
but in future new RFCs are expected to specify additional modes for but in future new RFCs are expected to specify additional modes for
transport of other MPEG-4 streams. transport of other MPEG-4 streams.
In summary, this document: In summary, this document:
- is intended for applications that do not apply the SL layer; - is intended for applications that do not apply the SL layer;
- describes how to use [6] without requiring knowledge of the SL layer; - describes how to use RFC XXXX without requiring knowledge of the
- defines a functional but true subset of [6]; SL layer;
- defines a functional but true subset of RFC XXXX;
- defines modes how to use this specification for transport of MPEG-4 - defines modes how to use this specification for transport of MPEG-4
CELP and AAC streams. CELP and AAC streams.
The use of [6] defined in this document is simple to implement and The use of RFC XXXX defined in this document is simple to implement
reasonably efficient. It allows for optional interleaving of Access and reasonably efficient. It allows for optional interleaving of
Units (such as audio frames) to increase error resiliency in packet Access Units (such as audio frames) to increase error resiliency in
loss. packet loss.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119 [3]. this document are to be interpreted as described in RFC 2119 [3].
2. Carriage of MPEG-4 elementary streams over RTP 2. Carriage of MPEG-4 elementary streams over RTP
2.1 MPEG-4 stream type identification 2.1 Introduction
Information on the type of MPEG-4 stream that is carried in the With this payload format a single MPEG-4 elementary stream can be
payload is conveyed by format parameters in an SDP message or by other transported. Information on the type of MPEG-4 stream carried in the
means. payload is conveyed by format parameters in an SDP [7] message or
by other means. These format parameters specify the configuration
of the payload. To simplify receivers, also a format parameter is
available to signal a specific mode of using this payload. A mode
definition MAY include the type of MPEG-4 elementary stream as well
as the applied configuration, so as to avoid the need in receivers
for parsing all format parameters.
2.2 MPEG Access Units 2.2 MPEG Access Units
For carriage of compressed audio-visual data MPEG defines Access For carriage of compressed audio-visual data MPEG defines Access
Units. An MPEG Access Unit (AU) is the smallest data entity to which Units. An MPEG Access Unit (AU) is the smallest data entity to which
timing information can be attributed. In case of audio an Access timing information can be attributed. In case of audio an Access
Unit represents an audio frame and in case of video a picture. MPEG Unit represents an audio frame and in case of video a picture. MPEG
Access Units are by definition byte aligned. If for example an audio Access Units are by definition byte aligned. If for example an audio
frame is not byte aligned, up to 7 zero-padding bits MUST be inserted frame is not byte aligned, up to 7 zero-padding bits MUST be inserted
at the end of the frame to achieve a byte-aligned Access Unit. at the end of the frame to achieve a byte-aligned Access Unit.
skipping to change at line 154 skipping to change at line 169
picture, any video stream headers that may precede the coded picture picture, any video stream headers that may precede the coded picture
data, and any video stream stuffing that may follow it, up to, but not data, and any video stream stuffing that may follow it, up to, but not
including the startcode indicating the start of a new video stream or including the startcode indicating the start of a new video stream or
the next Access Unit. the next Access Unit.
2.3 Concatenation of Access Units 2.3 Concatenation of Access Units
Frequently it is possible to carry multiple Access Units in one RTP Frequently it is possible to carry multiple Access Units in one RTP
packet. This is particularly useful for audio; for example, when AAC packet. This is particularly useful for audio; for example, when AAC
is used for encoding of a stereo signal at 64 kbits/sec, AAC frames is used for encoding of a stereo signal at 64 kbits/sec, AAC frames
contain on average approximately 200 bytes. On a LAN with a 1500 byte contain on average approximately 200 bytes. On a LAN with a 1500 octet
MTU this would allow on average 7 complete AAC frames to be carried MTU this would allow on average 7 complete AAC frames to be carried
per AAC packet. per AAC packet.
Access Units may have a fixed size in bytes, but a variable size is Access Units may have a fixed size in octets, but a variable size is
also possible. To facilitate parsing in case of multiple concatenated also possible. To facilitate parsing in case of multiple concatenated
AUs in one RTP packet, the size of each AU is made known to the AUs in one RTP packet, the size of each AU is made known to the
receiver. When concatenating in case of a constant AU size, this size receiver. When concatenating in case of a constant AU size, this size
is communicated through a format parameter. When concatenating in case is communicated through a format parameter. When concatenating in case
of variable size AUs, the RTP payload carries an AU size field for of variable size AUs, the RTP payload carries an AU size field for
each contained AU. In combination with the RTP payload length the each contained AU. In combination with the RTP payload length the
size information allows the RTP payload to be split by the receiver size information allows the RTP payload to be split by the receiver
back into the individual AUs. back into the individual AUs.
To simplify the implementation of [6] defined in this document, it To simplify the implementation of RFC XXXX defined in this document, it
is required that when multiple AUs are carried in an RTP packet, that is required that when multiple AUs are carried in an RTP packet, that
each AU MUST be complete, i.e. the number of AUs in an RTP packet each AU MUST be complete, i.e. the number of AUs in an RTP packet
MUST be integral. MUST be integral.
2.4 Fragmentation of Access Units 2.4 Fragmentation of Access Units
MPEG allows for very large Access Units. Since most IP networks have MPEG allows for very large Access Units. Since most IP networks have
significantly smaller MTU's, this payload format allows to fragment significantly smaller MTU's, this payload format allows to fragment
the AUs over multiple RTP packets so as to avoid IP layer the AUs over multiple RTP packets so as to avoid IP layer
fragmentation. To simplify the implementation of [6] defined in this fragmentation. To simplify the implementation of RFC XXXX defined in this
document, an RTP packet SHALL either carry one or more complete document, an RTP packet SHALL either carry one or more complete
Access Units or a single fragment of one Access Unit. Access Units or a single fragment of one Access Unit.
2.5 Interleaving 2.5 Interleaving
When an RTP packet carries a contiguous sequence of Access Units, When an RTP packet carries a contiguous sequence of Access Units,
the loss of such packet can result in "decoding gaps" for the user. the loss of such packet can result in "decoding gaps" for the user.
One method to alleviate this problem is to allow for the Access One method to alleviate this problem is to allow for the Access
Units to be interleaved in the RTP packets. For a modest cost in Units to be interleaved in the RTP packets. For a modest cost in
latency and implementation complexity, significant error resiliency latency and implementation complexity, significant error resiliency
skipping to change at line 232 skipping to change at line 247
is possible if the frame rate is constant. However, in some cases it is possible if the frame rate is constant. However, in some cases it
is not possible to make such calculation, for example for variable is not possible to make such calculation, for example for variable
frame rate video and for MPEG-4 BIFS streams carrying composition frame rate video and for MPEG-4 BIFS streams carrying composition
information. To support such cases, this payload format can be information. To support such cases, this payload format can be
configured to carry a CTS in the RTP payload for each contained configured to carry a CTS in the RTP payload for each contained
Access Unit. A CTS time stamp MAY be conveyed in the RTP payload Access Unit. A CTS time stamp MAY be conveyed in the RTP payload
only for non-first AUs in the RTP packet, and SHALL NOT be conveyed only for non-first AUs in the RTP packet, and SHALL NOT be conveyed
for the first AU (fragment), as the time stamp for the latter is for the first AU (fragment), as the time stamp for the latter is
carried by the RTP time stamp. carried by the RTP time stamp.
The DTS timestamp is applied only in MPEG video streams that use The DTS timestamp may be applied only in MPEG video streams that use
bi-directional coding, i.e. when pictures may be predicted in both bi-directional coding, i.e. when pictures may be predicted in both
forward and backward direction by using either a reference picture in forward and backward direction by using either a reference picture in
the past, or a reference picture in the future. The DTS cannot be the past, or a reference picture in the future. The DTS cannot be
carried in the RTP header. In some cases the DTS can be derived from carried in the RTP header. In some cases the DTS can be derived from
the RTP time stamp using frame rate information; this requires deep the RTP time stamp using frame rate information; this requires deep
parsing in the video stream, which may be considered objectionable. parsing in the video stream, which may be considered objectionable.
But if the video frame rate is variable, the required information But if the video frame rate is variable, the required information
is not even present in the video stream. For both reasons, the may not even present in the video stream. For both reasons, the
capability has been defined to optionally carry a DTS in the RTP capability has been defined to optionally carry a DTS in the RTP
payload for each contained Access Unit. payload for each contained Access Unit.
Since RTP time stamps may be re-stamped by RTP devices, each CTS Since RTP time stamps may be re-stamped by RTP devices, each CTS
and DTS contained in the RTP payload is coded differentially from the and DTS contained in the RTP payload is coded differentially from the
RTP time stamp, so as to avoid extensive parsing by re-stamping RTP time stamp, so as to avoid extensive parsing by re-stamping
devices. devices.
2.7 Carriage of auxiliary information. 2.7 Carriage of auxiliary information.
This payload format defines a specific field to carry auxiliary data This payload format defines a specific field to carry auxiliary data
on the contained MPEG-4 stream, representing MPEG-4 system information. on the contained MPEG-4 stream, representing MPEG-4 system information.
The auxiliary data corresponds to the RSLH field defined in [6]. The auxiliary data corresponds to the RSLH field defined in RFC XXXX.
Receivers MAY use the auxiliary data to decode the contained stream, Receivers MAY use the auxiliary data to decode the contained stream,
but receivers that have no interest in such data MAY skip the but receivers that have no interest in such data MAY skip the
auxiliary data field. To facilitate skipping of the data, and to avoid auxiliary data field. To facilitate skipping of the data, and to avoid
the need for parsing it, the auxiliary data field is preceded by a the need for parsing it, the auxiliary data field is preceded by a
field that specifies the length of the auxiliary data. field that specifies the length of the auxiliary data.
2.8 Format parameters and the conditional presence and length of fields 2.8 Format parameters and the conditional presence and length of fields
To support the features described in the previous sections several To support the features described in the previous sections several
fields are defined for carriage in the RTP payload. However, their use fields are defined for carriage in the RTP payload. However, their use
strongly depends on the type of MPEG-4 elementary stream that is strongly depends on the type of MPEG-4 elementary stream that is
carried. Sometimes a specific field is needed with a certain length, carried. Sometimes a specific field is needed with a certain length,
while in other cases such field is not needed at all. To be efficient while in other cases such field is not needed at all. To be efficient
in either case, the fields needed for these features are configurable in either case, the fields needed for these features are configurable
by means of format parameters. In general, a format parameter defines by means of format parameters. In general, a format parameter defines
the presence and length of associated fields. A length of zero the presence and length of associated fields. A length of zero
indicates absence of the field. As a consequence, parsing of the indicates absence of the field. As a consequence, parsing of the
payload requires knowledge of format parameters. The format payload requires knowledge of format parameters. The format
parameters are conveyed to the receiver via SDP messages or through parameters are conveyed to the receiver via SDP [7] messages or
other means. through other means.
2.9 Global structure of payload format 2.9 Global structure of payload format
The payload structure in [6] is described in terms derived from the The payload structure in RFC XXXX is described in terms derived from the
SL layer. In this document exactly the same structure is described SL layer. In this document exactly the same structure is described
in more general terms, so as to improve the readability for people in more general terms, so as to improve the readability for people
with no knowledge of the SL layer. So the payload structure described with no knowledge of the SL layer. So the payload structure described
below corresponds on bit level exactly to the payload structure below corresponds on bit level exactly to the payload structure
defined in [6]. defined in RFC XXXX.
The RTP payload following the RTP header, contains three byte aligned The RTP payload following the RTP header, contains three byte aligned
data sections, of which the first two MAY be empty. See figure 1. data sections, of which the first two MAY be empty. See figure 1.
+---------+-----------+-----------+---------------+ +---------+-----------+-----------+---------------+
| RTP | AU Header | Auxiliary | Access Unit | | RTP | AU Header | Auxiliary | Access Unit |
| Header | Section | Section | Data Section | | Header | Section | Section | Data Section |
+---------+-----------+-----------+---------------+ +---------+-----------+-----------+---------------+
<----------RTP Packet Payload-----------> <----------RTP Packet Payload----------->
skipping to change at line 305 skipping to change at line 320
The first data section is the AU (Access Unit) Header Section, that The first data section is the AU (Access Unit) Header Section, that
contains one or more AU-headers; however, each AU-header MAY be empty, contains one or more AU-headers; however, each AU-header MAY be empty,
in which case the entire AU Header Section is empty. The second in which case the entire AU Header Section is empty. The second
section is the Auxiliary Section, containing auxiliary data; also section is the Auxiliary Section, containing auxiliary data; also
this section MAY be configured empty. The third section is the Access this section MAY be configured empty. The third section is the Access
Unit Data Section, containing either a single fragment of one Access Unit Data Section, containing either a single fragment of one Access
Unit or one or more complete Access Units. The Access Unit Data Unit or one or more complete Access Units. The Access Unit Data
Section is never empty. Section is never empty.
When compared to the terms used in [6], the AU Header Section exactly When compared to the terms used in RFC XXXX, the AU Header Section
corresponds to the MSLHSection, the Auxiliary Section to the exactly corresponds to the Payload Header Section, the Auxiliary
RSLHSection, and the Access Unit Data Section to the SLPPSection. Section to the RSLH Section, and the Access Unit Data Section to the
Payload Section.
2.10 Modes to transport MPEG-4 streams 2.10 Modes to transport MPEG-4 streams
While it is possible to build fully configurable receivers capable of While it is possible to build fully configurable receivers capable of
receiving any MPEG-4 stream, this specification also allows for the receiving any MPEG-4 stream, this specification also allows for the
design of simplified, but dedicated receivers, that are capable for design of simplified, but dedicated receivers, that are capable for
example to receive only one type of MPEG-4 stream. This is achieved by example to receive only one type of MPEG-4 stream. This is achieved by
requiring that specific modes be defined for using this specification. requiring that specific modes be defined for using this specification.
Each mode defines how to transport specific MPEG-4 streams, for example Each mode defines how to transport specific MPEG-4 streams, for example
by defining suitable constraints or payload configurations. Modes can by defining suitable constraints or payload configurations. Modes can
skipping to change at line 332 skipping to change at line 348
important for receivers that are only capable of decoding a particular important for receivers that are only capable of decoding a particular
mode. Such receivers need to determine whether that particular mode is mode. Such receivers need to determine whether that particular mode is
applied, so as to avoid problems with processing of payloads that are applied, so as to avoid problems with processing of payloads that are
beyond the capabilities of the receiver. beyond the capabilities of the receiver.
In this internet draft only modes are defined for transport of MPEG-4 In this internet draft only modes are defined for transport of MPEG-4
CELP and AAC streams. However, in future new RFCs are expected to CELP and AAC streams. However, in future new RFCs are expected to
specify additional modes of using this specification for transport of specify additional modes of using this specification for transport of
other MPEG-4 streams. other MPEG-4 streams.
2.11 Alignment with "RFC-generic" and RFC 3016 2.11 Alignment with RFC XXXX and RFC 3016
This document defines a subset of the "RTP payload format for MPEG-4 This document defines a subset of the RFC XXXX. The main characteristic
streams" [6]. The main characteristic of this subset is that each RTP of this subset is that each RTP payload is only allowed to contain either
payload is only allowed to contain either a single fragment of one a single fragment of one Access Unit or one or more complete Access Units.
Access Unit or one or more complete Access Units. Obviously, RTP Obviously, RTP payloads that apply this subset in conformance with this
payloads that apply this subset in conformance with this document document conform also to RFC XXXX. Receivers that comply with RFC XXXX
conform also to [6]. Receivers that comply with [6] are able to decode are able to decode MPEG-4 streams carried in compliance with this
MPEG-4 streams carried in compliance with this document. document.
Receivers designed to only comply to this document may not be able to Receivers designed to only comply to this document may not be able to
decode a RTP payload that conforms to [6] but not to this document. decode a RTP payload that conforms to RFC XXXX but not to this document.
Such receivers may also not be capable of exploiting some of features Such receivers may also not be capable of exploiting some of features
of the SL layer supported in [6], such as knowledge of AU-start, random of the SL layer supported in RFC XXXX, such as knowledge of AU-start,
access information and other information carried in the SL header, but random access information and other information carried in the SL header,
not described in this document. but not described in this document.
Furthermore, this payload can be configured to be identical to the Furthermore, this payload can be configured to be identical to the
payload format defined in RFC 3016 for the MPEG-4 video configurations payload format defined in RFC 3016 [5] for the MPEG-4 video configurations
recommended in RFC 3016. Hence, receivers that comply with RFC 3016 recommended in RFC 3016. Hence, receivers that comply with RFC 3016
can decode such RTP payload. Vice versa, receivers that comply with the can decode such RTP payload. Vice versa, receivers that comply with the
specification in this document SHOULD be able to decode payloads, names specification in this document SHOULD be able to decode payloads, names
and parameters defined for MPEG-4 video in RFC 3016. and parameters defined for MPEG-4 video in RFC 3016.
For interoperability reasons, applications that transport MPEG-4 video For interoperability reasons, applications that transport MPEG-4 video
over RTP SHOULD use the payload format and associated names and over RTP SHOULD use the payload format and associated names and
parameters defined in RFC 3016 if the functionality provided by RFC 3016 parameters defined in RFC 3016 if the functionality provided by RFC 3016
can meet the requirements of that application. can meet the requirements of that application.
skipping to change at line 386 skipping to change at line 402
the M is always set to set to 1, except when the packet carries a the M is always set to set to 1, except when the packet carries a
single fragment of an Access Unit that is not the last one. single fragment of an Access Unit that is not the last one.
Extension (X) bit: Defined by the RTP profile used. Extension (X) bit: Defined by the RTP profile used.
Sequence Number: The RTP sequence number SHOULD be generated by the Sequence Number: The RTP sequence number SHOULD be generated by the
sender with a constant random offset. sender with a constant random offset.
Timestamp: Indicates the sampling instance of the first AU contained Timestamp: Indicates the sampling instance of the first AU contained
in the RTP payload. This sampling instance is equivalent to the CTS in the RTP payload. This sampling instance is equivalent to the CTS
in the MPEG-4 time domain. The clock rate of the RTP time stamp MAY in the MPEG-4 time domain. The clock rate of the RTP time stamp MUST
be expressed as part of the RTPMAP. If an audio or video stream with be expressed as part of the RTPMAP. If an audio or video stream with
a fixed frame rate is transported, the rate SHOULD be set to the same a fixed frame rate is transported, the rate SHOULD be set to the same
value as the sampling frequency of the audio or video frames (number value as the sampling frequency of the audio or video frames (number
of samples per second). of samples per second).
In all cases, the sender SHALL make sure that RTP time stamps In all cases, the sender SHALL make sure that RTP time stamps
are identical only if the RTP time stamp refers to fragments of the are identical only if the RTP time stamp refers to fragments of the
same Access Unit. same Access Unit.
According to RFC 1889 [2] (section 5.1), RTP timestamps are According to RFC 1889 [2] (section 5.1), RTP timestamps are
recommended to start at a random value for security reasons. However, recommended to start at a random value for security reasons. However,
then a receiver is, in the general case, not able to reconstruct the then a receiver is, in the general case, not able to reconstruct the
original MPEG Time Stamps, which creates problems for applications original MPEG Time Stamps, which creates problems for applications
where streams from multiple sources are to be synchronized. Therefore where streams from multiple sources are to be synchronized. To enable
the usage of a random offset SHOULD be avoided. synchronisation in such cases, for example between one stream from
local storage and another from an RTP streaming server, the applied
random offset MUST be provided out of band. Methods to convey the
applied random offset value are beyond the scope of this
specification.
SSRC: set as described in RFC1889 [2]. SSRC: set as described in RFC1889 [2].
CC and CSRC fields are used as described in RFC 1889 [2]. CC and CSRC fields are used as described in RFC 1889 [2].
RTCP SHOULD be used as defined in RFC 1889 [2]. RTCP SHOULD be used as defined in RFC 1889 [2].
3.2 RTP Payload Structure 3.2 RTP Payload Structure
As already noted in section 2.9 of this document, this document uses As already noted in section 2.9 of this document, this document uses
more general names to describe exactly the same payload structure as more general names to describe exactly the same payload structure as
defined in [6]. For mapping between section names in [6] and in this defined in RFC XXXX. For mapping between section names in RFC XXXX and
document see section 2.9. in this document see section 2.9.
3.2.1 The AU Header Section 3.2.1 The AU Header Section
When present, the AU Header Section consists of the AU-header-length When present, the AU Header Section consists of the AU-header-length
field, followed by a number of AU-headers. See figure 2. field, followed by a number of AU-headers. See figure 2.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
|AU-headers-length|AU-header|AU-header| |AU-header|padding| |AU-headers-length|AU-header|AU-header| |AU-header|padding|
| | (1) | (2) | | (n) | bits | | | (1) | (2) | | (n) | bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
Figure 2: The AU Header Section Figure 2: The AU Header Section
The AU-headers are configured using format parameters and MAY be empty. The AU-headers are configured using format parameters and MAY be empty.
If the AU-header is configured empty, the AU-headers-length field If the AU-header is configured empty, the AU-headers-length field
SHALL not be present and consequently the AU Header Section is empty. SHALL not be present and consequently the AU Header Section is empty.
If the AU-header is not configured empty, then the AU-headers-length If the AU-header is not configured empty, then the AU-headers-length
is a two byte field that specifies the length in bits of the is a two octet field that specifies the length in bits of the
immediately following AU-headers. immediately following AU-headers.
Each AU-header is associated with a single Access Unit (fragment) Each AU-header is associated with a single Access Unit (fragment)
contained in the Access Unit Data Section in the same RTP packet. For contained in the Access Unit Data Section in the same RTP packet. For
each contained Access Unit (fragment) there is exactly one AU-header. each contained Access Unit (fragment) there is exactly one AU-header.
Within the AU Header Section, the AU-headers are bit-wise concatenated Within the AU Header Section, the AU-headers are bit-wise concatenated
in the order in which the Access Units are contained in the Access in the order in which the Access Units are contained in the Access
Unit Data Section. Hence, the n-th AU-header refers to the n-th AU Unit Data Section. Hence, the n-th AU-header refers to the n-th AU
(fragment). If the concatenated AU-headers consume a non-integer (fragment). If the concatenated AU-headers consume a non-integer
number of bytes, up to 7 zero-padding bits MUST be inserted at the end number of octets, up to 7 zero-padding bits MUST be inserted at the end
in order to achieve byte-alignment of the AU Header Section. in order to achieve byte-alignment of the AU Header Section.
3.2.1.1 The AU-header 3.2.1.1 The AU-header
The AU-header contains the fields given in figure 3. The length in The AU-header contains the fields given in figure 3. The length in
bits of the above fields with the exception of the CTS-flag and bits of the above fields with the exception of the CTS-flag and
the DTS-flag fields is defined by format parameters; see section 4.1. the DTS-flag fields is defined by format parameters; see section 4.1.
If a format parameter has the default value of zero, then the If a format parameter has the default value of zero, then the
associated field is not present. associated field is not present.
skipping to change at line 470 skipping to change at line 490
| DTS-flag | | DTS-flag |
+---------------------------------------+ +---------------------------------------+
| DTS-delta | | DTS-delta |
+---------------------------------------+ +---------------------------------------+
Figure 3: The fields in the AU-header. If used, the AU-Index field Figure 3: The fields in the AU-header. If used, the AU-Index field
only occurs in the first AU-header within an AU Header only occurs in the first AU-header within an AU Header
Section; in any other AU-header the AU-Index-delta field Section; in any other AU-header the AU-Index-delta field
occurs instead. occurs instead.
AU-size: indicates the size in bytes of the associated Access Unit AU-size: indicates the size in octets of the associated Access Unit
in the Access Unit Data Section in the same RTP packet. When the in the Access Unit Data Section in the same RTP packet. When the
AU-size is associated to an AU fragment, the AU size indicates AU-size is associated to an AU fragment, the AU size indicates
the size of the entire AU and not the size of the fragment. This the size of the entire AU and not the size of the fragment. This
can be exploited to determine whether a packet contains an entire can be exploited to determine whether a packet contains an entire
AU or a fragment, which is particularly useful after losing a AU or a fragment, which is particularly useful after losing a
packet carrying the last fragment of an AU. packet carrying the last fragment of an AU.
AU-Index: indicates the serial number of the associated Access Unit AU-Index: indicates the serial number of the associated Access Unit
(fragment). For each (in time) consecutive AU or AU fragment, (fragment). For each (in time) consecutive AU or AU fragment,
the serial number is incremented with 1. When present, the the serial number is incremented with 1. When present, the
AU-Index field occurs in the first AU-header in the AU Header AU-Index field occurs in the first AU-header in the AU Header
Section, but MUST not occur in any subsequent (non-first) Section, but MUST NOT occur in any subsequent (non-first)
AU-header in that Section. To encode the serial number in any AU-header in that Section. To encode the serial number in any
such non-first AU-header, the AU-Index-delta field is used. such non-first AU-header, the AU-Index-delta field is used.
When each AU-Index field is coded with the value 0, the serial When each AU-Index field is coded with the value 0, the serial
number of the AU (fragment) is not specified and in that case number of the AU (fragment) is not specified and in that case
receivers MAY ignore the AU-Index field. receivers MAY ignore the AU-Index field.
AU-Index-delta: The AU-Index-delta field is an unsigned integer AU-Index-delta: The AU-Index-delta field is an unsigned integer
that specifies the serial number of the associated AU as the that specifies the serial number of the associated AU as the
difference with respect to the serial number of the previous difference with respect to the serial number of the previous
Access Unit. Hence, for the n-th (n>1) AU the serial number is Access Unit. Hence, for the n-th (n>1) AU the serial number is
skipping to change at line 543 skipping to change at line 563
DTS-flag, respectively. DTS-flag, respectively.
3.2.2 The Auxiliary Section 3.2.2 The Auxiliary Section
The Auxiliary Section consists of the auxiliary-data-size field The Auxiliary Section consists of the auxiliary-data-size field
followed by the auxiliary-data field. Receivers MAY (but are not followed by the auxiliary-data field. Receivers MAY (but are not
required to) parse the auxiliary-data field; to facilitate skipping required to) parse the auxiliary-data field; to facilitate skipping
of the auxiliary-data field by receivers, the auxiliary-data-size of the auxiliary-data field by receivers, the auxiliary-data-size
field indicates the length in bits of the auxiliary-data. If the field indicates the length in bits of the auxiliary-data. If the
concatenation of the auxiliary-data-size and the auxiliary-data concatenation of the auxiliary-data-size and the auxiliary-data
fields consume a non-integer number of bytes, up to 7 zero padding fields consume a non-integer number of octets, up to 7 zero padding
bits MUST be inserted immediately after the auxiliary data in order bits MUST be inserted immediately after the auxiliary data in order
to achieve byte-alignment. See figure 4. to achieve byte-alignment. See figure 4.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
| auxiliary-data-size | auxiliary-data |padding bits | | auxiliary-data-size | auxiliary-data |padding bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
Figure 4: The fields in the Auxiliary Section Figure 4: The fields in the Auxiliary Section
The length in bits of the auxiliary-data-size field is configurable The length in bits of the auxiliary-data-size field is configurable
by a format parameter; see section 4.1. The default length of zero by a format parameter; see section 4.1. The default length of zero
indicates that the entire Auxiliary Section is absent. indicates that the entire Auxiliary Section is absent.
auxiliary-data-size; specifies the length in bits of the immediately auxiliary-data-size; specifies the length in bits of the immediately
following auxiliary-data field; following auxiliary-data field;
auxiliary-data; the auxiliary-data field contains the Remaining SL auxiliary-data; the auxiliary-data field contains the Remaining SL
headers (RSLHs) as defined in [6]. headers (RSLHs) as defined in RFC XXXX.
3.2.3 The Access Unit Data Section 3.2.3 The Access Unit Data Section
The Access Unit Data Section contains an integer number of complete The Access Unit Data Section contains an integer number of complete
Access Units or a single fragment of one AU. The Access Unit Data Access Units or a single fragment of one AU. The Access Unit Data
Section is never empty. If data of more than one Access Units is Section is never empty. If data of more than one Access Units is
contained, then the AUs are concatenated into a contiguous string of contained, then the AUs are concatenated into a contiguous string of
bytes. See figure 5. The AUs inside the Access Unit Data Section octets. See figure 5. The AUs inside the Access Unit Data Section
MUST be in decoding order. MUST be in decoding order.
The size and number of Access Units SHOULD be adjusted such that the The size and number of Access Units SHOULD be adjusted such that the
resulting RTP packet is not larger than the path-MTU. To handle resulting RTP packet is not larger than the path-MTU. To handle
larger packets, this payload format relies on lower layers for larger packets, this payload format relies on lower layers for
fragmentation, which may not be desirable. fragmentation, which may not be desirable.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|AU(1) | |AU(1) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- |
skipping to change at line 661 skipping to change at line 681
receivers may 'flush' all Access Units from the interleave buffer receivers may 'flush' all Access Units from the interleave buffer
which have a time-stamp strictly less than the time-stamp of the which have a time-stamp strictly less than the time-stamp of the
arriving packet. Similarly the first Access Unit of every arriving arriving packet. Similarly the first Access Unit of every arriving
packet can always be flushed (as no following packet can provide an packet can always be flushed (as no following packet can provide an
earlier Access Unit), and any Access Units which are consecutive with earlier Access Unit), and any Access Units which are consecutive with
it which have already been received. Access Units should also be it which have already been received. Access Units should also be
flushed in time to be played; this can be important if there is loss flushed in time to be played; this can be important if there is loss
before end-of-stream, before a silence interval, or before a large before end-of-stream, before a silence interval, or before a large
drop-out. drop-out.
3.2.3.2.1 Constraints for interleaving 3.2.3.3 Constraints for interleaving
The size of the packets should be suitably chosen to be appropriate The size of the packets should be suitably chosen to be appropriate
to both the path MTU and the duration and capacity of the receiver's to both the path MTU and the duration and capacity of the receiver's
de-interleave buffer. The maximum packet size for a session should be de-interleave buffer. The maximum packet size for a session should be
chosen not to exceed the path MTU. chosen not to exceed the path MTU.
In order to control receiver latency and mitigate the effects of loss, In order to control receiver latency and mitigate the effects of loss,
there are profile-based limits on the size of the packet. This is there are profile-based limits on the size of the packet. This is
expressed as a duration: it is calculated from the duration of the expressed as a duration: it is calculated from the duration of the
Access Units contained within a packet. It is NOT the difference in Access Units contained within a packet. It is NOT the difference in
skipping to change at line 717 skipping to change at line 737
Four modes are defined for transport of MPEG-4 CELP and AAC streams. Four modes are defined for transport of MPEG-4 CELP and AAC streams.
In each of these modes, the same requirements apply for the rtpmap In each of these modes, the same requirements apply for the rtpmap
attributes. The general form of an rtpmap attribute is: attributes. The general form of an rtpmap attribute is:
a=rtpmap:<payload type><encoding name>/<clock rate>[/<encoding a=rtpmap:<payload type><encoding name>/<clock rate>[/<encoding
parameters>] parameters>]
For audio streams, <encoding parameters> specifies the number of For audio streams, <encoding parameters> specifies the number of
audio channels. This parameter may be omitted if the number of audio channels. This parameter may be omitted if the number of
channels is one, provided no additional parameters are needed. channels is one, provided no additional parameters are needed.
In all four modes, the following attributes are REQUIRED: In all four modes, the following attributes are REQUIRED:
a) The encoding name a) The encoding name
b) The RTP clock rate MUST be expressed. It is recommended that this b) The RTP clock rate MUST be expressed. It is RECOMMENDED that this
be the sampling rate of the audio, to give sample-accurate timing. be the sampling rate of the audio, to give sample-accurate timing.
However, other rates MAY be used (e.g. 90 kHz). However, other rates MAY be used (e.g. 90 kHz).
c) The number of audio channels MUST be specified, for example as 2 c) The number of audio channels MUST be specified, for example as 2
for stereo material (see RFC 2327) and MAY be specified as 1 for for stereo material (see RFC 2327) and MAY be specified as 1 for
mono material; 1 is the default. mono material; 1 is the default.
3.3.3 Constant bit-rate CELP. 3.3.3 Constant bit-rate CELP.
This mode is signalled by mode=CELP-cbr. In this mode one or more This mode is signalled by mode=CELP-cbr. In this mode one or more
fixed size CELP frames can be transported in one RTP packet; there is fixed size CELP frames can be transported in one RTP packet; there is
skipping to change at line 755 skipping to change at line 775
This mode is signalled by mode=CELP-vbr. With this mode in one RTP This mode is signalled by mode=CELP-vbr. With this mode in one RTP
packet one or more variable size CELP frames can be transported with packet one or more variable size CELP frames can be transported with
optional interleaving. As the largest possible frame size in this mode optional interleaving. As the largest possible frame size in this mode
is greater than the maximum CELP frames size, there is no support for is greater than the maximum CELP frames size, there is no support for
fragmentation on the CELP frames. fragmentation on the CELP frames.
In this mode the RTP payload consists of the AU Header Section, In this mode the RTP payload consists of the AU Header Section,
followed by one or more concatenated CELP frames. The Auxiliary Section followed by one or more concatenated CELP frames. The Auxiliary Section
is empty. For each CELP frame contained in the payload there is a one is empty. For each CELP frame contained in the payload there is a one
byte AU-header in the AU Header Section to provide : octet AU-header in the AU Header Section to provide :
(a) the size of each CELP frame in the payload and (a) the size of each CELP frame in the payload and
(b) index information for computing the sequence (and hence timing) of (b) index information for computing the sequence (and hence timing) of
each CELP frame. each CELP frame.
Transport of CELP frames requires that the AU-size field is coded with Transport of CELP frames requires that the AU-size field is coded with
6 bits. In this mode therefore 6 bits are allocated to the AU-size 6 bits. In this mode therefore 6 bits are allocated to the AU-size
field, and 2 bits to the AU-Index(-delta) field. Each AU-Index field field, and 2 bits to the AU-Index(-delta) field. Each AU-Index field
MUST be coded with the value 0. In the AU Header Section, the MUST be coded with the value 0. In the AU Header Section, the
concatenated AU-headers are preceded by the 16-bit AU-headers-length concatenated AU-headers are preceded by the 16-bit AU-headers-length
field, as specified in 3.2.1. field, as specified in 3.2.1.
skipping to change at line 780 skipping to change at line 800
than 0), also the parameter Profile MUST be present. than 0), also the parameter Profile MUST be present.
Example : Example :
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config= a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config=
AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2; AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2;
Profile=1 Profile=1
The AudioSpecificConfig() specifies that audio stream type is CELP. The AudioSpecificConfig() specifies that the audio stream type is CELP.
3.3.5 Low bit-rate AAC 3.3.5 Low bit-rate AAC
This mode is signalled by AAC-lbr. This mode supports transport of one This mode is signalled by AAC-lbr. This mode supports transport of one
or more variable size AAC frames with optional support for interleaving or more variable size AAC frames with optional support for interleaving
and fragmenting. The maximum size of an AAC frame (fragment) in this and fragmenting. The maximum size of an AAC frame (fragment) in this
mode is 63 bytes. mode is 63 octets.
The payload configuration in this mode is the same as in the variable The payload configuration in this mode is the same as in the variable
bit-rate CELP mode as defined in 3.3.4. The RTP payload consists of the bit-rate CELP mode as defined in 3.3.4. The RTP payload consists of the
AU Header Section, followed by concatenated AAC frames. The Auxiliary AU Header Section, followed by concatenated AAC frames. The Auxiliary
Section is empty. For each AAC frame contained in the payload the one Section is empty. For each AAC frame contained in the payload the one
byte AU-header provides : octet AU-header provides :
(a) the size of each AAC frame in the payload and (a) the size of each AAC frame in the payload and
(b) index information for computing the sequence (and hence timing) of (b) index information for computing the sequence (and hence timing) of
each AAC frame. each AAC frame.
In the AU-header, the AU-size is coded with 6 and the AU-Index(-delta) In the AU-header, the AU-size is coded with 6 and the AU-Index(-delta)
with 2 bits; the AU-Index field MUST have the value 0 in each AU-header. with 2 bits; the AU-Index field MUST have the value 0 in each AU-header.
In the AU-header Section, the concatenated AU-headers are preceded by In the AU-header Section, the concatenated AU-headers are preceded by
the 16-bit AU-headers-length field, as specified in 3.2.1. the 16-bit AU-headers-length field, as specified in 3.2.1.
Next to the required format parameters, the following parameters MUST Next to the required format parameters, the following parameters MUST
be present: be present:
skipping to change at line 816 skipping to change at line 836
than 0), also the parameter Profile MUST be present. than 0), also the parameter Profile MUST be present.
Example : Example :
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config= a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config=
AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2; AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2;
Profile=1 Profile=1
The AudioSpecificConfig() specifies that audio stream type is AAC. The AudioSpecificConfig() specifies that the audio stream type is AAC.
3.3.6 High bit-rate AAC 3.3.6 High bit-rate AAC
This mode is signalled by mode=AAC-hbr. This mode supports transport This mode is signalled by mode=AAC-hbr. This mode supports transport
of one or more large variable size AAC frames in one RTP packet with of one or more large variable size AAC frames in one RTP packet with
optional support for interleaving and fragmenting. The maximum size of optional support for interleaving and fragmenting. The maximum size of
an AAC frame (fragment) in this mode is 8191 bytes. an AAC frame (fragment) in this mode is 8191 bytes.
In this mode the RTP payload consists of the AU Header Section, In this mode the RTP payload consists of the AU Header Section,
followed by one or more concatenated AAC frames. The Auxiliary Section followed by one or more concatenated AAC frames. The Auxiliary Section
is empty. For each AAC frame contained in the payload there is an is empty. For each AAC frame contained in the payload there is an
AU-header in the AU Header Section to provide : AU-header in the AU Header Section to provide :
(a) the size of each AAC frame in the payload and (a) the size of each AAC frame in the payload and
(b) index information for computing the sequence (and hence timing) of (b) index information for computing the sequence (and hence timing) of
each AAC frame. each AAC frame.
To code the maximum size of an AAC frame requires 13 bits. Therefore in To code the maximum size of an AAC frame requires 13 bits. Therefore in
this configuration 13 bits are allocated to the AU-size, and 3 bits this configuration 13 bits are allocated to the AU-size, and 3 bits
to the AU-Index(-delta) field. Thus each AU-header has a size of 2 to the AU-Index(-delta) field. Thus each AU-header has a size of 2
bytes. Each AU-Index field MUST be coded with the value 0. In the octets. Each AU-Index field MUST be coded with the value 0. In the
AU Header Section, the concatenated AU-headers are preceded by the AU Header Section, the concatenated AU-headers are preceded by the
16-bit AU-headers-length field, as specified in 3.2.1. 16-bit AU-headers-length field, as specified in 3.2.1.
Next to the required format parameters, the following parameters MUST Next to the required format parameters, the following parameters MUST
be present: be present:
SizeLength, IndexLength, and IndexDeltaLength. SizeLength, IndexLength, and IndexDeltaLength.
When interleaving is applied (AU-Index-delta coded with a value larger When interleaving is applied (AU-Index-delta coded with a value larger
than 0), also the parameter Profile MUST be present. than 0), also the parameter Profile MUST be present.
Example : Example :
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 mpeg4-generic/44100/2 a=rtpmap:96 mpeg4-generic/44100/2
a=fmtp:96 streamtype=5; profile-level-id=15; mode= AAC-hbr; config= a=fmtp:96 streamtype=5; profile-level-id=15; mode= AAC-hbr; config=
AudioSpecificConfig(); SizeLength=13; IndexLength=3; IndexDeltaLength=3; AudioSpecificConfig(); SizeLength=13; IndexLength=3; IndexDeltaLength=3;
Profile=1 Profile=1
The AudioSpecificConfig() specifies that the audio stream type is AAC. The AudioSpecificConfig() specifies that the audio stream type is AAC.
4. Types and names 4. IANA considerations
This section describes the MIME types and names associated with this This payload format uses the same the MIME types and names as defined
payload format. in RFC XXXX. However, some additional format parameters are defined.
Depending on the required payload configuration, format parameters may Depending on the required payload configuration, format parameters may
need to be available to the receiver. This is done using the parameters need to be available to the receiver. This is done using the parameters
described in the next section. The absence of any of these parameters described in the next section. The absence of any of these parameters
is equivalent to the associated field set to its default value, which is equivalent to the associated field set to its default value, which
is always zero. The absence of any such parameters resolves into a is always zero. The absence of any such parameters resolves into a
default "basic" configuration. default "basic" configuration.
In the MPEG-4 framework the SL stream configuration information is
carried using the Object Descriptor. When such information is present
both in an Object Descriptor and as a parameter of this payload format
it MUST be exactly the same.
4.1 MIME types
This specification uses exactly the same MIME types as [6], and hence
no further MIME type registration is required. In [6] uses the MIME
media type names: "video" or "audio" or "application".
"video" SHOULD be used for any MPEG Video stream or any MPEG-4
System (ISO/IEC 14496-1) stream that conveys information needed for
an audio-visual presentation.
"audio" SHOULD be used for any MPEG Audio streams and any MPEG-4
System (ISO/IEC 14496-1) stream that conveys information needed for
an audio-only presentation.
"application" SHOULD be used for MPEG-4 Systems streams
(ISO/IEC14496-1) that serve other purposes than audio/visual
presentation, e.g. in some cases when MPEG-J streams are transmitted.
MIME subtype name: mpeg4-generic MIME subtype name: mpeg4-generic
Required parameters: Required parameters:
StreamType: StreamType:
The integer value that indicates the type of MPEG-4 stream that is The integer value that indicates the type of MPEG-4 stream that is
carried; its coding corresponds to the values of the streamType as carried; its coding corresponds to the values of the streamType as
defined for the DecoderConfigDescriptor in ISO/IEC 14496-1. defined for the DecoderConfigDescriptor in ISO/IEC 14496-1.
skipping to change at line 933 skipping to change at line 930
14496-3. 14496-3.
For visual streams, config is the MPEG-4 Visual configuration For visual streams, config is the MPEG-4 Visual configuration
information, as defined in subclause 6.2.1 Start codes of information, as defined in subclause 6.2.1 Start codes of
ISO/IEC14496-2. The configuration information indicated by this ISO/IEC14496-2. The configuration information indicated by this
parameter SHALL be the same as the configuration information in the parameter SHALL be the same as the configuration information in the
corresponding MPEG-4 Visual stream, except for first-half-vbv- corresponding MPEG-4 Visual stream, except for first-half-vbv-
occupancy and latter-half-vbv-occupancy, if it exists, which may occupancy and latter-half-vbv-occupancy, if it exists, which may
vary in the repeated configuration information inside an MPEG-4 vary in the repeated configuration information inside an MPEG-4
Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2). Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2).
Optional parameters:
Mode: Mode:
The mode in which this specification is used. The following modes The mode in which this specification is used. The following modes
can be signalled : can be signalled :
mode=CELP-cbr, mode=CELP-cbr,
mode=CELP-vbr, mode=CELP-vbr,
mode=AAC-lbr and mode=AAC-lbr and
mode=AAC-hbr. mode=AAC-hbr.
Other modes are expected to be defined in future RFCs. When defining Other modes are expected to be defined in future RFCs. When defining
a new mode care MUST be taken that an implementation of all features a new mode care MUST be taken that an implementation of all features
of this specification can decode the payload format corresponding to of this specification can decode the payload format corresponding to
this new mode. For this reason a mode MUST NOT specify new default this new mode. For this reason a mode MUST NOT specify new default
values for MIME parameters; in particular, MIME parameters MUST be values for MIME parameters; in particular, MIME parameters MUST be
present (unless they have the default value), even if it is redundant present (unless they have the default value), even if it is redundant
in case the mode assigns fixed values. A mode may define additionally in case the mode assigns fixed values. A mode may define additionally
that some MIME parameters are required instead of optional, that some that some MIME parameters are required instead of optional, that some
MIME parameters have fixed values (or ranges), and that there are MIME parameters have fixed values (or ranges), and that there are
rules restricting the usage. rules restricting the usage.
Optional parameters:
ConstantSize: ConstantSize:
The constant size in bytes of each Access Unit for this stream. The constant size in octets of each Access Unit for this stream.
Simultaneous presence of ConstantSize and the SizeLength Simultaneous presence of ConstantSize and the SizeLength
parameters is not permitted. parameters is not permitted.
SizeLength: SizeLength:
The number of bits on which the AU-size field is encoded in the The number of bits on which the AU-size field is encoded in the
AU-header. Simultaneous presence of SizeLength and the ConstantSize AU-header. Simultaneous presence of SizeLength and the ConstantSize
parameter is not permitted. parameter is not permitted.
IndexLength: IndexLength:
The number of bits on which the AU-Index is encoded in the first The number of bits on which the AU-Index is encoded in the first
skipping to change at line 1064 skipping to change at line 1061
Multiple parameters SHOULD be expressed as a MIME media type string, Multiple parameters SHOULD be expressed as a MIME media type string,
in the form of a semicolon-separated list of parameter=value pairs in the form of a semicolon-separated list of parameter=value pairs
(for parameter usage examples see Appendix A). (for parameter usage examples see Appendix A).
4.3 Usage of SDP 4.3 Usage of SDP
4.3.1 The a=fmtp keyword 4.3.1 The a=fmtp keyword
It is assumed that one typical way to transport the above-described It is assumed that one typical way to transport the above-described
parameters associated with this payload format is via a SDP message parameters associated with this payload format is via a SDP message
for example transported to the client in reply to a RTSP DESCRIBE of [7] for example transported to the client in reply to a RTSP DESCRIBE
via SAP. In that case the (a=fmtp) keyword MUST be used as described of via SAP. In that case the (a=fmtp) keyword MUST be used as
in RFC 2327 [8, section 6]. The syntax being then: described in RFC 2327 [7, section 6]. The syntax being then:
a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>] a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>]
5. Security Considerations 5. Security Considerations
RTP packets using the payload format defined in this specification No additional security considerations apply beyond those discussed in
are subject to the security considerations discussed in the RTP RFC 1889 and RFC XXXX.
specification [2]. This implies that confidentiality of the media
streams is achieved by encryption. Because the data compression used
with this payload format is applied end-to-end, encryption may be
performed on the compressed data so there is no conflict between the
two operations. The packet processing complexity of this payload
type (i.e. excluding media data processing) does not exhibit any
significant non-uniformity in the receiver side to cause a denial-
of-service threat.
However, it is possible to inject non-compliant MPEG streams (Audio,
Video, and Systems) to overload the receiver/decoder's buffers which
might compromise the functionality of the receiver or even crash it.
This is especially true for end-to-end systems like MPEG where the
buffer models are precisely defined.
MPEG-4 Systems supports stream types including commands that are
executed on the terminal like OD commands, BIFS commands, etc. and
programmatic content like MPEG-J (Java(TM) Byte Code) and
ECMASCRIPT. It is possible to use one or more of the above in a
manner non-compliant to MPEG to crash or temporarily make the
receiver unavailable.
Authentication mechanisms can be used to validate of the sender and
the data to prevent security problems due to non-compliant malignant
MPEG-4 streams.
A security model is defined in MPEG-4 Systems streams carrying MPEG-J 6. Acknowledgements
access units which comprises Java(TM) classes and objects. MPEG-J
defines a set of Java APIs and a secure execution model. MPEG-J
content can call this set of APIs and Java(TM) methods from a set of
Java packages supported in the receiver within the defined security
model. According to this security model, downloaded byte code is
forbidden to load libraries, define native methods, start programs,
read or write files, or read system properties.
Receivers can implement intelligent filters to validate the buffer This document evolved through several revisions thanks to contributions
requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, from a people from the ISMA forum, from the IETF AVT working group and
ECMAScript) commands in the streams. However, this can increase the the 4-on-IP ad-hoc group within MPEG. The authors wish to thank all
complexity significantly. involved people, and in particular Colin Perkins, Stephan Wenger and
Dorairaj V for their valuable comments and support.
6. References 7. References
[1] ISO/IEC International Standard 14496 (MPEG-4); "Information [1] ISO/IEC International Standard 14496 (MPEG-4); "Information
technology - Coding of audio-visual objects", January 2000 technology - Coding of audio-visual objects", January 2000
[2] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport [2] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport
Protocol for Real Time Applications RFC 1889, Internet Engineering Protocol for Real Time Applications RFC 1889, Internet Engineering
Task Force, January 1996. Task Force, January 1996.
[3] S. Bradner, Key words for use in RFCs to Indicate Requirement [3] S. Bradner, Key words for use in RFCs to Indicate Requirement
Levels, RFC 2119, March 1997. Levels, RFC 2119, March 1997.
skipping to change at line 1136 skipping to change at line 1102
[4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, RTP payload [4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, RTP payload
format for MPEG1/MPEG2 Video, RFC 2250, January 1998. format for MPEG1/MPEG2 Video, RFC 2250, January 1998.
[5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP [5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP
payload format for MPEG-4 Audio/Visual streams, RFC 3016. payload format for MPEG-4 Audio/Visual streams, RFC 3016.
[6] Avaro, Basso, Casner, Civanlar, Gentric, Herpel, Lim, Perkins, [6] Avaro, Basso, Casner, Civanlar, Gentric, Herpel, Lim, Perkins,
van der Meer, RTP payload format for MPEG-4 streams, work in progress, van der Meer, RTP payload format for MPEG-4 streams, work in progress,
draft-gentric-avt-mpeg4-multiSL-01.txt, January 2001. draft-gentric-avt-mpeg4-multiSL-01.txt, January 2001.
[7] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over [7] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327,
IP-based Protocols, work in progress, draft-singer-mpeg4-ip-
01.txt,October 2000.
[8] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327,
Internet Engineering Task Force, April 1998. Internet Engineering Task Force, April 1998.
7. Author Adresses 7. Author Adresses
Jan van der Meer Jan van der Meer
Philips Digital Networks Philips Digital Networks
Cederlaan 4 Cederlaan 4
5600 JB Eindhoven 5600 JB Eindhoven
Netherlands Netherlands
Email : jan.vandermeer@philips.com Email : jan.vandermeer@philips.com
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/