Internet Engineering Task Force                              Basso-AT&T
Internet Draft                                            Civanlar-AT&T
                                                        Gentric-Philips
                                                         Herpel-Thomson
                                                      Lifshitz-Optibase
                                                            Lim-mp4cast
                                                            Perkins-ISI
                                                   Van Der Meer-Philips
                                                         September
                                                          November 2001
                                                       Expires March May 2002
Document: draft-ietf-avt-mpeg4-multisl-02.txt draft-ietf-avt-mpeg4-multisl-03.txt

                 RTP Payload Format for MPEG-4 Streams

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of
   six months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet- Drafts
   as reference material or to cite them other than as "work in
   progress."

   This specification is a product of the Audio/Video Transport working
   group within the Internet Engineering Task Force and ISO/IEC MPEG-4
   ad hoc group on MPEG-4 over Internet. Comments are solicited and
   should be addressed to the working group's mailing list at
   avt@ietf.org and/or the authors.

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This document contains a MIME type registration form that is
   intended to be taken as-is and therefore makes reference to this
   document, using the temporary placeholder: <self-reference-to-this>.

Abstract

   This document describes a payload format for transporting MPEG-4
   encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for
   the coding of natural and synthetic audio-visual data. Several
   services provided by RTP are beneficial for MPEG-4 encoded data
   transport over the Internet. Additionally, the use of RTP makes it
   possible to synchronize MPEG-4 data with other real-time data types.

Gentric et al.            Expires March 2002                         1
                RTP Payload Format for MPEG-4 Streams  September 2001

1. Introduction

   MPEG-4 is a recent standard from ISO/IEC for the coding of natural
   and synthetic audio-visual data in the form of audiovisual objects
   that are arranged into an audiovisual scene by means of a scene
   description [1][2][3][4]. This draft specifies an RTP [5] payload
   format for transporting MPEG-4 encoded data streams.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
   this document are to be interpreted as described in RFC 2119 [6].

   The benefits of using RTP for MPEG-4 data stream transport include:

   i. Ability to synchronize MPEG-4 streams with other RTP payloads

   ii. Monitoring MPEG-4 delivery performance through RTCP

   iii. Combining MPEG-4 and other real-time data streams received from
   multiple end-systems into a set of consolidated streams through RTP
   mixers

   iv. Converting data types, etc. through the use of RTP translators.

1.1 Overview of MPEG-4 End-System Architecture

   Fig. 1 below shows the layered architecture

   Two types of terminals can use this specification. One case is a terminal, which
   implements the
   complete MPEG-4 systems model. The Compression Layer
   processes individual audio-visual media streams. The MPEG-4
   compression schemes are defined in terminal i.e. a terminal implementing the ISO/IEC specifications 14496-
   2 MPEG-4
   system [1] specification and possibly also MPEG-4 video [2] and 14496-3
   audio [3]. The compression schemes in MPEG-4 achieve
   efficient encoding over a bandwidth ranging from several kbps to
   many Mbps. The audio-visual content compressed by this layer Another possibility is
   organized into Elementary Streams (ESs).
   The MPEG-4 standard specifies MPEG-4 compliant streams. Within the
   constraint a terminal implementing only a
   part of this compliance the compression layer is unaware set of MPEG-4 specification; one example is a
   specific delivery technology, terminal
   using MPEG-4 video [2] but it can be made to react not MPEG-4 systems as in RFC3016.

   This document is structured so as to the
   characteristics be understandable from both
   points of a particular delivery layer such as the path-MTU view (with or loss characteristics. Also, some compressors without MPEG-4 systems). The target is also
   that services deployed for one type of terminal can be designed to
   be delivery specific adapted for implementation efficiency. In such cases
   the compressor may work in a non-optimal fashion with delivery
   technologies that other type thanks to minor session description change because
   recorded streams are different than the one it same. Another key assumption is specifically
   designed to operate with.

   The hierarchical relations, location and that the
   properties of ESs in a
   presentation are described by a dynamic set streams of Object Descriptors
   (ODs). Each OD groups one or more ES Descriptors referring to a
   single content item (audio-visual object). Hence, various type (video, audio, scene
   description) can be described with the same Elementary Stream model
   so that this same payload format can transport any MPEG-4 stream.

1.1.1 The simplified MPEG-4 model

   In the simplified MPEG-4 model MPEG-4 systems [1] is not used.
   However the concept of Elementary Stream remains i.e. both MPEG-4
   video [2] and MPEG-4 audio [3] describe how respectively audio and
   video bit streams are fragmented into pieces that are called Access
   Units. Each Access Unit has by definition a number of media
   independent basic properties:
   . composition time stamp
   . framing
   . possibly decoding time stamp

Gentric et al.            Expires March 2002                         2
                RTP Payload Format for MPEG-4 Streams  September 2001

   Furthermore both the video [2] and audio [3] specification also
   define how Access Units (AU) shall be themselves fragmented since in
   the spirit of Application Level Framing AUs SHOULD be fragmented in
   a way that decoders can process the packets immediately after a
   packet loss. In this case the signaling of Access Unit fragment
   boundaries is also required.

   In order to be understandable from this point of view this payload
   format is described in terms of Access Units (AU) and Access Units
   fragments, without reference to media specific properties (but for a
   few exceptions).

1.1.2 The complete MPEG-4 model

   Fig. 1 below shows the layered architecture of a terminal, which
   implements the complete MPEG-4 systems model. The Compression Layer
   processes individual audio-visual media streams. The MPEG-4
   compression schemes are defined in the ISO/IEC specifications 14496-
   2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve
   efficient encoding over a bandwidth ranging from a few kbps to many
   Mbps. The audio-visual content compressed by this layer is organized
   into Elementary Streams (ESs).

   The MPEG-4 standard specifies MPEG-4 compliant streams. Within the
   constraint of this compliance the compression layer is unaware of a
   specific delivery technology, but it can be made to react to the
   characteristics of a particular delivery layer such as the path-MTU
   or loss characteristics. Also, some compressors can be designed to
   be delivery specific for implementation efficiency. In such cases
   the compressor may work in a non-optimal fashion with delivery
   technologies that are different than the one it is specifically
   designed to operate with.

   The hierarchical relations, location and properties of ESs in a
   presentation are described by a dynamic set of Object Descriptors
   (ODs). Each OD groups one or more ES Descriptors referring to a
   single content item (audio-visual object). Hence, multiple
   alternative or hierarchical representations of each content item are
   possible.

   ODs are themselves conveyed through one or more ESs. A complete set
   of ODs can be seen as an MPEG-4 resource or session description at a

Gentric et al.            Expires March 2002                         2
                RTP Payload Format for MPEG-4 Streams  September 2001
   stream level. The resource description may itself be hierarchical,
   i.e. an ES conveying an OD may describe other ESs conveying other
   ODs.

   The session description is accompanied by a dynamic scene
   description, Binary Format for Scene (BIFS), again conveyed through
   one or more ESs. At this level, content is identified in terms of
   audio-visual objects. The spatio-temporal location of each object is
   defined by BIFS. The audio-visual content of those objects that are
   synthetic and static are described by BIFS also. Natural and

Gentric et al.            Expires March 2002                         3
                RTP Payload Format for MPEG-4 Streams  September 2001

   animated synthetic objects may refer to an OD that points to one or
   more ESs that carries the coded representation of the object or its
   animation data.

   media aware        +-----------------------------------------+
   delivery unaware   |           COMPRESSION LAYER             |
   14496-2 Visual     |streams from as low as Kbps to multi-Mbps|
   14496-3 Audio      +-----------------------------------------+

                                                      Elementary
                                                      Stream
   ===================================================Interface

   (ESI)
                     +-------------------------------------------+
   media and         |              SYNC LAYER                   |
   delivery unaware  | manages elementary streams, their synch-  |
   14496-1 Systems   | ronization and hierarchical relations     |
                     +-------------------------------------------+

                                                       DMIF
                                                       Application
   ====================================================Interface

   (DAI)
                     +-------------------------------------------+
   delivery aware    |               DELIVERY LAYER              |
   media  unaware    |provides transparent access to and delivery|
   14496-6 DMIF      | of content irrespective of delivery       |
                     |                technology                 |
                     +-------------------------------------------+

   Figure 1: Conceptual MPEG-4 terminal architecture

   By conveying the session (or resource) description as well as the
   scene (or content composition) description through their own ESs, it
   is made possible to change portions of the content composition and
   the number and properties of media streams that carry the audio-
   visual content separately and dynamically at well known instants in
   time.

   One or more initial Scene Description streams and the corresponding
   OD stream are pointed to by an initial object descriptor (IOD). In
   this context the IOD needs to be made available to the receivers
   through some out-of-band means that are out of scope of this payload
   specification. However in the context of transport on IP networks it
   is defined in a separate document [9]. Note that for applications
   that only use audio and/or video this payload format can also be
   used without IOD and OD streams (decoder configuration is then
   transported as MIME parameters, see section 4.1).

   The Compression Layer organizes the ESs in Access Units (AU), the
   smallest elements that can be attributed individual timestamps. The
   Access Units concept defines the boundary between media specific
   processing and delivery specific processing. That is to say

Gentric et al.            Expires March 2002                         4
                RTP Payload Format for MPEG-4 Streams  September 2001

   transport should not depend on the nature of the media data but only
   on AU properties.

1.1.3 The Sync Layer

   The Sync Layer (SL) that primarily provides the synchronization
   between streams defines a homogeneous encapsulation of ESs carrying
   media or control data (ODs, BIFS). Integer or fractional AUs are
   then encapsulated in SL packets and in the following we will
   describe this payload format as transporting SL packets, although in
   many cases SL packet payloads are actually (entire) Access Units
   payloads i.e. encoded media frames. packets.

   All consecutive data from one stream is called an SL-packetized stream at this layer.
   stream. The interface between the compression layer and the SL is
   called the Elementary Stream Interface (ESI). The ESI is informative
   i.e. it is extremely useful in order to define concepts and
   mechanisms but does not have to be implemented. For the same reason this draft describes
   the transport of SL packets i.e. Access Units or fragments thereof.
   It is important to note however that a SL stream can be configured

Gentric et al.            Expires March 2002                         3
                RTP Payload Format for MPEG-4 Streams  September 2001

   so that SL packets are reduced to the media (compressed) data and in
   that case implementations do not need to be aware of the SL at all.

   The Delivery Layer in MPEG-4 consists of the Delivery Multimedia
   Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is
   media unaware but delivery technology aware. It provides transparent
   access to and delivery of content irrespective of the technologies
   used.  The interface between the SL and DMIF is called the DMIF
   Application Interface (DAI). It offers content location independent
   procedures for establishing MPEG-4 sessions and access to transport
   channels.

   The specification of this payload format is considered as
   a part of the MPEG-4 Delivery Layer.

   media aware        +-----------------------------------------+
   delivery unaware   |           COMPRESSION LAYER             |
   14496-2 Visual     |streams from as low as Kbps to multi-Mbps|
   14496-3 Audio      +-----------------------------------------+

                                                      Elementary
                                                      Stream
   ===================================================Interface

   (ESI)
                     +-------------------------------------------+
   media and         |              SYNC LAYER                   |
   delivery unaware  | manages elementary streams, their synch-  |
   14496-1 Systems   | ronization and hierarchical relations     |
                     +-------------------------------------------+

                                                       DMIF
                                                       Application
   ====================================================Interface

   (DAI)
                     +-------------------------------------------+
   delivery aware    |               DELIVERY LAYER              | Layer in MPEG-4 consists of the Delivery Multimedia
   Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is
   media unaware    |provides but delivery technology aware. It provides transparent
   access to and delivery|
   14496-6 DMIF      | delivery of content irrespective of delivery       |
                     |                technology                 |
                     +-------------------------------------------+

   Figure 1: Conceptual the technologies
   used.  The interface between the SL and DMIF is called the DMIF
   Application Interface (DAI). It offers content location independent
   procedures for establishing MPEG-4 terminal architecture

1.2 sessions and access to transport
   channels. This payload format can be used as an instance of the
   MPEG-4 Elementary Stream Data Packetization Delivery Layer but is otherwise not tied to DMIF.

   The ESs from the encoders are fed into the SL with indications of AU
   boundaries, random access points, desired composition time and the
   current time. The Sync Layer fragments the ESs into SL packets, each
   containing a header that encodes information conveyed through the
   ESI. If the AU is larger than a SL packet, subsequent packets
   containing remaining

Gentric et al.            Expires March 2002                         4
                RTP Payload Format for MPEG-4 Streams  September 2001 parts of the AU are generated with subset
   headers until the complete AU is packetized. One SL packet describes
   an Access Units or fragments thereof, the SL packet header contains
   extended timing and framing information; the SL packet payload
   contains the bit stream frame (AU) or fragment. For the complete
   list of features of the Sync Layer refer to the MPEG-4 systems
   specification [1]. The syntax of the Sync Layer is configurable and
   can be adapted to the needs of the stream to be transported. This
   includes the possibility to select the presence or absence of
   individual syntax elements as well as configuration of their length
   in bits. The configuration for each individual stream is conveyed in
   a SLConfigDescriptor, which is an integral part of the ES Descriptor
   for this stream. The MPEG-4 SLConfigDescriptor, being configuration
   information, is not carried by the media stream itself but is rather
   transported via media stream itself but is rather
   transported via an ObjectDescriptor Stream encoded using the MPEG-4
   Object Description framework. This can be done in a separate stream
   using this payload format (see section 5.2 for details). The
   SLConfigDescriptor MAY also be transported by other means (for
   example as a parameter, see section 4.1).

   An important point is to note that this draft could just as well
   have been entirely written in terms of SL packets instead of Access

Gentric et al.            Expires March 2002                         5
                RTP Payload Format for MPEG-4 Streams  September 2001

   Units and Access Unit fragments. However this could have created
   confusion for implementers who only need basic properties and do not
   want to cope with the additional complexity of the Sync Layer.
   Instead this specification refers to the Sync Layer only when
   needed.

1.1.4 Where the two models meet

   In basic cases an ObjectDescriptor Elementary Stream encoded using is such that SL packets are
   reduced to the MPEG-4
   Object Description framework. This can be done media (compressed) data (empty headers) and in a separate stream
   using this payload format (see section 5.2 for details). The
   SLConfigDescriptor MAY also that
   case implementations do not actually need to be transported by other means (for
   example as a parameter, see section 4.1). Finally streams for which aware of the Sync
   Layer at all. In these cases it is logically equivalent to say that
   the Sync Layer is not implemented or to say that the SL packet
   headers are completely empty (or fully map into the RTP headers) can also be transported using this payload format; in
   these cases the Synch headers).
   The Sync Layer can then be seen as a purely conceptual construction
   that does not have to be implemented at all. Since

   The above described MPEG-4 system model also deals with session
   setup through Object Descriptors. In cases where the complete MPEG-4
   system framework is not used a replacement for this key functionally
   is required. In fact for simple (audio/video) systems only the
   knowledge of the decoder configuration is then needed it MAY needed; we will see how
   this specification defines options so that decoder configuration can
   also be transported as a parameter, as described in section 4.1. signaled without MPEG-4 system.

   In conclusion this payload format is intended to be capable of
   transporting data formatted according to the Sync Layer
   specification but is also useful without the Sync Layer, or when the
   Sync Layer is invisible, which is equivalent to not using it.

2. Analysis of the carriage of MPEG-4 over IP

   When

   As explained above when transporting MPEG-4 audio and video,
   applications may or may not require the use of MPEG-4 systems. To
   achieve the highest level of interoperability between all MPEG-4
   applications, it is desirable that (a) in both cases the same MPEG-4
   transport format can be used and that (b) receivers that have no
   MPEG-4 system knowledge can easily skip the MPEG-4 system specific
   information, if any.

2.1 The Sync Layer point of view

   RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4
   video, but when using MPEG-4 systems a problem arises from the fact
   that both RTP and MPEG-4 systems contain a synchronization layer.
   In particular, the RTP header duplicates some of the information
   provided in SL packet headers such as the composition timestamps
   (CTSs) and the marker bit that signals the end of access units. timestamps
   (CTSs) and Access Unit boundaries.

   To avoid unnecessary overhead and potential interoperability risks
   when transporting MPEG-4 systems, it is desirable to remove the
   redundancy between the SL packet header and the RTP packet header.

Gentric et al.            Expires March 2002                         6
                RTP Payload Format for MPEG-4 Streams  September 2001

   To be independent on the use of MPEG-4 systems, synchronization can
   rely on the parameters provided in the RTP header.
   Another desired property is to have compatibility with RFC3016 for
   MPEG-4 video transport.
   In case SL headers are used, the redundant fields are removed from
   the SL header, producing "reduced SL headers". The remaining
   information from the SL header, if any, is contained inside the RTP
   packet payload, together with the SL packet payload.
   The combination of RTP packet headers and reduced SL packet headers
   can be used to logically map the RTP packets to complete SL packets.

Gentric et al.            Expires March 2002                         5
                RTP Payload Format for MPEG-4 Streams  September 2001

   Some of the information contained in the reduced SL headers is also
   useful for transport over RTP when an MPEG-4 systems system is not used.

   For that reason the information in the "reduced" SL headers is split
   into "general useful information" and "MPEG-4 systems only
   information".

   The "general useful information" hereinafter called Mapped SL Packet Payload Header (MSLH)
   is carried by a number of fields configurable using parameters
   defined in section 4.1; all receivers MUST parse these fields.

   The "MPEG-4 systems only information", if any, is contained in a
   reduced SL an
   auxiliary header, hereinafter called Remaining SL Packet Header
   (RSLH), also configured using parameters (see section 4.1) and
   preceded by a length field, so that non-MPEG-4-system devices MAY
   skip this information.

   This is depicted in figure 2. 2a.

                                           +------------+
                extended framing and       | AU or AU   |
                  timing information       | fragment   |
                                           +------------+
                                   |              |
                                   |              |
                                   |              |
                                   |              |
                                   V              V

                            <----------SL Packet-------->

                            +---------------------------+
                            |   SL Packet   | SL Packet |
                            |    Header     | Payload   |
                            +---------------------------+
                                  |                |
                                  |                |
         +-------------+----------+---+            |
         |             |              |            |
         V             V              V            V
   +-----------+ +-----------+ +-------------+ +-----------+
   |RTP Packet | | Mapped SL  Payload  | | Remaining SL| | SL Packet |

Gentric et al.            Expires March 2002                         7
                RTP Payload Format for MPEG-4 Streams  September 2001

   |  Header   | |  Header   | |    Header   | | Payload   |
   +-----------+ +-----------+ +-------------+ +-----------+

                 <----RTP Packet Payload------------------->

   Figure 2: 2a: Mapping of ES into SL, then SL Packet into RTP packet

   When the configuration is such that SL packet headers map directly

2.2 The Elementary Stream point of view

   Another way to see the mapping of Elementary Streams (i.e. Access
   Units or AU fragments) into RTP headers packets is depicted in figure 2.b.
   In this process of mapping SL packet headers view the "basic" timing and fragmentation information listed
   in section 1.1.1 is purely
   conceptual. obtained directly at the codec interfaces and
   mapped into the RTP header or the RTP Payload Header.

   For example this RTP payload format has been designed so that it is
   by default configured to be identical to RFC 3016 for the
   recommended MPEG-4 video configurations (see section 5.5). configurations, specifically in this case
   the Payload Header is empty. Hence receivers that comply with this
   payload specification can decode such RTP payload without knowledge
   about the Synch Sync Layer (see the example in Appendix.1). Appendix 1). In a similar
   fashion but with non-empty Payload Headers, MPEG-4 audio (see
   Appendix 3 and 4 for examples) can be transported without explicit
   use of the Synch Sync Layer.

                               +------------+
        basic framing and      | AU or AU   |
        timing information     | fragment   |
                               +------------+
                |                    |
                |                    |
         +-------------+             |
         |             |             |
         V             V             V
   +-----------+ +-----------+ +-----------+
   |RTP Packet | |  Payload  | |           |
   |  Header   | |  Header   | | Payload   |
   +-----------+ +-----------+ +-----------+

                 <----RTP Packet Payload--->

   Figure 2b: Direct mapping of Elementary Streams into RTP packet

2.3 How the two views reconcile

   A simple concept enables to unify these apparently antagonistic
   points of view: a "no-SL" terminals can skip (ignore) the Remaining
   SL Header, if present.

Gentric et al.            Expires March 2002                         6                         8
                RTP Payload Format for MPEG-4 Streams  September 2001

3. Payload Format

   The RTP Payload corresponds to an integer number of SL packets.

   If multiple SL packets are transported in each Access Units or
   Access Unit fragments.

   The RTP packet, they MUST
   be in decoding order, i.e:
   i)   decodingTimeStamp order, if present
   ii)  packetSequenceNumber order, if present
   iii) Implicit decoding order in all other cases. payload is composed of 3 sections:
   . a Payload Header section
   . a RSLH section
   . a Payload Section.

   The AU and AU fragment boundaries and timing information is
   transported in the Payload Header.

   When transporting SL streams, SL Packet Headers are transformed into RSLH
   Remaining SL Header (RSLH) with some fields extracted to be mapped
   in the RTP header and others extracted to be mapped in the
   corresponding MSLH. The SL Packet Payload Header.

   The AU or AU fragment data (SL packet payload) i.e. Elementary
   stream codec data is unchanged.

   This payload format has two modes. The "SingleSL" "Single" mode is a mode where
   a single SL packet AU or AU fragment is transported per RTP packet. The
   "MultipleSL"
   "Multiple" mode is a mode where possibly more than one SL packet AU or AU
   fragment are transported per RTP packet. The default mode is the Single-SL
   "Single" mode. The mode can

   In the "Multiple" mode, AU or AU fragments MUST be set to Multiple-SL in decoding order
   inside one RTP packet. Decoding order is defined by adding a non-zero
   ConstantSize or SizeLength parameter (see section 4.1). the relevant
   codec specification. Decoding order may be different than
   presentation order, for example for video streams containing B
   frames. According to the MPEG-4 system model this order is
   quantified using decoding time stamps (DTS).

   RTP Packets SHOULD be sent in the SL stream order (as defined
   above). decoding order. In case of
   interleaving the first SL packet AU or AU fragment of each RTP packet is used
   as reference as in the following examples of RTP packets containing
   interleaved SL packets.
   This sequence is correct: [0,2,4][1,3,5]
   This sequence is correct: [0,3,6][1,2][4,5]
   This sequence is correct: [0,3,6][1,4][2,5]
   This sequence is prohibited: [0,4,2][1,5,3]
   This sequence is prohibited: [1,3,5][0,2,4]
   This sequence is prohibited: [0,3,6][2,5][1,4]

   In the multiple-SL modes "Multiple" mode senders MUST make sure that no fields undergo
   roll over inside one RTP packet. This may limit the number of SL
   packets inside one RTP packet and, when interleaving, may limit the
   interleaving period. period as detailed below.

   The size (or number) and/or number of the SL packet(s) payload(s) SHOULD be adjusted such
   that the resulting RTP packet is not larger than the path-MTU. To

Gentric et al.            Expires March 2002                         9
                RTP Payload Format for MPEG-4 Streams  September 2001

   handle larger packets, this payload format relies on lower layers
   for fragmentation, which may not be desirable.

3.1 RTP Header Fields Usage

   Payload Type (PT): The assignment of an RTP payload type for this
   new packet format is outside the scope of this document, and will
   not be specified here. It is expected that the RTP profile for a
   particular class of applications will assign a payload type for this
   encoding, or if that is not done then a payload type in the dynamic
   range shall be chosen.

Gentric et al.            Expires March 2002                         7
                RTP Payload Format for MPEG-4 Streams  September 2001

   Marker (M) bit: The M bit is set to 1 when all SL packets AU fragments in the
   RTP packet are Access Units ends i.e. the M bit maps to the Synch Layer
   accessUnitEndFlag. ends.

   Specifically the M bit is set to 0 when the RTP packet contains one
   or more Access Unit AU fragments that are not Access Unit ends, and the M bit is
   set to 1 for RTP packets that contain either:
   . A single complete Access Unit
   . The last fragment of an Access Unit
   . Several complete Access Units
   . Several last fragments of Access Units
   . A mix of complete Access Units and last fragments of Access Units

   Therefore for streams where all SL packets are complete Access Units
   the M bit is 1 for all RTP packets. 1 for all RTP packets. Note also that in terms of Sync
   Layer this means that the M bit is related to the accessUnitEndFlag.

   Extension (X) bit: Defined by the RTP profile used.

   Sequence Number: The RTP sequence number should be generated by the
   sender with a constant random offset and does not have to be
   correlated to any (optional) MPEG-4 SL sequence numbers. offset.

   Timestamp: Set to the value in the compositionTimeStamp field of the
   first SL packet AU or AU fragment in the RTP packet, if present.

   If compositionTimeStamp has less than 32 bits length, the RTP
   timestamp is incremented generated to extend it out to 32 bits. If
   compositionTimeStamp has more than 32 bits length, the RTP timestamp
   uses the 32 LSB of it. The When using the Sync Layer the resolution of
   the timestamp (timeStampLength) is available from the SL
   configuration data and shall be used by receivers to reconstruct
   compositionTimeStamps with the original bit length. When making SL streams specifically for
   usage with this payload format In all other
   case it is RECOMMENDED to use timeStampLength=32.

   In all cases, the sender SHALL always make sure that RTP time stamps
   are identical only for RTP packets transporting fragments of the
   same Access Unit.

   In case compositionTimeStamp is not present in the current SL
   packet, but has been present in a previous SL packet the AU or AU fragmentthe
   reason is that this is the same Access Unit that has been
   fragmented, therefore the same timestamp value MUST be taken as RTP
   timestamp.

Gentric et al.            Expires March 2002                        10
                RTP Payload Format for MPEG-4 Streams  September 2001

   If compositionTimeStamp is never present in SL packets for this
   stream, the RTP packetizer SHOULD convey a reading of a local clock
   at the time the RTP packet is created.

   In all cases, the sender SHALL always make sure that RTP time stamps
   are identical only for RTP packets transporting fragments of the
   same Access Unit.

   According to RFC1889 [5, Section 5.1] timestamps are recommended to
   start at a random value for security reasons. However then, a
   receiver is not in the general case able to reconstruct the original
   MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for
   applications where streams from multiple sources are to be

Gentric et al.            Expires March 2002                         8
                RTP Payload Format for MPEG-4 Streams  September 2001

   synchronized.
   synchronized (for example one stream from local storage, another
   from a streaming server). Therefore the usage of such a random
   offset SHOULD be avoided.

   Note that since RTP devices may re-stamp the stream, all time stamps
   inside of the RTP payload (CTS and DTS in MSLH, PayloadHeader, OCR in
   RSLH) MUST be expressed as difference to the RTP time stamp. Since
   this subtraction may lead to negative values, the offset MUST be
   encoded as a two's complement signed integer in network byte octet order.
   Note these offsets (delta) typically require much fewer bits to be
   encoded than the original length, which is another justification.

   When startCompositionTimeStamp is signaled in the SLConfigDescriptor
   the RTP time stamps MUST start with this value.

   SSRC, CC and CSRC fields are used as described in RFC 1889 [5].

   RTCP SHOULD be used as defined in RFC 1889 [5].

   RTP timestamps in RTCP SR packets: according to the RTP timing
   model, the RTP timestamp that is carried into an RTCP SR packet is
   the same as the compositionTimeStamp that would be applied to an RTP
   packet for data that was sampled at the instant the SR packet is
   being generated and sent. The RTP timestamp value is calculated from
   the NTP timestamp for the current time, which also goes in the RTCP
   SR packet. To perform that calculation, an implementation needs to
   periodically establish a correspondence between the CTS value of a
   data packet

   SSRC, CC and the NTP time at which that data was sampled. CSRC fields are used as described in RFC 1889 [5].

   RTCP SHOULD be used as defined in RFC 1889 [5].

3.2 RTP payload structure

   The packet payload structure consists of 3 byte-aligned octet-aligned sections.

   The first section is the MSLHSection Payload Header Section and contains Mapped SL Packet
   Headers (MSLH). Payload
   Headers. Each Payload Header contains basic fragmentation and timing
   information for one AU or AU fragment. The MSLH Payload Header structure
   is described in 3.3. In the
   Single-SL "Single" mode this section is empty by
   default.

   The second section is the RSLHSection RSLH Section and contains Remaining SL
   Headers (RSLH). The RSLH structure is described in 3.5. By default
   this section is empty.

   The last section (SLPPSection) (Payload Section) contains the SL packet payloads. AU or AU fragment
   codec bit stream fragments. This section is never empty.

   The Nth MSLH Payload Header in the MSLHSection, Payload Header Section, the Nth RSLH
   in the RSLHSection RSLH Section and the Nth SL packet AU or AU fragment payload in the SLPPSection
   Payload Section correspond to the Nth
   SL packet AU or AU fragment transported
   by the RTP packet.

Gentric et al.            Expires March 2002                        11
                RTP Payload Format for MPEG-4 Streams  September 2001

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Gentric et al.            Expires March 2002                         9
                RTP Payload Format for MPEG-4 Streams  September 2001
   |                           timestamp                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           synchronization source (SSRC) identifier            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :            contributing source (CSRC) identifiers             :
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |                                                               |
   |                MSLHSection (byte              Payload Header Section (octet aligned)           |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                                                               |
   |                RSLHSection (byte              RSLH Section (octet aligned)                     |
   |                                                               |
   |               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |                                               |
   +-+-+-+-+-+-+-+-+                                               |
   |                                                               |
   |                SLPPSection (byte              Payload Section (octet aligned)                  |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 3: An RTP packet for MPEG-4

3.3 MSLHSection Payload Header Section structure

   If the MSLHSection Payload Header Section consumes a non-integer number of bytes,
   octets, up to 7 zero-valued padding bits MUST be inserted at the end
   in order to achieve byte-alignment. octet-alignment. This size excludes the padding
   bits, if any.

   In the Single-SL "Single" mode the MSLHSection Payload Header Section consists of a single MSLH.

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Payload Header.

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | MSLH Payload Header (x bits )  : padding bits|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 4: MSLHSection Payload Header Section structure in Single-SL "Single" mode

Gentric et al.            Expires March 2002                        12
                RTP Payload Format for MPEG-4 Streams  September 2001

   In the Multiple-SL "Multiple" mode this the Payload Header section consist of a 2 bytes
   octets field giving the size in bits (in network byte octet order) of the
   following block of bit-wise concatenated MSLHs. PayloadHeaders.

   This size field is absent in the Single-SL "Single" mode not because it is not
   needed (which would be a minor gain) but for compatibility with RFC
   3016.

   This size field is also absent when the value would always be zero
   because the MSLH Payload Header is always empty, which may happen when a
   constant payload size in signaled using ConstantSize.

Gentric et al.            Expires March 2002                        10
                RTP Payload Format for MPEG-4 Streams  September 2001 ConstantSize (see below).

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | MSLH Payload Header section size in bits    | MSLH        |         etc                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+-+-+                      |
   | as many bit-wise concatenated MSLHs Payload Headers                 |
   | as SL packets AU or AU fragments in this RTP packet                      |
   |                                 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                 : padding bits|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 5: MSLHSection Payload Header Section structure in Multiple-SL "Multiple" mode

3.4 MSLH Payload Header structure

   The Mapped SL Packet Payload Header content depends on parameters (as described in
   section 4.1); by default it is empty for the Single-SL "Single" mode and,
   except when ConstantSize is signaled, contains at least the
   PayloadSize field in the Multiple-SL "Multiple" mode.

   When all options are used the MSLH Payload Header structure and the
   relationship with the related parameter is given in figure 6.

   +============================+
   |PayloadSize table 1.

   +===========================+=================================+
   | Fields of MSLPH           | Number of bits (parameters)     |
   +===========================+=================================+
   | PayloadSize               | SizeLength                      |
   +---------------------------+---------------------------------+
   | Index                     | IndexLength                     |
   +---------------------------+---------------------------------+
   | IndexDelta                | IndexDeltaLength                |
   +---------------------------+---------------------------------+
   | CTSFlag                   | 1      If (CTSDeltaLength > 0)  |
   +----------------------------+
   |Index or IndexDelta
   +---------------------------+---------------------------------+
   |
   +----------------------------+
   |CTSFlag CTSDelta                  |
   +----------------------------+
   |CTSDelta CTSDeltaLength If (CTSFlag==1)  |
   +----------------------------+
   |DTSFlag
   +---------------------------+---------------------------------+
   |
   +----------------------------+
   |DTSDelta DTSFlag                   |
   +============================+

   Figure 6: Mapped SL Packet 1      If (DTSDeltaLength > 0)  |
   +---------------------------+---------------------------------+
   | DTSDelta                  | DTSDeltaLength If (DTSFlag==1)  |

Gentric et al.            Expires March 2002                        13
                RTP Payload Format for MPEG-4 Streams  September 2001

   +---------------------------+---------------------------------+

   Table 1: Payload Header (MSLH) structure fields and parameters giving the sizes

   In the general case a receiver can only discover the size of a MSLH
   Payload Header by parsing it since for example the presence of
   CTSDelta is signaled by the value of CTSFlag.

3.4.1 Fields of MSLH Payload Header

   PayloadSize: Indicates the size in bytes octets of the associated SL Packet Payload,
   which can be found in the SLPPSection Payload Section of the RTP packet. The
   length in bits of this field is signaled by the SizeLength parameter
   (see section 4.1).

   There is an exception to that. In the case that the RTP packet
   contains only one SL packet AU or AU fragment in the "Multiple SL mode", "Multiple" mode, the

Gentric et al.            Expires March 2002                        11
                RTP Payload Format for MPEG-4 Streams  September 2001
   PayloadSize field SHALL contain the size of the entire corresponding
   Access Unit. There are two reasons, firstly the size of the fragment
   is not needed when there is only one fragment, fragment in the RTP packet,
   secondly this is useful in order to detect that if a full Access Unit has
   been received after the loss of a packet carrying a M bit set to 1.

   Index, IndexDelta: Encodes encodes the packetSequenceNumber (serial number) serial number of the SL Packet. When making streams specifically for transport
   with this payload format associated AU or
   AU fragment. IndexDelta is useful for interleaving (see section
   3.8). Since When transporting a mapping of packetSequenceNumber SL stream, Index and IndexDelta SHALL be
   used to RTP
   sequence number is not possible in encode the Multiple-SL mode there is no
   requirement for a correspondence. SL Packet Header packetSequenceNumber field.

   Index is optional and -if present- appears for in the first SL packet
   in Payload
   Header of a RTP packet.

   The length in bits of the Index field is defined by the IndexLength
   parameter (see section 4.1).

   IndexDelta is optional and -if present- appears for subsequent (non-
   first) SL packets in Payload Headers of a RTP packet.

   The length in bits of the IndexDelta field is defined by the
   IndexDeltaLength parameter (see section 4.1).

   Both Index and IndexDelta MUST be incremented so that 2 different SL
   packets consecutive
   AU or AU fragments SHALL NOT have the same packetSequenceNumber. be distinguishable. One exception for Index
   is described in 3.8.1.

   If the parameter IndexDeltaLength is defined, non-first SL packets AU or AU
   fragments inside a RTP packet have their packetSequenceNumber serial number encoded as a
   difference (thus the name IndexDelta). This difference is relative
   to the previous SL packet AU or AU fragment in the RTP packet according to
   (with i>=0):
   packetSequenceNumber(0)
   Serial number(0) = Index(0)
   packetSequenceNumber(i+1)
   Serial number (i+1) = packetSequenceNumber(i) Serial number (i) + IndexDelta(i+1) + 1

Gentric et al.            Expires March 2002                        14
                RTP Payload Format for MPEG-4 Streams  September 2001

   If the parameter IndexDeltaLength is not defined the default value
   is zero and then the IndexDelta field is not present for non-first
   SL packets.
   AU or AU fragments. Nevertheless receivers SHALL then apply the
   above formula with IndexDelta equal to zero. In other words by
   default
   packetSequenceNumber the serial number is incremented by 1 for each SL packet AU or AU
   fragment in one the RTP packet.

   CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A
   value of 1 indicates that the CTSDelta field is present, a value of
   0 that it is not present.

   If CTSDeltaLength is not zero, CTSFlag is present in all MSLH Payload
   Headers regardless of whether the SL packet AU fragment is an Access Unit
   start or not.

Gentric et al.            Expires March 2002                        12
                RTP Payload Format for MPEG-4 Streams  September 2001

   CTSDelta (CTSDeltaLength bits): Specifies the value of the CTS as a
   2-complement offset (delta) from the timestamp in the RTP header of
   the RTP packet. The length in bits of each CTSDelta field is
   specified by the CTSDeltaLength parameter (see section 4.1).

   The CTSDelta field is present if CTSFlag is 1.

   For the first MSLH Payload Header of each RTP packet CTSFlag is always 0,
   since the composition time stamp of the first SL packet AU or AU fragment in
   the RTP packet is mapped to the RTP time stamp. In all cases When using the Sync
   Layer the sender MUST remove the compositionTimeStamp from the RSLH.

   Senders MUST NOT assemble finish assembling a RTP packets packet for which CTSDelta rolls would
   roll over
   inside since this would prevent the receiver from reconstructing
   the correct CTS. This can result in sub optimal RTP packet. packets (smaller
   than the MTU) depending on the MTU, the AU or AU fragment sizes and
   CTSDeltaLength.

   DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A
   value of 1 indicates that DTSDelta is present, a value of 0 that it
   is not present.

   If DTSDeltaLength is not zero, DTSFlag is present in all MSLH Payload
   Headers regardless of whether the SL packet AU fragment is an Access Unit
   start or not; not. When transporting SL streams the receiver needs this
   flag in order to reconstruct the decodingTimeStampFlag of SL Packet
   Headers.

   DTSDelta (DTSDeltaLength  bits): encodes (compositionTimeStamp -
   decodingTimeStamp) for the same SL packet (always AU or AU fragment(always positive).
   The length in bits of each DTSDelta field is specified by the
   DTSDeltaLength parameter (see section 4.1).

   Senders MUST NOT assemble RTP packets for which the difference
   between compositionTimeStamp and decodingTimeStamp cannot be
   expressed on DTSDeltaLength bits.

   The DTSDelta field appears when DTSFlag is 1. The sender MUST always
   remove the decodingTimeStamp from the RSLH.

   If DTSDelta is zero i.e. if decodingTimeStamp equals
   compositionTimeStamp then DTSFlag MUST be set to 0 and no DTSDelta
   field SHALL be present.

   At the sender side the computation of DTSDelta MUST be performed by
   taking into account roll over. For example for a SL stream with the
   following (CTS, DTS) pairs (assuming timeStampLength=3):
   (4,3), (5,4), (6,5), (7,6), (0,7); DTSDelta for the last pair is
   logically (1) and not (-7) which would be illegal and could cause
   receivers implemented following section 5.1 to fail.

3.4.2 Relationship between sizes of MSLH fields and parameters

   The relationship between a Mapped SL Packet Header and the related
   parameters is as follows:

   +===========================+=================================+

Gentric et al.            Expires March 2002                        13
                RTP Payload Format for MPEG-4 Streams  September 2001

   | Fields of MSLPH           | Number of bits (parameters)     |
   +===========================+=================================+
   | PayloadSize               | SizeLength                      |
   +---------------------------+---------------------------------+
   | Index                     | IndexLength                     |
   +---------------------------+---------------------------------+
   | IndexDelta                | IndexDeltaLength                |
   +---------------------------+---------------------------------+
   | CTSFlag                   | 1      If (CTSDeltaLength > 0)  |
   +---------------------------+---------------------------------+
   | CTSDelta                  | CTSDeltaLength If (CTSFlag==1)  |
   +---------------------------+---------------------------------+
   | DTSFlag                   | 1      If (DTSDeltaLength > 0)  |
   +---------------------------+---------------------------------+
   | DTSDelta                  | make sure that DTSDeltaLength If (DTSFlag==1)  |
   +---------------------------+---------------------------------+

   Table 1: Relationship is large enough to encode
   the difference between MSLH CTS and DTS (otherwise the DTS computed by
   the receiver would be incorrect).

Gentric et al.            Expires March 2002                        15
                RTP Payload Format for MPEG-4 Streams  September 2001

   The DTSDelta field size appears when DTSFlag is 1. The sender MUST always
   remove the decodingTimeStamp from the RSLH.

   If DTSDelta is zero i.e. if decodingTimeStamp equals
   compositionTimeStamp then DTSFlag MUST be set to 0 and parameters no DTSDelta
   field SHALL be present.

3.5 RSLHSection structure

   This section is present only when using the Sync Layer, and then,
   when the rules in the previous section have left remaining fields.

   This section first consists of a field (RSLHSectionSize) giving the
   size in bits of the following block of bit-wise concatenated RSLHs. RSLHs
   (this size does not include padding bits).

   If the section consumes a non-integer number of bytes, octets, up to 7 zero
   padding bits MUST be inserted at the end in order to achieve byte- octet-
   alignment.

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                 |
   | number of bits)                                               |
   |                                                               |
   |         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         | RSLH (variable number of bits)                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | etc                                                           |
   | as many bit-wise concatenated RSLHs                           |
   | as SL Packets in this RTP packet                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | RSLH (variable number of bits)                                |
   |                                                 +-+-+-+-+-+-+-+
   |                                                 : padding bits|
   |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 7: RSLHSection structure

   The length in bits of the RSLHSectionSize field is
   RSLHSectionSizeLength and is specified with a default value of zero
   indicating that the whole RSLHSection is absent. Compatibility Note that for
   compatibility with RFC 3016 requires that we need to be able to make the
   RSLHSection should be empty, disappear completely, including the RSLHSectionSize
   field. This is the reason why there is such a

Gentric et al.            Expires March 2002                        14
                RTP Payload Format for MPEG-4 Streams  September 2001 variable length with a
   zero default value indicating the absence of the RSLHSectionSize
   field.

   +=================================+===============================+
   | Fields of RSLHSection           |         Number of bits        |
   +=================================+===============================+
   | RSLHSectionSize                 |       RSLHSectionSizeLength   |

Gentric et al.            Expires March 2002                        16
                RTP Payload Format for MPEG-4 Streams  September 2001

   +---------------------------------+-------------------------------+
   | all bit-wise concatenated RSLHs |       RSLHSectionSize         |
   +---------------------------------+-------------------------------+

   Table 2: Sizes in bits inside RSLHSection

   Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system
   awareness, specifically it requires to understand the MPEG-4
   Synchronization
   Sync Layer (SL) syntax and the modifications to this syntax
   described in the next section.

   However thanks to the RSLHSectionSize field non-MPEG-4-system
   receivers MAY CAN skip this part by rounding up RSLPHSize/8 to the next
   integer number of bytes. octets. This means that receivers not implementing
   the Sync Layer can process streams containing Sync Layer specific
   items by simply ignoring the parts they would not be able to parse.

3.6 RSLH structure

   RSLH is present only when using the Sync Layer, and then, when the
   rules in the previous section have left remaining fields.

   A Remaining SL Packet Header (RSLH) is what remains of an SL header
   after modifications for mapping into this payload format.

   The following modifications of the SL packet header Packet Header MUST be applied.
   The other fields of the SL packet header Packet Header MUST remain unchanged but
   are bit-shifted to fill in the gaps left by the operations specified
   below.

3.6.1 Removal of fields

   The following SL Packet Header fields -if present- are removed since
   they are mapped either in the RTP header or in the corresponding
   MSLH:
   Payload Header:
   . compositionTimeStampFlag
   . compositionTimeStamp
   . decodingTimeStampFlag
   . decodingTimeStamp
   . packetSequenceNumber
   . AccessUnitEndFlag (in Single-SL "Single" mode only)

   The AccessUnitEndFlag, when present for a given stream, MUST be
   removed from every RSLH when using the Single-SL "Single" mode since it has
   the same meaning as the Marker bit (and for compatibility with RFC
   3016). However when using the Multiple-SL "Multiple" mode, AccessUnitEndFlag
   MUST NOT be removed since it is useful to signal individual AU ends.

3.6.2 Mapping of OCR

Gentric et al.            Expires March 2002                        15
                RTP Payload Format for MPEG-4 Streams  September 2001

   Furthermore if the SL Packet header contains an OCR, then this field
   is encoded in the RSLH as a 2-complement difference (delta) exactly
   like a compositionTimeStamp or a decodingTimeStamp in the MSLH.

Gentric et al.            Expires March 2002                        17
                RTP Payload Format for MPEG-4 Streams  September 2001

   PayloadHeader. The length in bit of this difference is indicated by
   the OCRDeltaLength parameter (see section 4.1).

   With this payload format OCRs MUST have the same clock resolution frequency as
   Time Stamps.

   If compositionTimeStamp is not present for a SL packet that has OCR
   then the OCR SHALL be encoded as a difference to the RTP time stamp.

3.6.3 Degradation Priority

   For streams that use the optional degradationPriority field in the
   SL Packet Headers, only SL packets with the same degradation
   priority SHALL be transported by one RTP packet so that components
   may dispatch the RTP packets according to appropriate QoS or
   protection schemes. Furthermore only the first RSLH of one RTP
   packet SHALL contain the degradationPriority field since it would be
   otherwise redundant.

3.7 SLPPSection Payload Section structure

   The SLPPSection (SL Packet Payload Section) Section contains the concatenated SL Packet AU or AU fragment
   Payloads. By definition SL Packet AU or AU fragment Payloads are byte octet
   aligned.

   For efficiency SL packets do not carry their own payload size. This
   is not an issue for RTP packets that contain a single SL Packet.
   However in the Multiple-SL "Multiple" mode the size of each SL packet AU or AU fragment
   payload MUST be available to the receiver.

   If the SL packet AU or AU fragment payload size is constant for a stream, the
   size information SHOULD NOT be transported in the RTP packet.
   However in that case it MUST be signaled using the ConstantSize
   parameter (see section 4.1).

   If the SL packet AU or AU fragment payload size is variable then the size of
   each SL
   packet AU or AU fragment payload MUST be indicated in the
   corresponding MSLH. Payload Header. In order to do so the MSLH Payload Header
   MUST contain a PayloadSize field. The number of bits on which this
   PayloadSize field is encoded MUST be indicated using the SizeLength
   parameter (see section 4.1).

   The absence of either ConstantSize or SizeLength indicates the
   Single-SL
   "Single" mode i.e. that a single SL packet AU or AU fragment is transported in
   each RTP packet for that stream.

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | SLPP AU or AU fragment (variable number of bytes) octets)             |
   |                                                           |
   |                                                           |
   |                         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         | AU or AU fragment               |

Gentric et al.            Expires March 2002                        16                        18
                RTP Payload Format for MPEG-4 Streams  September 2001

   +-+-+-+-+-+-+-+-+-+-+-+-+-+                                 |
   |                                                           |                         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |
   | SLPP         (variable number of bytes) |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+                                 |
   |                                                           |
   |                                                           |
   | octets)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | etc                                                       |
   | as many byte-wise octet-wise concatenated SLPPs AU or AU fragment         |
   | as SL Packets in this required to finish RTP packet                          |
   |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 8: SLPPSection Payload Section structure

3.8 Interleaving

   SL Packets MAY be interleaved. Senders MAY perform interleaving.
   Receivers MUST support interleaving.

   The

   Note for Sync Layer implementers: the AUSequenceNumber field of the
   SL header Header MUST NOT be used for interleaving since firstly it may
   collide with the Scene Description Carousel usage described in
   section 5.2 and secondly it is not visible to non-MPEG-4 system receivers. receivers that do not
   implement the Sync Layer and would skip the RSLH section
   transporting AUSequenceNumber.

   When interleaving of SL packets AU or AU fragments is used it SHALL be
   implemented using the IndexDelta fields of MSLH. the Payload Header.
   Senders MUST use properly large
   values NOT make RTP packets for IndexDeltaLength, as required by which IndexDelta rolls over.
   Therefore depending on the interleaving
   algorithm. scheme (if any), the MTU and
   the AU or AU fragment sizes, senders wishing to make optimally sized
   RTP packets (i.e. close to the MTU) will need to set
   IndexDeltaLength to a properly large value.

   Senders SHALL use non zero values of IndexDeltaLength only for
   streams that MAY exhibit interleaving, so that this CAN can be interpreted
   by receivers as an indication that interleaving may be is (maybe) present.

   There are, based on this, two ways for a receiver to implement de-
   interleaving, using either Index or timestamps. This is signaled
   using mime parameters as in the following table, where TSBI and IBI
   stand respectively for Time-Stamp-Based-Interleaving (see section
   3.8.1) and Index-Based-Interleaving (see section 3.8.2). Note that
   the need for two methods arises from two facts: firstly the time
   stamp based method is more economical and in basic cases (no
   multiple AU fragments, CTS always defined) simpler to implement.
   Secondly, unfortunately this method does not always work as
   explained below.

   ==================================================================
   |                | IndexDeltaLength = 0 | IndexDeltaLength !=  0 |
   ------------------------------------------------------------------
   | IndexLength=0  |   no interleaving    |          TSBI          |
   ------------------------------------------------------------------
   | IndexLength!=0 |   no interleaving,   |   Index=0  |  Index!=0 |
   |                |   SL.packetSeqNum    |-------------------------
   |                |    transport         |    TSBI    |    IBI    |
   ==================================================================

Gentric et al.            Expires March 2002                        17                        19
                RTP Payload Format for MPEG-4 Streams  September 2001

   |                |   SL.packetSeqNum    |-------------------------
   |                |    transport         |    TSBI    |    IBI    |
   ==================================================================

3.8.1 Time stamp based interleaving (TSBI)

   The conjunction of RTP time stamp, IndexDelta and CTS may allow a
   receiver to un-ambiguously re-order SL packets AU or AU fragments based on
   their time stamps (CTS).

   This is possible and efficient for streams where SL packets
   transport only complete
   Access Units are transported and receivers can always compute the
   CTS
   time stamp of each Access Unit.

   In case of Access Units of constant duration (e.g. audio streams)
   the explicit presence of CTS in MSLH the Payload Header is not even required.
   required; Indeed then we have (i being the index of SL packets one AU in one
   RTP packet):
   CTS(0) = RTP-TS
   for (i >= 1): CTS(i) = CTS(i-1) + (IndexDelta(i)+1)*AU-duration

   AU-duration, when constant, can be either signaled in SLConfig or be
   deduced from the decoder configuration (see the config MIME
   parameter).

   Senders MUST use either IndexLength=0 or set all Index values in all
   packets to zero so that receivers CAN detect this as an indication
   that de-interleaving SHOULD be performed using time stamps.

   In cases where CTS is transported in MSLH

   When using the Sync Layer and when interleaving senders MUST use properly
   large values for
   SL.timeStampLength when interleaving (in order values large enough to prevent the CTS from
   rolling over). Pre-existing over more often than a packet loss burst length. Pre-
   existing SL streams that do not comply with this requirement cannot
   be interleaved using this payload format (or by using 3.8.2)

3.8.2 Index based interleaving

   If the AU duration is not constant (SLConfigDescriptor.durationFlag
   = 0) and CTS is not signaled  (SLConfigDescriptor.useTimeStampsFlag=
   0) or SL packets transport AU fragments, then the (IBI)

   The timestamp-based interleaving algorithm described in 3.8.1. would does
   not work because when a CTS cannot always be computed for all SL packets AU or AU
   fragments (for example after a packet loss). loss); this happens:
   . If the AU duration is not constant (SL durationFlag = 0) and CTS
   is not signaled  (SL useTimeStampsFlag= 0).
   . When interleaving AU fragments.

   When interleaving, senders of such streams MUST use the index-based
   technique described in this section.

   The conjunction of RTP sequence number, Index and IndexDelta can
   produce a quasi-unique identifier for each SL packet AU or AU fragment so that
   a receiver can unambiguously reconstruct the original order even in
   case of out-of-order packets, packet loss or duplication (see the
   pseudo code in 3.4.1 and 5.1).

Gentric et al.            Expires March 2002                        20
                RTP Payload Format for MPEG-4 Streams  September 2001

   This requires, however, that IndexLength is not too small. For that
   reason senders MUST use properly large values for IndexLength when interleaving in this fashion. fashion MUST use for
   IndexLength values large enough to prevent Index from rolling over
   more often than a typical loss burst loss. Pre-existing SL streams
   that do not comply with this requirement (specifically if
   SL.packetSeqNumLength

Gentric et al.            Expires March 2002                        18
                RTP Payload Format for MPEG-4 Streams  September 2001 is too small) cannot be interleaved using this
   payload format (or by using 3.8.1).

   Receivers CAN interpret non-zero values in the Index field as an
   indication that de-interleaving CAN be performed using Index and
   IndexDelta and CANNOT be performed using timestamps.

3.8.3 SL streams that cannot be interleaved

   SL streams for which both SL.timeStampLength and
   SL.packetSeqNumLength are too small cannot be interleaved with this
   payload format. Typically small values would cause a receiver to
   drop a large part of the stream in case of packet loss. The actual
   minimal value depends on network loss properties and on the expected
   quality of service.

3.9 Fragmentation Rules

   This section specifies rules for senders in order to prevent media
   decoding difficulties at the receiver end.

   MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams
   and SHOULD be mapped directly into RTP packets of this format with
   two exceptions:
   - Access Units larger than the MTU
   - When using interleaving for better packet loss resilience.

   In all cases Access Unit start MUST be aligned with SL packet start.

   This section gives rules to apply when performing Access Unit
   fragmentation. Let us first explain the context before describing
   the rules.

   Some MPEG-4 codecs define optional syntax for Access Units sub-
   entities (fragments) that are independently decodable for error
   resilience purposes. Examples are Video Packets for video and Error
   Sensitivity Categories (ESC) for audio. This always corresponds to
   specific bitstream syntax, which is signaled in the
   DecoderSpecificInfo inside the DecoderConfig in SLConfig, and/or
   using the corresponding parameters as described in section 4.1.
   Therefore encoders and
   Thanks to that decoders are both aware whether they encoders are operating in
   such a mode or not (however since this codec configuration is an
   opaque data block this is not explicitly signaled by this payload
   format).

   If not operating in such a mode it is obvious that the decoder has
   to skip packets after a loss until an Access Unit start is received.
   Similarly decoder implementations that do not implement robust
   decoding of Access Units fragments have to discard all packets after

Gentric et al.            Expires March 2002                        21
                RTP Payload Format for MPEG-4 Streams  September 2001

   a packet loss until an Access Unit start is received. In the same
   way decoder implementations that do not implement re-synchronization
   at any Access Units start have to discard all packets after a packet
   loss until a Random Access Point Access Unit is received. These are
   all obvious things that a good implementation would do.

   However serious problems would arise for decoder implementations
   that try to restart decoding after a packet loss if independently

Gentric et al.            Expires March 2002                        19
                RTP Payload Format for MPEG-4 Streams  September 2001
   decodable fragments are signaled (in the decoder configuration) but
   the fragments actually received are not independently decodable
   because the RTP sender has made RTP packets on different boundaries
   than the fragments provided by the encoder (so this issue applies to
   the interface between the encoder and the RTP sender and to the RTP
   sender component itself), because the decoder has in general no way
   to detect such a faulty fragment.

   For this reason the following rules must apply to SL streams that
   are specifically made for transport with this payload format:

   SL packets SHOULD reason the following rules must be codec-semantic entities in applied:

   In the spirit of ALF
   i.e. this payload format should transport either
   complete Access Units or fragments of Access Units that are
   independently decodable. Specifically when a given codec has an
   independently decodable Access Unit fragments optional syntax this
   option SHOULD be used.

   Furthermore when streams are generated using independently decodable
   Access Units fragments these Access Units fragments MUST be mapped
   one-to-one into SL packets. Consequently independently

   Independently decodable Access Units fragments MUST NOT be split
   across several SL packets
   and therefore MUST NOT be split across several RTP packets.

   For example an MPEG-4 audio stream encoded using the ESC syntax MUST
   NOT split one ESC across 2 RTP packets.

   This rule is relaxed when using MPEG-4 Video Packets for two
   reasons: firstly Video Packets can be much larger than typical MTU
   and secondly all Video Packets start with a specific
   resynchronization marker that can be unambiguously detected.
   Therefore for video streams using the Video Packet syntax Video
   Packets MAY be split across several SL packets although it is
   strongly RECOMMENDED to always adapt the Video Packet size to fit
   the MTU. A However a Video Packet start MUST always be aligned with a SL
   packet an
   AU fragment start, except when a GOV is present, in which case the
   GOV and the first Video Packet of the following VOP MUST be included
   in the same SL packet.

4. Types and Names

   This section describes the MIME types and names associated with this
   payload format. Section 4.1 is intended for registration with IANA registers the MIME types, as in per RFC
   2048.

   This format may require additional information about the mapping to
   be made available to the receiver. This is done using parameters
   described in the next section. The absence of any of these fields is
   equivalent to a field set to the default value, which is always

Gentric et al.            Expires March 2002                        22
                RTP Payload Format for MPEG-4 Streams  September 2001

   zero. The absence of any such parameters resolves into a default
   "basic" configuration compatible with RFC3016 for MPEG-4 video.

   In the MPEG-4 framework the SL stream configuration information is
   carried using the Object Descriptor. For compatibility with

Gentric et al.            Expires March 2002                        20
                RTP Payload Format for MPEG-4 Streams  September 2001
   receivers that do not implement the full MPEG-4 system specification
   this information MAY also be signaled using parameters described
   here. When such information is present both in an Object Descriptor
   and as a parameter of this payload format it MUST be exactly the
   same.

   For transport of MPEG-4 audio and video without the use of MPEG-4
   systems, as well as to support non-MPEG-4 system receivers, it is
   also possible to transport information on the profile and level of
   the stream and on the decoder configuration. This is also described
   in the next section.

   Finally this MIME type also defines a mode parameter and a profile
   parameter that are intended for future derivations of this payload
   format.

4.1 MIME type registration

   MIME media type name:  "video" or "audio" or "application"

   "video" SHOULD be used for MPEG-4 Visual streams (i.e. video as
   defined in ISO/IEC 14496-2 [2] and/or graphics as defined in ISO/IEC
   14496-1 [1]) or MPEG-4 Systems streams that convey information
   needed for an audio/visual presentation.

   "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or
   MPEG-4 Systems streams that convey information needed for an audio
   only presentation.

   "application" SHOULD be used for MPEG-4 Systems streams
   (ISO/IEC14496-1) that serve other purposes than audio/visual
   presentation, e.g. in some cases when MPEG-J streams are
   transmitted.

   MIME subtype name: mpeg4-generic

   Required parameters: none

   Optional parameters:

   Mode:

   mode:
   The mode in which this specification is used. This specification
   itself defines only the default mode (Mode=default). When the mode
   parameter is not present the default mode SHALL be assumed. In the
   default mode all parameters are optional and as defined here. Other
   modes may be defined as needed in other RFCs. A mode MUST be a
   subset of this specification. Specifically when defining a mode care
   MUST be taken that an implementation of this specification can

Gentric et al.            Expires March 2002                        23
                RTP Payload Format for MPEG-4 Streams  September 2001

   decode the payload format corresponding to this new mode. For this
   reason a mode MUST NOT specify new default values for MIME
   parameters and MIME parameters MUST be present (unless they have the
   default value) even if it is redundant in case the mode assigns
   fixed values. A mode may define additionally that some MIME

Gentric et al.            Expires March 2002                        21
                RTP Payload Format for MPEG-4 Streams  September 2001
   parameters are required instead of optional, that some MIME
   parameters have fixed values (or ranges), and that there are rules
   restricting the usage (for example forbidding the carriage of
   multiple AU fragments in the same RTP packet).

   Profile:

   profile:
   The meaning of this parameter may be defined by a mode. This is
   meant to be used in order to define sub-configurations of a given
   mode, for example the maximum delay (and therefore the size of
   buffers) induced by the usage of interleaving. Implementations of
   this specification can ignore this parameter.

   DTSDeltaLength:
   The number of bits on which the DTSDelta field is encoded in MSLH. each
   Payload Header. The default value is zero and indicates the absence
   of DTSFlag and DTSDelta in MSLH the Payload Header (the stream does not
   transport decodingTimeStamps). A value larger than zero indicates
   that there is a DTSFlag in each
   MSLH. Payload Header. Since
   decodingTimeStamp, if present, must be encoded as a difference to
   the RTP time stamp, the DTSDeltaLength parameter MUST be present in
   order to transport decodingTimeStamps with this payload format.

   CTSDeltaLength:
   The number of bits on which the CTSDelta field is encoded in (non-
   first) MSLH. encoded. The
   default value is zero and indicates the absence of the CTSFlag and
   CTSDelta fields in MSLH. Payload Header. Non-zero values MUST NOT be
   signaled in the Single-SL "Single" mode. Since compositionTimeStamps, if
   present, must be encoded as a difference to the RTP time stamp, the
   CTSDeltaLength parameter MUST be present in order to transport
   compositionTimeStamps using this payload format (in the Multiple-SL "Multiple"
   mode). However CTSDeltaLength SHOULD be set to zero (or not
   signaled) for streams that have a constant Access Unit duration
   (which can be explicitly signaled using the DurationFlag and
   AccessUnitDuration field of SLConfigDescriptor).

   OCRDeltaLength:
   The number of bits on which the OCRDelta field is encoded in RSLH.
   The default value is zero and indicates the absence of OCR for this
   stream. Since objectClockReference -if present- must be encoded as a
   difference to the RTP time stamp, the OCRDeltaLength parameter MUST
   be present in order to transport objectClockReferences with this
   payload format.

   SizeLength:
   The number of bits on which the PayloadSize field of MSLH a Payload
   Header is encoded. The default value is zero and indicates the Single-SL
   "Single" mode (unless ConstantSize is present). Simultaneous
   presence of this parameter and ConstantSize is illegal. Either the

Gentric et al.            Expires March 2002                        24
                RTP Payload Format for MPEG-4 Streams  September 2001

   SizeLength or ConstantSize parameter MUST be present in order to
   signal the
   Multiple-SL "Multiple" mode of this payload format.

   ConstantSize:

Gentric et al.            Expires March 2002                        22
                RTP Payload Format for MPEG-4 Streams  September 2001
   The constant size in bytes octets of each SL Packet AU or AU fragment Payload for
   this stream. The default value is zero and indicates variable SL Packet AU or
   AU fragment Payload size (or the Single-SL "Single" mode if SizeLength is
   absent). Simultaneous presence of this parameter and SizeLength is
   illegal. Either the SizeLength or ConstantSize parameter MUST be
   present in order to signal the Multiple-SL "Multiple" mode of this payload
   format. When ConstantSize is present the PayloadSize field of MSLH the
   Payload Header in the RTP packets MUST NOT be present.

   IndexLength:
   The number of bits on which the Index is encoded in the first MSLH.
   Payload Header of a RTP packet. The default value is zero and
   indicates the absence of Index and IndexDelta for all MSLHs. Payload
   Headers. Since packetSequenceNumber SL.packetSequenceNumber -if present- must be mapped
   in MSLH, PayloadHeader, the IndexLength parameter MUST be present in order
   to transport packetSequenceNumber SL.packetSequenceNumber with this payload format.

   IndexDeltaLength:
   The number of bits on which the IndexDelta are encoded in any non-
   first MSLH. Payload Header. The default value is zero and indicates that
   packetSequenceNumber
   the serial number MUST be incremented by one for each SL packet AU or AU
   fragment in the RTP packet (see section 3.5). IndexDeltaLength
   parameter MUST be present when using interleaving with this payload
   format.

   RSLHSectionSizeLength:
   The number of bits that is used to encode the RSLHSectionSize field.
   The default value is zero and indicates the absence of the whole
   RSLHSection for all RTP packets of this stream.

   SLConfigDescriptor:
   A base-64 encoding of the SLConfigDescriptor. This SHALL be the
   original SLConfigDescriptor and it SHALL be the same as the one
   transported by the OD framework, if any.

   profile-level-id:
   A decimal representation of the MPEG-4 Profile Level indication
   value. For audio this parameter indicates which MPEG-4 Audio tool
   subsets are applied to encode the audio stream and is defined in
   ISO/IEC 14496-1 [1]. For video this parameter indicates which MPEG-4
   Visual tool subsets are applied to encode the video stream and is
   defined in Table G-1 of ISO/IEC 14496-2 [2]. This parameter MAY be
   used in the capability exchange or session setup procedure to
   indicate MPEG-4 Profile and Level combination of which the relevant
   MPEG-4 media codec is capable. If this parameter is not specified
   its default value is 1 (Simple Profile/Level 1) for video (for
   compatibility with RFC 3016) and otherwise 0xFE (defined in ISO/IEC
   14496-1 [1] as being the generic default value).

   Config:

Gentric et al.            Expires March 2002                        25
                RTP Payload Format for MPEG-4 Streams  September 2001

   config:
   A hexadecimal representation of an octet string that expresses the
   media payload configuration. Configuration data is mapped onto the
   octet string in an MSB-first basis. The first bit of the
   configuration data SHALL be located at the MSB of the first octet.
   In the last octet, zero-valued padding bits, if necessary, shall

Gentric et al.            Expires March 2002                        23
                RTP Payload Format for MPEG-4 Streams  September 2001
   follow the configuration data. For audio streams, config is the
   audio object type specific decoder configuration data
   AudioSpecificConfig() as defined in ISO/IEC 14496-3 [3]. For video
   this expresses the MPEG-4 Visual configuration information, as
   defined in subclause 6.2.1 Start codes of ISO/IEC14496-2 [2] and the
   configuration information indicated by this parameter SHALL be the
   same as the configuration information in the corresponding MPEG-4
   Visual stream, except for first-half-vbv-occupancy and latter-half-
   vbv-occupancy, if it exists, which may vary in the repeated
   configuration information inside an MPEG-4 Visual stream (See 6.2.1
   Start codes of ISO/IEC14496-2).

   StreamType:
   The integer value that indicates the type of MPEG-4 stream that is
   carried; its coding corresponds to the values of the streamType as
   defined for the DecoderConfigDescriptor in ISO/IEC 14496-1.

   Encoding considerations:
   System bitstreams MUST be generated according to MPEG-4 System
   specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated
   according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio
   bitstreams MUST be generated according to MPEG-4 Audio
   specifications (ISO/IEC 14496-3). All If the Sync Layer is used SL
   streams MUST be generated according to MPEG-4 Sync Layer
   specifications (ISO/IEC 14496-1 section 10), then in order to read
   the RSLH parts of this format the SLConfigDescriptor may
   be is required.
   These bitstreams are binary data and MUST be encoded for non-binary
   transport (for Email, the Base64 encoding is sufficient).  This type
   is also defined for transfer via RTP.  The RTP packets MUST be
   packetized according to the RTP payload format defined in RFC <self-reference-to-this>. <self-
   reference-to-this>.

   Security considerations:
   As in RFC <self-reference-to-this>.

   Interoperability considerations:
   MPEG-4 provides a large and rich set of tools for the coding of
   visual objects.  For effective implementation of the standard,
   subsets of the MPEG-4 tool sets have been provided for use in
   specific applications. These subsets, called 'Profiles', limit the
   size of the tool set a decoder is required to implement. In order to
   restrict computational complexity, one or more 'Levels' are set for
   each Profile. A Profile@Level combination allows:
   . a codec builder to implement only the subset of the standard he
   needs, while maintaining interoperability with other MPEG-4 devices
   included in the same combination, and

Gentric et al.            Expires March 2002                        26
                RTP Payload Format for MPEG-4 Streams  September 2001

   . checking whether MPEG-4 devices comply with the standard
   ('conformance testing').
   A stream SHALL be compliant with the MPEG-4 Profile@Level specified
   by the parameter "profile-level-id". Interoperability between a
   sender and a receiver may be achieved by specifying the parameter
   "profile-level-id" in MIME content, or by arranging in the
   capability exchange/announcement procedure to set this parameter
   mutually to the same value.

Gentric et al.            Expires March 2002                        24
                RTP Payload Format for MPEG-4 Streams  September 2001

   Published specification:
   The specifications for MPEG-4 streams are presented in ISO/IEC
   14469-1, 14469-2, and 14469-3.  The RTP payload format is described
   in RFC <self-reference-to-this>.

   Applications that use this media type:
   Multimedia streaming and conferencing tools, Internet messaging and
   Email applications.

   Additional information: none

   Magic number(s): none

   File extension(s):
   None. A file format with the extension .mp4 has been defined for
   MPEG-4 content but is not directly correlated with this MIME type
   which sole purpose is RTP transport.

   Macintosh File Type Code(s): none

   Person & email address to contact for further information:
   Authors of RFC <self-reference-to-this>.

   Intended usage: COMMON

   Author/Change controller:
   Authors of RFC <self-reference-to-this>.

4.2 Concatenation of parameters

   Multiple parameters SHOULD be expressed as a MIME media type string,
   in the form of a semicolon-separated list of parameter=value pairs
   (see examples below).

4.3 Usage of SDP

4.3.1 The a=fmtp keyword

   It is assumed that one typical way to transport the above-described
   parameters associated with this payload format is via an SDP [10]
   message for example transported to the client in reply to a RTSP
   [13] DESCRIBE message or via SAP [14]. In that case the (a=fmtp)
   keyword MUST be used as described in RFC 2327 [10, section 6]. The
   syntax being then:

Gentric et al.            Expires March 2002                        27
                RTP Payload Format for MPEG-4 Streams  September 2001

   a=fmtp:<format> <parameter name>=<value>

4.3.2 SDP example

   The following is an example of SDP syntax for the description of a
   session containing one MPEG-4 video, one MPEG-4 audio stream and
   three MPEG-4 system streams, the first one being BIFS, the second

Gentric et al.            Expires March 2002                        25
                RTP Payload Format for MPEG-4 Streams  September 2001
   one OD and the third one IPMP. All are transported using this format
   and the AVP profile [12]. Note the usage of some MIME parameters:
   all stream display their streamtype; the video stream uses DTS with
   DTSDelta encoded on 4 bits; the audio stream uses the multiple-SL "Multiple"
   mode with 12 bits to describe the size of each SL packet AU or AU fragment
   payload. See the Appendix for more examples.

   o= ....
   I= ....
   c=IN IP4 123.234.71.112

   m=video 1034 RTP/AVP 97
   a=fmtp:97 StreamType=4;DTSDeltaLength=4
   a=rtpmap:97 mpeg4-generic

   m=audio 1810  RTP/AVP 98
   a=fmtp:98 StreamType=5; SizeLength=12; profile-level-id=1;
   config=7866E7E6EF
   a=rtpmap:98 mpeg4-generic

   m=application 1234  RTP/AVP 99
   a=rtpmap:99 mpeg4-generic
   a=fmtp:99 StreamType=3; StreamType=3

   m=application 1236  RTP/AVP 99
   a=rtpmap:99 mpeg4-generic
   a=fmtp:99 StreamType=1; StreamType=1

   m=application 1238  RTP/AVP 99
   a=rtpmap:99 mpeg4-generic
   a=fmtp:99 StreamType=7; StreamType=7

5. Other issues

5.1 SL packetized stream reconstruction

   The purpose of this section is to document how a receiver can
   reconstruct a valid SL packetized stream. Since this format directly
   transports SL packets this reconstruction is performed by reversing
   the payload structure rules (section 3). We explicitly describe here
   the most complex transformations.

   In the following let (i) be the index of SL packets inside one RTP
   packet (starting at zero for each RTP packet), let SLPacketHeader.x

Gentric et al.            Expires March 2002                        28
                RTP Payload Format for MPEG-4 Streams  September 2001

   denote field x of the reconstructed SL packet header, let MSLH.x
   PayloadHeader.x denote field x of the received MSLH, PayloadHeader, etc.

   SLPacketHeader.packetSequenceNumber is restored from MSLH.Index
   PayloadHeader.Index and
   MSLH.IndexDelta PayloadHeader.IndexDelta using:

   If ( IndexLength == 0) { // or is absent
      if ( SLConfig.packetSeqNumLength == 0 ) {
          // this stream does not have SL packet sequence number

Gentric et al.            Expires March 2002                        26
                RTP Payload Format for MPEG-4 Streams  September 2001
      }
      else {
          // illegal, normally the sender MUST map
          // SLPacketHeader.packetSequenceNumber in MSLH PayloadHeader
          // and set a relevant IndexLength value;
          // otherwise it is unfortunately impossible for the receiver
          // to reconstruct the correct sequence
      }
   }
   else { // IndexLength is not zero
      if ( SLConfig.packetSeqNumLength == 0 ) {
          // the original SL stream does not have SL packet
          // sequence numbers, typically the sender inserted them
          // in order to implement interleaving at the RTP level;
          // they must be ignored for SL stream reconstruction
      }
      else {
         if (i == 0){ // first SL packet in RTP packet
           SLPacketHeader.packetSequenceNumber(0) = MSLH.Index(0);
   PayloadHeader.Index(0);
         }
         else { // remaining SL packets
           SLPacketHeader.packetSequenceNumber(i+1)=
              SLPacketHeader.packetSequenceNumber(i)
              + MSLH.IndexDelta(i+1) PayloadHeader.IndexDelta(i+1)
              +1;
         }
   }

   All time stamps (CTS, DTS, OCR), when present, are restored from the
   delta values. Time stamps flags (CTSFlag, DTSFlag) in MSLH PayloadHeader
   are used to reconstruct respectively the compositionTimeStampFlag
   and decodingTimeStampFlag of SLPacketHeader. The function
   corrected(x) for the RTP time stamp transformation is the mapping
   from 32 bits to SLConfig.timeStampLength, which may be smaller or
   larger than 32 bits:

   If (timeStampLength < 32 ) { // short SL time stamps
      corrected(x) = LSB(x); // only the timeStampLength LSBits of x
   }
   else If (timeStampLength > 32 ) { // long SL time stamps
      corrected(x) = x + m; // start with m=0
      if ( x(i) < x(i-1) ) { // 32 bits RTPTS roll over has occurred
      {

Gentric et al.            Expires March 2002                        29
                RTP Payload Format for MPEG-4 Streams  September 2001

          m += 2^32;
      }
   }
   else If (timeStampLength = 32 ) { // recommended value
      corrected(x) = x; // direct mapping
   }

   if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent
      // CTS is not transported for this RTP stream

Gentric et al.            Expires March 2002                        27
                RTP Payload Format for MPEG-4 Streams  September 2001
      if (i == 0){ // first SL packet in RTP packet
         if ( SLConfig.useTimeStamps == 1 ) {
            if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
               SLPacketHeader.compositionTimeStampFlag(0) = 1;
               SLPacketHeader.compositionTimeStamp(0) =
                   corrected(RTP TimeStamp);
            }
            else {
               // ignore
            }
         }
         else {
             // empty
         }
      }
      else { // non-first SL packets in RTP packet
         if ( SLConfig.useTimeStamps == 1 ) {
             if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
                SLPacketHeader.compositionTimeStampFlag(i) = 0;
             }
             else {
                // ignore
             }
         }
         else {
             // empty
         }
      }
   }
   else { // CTSDeltaLength is not zero
      // CTS is transported for this stream
      if ( SLConfig.useTimeStamps == 1 ) {
         if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
             SLPacketHeader.compositionTimeStampFlag(i) =
                      MSLH.CTSFlag(i);
                      PayloadHeader.CTSFlag(i);
             SLPacketHeader.compositionTimeStamp(i) =
                    corrected(RTP TimeStamp) + MSLH.CTSDelta(i);
   PayloadHeader.CTSDelta(i);
         }
         else {
            // ignore CTSFlag (which must be zero)
         }
      else {

Gentric et al.            Expires March 2002                        30
                RTP Payload Format for MPEG-4 Streams  September 2001

         // this is strange and sub-optimal at best
         // a receiver should ignore this
      }
   }

   if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent
      // DTS is not transported for this stream
      if ( SLConfig.useTimeStamps == 1 ) {
         if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
             SLPacketHeader.decodingTimeStampFlag(i) = 0;
         }

Gentric et al.            Expires March 2002                        28
                RTP Payload Format for MPEG-4 Streams  September 2001
         else {
             // ignore
         }
      }
      else {
          // empty
      }
   }
   else {
      // DTS is transported for this stream
      if ( SLConfig.useTimeStamps == 1 ) {
         if ( SLPacketHeader.accessUnitStartFlag == 1 ) {
              SLPacketHeader.decodingTimeStampFlag(i) =
                  MSLH.DTSFlag(i);
                  PayloadHeader.DTSFlag(i);
              SLPacketHeader.decodingTimeStamp(i)=
                  SLPacketHeader.compositionTimeStamp(i)
                  - MSLH.DTSDelta(i); PayloadHeader.DTSDelta(i); // DTS <= CTS always
         }
         else {
             // ignore DTSFlag (which must be zero)
         }
      }
      else {
         // this is strange and sub-optimal at best
         // a receiver should ignore this
      }
   }

   if ( OCRDeltaLength == 0) { // or OCRDeltaLength is absent
      // the RTP stream does not transport any OCR
      if ( SLConfig.OCRLenght == 0 ) {
          // this stream does not have any OCR
      }
      else {
          // illegal, normally the sender MUST detect
          // OCRs, replace them with OCRDelta and set
          // a relevant OCRDeltaLength value
      }
   }
   else {
      if ( SLConfig.OCRLenght == 0 ) {
         // this is strange and sub-optimal at best

Gentric et al.            Expires March 2002                        31
                RTP Payload Format for MPEG-4 Streams  September 2001

         // a receiver should ignore this
      }
      else {
          SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i);
          if ( SLPacketHeader.OCRflag(i) == 1) {
               SLPacketHeader.objectClockReference(i) =
                    corrected(RTP TimeStamp) + RSLH.OCRDelta(i);
          }
      }
   }

Gentric et al.            Expires March 2002                        29
                RTP Payload Format for MPEG-4 Streams  September 2001

   In the SingleSL "Single" mode the AccessUnitEndFlag, if needed, is restored
   from the M bit, as follows:

   if ( SLConfig.useAccessUnitEndFlag == 0 ) {
       // this SL stream does not signal access unit ends
   else {
       SLPacketHeader.AccessUnitEndFlag = M bit;
   }

   In the multipleSL "Multiple" mode the AccessUnitEndFlag is untouched in RSLH.

   The other SL packet header fields SHALL remain as found in RSLH.

   It is obvious that in the general case the reconstruction of the
   original SL packetized stream requires SL-awareness. However this
   payload format allows in all cases a receiver that does not know
   about the SL syntax to reconstruct the semantic of SL Elementary
   Streams for the following very useful features:
   - Packet order (decoding order)
   - Access Unit boundaries (using the M bit)
   - Access Unit fragments (i.e. SL packet (fragment boundaries using
   MSLH.PayloadSize) PayloadSize)
   - Composition Time Stamps (using the Stamps, according to:
      compositionTimeStamp(i) = RTP Time Stamp and
   MSLH.CTSDelta) TimeStamp + CTSDelta(i);
   - Decoding Time Stamps (using the RTP Time Stamp and MSLH.DTSDelta) Stamps, according to:
      decodingTimeStamp(i) = compositionTimeStamp(i) - DTSDelta(i);
   - Packet sequence number (using the serial number, according to:
      if (i == 0){ // first SL packet in RTP Time Sequence packet
           packet serial number(0) = Index(0);
         }
         else { // remaining SL packets
           packet serial number and
   MSLH.Index) (i+1) = packet serial number (i)
               + IndexDelta(i+1) + 1;
         }

5.2 Handling of scene description streams

   MPEG-4 introduces new stream types as described in section 1 namely
   Object Descriptors and BIFS. In the following both OD and BIFS are
   discussed on the same basis i.e. as "scene description".

Gentric et al.            Expires March 2002                        32
                RTP Payload Format for MPEG-4 Streams  September 2001

   Considering scene description as a "stream-able" type of content is
   a rather new concept and for that reasons some specific comments are
   needed.

   Typically scene descriptions are encoded in such a way that
   information loss would in the general case cripple the presentation
   beyond any hope of repair by the receiver. Still this is well suited
   for a number of multimedia applications were the scene is first made
   available via reliable channels to the client and then played. This
   payload format is not intended for this type of applications for
   which download of MPEG-4 interchange (.mp4) files is typical.
   However this payload format can also be used. It is then RECOMMENDED
   that the RTP packets should be transported using TCP (for example
   inside RTSP as described in [13, section 10.12]) or any other
   reliable protocol.

   On the other hand MPEG-4 has introduced the possibility to
   dynamically change the scene description by sending animation

Gentric et al.            Expires March 2002                        30
                RTP Payload Format for MPEG-4 Streams  September 2001
   information (changes in parameters) and structural change
   information (updates). Since this information has to be sent in a
   timely fashion MPEG-4 has defined a number of techniques in order to
   encode the scene description in a manner that makes it behave
   similarly to other temporal encoding schemes such as audio and
   video. This payload format is intended for this usage.

   Note that in many cases the application will consist of first the
   reliable transmission of a static initial scene followed by the
   streaming of animations and updates. For this reason the usage of
   this payload format is attractive since it offers a unique solution.

   Senders must be aware that suitable schemes should be used when
   scene description streams transport sensitive configuration
   information. For example in case the RTP packet transporting an OD-
   update command would be lost, the corresponding media stream would
   not be accessible by the receiver.

   Redundancy is a possibility and may either be added by tools
   hierarchically higher than this payload format, e.g. by packet based
   FEC, re-transmission, or similar tools. In such a case, the general
   congestion control principles have to be observed.

   Since BIFS and OD streams may be modified during the session with
   update commands, there is a need to send both update commands and
   full BIFS/OD refresh. For that reason MPEG-4 defines Random Access
   Points (RAP) for scene description streams (OD and BIFS) where by
   definition a decoder can restart decoding i.e. receives a "full
   update" of the scene. This mechanism is called Scene and Object
   Description Carousel. The AU Sequence Number field of SL Packet
   Header is used to support this behavior at the Synchronization Sync Layer. When two
   access units are sent consecutively with the same AU Sequence
   Number, the second one is assumed to be a semantic repetition of the
   first. If a receiver starts to listen in the middle of a session or
   has detected losses, it can skip all received Access Units until

Gentric et al.            Expires March 2002                        33
                RTP Payload Format for MPEG-4 Streams  September 2001

   such a RAP. The periodicity of transmission of these RAPs should be
   chosen/adjusted depending on the application and the network it is
   deployed on; i.e. exactly like Intra-coded frames for video, it is
   the responsibility of the sender to make sure the periodicity of
   RAPs is suitable.

5.3 Multiplexing

   An advanced MPEG-4 session may involve a large number of objects
   that may be as many as a few hundred, transporting each ES as an
   individual RTP stream may not always be practical. Allocating and
   controlling hundreds of destination addresses for each MPEG-4
   session may pose insurmountable session administration problems.
   The input/output processing overhead at the end-points will be
   extremely high also. Additionally, low delay transmission of low
   bitrate data streams, e.g. facial animation parameters, results in
   extremely high header overheads.

Gentric et al.            Expires March 2002                        31
                RTP Payload Format for MPEG-4 Streams  September 2001

   To solve these problems, MPEG-4 data transport requires a
   multiplexing scheme that allows selective bundling of several ESs.
   This is beyond the scope of the payload format defined here.

   The MPEG-4's Flexmux multiplexing scheme may be used for this
   purpose and a specific RTP payload format is being developed [11].

   Another approach may be to develop a generic RTP multiplexing scheme
   usable for MPEG-4 data. The multiplexing scheme reported in [8] may
   be a candidate for this approach.

   For MPEG-4 applications, the multiplexing technique needs to address
   the following requirements:

   i. The ESs multiplexed in one stream can change frequently during a
   session. Consequently, the coding type, individual packet size and
   temporal relationships between the multiplexed data units must be
   handled dynamically.

   ii. The multiplexing scheme should have a mechanism to determine the
   ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is
   not a part of the SL header.

   iii. In general, an SL packet does not contain information about its
   size. The multiplexing scheme should be able to delineate the
   multiplexed packets whose lengths may vary from a few bytes octets to
   close to the path-MTU.

5.5 Overlap with RFC 3016

   This payload format has been designed to have a (large) overlap with
   RFC 3016 [7]. The conditions for this overlap are:
   Conditions for RFC 3016:
   i. MPEG-4 video elementary streams only

Gentric et al.            Expires March 2002                        34
                RTP Payload Format for MPEG-4 Streams  September 2001

   ii. There MUST be a single VOP or Video Packet per RTP packet (only
   recommended in RFC 3016)
   iii. The decoder configuration MUST be signaled out-of-band either
   using the Config mime parameter or using the OD framework
   Conditions for this payload format:
   i. No structural parameters defined (or all set to zero), i.e.
   Single-SL
   "Single" mode with empty MSLH Payload Header and empty RSLH.
   ii. Receivers MUST be ready to accept (and ignore) video
   configuration headers (e.g. VOSH, VO and VOL) and visual-object-
   sequence-end-code transported in-band.

6. Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [5]. This implies that confidentiality of the media
   streams is achieved by encryption. Because the data compression used
   with this payload format is applied end-to-end, encryption may be
   performed on the compressed data so there is no conflict between the

Gentric et al.            Expires March 2002                        32
                RTP Payload Format for MPEG-4 Streams  September 2001
   two operations. The packet processing complexity of this payload
   type (i.e. excluding media data processing) does not exhibit any
   significant non-uniformity in the receiver side to cause a denial-
   of-service threat.

   However, it is possible to inject non-compliant MPEG streams (Audio,
   Video, and Systems) to overload the receiver/decoder's buffers which
   might compromise the functionality of the receiver or even crash it.
   This is especially true for end-to-end systems like MPEG where the
   buffer models are precisely defined.

   MPEG-4 Systems supports stream types including commands that are
   executed on the terminal like OD commands, BIFS commands, etc. and
   programmatic content like MPEG-J (Java(TM) Byte Code) and
   ECMAScript. It is possible to use one or more of the above in a
   manner non-compliant to MPEG to crash or temporarily make the
   receiver unavailable.

   Authentication mechanisms can be used to validate of the sender and
   the data to prevent security problems due to non-compliant malignant
   MPEG-4 streams.

   A security model is defined in MPEG-4 Systems streams carrying MPEG-
   J access units which comprises Java(TM) classes and objects. MPEG-J
   defines a set of Java APIs and a secure execution model.  MPEG-J
   content can call this set of APIs and Java(TM) methods from a set of
   Java packages supported in the receiver within the defined security
   model. According to this security model, downloaded byte code is
   forbidden to load libraries, define native methods, start programs,
   read or write files, or read system properties.

   Receivers can implement intelligent filters to validate the buffer
   requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J,

Gentric et al.            Expires March 2002                        35
                RTP Payload Format for MPEG-4 Streams  September 2001

   ECMAScript) commands in the streams. However, this can increase the
   complexity significantly.

7. Acknowledgements

   This document evolved across several years thanks to contributions
   from a large number of people since it is based on work within the
   IETF AVT working group and various ISO MPEG working groups,
   especially the 4-on-IP ad-hoc group. The authors wish to thank
   Olivier Avaro, Stephen Casner, Guido Fransceschini, Art Howarth,
   Dave Mackie, Dave Singer, and Stephan Wenger for their valuable
   comments and support. Attentive readers and early implementers also
   found flaws and bugs, thank you all.

8. References

   [1] ISO/IEC 14496-1:2001 MPEG-4 Systems

   [2] ISO/IEC 14496-2:2001 MPEG-4 Visual

Gentric et al.            Expires March 2002                        33
                RTP Payload Format for MPEG-4 Streams  September 2001

   [3] ISO/IEC 14496-3:2001 MPEG-4 Audio

   [4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework.

   [5] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A
   Transport Protocol for Real Time Applications, RFC 1889, Internet
   Engineering Task Force, January 1996.

   [6] S. Bradner, Key words for use in RFCs to Indicate Requirement
   Levels, RFC 2119, Internet Engineering Task Force, March 1997.

   [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP
   payload format for MPEG-4 Audio/Visual streams, Internet Engineering
   Task Force, RFC 3016.

   [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed
   RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-04.txt, July
   2001.

   [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over
   IP-based Protocols, work in progress, draft-singer-mpeg4-ip-02.txt,
   May 2001.

   [10] M. Handley, V. Jacobson, SDP: Session Description Protocol, RFC
   2327, Internet Engineering Task Force, April 1998.

   [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed
   Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt,
   February 2001.

   [12] H. Schulzrinne, RTP Profile for Audio and Video Conferences
   with Minimal Control, RFC 1890, Internet Engineering Task Force,
   January 1996.

Gentric et al.            Expires March 2002                        36
                RTP Payload Format for MPEG-4 Streams  September 2001

   [13] H. Schulzrinne, A. Rao, R. Lanphier, Real Time Streaming
   Protocol, RFC 2326, Internet Engineering Task Force, April 1998.

   [14] M. Handley, C. Perkins, E. Whelan, Session Announcement
   Protocol, RFC 2974, Internet Engineering Task Force, October 2000.

9. Authors' Addresses

   Andrea Basso
   AT&T Labs Research
   200 Laurel Avenue
   Middletown, NJ 07748
   USA
   e-mail: basso@research.att.com

   M. Reha Civanlar
   AT&T Labs - Research
   200 Laurel Ave. South, A5 4D04

Gentric et al.            Expires March 2002                        34
                RTP Payload Format for MPEG-4 Streams  September 2001
   Middletown, NJ 07748
   USA
   e-mail: civanlar@research.att.com

   Philippe Gentric
   Philips Digital Networks, MP4Net
   51 rue Carnot
   92156 Suresnes
   France
   e-mail: philippe.gentric@philips.com

   Carsten Herpel
   THOMSON multimedia
   Karl-Wiechert-Allee 74
   30625 Hannover
   Germany
   e-mail: herpelc@thmulti.com

   Zvi Lifshitz
   Optibase Ltd.
   7 Shenkar St.
   Herzliya 46120
   Israel
   e-mail: zvil@optibase.com

   Young-kwon Lim
   mp4cast (MPEG-4 Internet Broadcasting Solution Consortium)
   1001-1 Daechi-Dong Gangnam-Gu
   Seoul, 305-333,
   Korea
   e-mail : young@techway.co.kr

   Colin Perkins

Gentric et al.            Expires March 2002                        37
                RTP Payload Format for MPEG-4 Streams  September 2001

   USC Information Sciences Institute
   4350
   3811 N. Fairfax Drive #620 suite 200
   Arlington, VA 22203
   USA
   e-mail : csp@isi.edu

   Jan van der Meer
   Philips Digital Networks
   Building WDB-1
   Prof Holstlaan 4
   5656 AA Eindhoven
   Netherlands
   e-mail : jan.vandermeer@philips.com

Gentric et al.            Expires March 2002                        35
                RTP Payload Format for MPEG-4 Streams  September 2001

APPENDIX: Examples of usage

   This payload format has been designed to transport efficiently a
   very versatile packetization scheme: the MPEG-4 Synch Layer; as a
   result its complexity is larger than the average RTP payload format.
   For this reason this section describes a number of key examples of how this payload
   format can be used. used either with or without the Sync Layer. In all
   examples however the Sync Layer syntax is given which shows how it
   becomes invisible in cases 1,3,4 and 5.

   A C++-like syntax called SDL (Syntactic Description Language)
   defined in [1, section 14] is used to economically describe MPEG-4
   system data structures.

   However, as discussed in section 2, this payload format can also be [1, section 14] is used without explicit knowledge of SL (logically equivalent to
   configuring the SL headers as being empty), several examples
   (Appendix 1,3,4,5) cover this case.

   Furthermore these economically describe MPEG-4
   system data structures.

   These examples assume that the (a=fmtp) SDP syntax is used to convey
   the MIME parameters of the payload format.

Appendix.1 RFC 3016 compatible MPEG-4 Video (no SL)

   This is an example of a video stream where the SL is configured to
   produce RTP packets compatible with RFC 3016.

SLConfigDescriptor

   In this example the SLConfigDescriptor is:

   class SLConfigDescriptor extends BaseDescriptor : bit(8)
   tag=SLConfigDescrTag {
    bit(8) predefined;
    if (predefined==0) {
     bit(1) useAccessUnitStartFlag; = 0
     bit(1) useAccessUnitEndFlag; = 1
     bit(1) useRandomAccessPointFlag; = 0
     bit(1) hasRandomAccessUnitsOnlyFlag; = 0
     bit(1) usePaddingFlag; = 0
     bit(1) useTimeStampsFlag; = 0

Gentric et al.            Expires March 2002                        38
                RTP Payload Format for MPEG-4 Streams  September 2001

     bit(1) useIdleFlag; = 0
     bit(1) durationFlag; = 0
     bit(32) timeStampResolution; = 0
     bit(32) OCRResolution; = 0
     bit(8) timeStampLength; = 0
     bit(8) OCRLength; = 0
     bit(8) AU_Length; = 0
     bit(8) instantBitrateLength; = 0
     bit(4) degradationPriorityLength; = 0
     bit(5) AU_seqNumLength; = 0
     bit(5) packetSeqNumLength; = 0
     bit(2) reserved=0b11;
    }
    if (durationFlag) {

Gentric et al.            Expires March 2002                        36
                RTP Payload Format for MPEG-4 Streams  September 2001
     bit(32) timeScale; // NOT USED
     bit(16) accessUnitDuration;  // NOT USED
     bit(16) compositionUnitDuration;  // NOT USED
    }
    if (!useTimeStampsFlag) {
     bit(timeStampLength) startDecodingTimeStamp; = 0
     bit(timeStampLength) startCompositionTimeStamp; = 0
    }
   }

SL Packet Header structure

   With this configuration we have the following SL packet header
   structure:

   aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
    if (SL.useAccessUnitEndFlag) {
     bit(1) accessUnitEndFlag; // 1 bit
    }
   }

   In this case this payload produces RTP packets that are exactly
   conformant to RFC 3016 and the Synch Layer SL is reduced to a purely logical
   construction that neither sender nor receiver need to implement.

Parameters

   This configuration is the default one; no parameters are required.

RTP packet structure

   Note that accessUnitEndFlag is mapped to the RTP header M bit.

   +=========================================+=============+
   | Field                                   |  size       |
   +=========================================+=============+
   | RTP header                              |    -        |
   +-----------------------------------------+-------------+
   | SL packet payload Access Unit or AU fragment              | 1400 bytes octets |

Gentric et al.            Expires March 2002                        39
                RTP Payload Format for MPEG-4 Streams  September 2001

   +-----------------------------------------+-------------+

Overhead

   In this example we have an RTP overhead of 40 bytes octets for 1400 bytes octets
   of payload i.e. 3 % overhead.

Appendix.2 MPEG-4 Video with SL

   Let us consider the case of a 30 frames per second MPEG-4 video
   stream which bit rate is high enough that Access Units have to be
   split in several SL packets (typically above 300 kb/s).

Gentric et al.            Expires March 2002                        37
                RTP Payload Format for MPEG-4 Streams  September 2001

   Let us assume also that the video codec generates in that case Video
   Packets suitable to fit in one SL packet i.e that the video codec is
   MTU aware and the MTU is 1500 bytes. octets. We assume furthermore that
   this stream contains B frames and that decodingTimeStamps are
   present.

SLConfigDescriptor

   In this example the SLConfigDescriptor is:

   class SLConfigDescriptor extends BaseDescriptor : bit(8)
   tag=SLConfigDescrTag {
    bit(8) predefined;
    if (predefined==0) {
     bit(1) useAccessUnitStartFlag; = 1
     bit(1) useAccessUnitEndFlag; = 0
     bit(1) useRandomAccessPointFlag; = 1
     bit(1) hasRandomAccessUnitsOnlyFlag; = 0
     bit(1) usePaddingFlag; = 0
     bit(1) useTimeStampsFlag; = 1
     bit(1) useIdleFlag; = 0
     bit(1) durationFlag; = 0
     bit(32) timeStampResolution; = 30
     bit(32) OCRResolution; = 0
     bit(8) timeStampLength; = 32
     bit(8) OCRLength; = 0
     bit(8) AU_Length; = 0
     bit(8) instantBitrateLength; = 0
     bit(4) degradationPriorityLength; = 0
     bit(5) AU_seqNumLength; = 0
     bit(5) packetSeqNumLength; = 0
     bit(2) reserved=0b11;
    }
    if (durationFlag) {
     bit(32) timeScale; // NOT USED
     bit(16) accessUnitDuration;  // NOT USED
     bit(16) compositionUnitDuration;  // NOT USED
    }
    if (!useTimeStampsFlag) {
     bit(timeStampLength) startDecodingTimeStamp; // NOT USED

Gentric et al.            Expires March 2002                        40
                RTP Payload Format for MPEG-4 Streams  September 2001

     bit(timeStampLength) startCompositionTimeStamp; // NOT USED
    }
   }

   The useRandomAccessPointFlag is set so that the
   randomAccessPointFlag can indicate that the corresponding SL packet
   contains a GOV and the first Video Packet of an Intra coded frame.

SL Packet Header structure

   With this configuration we have the following SL packet header
   structure:

   aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {

Gentric et al.            Expires March 2002                        38
                RTP Payload Format for MPEG-4 Streams  September 2001
    bit(1) accessUnitStartFlag; // 1 bit
    if (accessUnitStartFlag) {
      bit(1) randomAccessPointFlag; // 1 bit
      bit(1) decodingTimeStampFlag; // 1 bit
      bit(1) compositionTimeStampFlag; // 1 bit
      if (decodingTimeStampFlag) {
         bit(SL.timeStampLength) decodingTimeStamp;
      }
      if (compositionTimeStampFlag) {
         bit(SL.timeStampLength) compositionTimeStamp;
      }
   }

Parameters

   decodingTimeStamps are encoded on 32 bits, which is much more than
   needed for delta. Therefore the sender will use DTSDeltaLength to
   signal that only 7 bits are used for the coding of relative DTS in
   the RTP packet.

   The RSLHSectionSize cannot exceed 4 (bits), which is encoded on 3
   bits and signaled by RSLHSectionSizeLength. The resulting
   concatenated fmtp line is:

   a=fmtp:<format> DTSDeltaLength=7;RSLHSectionSizeLength=3

RTP packet structure

   Two cases can occur; for packets that transport first fragments of
   Access Units we have:

   +=========================================+=============+
   | Field                                   |  size       |
   +=========================================+=============+
   | RTP header                              |    -        |
   +-----------------------------------------+-------------+
   | DTSFlag = (1)                           |  1 bit      |
   +-----------------------------------------+-------------+
   | DTSDelta                                |  7 bits     |

Gentric et al.            Expires March 2002                        41
                RTP Payload Format for MPEG-4 Streams  September 2001

   +-----------------------------------------+-------------+
   | bits to byte octet alignment                 |  0 bits     |
   +-----------------------------------------+-------------+
   | RSLHSectionSize = (100)                 |  3 bits     |
   +-----------------------------------------+-------------+
   | accessUnitStartFlag = (1)               |  1 bit      |
   +-----------------------------------------+-------------+
   | randomAccessPointFlag                   |  1 bit      |
   +-----------------------------------------+-------------+
   | decodingTimeStampFlag                   |  1 bit      |
   +-----------------------------------------+-------------+
   | compositionTimeStampFlag                |  1 bit      |
   +-----------------------------------------+-------------+
   | bits to byte octet alignment =(0)            |  1 bit      |

Gentric et al.            Expires March 2002                        39
                RTP Payload Format for MPEG-4 Streams  September 2001
   +-----------------------------------------+-------------+
   | SL packet payload                       |  N bytes octets   |
   +-----------------------------------------+-------------+

   For packets that transport non-first fragments of Access Units we
   have:

   +=========================================+=============+
   | Field                                   |  size       |
   +=========================================+=============+
   | RTP header                              |    -        |
   +-----------------------------------------+-------------+
   | DTSFlag = 0                             |  1 bit      |
   +-----------------------------------------+-------------+
   | bits to byte octet alignment = (0000000)     |  7 bits     |
   +-----------------------------------------+-------------+
   | RSLHSectionSize = (001)                 |  3 bits     |
   +-----------------------------------------+-------------+
   | accessUnitStartFlag = (0)               |  1 bit      |
   +-----------------------------------------+-------------+
   | bits to byte octet alignment = (0000)        |  4 bits     |
   +-----------------------------------------+-------------+
   | SL packet payload                       |  N bytes octets   |
   +-----------------------------------------+-------------+

Overhead estimation

   In this example we have a RTP overhead of 40 + 2 bytes octets for 1400
   bytes
   octets of payload i.e. 3 % overhead.

Appendix.3 Low delay MPEG-4 Audio (no SL)

   This example is for a low delay audio service. For this reason a
   single SL packet Access Unit is transported in each RTP packet. Actually packet (in terms of
   Sync Layer each SL packet contains a complete Access Unit. Unit).

SLConfigDescriptor

Gentric et al.            Expires March 2002                        42
                RTP Payload Format for MPEG-4 Streams  September 2001

   Since CTS=DTS and Access Unit duration is constant signaling of
   MPEG-4 time stamps is not needed (the durationFlag of SLConfig is
   set)

   We also assume here an audio Object Type for which all Access Units
   are Random Access Points, which is signaled using the
   hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor.

   We assume furthermore a mode where the Access Unit size is constant
   and equal to 5 bytes octets (which is signaled with AU_Length).

   In this example the SLConfigDescriptor is:

   class SLConfigDescriptor extends BaseDescriptor : bit(8)

Gentric et al.            Expires March 2002                        40
                RTP Payload Format for MPEG-4 Streams  September 2001
   tag=SLConfigDescrTag {
    bit(8) predefined;
    if (predefined==0) {
     bit(1) useAccessUnitStartFlag; = 0
     bit(1) useAccessUnitEndFlag; = 0
     bit(1) useRandomAccessPointFlag; = 0
     bit(1) hasRandomAccessUnitsOnlyFlag; = 1
     bit(1) usePaddingFlag; = 0
     bit(1) useTimeStampsFlag; = 0
     bit(1) useIdleFlag; = 0
     bit(1) durationFlag; = 1 // signals constant AU duration
     bit(32) timeStampResolution; = 0
     bit(32) OCRResolution; = 0
     bit(8) timeStampLength; = 0
     bit(8) OCRLength; = 0
     bit(8) AU_Length; = 5
     bit(8) instantBitrateLength; = 0
     bit(4) degradationPriorityLength; = 0
     bit(5) AU_seqNumLength; = 0
     bit(5) packetSeqNumLength; = 0
     bit(2) reserved=0b11;
    }
    if (durationFlag) {
     bit(32) timeScale; = 1000 // for milliseconds
     bit(16) accessUnitDuration; = 10 // ms
     bit(16) compositionUnitDuration; = 10 // ms
    }
    if (!useTimeStampsFlag) {
     bit(timeStampLength) startDecodingTimeStamp; = 0
     bit(timeStampLength) startCompositionTimeStamp; = 0
    }
   }

SL packet header

   With this configuration the SL packet header is empty. The Synch Sync
   Layer is reduced to a purely logical construction that neither
   sender nor receiver need to implement.

Gentric et al.            Expires March 2002                        43
                RTP Payload Format for MPEG-4 Streams  September 2001

Parameters

   No parameters are required.

RTP packet structure

   Note that the RTP header M bit should be always set to 1.

   +=========================================+=============+
   | Field                                   |  size       |
   +=========================================+=============+
   | RTP header                              |    -        |
   +-----------------------------------------+-------------+
   | SL packet payload Access Unit                             |  5 bytes octets   |

Gentric et al.            Expires March 2002                        41
                RTP Payload Format for MPEG-4 Streams  September 2001
   +-----------------------------------------+-------------+

Overhead estimation

   The overhead is extremely large i.e. more than 800 %, since 40 bytes
   octets of headers are required to transport 5 bytes octets of data. Note
   however that RTP header compression would work well since time
   stamps increments are constant.

Appendix.4 Media delivery MPEG-4 Audio (no SL)

   This example is for a media delivery service where delay is not an
   issue but efficiency is. In this case several SL Packets Access Units are
   transported in each RTP packet.

SLConfigDescriptor

   Similar to previous example.

SL packet header

   With this configuration the SL packet header is empty. The Synch Sync
   Layer is reduced to a purely logical construction that neither
   sender nor receiver need to implement.

Parameters

   The absence of RSLHSectionSizeLength indicates that the RSLHSection
   is empty.

   The size of SL Packets (which are all complete Access Units in this
   case) is constant and is indicated  with:

   a=fmtp:<format> ConstantSize=5

Gentric et al.            Expires March 2002                        44
                RTP Payload Format for MPEG-4 Streams  September 2001

   This also indicates to the receiver that the Multiple-SL Multiple mode will be
   used, the 2 bytes octets field that would give the size of the
   MSLHSection
   PayloadHeaderSection is ommited since in this case this field always
   contains zero (the MSLHSection PayloadHeaderSection is always empty due to the
   absence of any other MIME parameter).

RTP packet structure

   Note that the RTP header M bit is always set to 1, which indicates
   to the receiver that only complete Access Units are transported.

   +=========================================+=============+
   | Field                                   |  size       |

Gentric et al.            Expires March 2002                        42
                RTP Payload Format for MPEG-4 Streams  September 2001
   +=========================================+=============+
   | RTP header                              |    -        |
   +-----------------------------------------+-------------+
   | SL packet payload Access Unit data                        |  5 bytes octets   |
   +-----------------------------------------+-------------+
   | SL packet payload Access Unit data                        |  5 bytes octets   |
   +-----------------------------------------+-------------+
   | etc, until MTU is reached                             |
   +-----------------------------------------+-------------+
   | SL packet payload Access Unit data                        |  5 bytes octets   |
   +-----------------------------------------+-------------+

Overhead estimation

   The overhead is 3% i.e. minimal.

Appendix.5 AAC with interleaving (no SL)

   Let us consider AAC at 128 kb/s where each Access Unit is in the
   average 320 bytes. octets. Interleaving is applied with a continuous
   interleaving scheme (see table below) where 4 Access Units are used
   to construct each RTP packet in order to match a MTU of 1500 bytes. octets.

   IndexDelta is constant and equal to 2 (since +1 is automatically
   added); it is encoded on 2 bits.

   As explained in section 3.8 this is a time stamp based interleaving
   (TSBI) scheme (IndexLength=0); indeed receivers know that each SL packet
   payload is a complete Access Unit because all RTP packets have the M
   bit set to 1 and therefore, since Access Unit duration is constant,
   Access Unit timestamps can be computed from RTP timestamps and
   IndexDelta values; this can be used for de-interleaving even in case
   of losses.

   Note that it would also be possible to use IndexLength=2 so as to
   maintain a byte octet alignement in the MSLH Payload Header portions; in this
   case however the value of these two bits MUST be zero as stated in
   3.8.1.

Gentric et al.            Expires March 2002                        45
                RTP Payload Format for MPEG-4 Streams  September 2001

   +-----------------------------------------------------------------+
   | RTP packet | RTP Timestamp |    Aus          |    IndexDelta    |
   +-----------------------------------------------------------------+
   |    1       |   CTS(AU1)    |             1   |  -               |
   +-----------------------------------------------------------------+
   |    2       |   CTS(AU2)    |          2, 5   |  -,2             |
   +-----------------------------------------------------------------+
   |    3       |   CTS(AU3)    |       3, 6, 9   |  -,2,2           |
   +-----------------------------------------------------------------+
   |    4       |   CTS(AU4)    |    4, 7,10,13   |  -,2,2,2         |
   +-----------------------------------------------------------------+
   |    5       |   CTS(AU8)    |    8,11,14,17   |  -,2,2,2         |
   +-----------------------------------------------------------------+
   |    6       |   CTS(AU12)   |   12,15,18,21   |  -,2,2,2         |

Gentric et al.            Expires March 2002                        43
                RTP Payload Format for MPEG-4 Streams  September 2001
   +-----------------------------------------------------------------+
   |    7       |   CTS(AU16)   |   16,19,22,25   |  -,2,2,2         |
   +----------------------------------------------------------------+
   |    8       |   CTS(AU20)   |   20,23,26,29   |  -,2,2,2         |
   +-----------------------------------------------------------------+
   |    9       |   CTS(AU24)   |   24,27,30,33   |  -,2,2,2         |
   +-----------------------------------------------------------------+
   |    10      |   CTS(AU28)   |   28,31,34,37   |  -,2,2,2         |
   +-----------------------------------------------------------------+
   |                              etc                                |
   +-----------------------------------------------------------------+

SLConfigDescriptor

   Similar to previous example.

SL Packet Header

   Similar to previous example (empty).

Parameters

   The resulting concatenated fmtp line is:

   a=fmtp:<format> SizeLength=9; IndexDeltaLength=2;

RTP packet structure

   +=========================================+=============+
   | Field                                   |  size       |
   +=========================================+=============+
   | RTP header                              |    -        |
   +-----------------------------------------+-------------+
                         MSLHSection
                      Payload Header Section
   +=========================================+=============+
   | MSLHSection PayloadHeaderSection size in bits = 42 bits     |  2 bytes octets   |
   +-----------------------------------------+-------------+

Gentric et al.            Expires March 2002                        46
                RTP Payload Format for MPEG-4 Streams  September 2001

   | PayloadSize                             |  9 bits     |
   +-----------------------------------------+-------------+
   | PayloadSize                             |  9 bits     |
   +-----------------------------------------+-------------+
   | IndexDelta                              |  2 bits     |
   +-----------------------------------------+-------------+
   | PayloadSize                             |  9 bits     |
   +-----------------------------------------+-------------+
   | IndexDelta                              |  2 bits     |
   +-----------------------------------------+-------------+
   | PayloadSize                             |  9 bits     |
   +-----------------------------------------+-------------+
   | IndexDelta                              |  2 bits     |
   +-----------------------------------------+-------------+
   | bits to byte octet alignment = (000000)      |  6 bits     |
   +-----------------------------------------+-------------+

Gentric et al.            Expires March 2002                        44
                RTP
                         Payload Format for MPEG-4 Streams  September 2001

                         SLPPSection Section
   +=========================================+=============+
   | AAC Access Unit                         |   x bytes octets  |
   +-----------------------------------------+-------------+
   | AAC Access Unit                         |   x bytes octets  |
   +-----------------------------------------+-------------+
   | AAC Access Unit                         |   x bytes octets  |
   +-----------------------------------------+-------------+
   | AAC Access Unit                         |   x bytes octets  |
   +-----------------------------------------+-------------+

Overhead estimation

   The MSLHSection PayloadHeaderSection is 8 bytes; octets; in this example we have
   therefore a RTP overhead of 40 + 8 bytes octets for 1400 bytes octets (approx)
   of payload i.e. around 4 % overhead.

Appendix.6 AAC with Index-based interleaving and SL

   Let us consider AAC around 130 kb/s where each Access Unit is split
   in 4 SL packets corresponding to Error Sensitivity Categories (ESC)
   of maximum 90 bytes octets for which interleaving is very useful in terms
   of error resilience. We thus use an interleaving scheme where 15 SL
   Packets (extracted from 15 consecutive Access Units) are used to
   construct each RTP packet in order to match a MTU of 1500 bytes. octets.
   Note that since ESC fragments are not byte octet aligned we also use the
   paddingFlag and paddingBits features of the Synch Sync Layer. The
   interleaving sequence is 4 RTP packets and 350 ms long, which is too
   long for conferencing but perfectly OK for Internet radio.

   Since the sequence contains 60 SL packets, IndexLength is set to 16
   bits so as to provide a safe margin in case of long loss bursts.
   This will also indicate to the receiver that this is a Index-Based-

Gentric et al.            Expires March 2002                        47
                RTP Payload Format for MPEG-4 Streams  September 2001

   Interleaving scheme (indeed CTS cannot be computed for SL packets
   that are not AU starts).

   2 bits are enough for IndexDelta, which is constant and equal to 3
   (since +1 is automatically added).

   Note that the 4th RTP packet in each sequence has its M bit set to 1
   since it contains 15 SL packets transporting the end of 15
   consecutive Access Units.

   With this scheme a sender (for example upon reception of RTCP
   reports indicating high loss rates) can (for example) choose to
   duplicate for each interleaving sequence the first RTP packet that
   contains the most useful data in terms of ESC or apply other error
   protection techniques, with due care to congestion issues.

Gentric et al.            Expires March 2002                        45
                RTP Payload Format for MPEG-4 Streams  September 2001

   In this example we will also show several other SL features (OCR, AU
   boundary flags, padding, as detailed below).

   One feature demonstrated by this example is the degradation
   priority. We assume degradation priority can take 4 different
   values, mapped to Error Sensitivity Categories, and is encoded on 2
   bits. This interleaving scheme makes sure that only SL packets of
   identical degradation priorities are grouped in the same RTP packet
   (3.6.3) and that only the first RSLH of each RTP packet transports
   the degradation priority.

   We also assume that for each last SL packet of each RTP packet the
   server inserts an OCR.

SLConfigDescriptor

   In this example the SLConfigDescriptor is:

   class SLConfigDescriptor extends BaseDescriptor : bit(8)
   tag=SLConfigDescrTag {
    bit(8) predefined;
    if (predefined==0) {
     bit(1) useAccessUnitStartFlag; = 1
     bit(1) useAccessUnitEndFlag; = 1
     bit(1) useRandomAccessPointFlag; = 0
     bit(1) hasRandomAccessUnitsOnlyFlag; = 1
     bit(1) usePaddingFlag; = 1 // we need to signal padding bits
     bit(1) useTimeStampsFlag; = 0
     bit(1) useIdleFlag; = 0
     bit(1) durationFlag; = 1
     bit(32) timeStampResolution; = 0
     bit(32) OCRResolution; = 30
     bit(8) timeStampLength; = 0
     bit(8) OCRLength; = 32
     bit(8) AU_Length; = 0
     bit(8) instantBitrateLength; = 0

Gentric et al.            Expires March 2002                        48
                RTP Payload Format for MPEG-4 Streams  September 2001

     bit(4) degradationPriorityLength; = 2
     bit(5) AU_seqNumLength; = 0
     bit(5) packetSeqNumLength; = 6
     bit(2) reserved=0b11;
    }
    if (durationFlag) {
     bit(32) timeScale; = 1000// milliseconds
     bit(16) accessUnitDuration; = 23.22 // ms
     bit(16) compositionUnitDuration; = 23.22 // ms
    }
    if (!useTimeStampsFlag) {
     bit(timeStampLength) startDecodingTimeStamp; = 0
     bit(timeStampLength) startCompositionTimeStamp; = 0
    }
   }

Gentric et al.            Expires March 2002                        46
                RTP Payload Format for MPEG-4 Streams  September 2001

SL Packet Header structure

   With this configuration we have the following SL packet header
   structure:

   aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) {
    bit(1) accessUnitStartFlag;
    bit(1) accessUnitEndFlag;
    bit(1) OCRflag;
    bit(1) paddingFlag;
    if (paddingFlag) bit(3) paddingBits;
    bit(SL.packetSeqNumLength) packetSequenceNumber;
    bit(1) DegPrioflag;
    if (DegPrioflag) {
     bit(SL.degradationPriorityLength) degradationPriority;}
    if (OCRflag) {
     bit(SL.OCRLength) objectClockReference;}
    }
   }

Parameters

   The resulting concatenated fmtp line is:

   a=fmtp:<format> SizeLength=7; RSLHSectionSizeLength=8;
   IndexLength=16; IndexDeltaLength=2; OCRDeltaLength=16

RTP packet structure

   +=========================================+=============+
   | Field                                   |  size       |
   +=========================================+=============+
   | RTP header                              |    -        |
   +-----------------------------------------+-------------+
                         MSLHSection
                       Payload Header Section
   +=========================================+=============+
   | MSLHSection Payload Header Section size in bits = 149 bits  |  2 bytes octets   |

Gentric et al.            Expires March 2002                        49
                RTP Payload Format for MPEG-4 Streams  September 2001

   +-----------------------------------------+-------------+
   | PayloadSize                             |  7 bits     |
   +-----------------------------------------+-------------+
   | Index                                   |  16 bits    |
   +-----------------------------------------+-------------+
   | PayloadSize                             |  7 bits     |
   +-----------------------------------------+-------------+
   | IndexDelta = (11)                       |  2 bits     |
   +-----------------------------------------+-------------+
   |            etc + 12 times 9 bits                      |
   +-----------------------------------------+-------------+
   | PayloadSize                             |  7 bits     |
   +-----------------------------------------+-------------+
   | IndexDelta = (11)                       |  2 bits     |
   +-----------------------------------------+-------------+
   | bits to byte octet alignment = (000)         |  3 bits     |

Gentric et al.            Expires March 2002                        47
                RTP Payload Format for MPEG-4 Streams  September 2001
   +-----------------------------------------+-------------+
                         RSLHSection
   +=========================================+=============+
   | RSLHSectionSize =  (10000111)           |  8 bits     |
   +-----------------------------------------+-------------+
   | accessUnitStartFlag                     |  1 bit      |
   +-----------------------------------------+-------------+
   | accessUnitEndFlag                       |  1 bit      |
   +-----------------------------------------+-------------+
   | OCRFlag = (0)                           |  1 bit      |
   +-----------------------------------------+-------------+
   | paddingFlag = (1)                       |  1 bit      |
   +-----------------------------------------+-------------+
   | paddingBits                             |  3 bits     |
   +-----------------------------------------+-------------+
   | DegPrioflag = (1)                       |  1 bit      |
   +-----------------------------------------+-------------+
   | degradationPriority                     |  2 bits     |
   +-----------------------------------------+-------------+
   | accessUnitStartFlag                     |  1 bit      |
   +-----------------------------------------+-------------+
   | accessUnitEndFlag                       |  1 bit      |
   +-----------------------------------------+-------------+
   | OCRFlag = (0)                           |  1 bit      |
   +-----------------------------------------+-------------+
   | paddingFlag = (1)                       |  1 bit      |
   +-----------------------------------------+-------------+
   | paddingBits                             |  3 bits     |
   +-----------------------------------------+-------------+
   | DegPrioflag = (0)                       |  1 bit      |
   +-----------------------------------------+-------------+
   |              etc + 12 times 8 bits                    |
   +-----------------------------------------+-------------+
   | accessUnitStartFlag                     |  1 bit      |
   +-----------------------------------------+-------------+
   | accessUnitEndFlag                       |  1 bit      |
   +-----------------------------------------+-------------+

Gentric et al.            Expires March 2002                        50
                RTP Payload Format for MPEG-4 Streams  September 2001

   | OCRFlag = (1)                           |  1 bit      |
   +-----------------------------------------+-------------+
   | OCRDelta                                |  16 bits    |
   +-----------------------------------------+-------------+
   | paddingFlag = (0)                       |  1 bit      |
   +-----------------------------------------+-------------+
   | DegPrioflag = (0)                       |  1 bit      |
   +-----------------------------------------+-------------+
   | bits to byte octet alignment = (000)         |  3 bits     |
   +-----------------------------------------+-------------+
                         SLPPSection
                         Payload Section
   +=========================================+=============+
   | SL packet payload                       |max 90 bytes | octets|
   +-----------------------------------------+-------------+
   |             etc + 13  SL packets                      |
   +-----------------------------------------+-------------+

Gentric et al.            Expires March 2002                        48
                RTP Payload Format for MPEG-4 Streams  September 2001
   | SL packet payload                       |max 90 bytes | octets|
   +-----------------------------------------+-------------+

   Note that in the above table the last SL packet in the RTP packet
   has a payload that is byte-aligned octet-aligned (at the end). When this happens
   paddingFlag is set to zero and the paddingBits field is omitted.

Overhead estimation

   The MSLHSection PayloadHeaderSection is 19 bytes, octets, the RSLHSection is 16 bytes; octets;
   in this example we have therefore a RTP overhead of 40 + 35 bytes octets
   for 1350
   bytes octets of payload i.e. around 6 % overhead.

Gentric et al.            Expires March 2002                        49                        51