Audio/Video Transport WG                                  Ari Lakaniemi
Internet Draft                                                    Nokia
Network Working Group                                       G. Zorn, Ed.
Internet-Draft                                               Network Zen
Intended status: Standards track                            Ye-Kui Track                                 Y. Wang
Expires: October 2010 June 11, 2011                               Huawei Technologies
                                                          April 22,
                                                            A. Lakaniemi
                                                                   Nokia
                                                        December 8, 2010

               RTP Payload Format for G.718 Speech/audio
                     draft-ietf-avt-rtp-g718-04.txt

Abstract

   This document specifies the Real-Time Transport Protocol (RTP)
   payload format for G.718 speech/audio
                      draft-ietf-avt-rtp-g718-03.txt the Embedded Variable Bit-Rate (EV-VBR) speech/
   audio codec, specified in ITU-T G.718.  A media type registration for
   this RTP payload format is also included.

Status of this This Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on October 22, 2010. June 11, 2011.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Abstract

   This document specifies the Real-Time Transport Protocol (RTP)
   payload format for the Embedded Variable Bit-Rate (EV-VBR)
   speech/audio codec, specified in ITU-T G.718. A media type
   registration for this RTP payload format is also included.

Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Table of Contents

   1. Introduction...................................................3  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2. Background.....................................................3
      2.1.  Requirements Language  . . . . . . . . . . . . . . . . . . . .  3
   3.  Background . . . . . . . . . . . . . . . . . . . . . . . . . .  3
     3.1.  The G.718 codec...........................................3
      2.2. Codec  . . . . . . . . . . . . . . . . . . . . .  3
     3.2.  Benefits of layered design................................5
      2.3. Layered Design . . . . . . . . . . . . . . . .  5
     3.3.  Transmitting layered data.................................5
      2.4. Layered Data  . . . . . . . . . . . . . . . .  5
     3.4.  Scaling scenarios & rate control..........................6
   3. Scenarios and Rate Control . . . . . . . . . . . .  6
   4.  G.718 RTP payload format.......................................7
      3.1. Payload Structure.........................................7
         3.1.1. Format . . . . . . . . . . . . . . . . . . .  7
     4.1.  Payload Structure  . . . . . . . . . . . . . . . . . . . .  7
       4.1.1.  Payload Header.......................................7
         3.1.2. Header . . . . . . . . . . . . . . . . . . . .  7
       4.1.2.  G.718 transport blocks...............................8
      3.2. Transport Blocks . . . . . . . . . . . . . . . .  7
     4.2.  Handling the The Encoded data................................11
      3.3. Data  . . . . . . . . . . . . . . . . 10
     4.3.  G.718 scaling............................................13
      3.4. Scaling  . . . . . . . . . . . . . . . . . . . . . . 12
     4.4.  CRC verification.........................................14
      3.5. Verification . . . . . . . . . . . . . . . . . . . . . 12
     4.5.  G.718 session............................................14
      3.6. Cross-stream/cross-layer timing synchronization..........14
      3.7. Session  . . . . . . . . . . . . . . . . . . . . . . 13
     4.6.  Cross-stream/Cross-layer Timing Synchronization  . . . . . 13
     4.7.  RTP Header usage.........................................15
   4. Usage . . . . . . . . . . . . . . . . . . . . . 13
   5.  Payload Format Parameters.....................................15
      4.1. Parameters  . . . . . . . . . . . . . . . . . . 14
     5.1.  Media Type Registration..................................15
      4.2. Registration  . . . . . . . . . . . . . . . . . 14
     5.2.  Mapping to SDP Parameters................................17
      4.3. Offer/answer considerations..............................18
      4.4. Parameters  . . . . . . . . . . . . . . . . 16
     5.3.  Offer/Answer Considerations  . . . . . . . . . . . . . . . 16
     5.4.  Declarative usage Usage of SDP.................................18
      4.5. SDP examples.............................................18
   5. Security Considerations.......................................20 . . . . . . . . . . . . . . . . . 17
     5.5.  SDP Examples . . . . . . . . . . . . . . . . . . . . . . . 17
       5.5.1.  Example 1  . . . . . . . . . . . . . . . . . . . . . . 17
       5.5.2.  Example 2  . . . . . . . . . . . . . . . . . . . . . . 17
       5.5.3.  Example 3  . . . . . . . . . . . . . . . . . . . . . . 18
   6.  Congestion control............................................21 Control . . . . . . . . . . . . . . . . . . . . . . 19
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 19
   8.  IANA Considerations...........................................22
   APPENDIX A: Payload examples.....................................23
      A.1. Simple payload examples..................................23
         A.1.1. All the layers in the same payload..................23 Considerations  . . . . . . . . . . . . . . . . . . . . . 20
   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20
   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
     10.1. Normative References . . . . . . . . . . . . . . . . . . . 20
     10.2. Informative References . . . . . . . . . . . . . . . . . . 21
   Appendix A.  Payload Examples  . . . . . . . . . . . . . . . . . . 22
     A.1.  Simple Payload Examples  . . . . . . . . . . . . . . . . . 22
       A.1.1.  All The Layers in The Same Payload . . . . . . . . . . 22
       A.1.2.  Layers in separate Seperate RTP streams......................24 Streams . . . . . . . . . . . . 23
     A.2.  Advanced examples........................................25 Examples  . . . . . . . . . . . . . . . . . . . . 24
       A.2.1.  Different update rate Update Rate for subset Subset of layers..........25 Layers . . . . . . 24
       A.2.2.  Redundant frames with limited set Frames With Limited Set of layers.........26
   8. References....................................................28
      8.1. Normative References.....................................28
      8.2. Informative References...................................29
   Author's Addresses...............................................30
   Acknowledgment...................................................30
   9. Open Issues...................................................30
   10. Changes Log..................................................31 Layers  . . . . . 25

1.  Introduction

   The International Telecommunication Union (ITU-T) Recommendation
   G.718 [G.718] [ITU.G718.2008] specifies the Embedded Variable Bit Rate (EV-VBR) (EV-
   VBR) speech/audio codec.  This document specifies the Real-time
   Transport Protocol (RTP) [RFC3550] payload format for this codec.

2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

3.  Background

2.1.

3.1.  The G.718 codec Codec

   G.718 is an embedded variable rate speech codec having a layered
   design.  The bitstream of the G.718 core codec consists of a core
   layer, denoted as L1, and four enhancement layers, denoted as L2-L5.
   The bit-rates of the G.718 core codec range from 8 kbit/s (core layer
   only) to 32 kbit/s (with all layers up to L5).  Furthermore, the
   G.718 codec supports also supports discontinuous transmission (DTX) and
   comfort noise generation (CNG) by sending Silence Descriptor (SID)
   frames during periods of non-active input signal, resulting in a
   reduced bit-rate.  The sampling frequency of the core codec is 16 kHz
   and the codec operates on 20 ms frames.  The G.718 codec is also
   capable of narrowband operation with audio input and/or output at 8
   kHz sampling frequency.

   While transmitting/receiving the core layer L1 is enough for
   successful decoding of the audio content, each of the enhancement
   layers Ln (n being 2 to 5, inclusive) provides an improvement to
   reconstructed audio quality.  Thus, the core layer ensures the basic
   communication while the enhancement layers can be used to improve the
   perceptual quality.  Furthermore, enhancement layers are dependent on
   all the lower layers in a sense that successful decoding of layer Ln
   requires also all the layers Lm with m<n to be available.

   The sizes, sampling rates and possible outputs of the G.718 core
   codec layers L1-L5 are summarized in Table 1 below, where the "Bytes"
   column indicates the number of bytes per encoded data unit for a
   layer,
   layer.  NB and WB denotes denote narrowband and wideband, respectively.  The
   "Bytes" column in other tables has the same meaning.  Note that for
   layers L1 and L2, the corresponding output may either be NB or WB,
   depending on the rendering device and the application requirement,
   regardless of the sampling rate of the encoded data.

                           Table 1: G.718 layers Layers

        Layer   Bytes   Cumulative bit-rate   Sampling rate   Output
      -----------------------------------------------------------------_
         L1       20        8 kbit/s           8 or 16 kHz    NB or WB
         L2       10       12
      ----------------------------------------------------------------
         L1'      32       12.8 kbit/s           8 or           16 kHz    NB or         WB
         L3       10       16
         L3'       9       16.4 kbit/s           16 kHz         WB
         L4       20       24       24.4 kbit/s           16 kHz         WB
         L5       20       32       32.4 kbit/s           16 kHz         WB

   The G.718 codec includes also includes an operating mode that is compatible
   with the Adaptive Multi-Rate Wideband (AMR-WB) codec [AMR-WB], for
   which the RTP payload format is specified in [RFC4867].  In this AMR-
   WB interoperable mode, layers L1, L1 and L2 are replaced by L1'
   consisting of AMR-WB encoded data.  Furthermore, together with L1' L1'.
   modified L3' is used instead of L3.  The usage of layers L4 and L5 is
   not affected by transmitting AMR-WB data in the lower layers.  If
   layer L3' is present in the encoded bit-stream, the base layer L1'
   must use the AMR-WB mode 2 with the a bit-rate of 12.65 kbits/s.
   Otherwise (the encoded bit-stream contains only the L1' layer), any
   of the 9 AMR-WB coding modes 0, 1, 2, 3, 4, 5, 6, 7, and 8 correspond
   to the bit- rates of 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85,
   23.05, and 23.85 kbit/s, respectively, may be in use.  Table 2
   summarizes the AMR-WB interoperable mode when more than one layer may
   be present.

           Table 2: G.718 layers in the AMR-WB interoperable mode

        Layer   Bytes   Cumulative bit-rate   Sampling rate   Output
      -----------------------------------------------------------------_
      ----------------------------------------------------------------
         L1'      32       12.8 kbit/s           16 kHz          WB
         L3'       9       16.4 kbit/s           16 kHz          WB
         L4       20       24.4 kbit/s           16 kHz          WB
         L5       20       32.4 kbit/s           16 kHz          WB

   Note that the bit-rate for the raw bit-stream of AMR-WB mode 2 is
   12.65 kbits/s.  However, after counting the padding bits to make each
   encoded data unit byte-aligned, as in the octet-aligned mode
   specified in [RFC4867], the resulting bit-rate is then 12.8 kbits/s.

   In the AMR-WB interoperable mode, when the base layer L1' is
   transported in its own RTP packet stream, the packetisation specified
   in [RFC4867] MUST be used, to enable legacy RFC4867 receivers to
   receive the base layer L1'.

   ITU-T SG16 is currently working on a set of extension layers in order
   to provide a so-called super-wideband (SWB) audio and stereophonic
   encoding extensions on top of the G.718 core codec.  Further details
   and the usage of these layers are TBD. undtermined at this time.

   The main application of the G.718 codec is telephony.  Other expected
   applications include audio/video conferencing and streaming.

2.2.

3.2.  Benefits of layered design

   The layered Layered Design

   Layered design enables simple scalability of the transmitted stream
   simply by conveying a suitable number of layers.  The number of
   layers used in a session may be selected for example based on the
   capacity of the transmission channel, current transmission
   conditions, characteristics of the source signal or available
   processing capacity.

   Another obvious benefit of the layered codec design is the
   possibility to exploit the scalability to support congestion control
   by transmitting/dropping some of the (higher) enhancement layers in
   order to alleviate congestion in the network.  See more detailed
   discussion on the congestion control in section 6.

   Furthermore, the layered design also implicitly provides possibility
   for unequal error detection/protection by employing different levels
   of protection on core layer and enhancement layers.

2.3.

3.3.  Transmitting layered data Layered Data

   In principle there are two basic approaches to carry the data from a
   layered encoder:

   1.  All the layers are carried within a single RTP session. session

   2.  The encoded data is divided over multiple RTP sessions, each
       session carrying a subset of layers.  This is also referred to as
       Multi-Session Transmission (MST). (MST)

   The first choice is the most efficient in terms of exploitation of
   transmission bandwidth.  Furthermore, using only one packet to carry
   all encoded data layers of a frame requires less resources also from
   the end-systems (and intermediate systems) since the number of
   packets is kept at minimum and only single RTP packet stream needs to
   be handled.  However, this option requires any intermediate network
   element performing the scaling operation to be fully media-aware
   since removing encoded layers requires modification of the payload.
   Furthermore, the intermediate network element needs to be within the
   security context to enable the meaningful manipulation of the
   payload, in case secure transport is employed.  This might not be
   feasible in all systems/scenarios, but some special-purpose devices
   such as e.g. media gateways in cellular telephone systems may be able
   to implement this kind of media-aware functionality.

   The second alternative alternative, transmitting selected subsets of layers in
   separate RTP sessions sessions, facilitates simple scalability in intermediate
   network elements without the requirement of being fully media-aware.
   One use case of this alternative is layered multicast [McCanne].  On
   the other hand, this approach introduces separate packet header
   overhead for each subset of layers for those low-delay application
   scenarios wherein aggregation of data from multiple frames is not
   ideal.  In this case, when the size of the encoded data block per
   single layer is in the range of 10 to 20 bytes, the packetisation may
   result in relatively high amount of protocol overhead, which might be
   an expensive solution on bandwidth-limited links.  Another drawback
   of this approach is somewhat more complex session setup and the
   additional complexity associated with handling of several concurrent
   RTP sessions.  However, this is a trade-off that enables simple
   scalability also by intermediate network elements that are not aware
   of the details of the transmitted media.

2.4.

3.4.  Scaling scenarios & rate control Scenarios and Rate Control

   In principle there are three different ways to make use of the
   layered design to control the bandwidth usage:

   1.  A sender decides to change the number of layers it is
       transmitting (for example due to congestion control constrains) constraints)

   2.  A receiver or an intermediate network element instructs a sender
       to change the number of layers it is transmitting

   3.  An intermediate network element passes forward through only a subset of
       layers it receives

   The most appropriate mechanism depends on the application and the
   employed network topology.  For example point-to-point conversational
   audio connection can easily introduce rate control by changing the
   number of transmitted layers, while in centralized audio/video
   conferencing scenario the conference server is a more appropriate
   point to implement the rate control instead of transmitting end-point. end-
   point.  Please refer to [RFC5117] RFC 5117 for extensive discussion on the
   different topologies and their implications to the transmission.
   However, the fundamental difference between these choices is that
   method 1 does not necessarily need any feedback from the receiver(s),
   while methods 2 and 3 require a signaling mechanism to support rate
   control.

3.

4.  G.718 RTP payload format Payload Format

   The basic G.718 source data unit is one layer of an encoded frame.
   Since generally the term layer refers to time series of data
   representing certain encoding layer, in this specification we use the
   term Encoded Data Unit (EDU) to refer to a single layer of data from
   single encoded frame.  Thus, each EDU has a (conceptual) frame number
   indicating its location in encoding/decoding order and a layer number
   indicating the encoding layer the EDU represents.

3.1.

4.1.  Payload Structure

   The G.718 payload format consists of a payload header, followed by
   one or more transport blocks (TB) forming the actual payload data.

   +-----------------+----------+----------+- /// -+----------+
   | Payload header  |  TB(1)   |  TB(2)   |          TB(n)   |
   +-----------------+----------+----------+- /// -+----------+

3.1.1.

4.1.1.  Payload Header

   The payload header consists of an 8-bit payload CRC checksum:

   +-+-+-+-+-+-+-+-+
   |     CRC       |
   +-+-+-+-+-+-+-+-+

   In

   On the transmitting end the payload checksum is computed over the
   primary transport block (specified in section 3.1.2. ) Section 4.1.2) of the payload
   using the generator polynomial

   C(z) = z^8 + z^4 + z^3 + z^2 + 1. 1

   Subsequent transport blocks are prepared in such a way that the
   payload checksum is valid for any integer number of contiguous
   transport blocks within one RTP packet starting from the beginning of
   the primary transport block.

   In

   On the receiving end the payload CRC checksum can be used to verify
   the correct reception of any contiguous subset of transport blocks
   within one RTP packet starting from the beginning of the primary
   transport block (see section 3.4. Section 4.4 for a detailed description).

3.1.2.

4.1.2.  G.718 transport blocks Transport Blocks

   The basic building block of the G.718 RTP payload data is an G.718
   transport block (TB).  There are two types of transport blocks:
   primary transport block and secondary transport block. secondary.

   The structure of the primary transport block is depicted below.

    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+----------------------------+
   |   L-ID    |NF | Encoded data               |
   +-+-+-+-+-+-+-+-+----------------------------+

   The structure of the secondary transport block is depicted below.

    0 1 2 3 4 5 6 7                              0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+----------------------------+-+-+-+-+-+-+-+-+
   |   L-ID    |NF | Encoded data Data               |     Tail      |
   +-+-+-+-+-+-+-+-+----------------------------+-+-+-+-+-+-+-+-+

   The layer ID (L-ID) and the NF fields form the transport block
   header.  The L-ID field is used to identify the layer structure of
   the encoded data carried in this G.718 transport block, and the NF
   field indicates the number of encoded frames with this layer
   structure carried in the Encoded data part following the transport
   block header.  The Tail field of the secondary transport block
   carries a modified 8-
   bit 8-bit CRC checksum computed over the transport
   block, as specified below.

             Author's note: For streaming or other applications that
             allow for relatively long end-to-end delay, sometimes it
             would be beneficial to aggregate more than 4 frames in one
             TB. Should the length of NF be larger?

   An

   A G.718 RTP packet payload SHALL include exactly one primary
   transport block, which MAY be followed by one or more secondary
   transport blocks.  The data fields of both transport block types are
   described below.

   L-ID Identification (6 bits)
      Identification of the encoded data carried in this transport
      block.  Table 3 below specifies the mapping between L-
        ID L-ID and the
      encoded data.  Note that L-ID is treated as an unsigned integer.

                 Table 3: Layer identification Identification (L-ID) values Values

                             L-ID    Encoded data
        --------------------------------------
                           --------------------------
                               0     Empty frame
                               1     L1
                               2     L1-L2
                               3     L1-L3
                               4     L1-L4
                               5     L1-L5
                               6     L2
                               7     L2-L3
                               8     L2-L4
                               9     L2-L5
                              10     L3
                              11     L3-L4
                              12     L3-L5
                              13     L4
                              14     L4-L5
                              15     L5
                              16     L1'
                              17     L1', L3'
                              18     L1', L3', L4
                              19     L1', L3', L4-L5
                              20     G.718 SID
                              21     AMR-WB SID
                              22-63  Reserved

             Author's note: The current approach provides maximum
             flexibility

   NF (2 bits)
      Number of frames in terms of layer configuration. However,
             limiting choices would be one way to leave more bits for
             stereo & SWB layer configurations.

             Author's note: One suggested way to make sure we do not
             run out of L-ID values with the extension modes has been
             to make the mapping between L-ID and layer configuration
             dynamic (to be specified using SDP in session set-up).
             While this would provide effective usage of L-ID bits, it
             would require all elements processing the payload to be
             signaling-aware. A compromise solution would be to provide
             static mapping for selected layer configurations and leave
             'more exotic' cases to be dynamically mapped on session
             basis. The usage of this type of approach is FFS.

             Author's note: Yet another possible way is to do similar
             as in the SVC RTP payload format draft, i.e. to signal the
             bitrate etc. parameters for an operation point, and signal
             dependency between sessions using the MMUSIC decoding
             dependency draft. This way should be generic enough and
             applicable to future versions of scalable codecs. However,
             the above methods (using detailed layer configuration may
             provide more useful information as the bitrate etc. of
             each layer is fixed, not as flexible as in SVC.)

             Author's note: Yet another approach is to allocate L-ID
             according to different mode. For example, the mode with L1
             being present and the AMR-WB compatible mode (with L1'
             being present) use different value spaces of L-ID.

   NF   Number of frames in this transport block (2 bits) decreased by
        one. The number this transport block (2 bits) decreased by
      one.  The number of frames is equal to the value of NF incremented
      by one.  For example, value NF=0 indicates that the transport
      block carries one frame, and value NF=3 indicate that the
      transport block carries four frames.  If the sender wants to
      encapsulate more than four frames per payload, several transport
      blocks need to be used.

   Encoded data Data (variable length)
      Encoded data consists of EDUs as specified by the values L-ID and
      NF fields, arranged according to the rules given in section 3.2. Section 4.2.
      When L-ID is equal to 0 (empty frame), the encoded data field is
      not present (i.e. it consists of zero octet). present.

   Tail (8 bits)
      The 8-bit tail Tail field of the secondary transport block carries a bit
      field that is needed to modify the partial CRC checksum over the
      payload data up to the end of this TB to match the payload CRC
      field value carried in the payload header.

      In the transmitter the Tail bits for a secondary TB(n) are
      computed by first computing the CRC checksum CRC(n) over the
      payload data from the beginning of the primary TB up to the end of
      TB(n) using the generator polynomial C(z) given above.  The bits
      of the Tail field of TB(n) are set to zero value for the CRC
      computation.  The transmitted value of the Tail field in TB(n) is
      obtained by bitwise XOR operation between the payload CRC field
      value carried in the payload header and the CRC(n) computed for
      TB(n).

3.2.

4.2.  Handling the The Encoded data Data

   In order to provide unique mapping of EDUs to encoded frames, the
   following rules on sequence of frames and sequence of layers need to
   be followed when creating a payload:

   o  The frames within a payload MUST form a set of contiguous frames
      in decoding order, i.e. if a payload carries frames n and n+N, all
      frames between n and n+N in decoding order MUST also be present in
      the payload.

   o  The layers within a frame MUST form a contiguous set of layers,
      i.e. if layers Lx and Ly of a frame are included in the payload,
      all layers between Lx and Ly layers MUST also be present.

   The EDUs within a transport block are arranged according to the
   following rules:

   o  The EDUs within a transport block MUST be arranged in increasing
      order of layer number

   o  The EDUs with the same layer number within a transport block MUST
      be arranged in decoding order

   Explicit timing information for the transport blocks is not needed,
   since the ordering of EDUs in the payload and their mapping to
   transport blocks can be used to implicitly carry this information.
   The following rules apply:

   o  If the highest layer carried in transport block k is n, and the
      lowest layer carried by transport block k+1 is n+1, then the EDUs
      of transport block k and k+1 belong to the same encoded frame.
      Furthermore, if transport blocks k and k+1 carry EDUs belonging to
      the same encoded frame(s), these transport blocks MUST include the
      same number of EDUs. EDUs

   o  If the highest layer carried in transport block k is n, and the
      lowest layer carried by transport block k+1 is smaller than or
      equal to n, the EDUs of transport block k and k+1 belong to the
      two separate encoded frames, which are contiguous in decoding
      order.
      order

   o  Multiple copies of an EDU MUST NOT be included in the payload. payload

   A set of EDUs can be allocated to transport blocks in several ways.
   For example each EDU can be encapsulated in its own transport block,
   all EDUs can be carried in single transport block, EDUs belonging to
   the same encoded frame can be encapsulated in dedicated transport
   block, or EDUs representing the same layer can be carried in their
   own transport blocks.  Three examples on this with two frames with
   layers L1-L3 are given below.  The first example illustrates the case
   using a single transport block for the whole payload, while the
   second payload example introduces separate transport blocks for each
   of the EDUs.  The third example shows an approach where all layers
   are carried in dedicated transport blocks.  The notation Fx-Ly is
   used to denote layer y of frame x.

   Example 1: All EDUs in a single transport block

   +---------+-----+-------+-------+-------+-------+-------+--------+
   | L-ID=3  |NF=1 | F1-L1 | F2-L1 | F1-L2 | F2-L2 | F1-L3 | F2-L3  |
   +---------+-----+-------+-------+-------+-------+-------+--------+

             Author's note: Currently, it is mandated that lower layer
             EDUs of later frames go before higher layer EDUs of
             earlier frames. This way is friendlier to adaptation
             (dropping of higher layers). However, if all layers are
             received, then the depacketizer needs to reorder the EDUs
             to their decoding order before feeding them to the decoder.
             Therefore, the other way around (i.e. lower layer EDUs of
             later frames go after higher layer EDUs of earlier frames,
             or EDUs in transport blocks are placed in decoding order)
             is more friendly to the depacketizer. Another benefit of
             the latter is that it does not introduce any end-to-end
             delay. Which way to be specified (or both allowed if
             needed) is FFS.

   Example 2: All

   Example 2: All EDUs in separate transport blocks

   +---------+-----+-------+---------+-----+-------+
   | L-ID=1  |NF=0 | F1-L1 | L-ID=1  |NF=0 | F2-L1 |
   +---------+-----+-------+---------+-----+-------+
   | L-ID=8  |NF=0 | F1-L2 | L-ID=8  |NF=0 | F2-L2 |
   +---------+-----+-------+---------+-----+-------+
   | L-ID=14 |NF=0 | F1-L3 | L-ID=14 |NF=0 | F2-L3 |
   +---------+-----+-------+---------+-----+-------+

   Example 3: Dedicated transport for EDUs of each layer

   +---------+-----+-------+-------+---------+-----+-------+-------+
   | L-ID=1  |NF=1 | F1-L1 | F2-L1 | L-ID=6  |NF=1 | F1-L2 | F2-L2 |
   +---------+-----+-------+-------+---------+-----+-------+-------+
   | L-ID=10 |NF=1 | F1-L3 | F2-L3 |
   +---------+-----+-------+-------+

   While the first example carrying data from all layers in the same
   transport block obviously consumes less bandwidth, the second example
   using separate transport block for each EDU, and the third example
   using dedicated transport blocks for each layer provide simple
   scaling possibility: while in the first case the removal of e.g.
   layer L3 (from each frame in the payload) would require changing the
   value of the L-ID in addition to removing the corresponding EDU(s),
   in the second and third options it is enough to just remove all
   transport blocks carrying L3 data and the remaining part of the
   payload can be left untouched (however the packet size information in
   high-layer protocol headers needs change).

3.3.

4.3.  G.718 scaling Scaling

   Some media-aware network elements Media-Aware Network Elements (MANEs) MAY modify the G.718
   bitstream by dropping some of the layers in case congestion control
   or e.g. access link bandwidth requires such scaling to take place.
   Such MANEs are RTP translators (with the topology Topo-Translator as
   described in [RFC5117]), [RFC5117], for which the rules for RTP translators
   specified in [RFC3550] apply.

   A payload can be either completely dropped or some of the transport
   blocks it carries can be discarded.  In case full payloads are
   dropped to implement scaling, a packet containing the core layer L1
   SHOULD NOT be discarded, since the decoding of higher layers of the
   same encoded frame is not possible without the core layer data being
   available.  This means that payloads with L-ID values equal to 1 to
   5, inclusive and 16 to 19, inclusive, SHOULD NOT be completely
   discarded.

      Author's note: To be checked whether the case of dropping a subset
      of the transport blocks in one packet also strictly follows the
      topology Topo-Translator.

   In case the payload is forwarded with modified content, at least the
   primary transport block MUST be preserved in the payload, while some
   of the secondary transport blocks at the end of the payload MAY be
   discarded.

3.4.

4.4.  CRC verification Verification

   Both UDP-Lite [RFC3828] and DCCP [4340] [RFC4340] provide partial checksum
   options, in which partially damaged payloads can be delivered to the
   application layer.  In cases wherein such a transport layer operation
   is in use, and the partial checksum service by the transport layer
   protects up to the RTP header and the payload header, the CRC
   checksum provided in the payload header can be used to verify whether
   an RTP packet payload contains corrupt transport blocks.

   In

   On the receiving end the CRC verification is made in such a way that
   the CRC computation is started from the beginning of the primary TB,
   i.e. from the MSB of the first octet of the TB(1), and the
   computation is continued until the end of the payload data or until
   an erroneous TB is encountered.  At the end of each TB a check MAY be
   performed: if the CRC value at the end of TB(n) matches the payload
   CRC value received in the payload header, the verification is
   successful and the data up TB(n) is valid.  If the CRC value at the
   end of TB(n) does not match the payload CRC value received in the
   payload header, there is an error in the TB(n) and it MUST be
   discarded as corrupted.  Furthermore, if the verification indicates
   corrupted TB(n), all subsequent transport blocks TB(m) with m>n MUST
   also be discarded.

3.5.

4.5.  G.718 session

   An Session

   A G.718 session consists of one or several RTP sessions carrying
   encoded
   G.718 data encoded according to the payload format specified in section
   3.1.

3.6. Cross-stream/cross-layer timing synchronization
   Section 4.1.

4.6.  Cross-stream/Cross-layer Timing Synchronization

   In the case an where a G.718 session consists of multiple RTP sessions,
   the RTP packets transmitted on separate RTP sessions need to be
   synchronized in order to enable reconstruction of the frames in the
   receiving end.  Since each of the RTP sessions uses its own random
   initial value for the RTP timestamp, there is also a random offset
   between the RTP timestamps values carrying the EDUs belonging to the
   same encoded frame in different RTP sessions.

   The receiver MUST use the traditional RTCP based RTCP-based mechanism to
   synchronize streams by using the RTP and NTP timestamps of the RTCP
   Sender Reports (SR) it receives.

3.7.

4.7.  RTP Header usage Usage

   This section specifies the usage of some fields of the RTP header
   (specified in section Section 5 of [RFC3550]) with the G.718 RTP payload
   format.  Setting of  The settings for other RTP header fields is are as specified in
   [RFC3550].

   The RTP timestamp corresponds to the sampling instant of the first
   encoded sample of the earliest frame in the payload.  The timestamp
   clock frequency is 32 kHz.

   The marker bit (M) of each of the RTP streams of the session SHALL be
   set to value 1 if the payload carries an EDU belonging to the first
   frame after an inactive period, i.e. an EDU from the first frame of a
   talkspurt.  For all other packets the marker bit is set to value 0.

4.

5.  Payload Format Parameters

   This section defines the parameters that may be used to configure
   optional features in the G.718 RTP transmission.

   The parameters are defined here as part of the media subtype
   registration for the G.718 codec.  Mapping of the parameters into the
   Session Description Protocol (SDP) [RFC4566] is also provided for
   those applications that use SDP.  In control protocols that do not
   use MIME or SDP, the media type parameters must MUST be mapped to the
   appropriate
   format used with that control protocol.

4.1.

5.1.  Media Type Registration

   This registration is done using the template defined in RFC 4288
   [RFC4288] and following RFC 4855 [RFC4855].

   Type name:  audio

   Subtype name:  G718

   Required parameters:  none

   Optional parameters:

         mode:        This parameter MAY be used to indicate whether the
                      mode with layer L1 being present or the AMR-WB
                      compatible mode (with layer L1' being present) is
                      in use.  If this parameter is not present or the
                      value of this parameter is equal to 0, the mode
                      with layer L1 being present is in use.  Otherwise,
                      the AMR-WB compatible mode is in use.  When this
                      parameter is present, the value MUST be either 0
                      or 1.

      Author's note:
                      NOTE: When the upcoming stereo and SWB options are
                            present, the semantics of this parameter may
                            change.

         layers:      The numbers of the layers (in range from 1 to 5,
                      denoting layers from L1 to L5, respectively)
                      transmitted in this session, expressed as comma-
                      separated list of layer numbers.  If the parameter
                      is present, at least layer L1 or L1' MUST be
                      included in the list of layers in one of the RTP
                      sessions included in the G.718 session.  If the
                      parameter is not present, all layers up to layer
                      L5 MAY be used in the session.

      Author's note:

                      NOTE: Why not use semantics similarly as L-ID?

         ptime:       The recommended length of time (in milliseconds)
                      represented by the media in a packet.  See Section
                      6 of [RFC4566].

         maxptime:    The maximum length of time (in milliseconds) that
                      can be encapsulated in a packet.  See Section 6 of
                 [RFC4566]

      Author's note:
                      [RFC4566].

      NOTE: Some further study is needed to see if separate parameters
            for sending and receiving capabilities/preferences are
            needed -- especially for upcoming stereo and SWB options.

      Author's note: The support

      NOTE: Support for upcoming SWB and stereo options needs to be
            taken into account.  Basically we can either 1) extend the
            parameter "layers" to cover also this aspect, or 2) define
            separate parameter(s) for these new options when more
            details on the stereo/SWB support are available.

   Encoding considerations:
      This media type is framed and contains binary data; see Section
      4.8 of [RFC4288].

   Security considerations:  See Section 6 7 of RFC xxxx XXXX.
      [RFC Editor: Upon publication as an RFC, please "XXXX" with the
      number assigned to this document and remove this note.]

   Interoperability considerations:  none  None.

   Published specification:  RFC xxxx XXX.
      [RFC Editor: Upon publication as an RFC, please "XXXX" with the
      number assigned to this document and remove this note.]

   Applications which use this media type:
      For example example: Voice over IP, audio and video conferencing, audio
      streaming and voice messaging.

   Additional information:  none  None.

   Person & email address to contact for further information:
      Ari Lakaniemi, ari.lakaniemi@nokia.com

   Intended usage:  COMMON
   Restrictions on usage:
      This media type depends on RTP framing, and hence is only defined
      for transfer via RTP [RFC3550] [RFC3550].

   Author:  Ari Lakaniemi, ari.lakaniemi@nokia.com

   Change controller:
      IETF Audio/Video Transport working group Working Group delegated from the IESG

4.2. IESG.

5.2.  Mapping to SDP Parameters

   The information carried in the media type specification has a
   specific mapping to fields of the SDP [RFC4566], which is commonly
   used to describe RTP sessions.  When SDP is used to specify sessions
   employing the G.718 codec, the mapping is as follows:

   o  The media type ("audio") goes in SDP "m=" as the media name.

   o  The media subtype ("G718") goes in SDP "a=rtpmap" as the encoding
      name.  The RTP clock rate in "a=rtpmap" MUST be 32000 for G.718.

      Author's note:
      NOTE:  The current choice for the RTP clock rate is a
         'placeholder'.  The clock rate needs to be set according to SWB
         sampling rate, which is still T.B.D. Since the core codec
         employs 16000 Hz sampling rate, an integer multiple of 16000 Hz
         seems to be a preferable choice.

   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
      "a=maxptime" attributes, respectively.

   o  Any remaining parameters go in the SDP "a=fmtp" attribute by
      copying them directly from the media type string as a semicolon
      separated list of parameter=value pairs.

4.3. Offer/answer considerations

5.3.  Offer/Answer Considerations

   The following considerations apply when using the SDP offer/answer
   [RFC3264] mechanism to negotiate the G.718 transport.  The parameter
   "layers" MAY be used to indicate the layer configuration for the each
   RTP session belonging to current G.718 session an end-point making
   the offer is ready to transmit and wishes to receive.

   o  In case the G.718 session consists of a single RTP session, it is
      RECOMMENDED not to impose any layer restrictions for the session
      but to use the rate control functionality to set possible
      restrictions on usage of the higher or highest layers.  If the
      offer includes a layer configuration parameter, the answer MAY use
      different configuration, but the highest layer in the answer MUST
      NOT be higher than the highest layer of the offered configuration.

      Author's note:

      NOTE:  Support for answer modifying the layer configuration is
         FFS.

   o  In case the G.718 session consists of multiple RTP sessions, the
      answer MUST use the layer configurations provided in the offer for
      the sessions it accepts.

4.4.

5.4.  Declarative usage Usage of SDP

   In declarative usage, such as SDP in RTSP [RFC2326] or SAP [RFC2974],
   the parameter "layers" SHALL be interpreted to provide a set of
   layers that the sender may MAY use in the session.

4.5.

5.5.  SDP examples Examples

   Some example SDP session descriptions utilizing G.718 encodings are
   provided below.

5.5.1.  Example 1

   The first example illustrates the simple case where with the G.718 session
   employing a single RTP session and the AVPF profile is offered, and
   the answer accepts the offer without any changes.

   Offer:

      m=audio 49120 RTP/AVPF 97
      a=rtpmap:97 G718/32000/1

   Answer:

      m=audio 49120 RTP/AVPF 97
      a=rtpmap:97 G718/32000/1

   The second

5.5.2.  Example 2

   This example shows a bit more complex case where the G.718 session
   using a single RTP session and the AVPF profile is offered with the
   restriction to send/receive only with layers L1 and L2.  The answer
   indicates that the other end-point is happy to receive (and send)
   layers up to L5.

   Offer:

      m=audio 49120 RTP/AVPF 97
      a=rtpmap:97 G718/32000/1
      a=fmtp:97 layers=1,2

   Answer:

      m=audio 49120 RTP/AVPF 97
      a=rtpmap:97 G718/32000/1
      a=fmtp:97 layers=1,2,3,4,5

5.5.3.  Example 3

   The third example shows an G.718 session using multiple RTP sessions
   with the AVPF profile.  The answerer wishes to use only layers up to
   L3.

   Offer:

      m=audio 49120 RTP/AVPF 97
      a=rtpmap:97 G718/32000/1
      a=fmtp:97 layers=1,2
      a=mid=1

      m=audio 49122 RTP/AVPF 98
      a=rtpmap:98 G718/32000/1
      a=fmtp:98 layers=3
      a=mid=2
      a=depend:lay 1

      m=audio 49124 RTP/AVPF 99
      a=rtpmap:99 G718/32000/1
      a=fmtp:99 layers=4,5
      a=mid=3
      a=depend:lay 1 2

   Answer:

      m=audio 49120 RTP/AVPF 97
      a=rtpmap:97 G718/32000/1
      a=fmtp:97 layers=1,2
      a=mid=1

      m=audio 49120 RTP/AVPF 98
      a=rtpmap:98 G718/32000/1
      a=fmtp:98 layers=3
      a=mid=2
      a=depend:lay 1

   Note that the dependency signaling according to [smd-sdp] described in [RFC5583] is used in
   the third example above to indicate the relationship between the
   layers distributed into separate RTP sessions.

5. Security Considerations

   RTP packets using

6.  Congestion Control

   As a scalable codec, G.718 implicitly provides means for congestion
   control by providing a possibility for 'thinning' the bitstream.  The
   RTP payload format defined in according to this specification
   are subject provides several
   different means for reducing the G.718 session bandwidth.  The most
   appropriate mechanism (in terms of impact to the security considerations discussed in user experience)
   depends on the RTP
   specification [RFC3550], employed payload structure and also on the employed
   session configuration (single RTP session or multiple RTP sessions).
   The following means (in no particular order) can be used to assist
   congestion control procedures -- either by the sender or by the
   intermediate node.

   o  The transport blocks carrying the EDUs representing the highest
      layers within the payload may be dropped

   o  The payloads carrying the EDUs representing the highest layers in
      an G.718 session are dropped

   o  Transport blocks or payloads carrying EDUs belonging to redundant
      frames included in the payload are dropped

7.  Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [RFC3550], and in any appropriate RTP profile (for
   example [RFC3551] or [RFC4585]). [RFC4585].  This implies that confidentiality of
   the media streams is achieved by encryption; for example, through the
   application of SRTP [RFC3711].  Because the data compression used
   with this payload format is applied end-to-end, any encryption needs
   to be performed after compression.

   A potential denial-of-service threat exists for data encodings using
   compression techniques that have non-uniform receiver-end
   computational load.  The attacker can inject pathological datagrams
   into the stream that will increase the processing load of the decoder
   and may cause the receiver to be overloaded.  For example inserting
   additional EDUs representing the higher enhancement layers on top of
   the ones actually transmitted may increase the decoder load.
   However, the G.718 codec is not particularly vulnerable to such an
   attack, since the majority of the computational load in an G.718
   session is associated to the encoder.  Another form of possible
   attach might be forging of codec bit-rate control messages, which may
   result in encoder operating employing higher number of enhancement
   layers than originally intended and thereby requiring larger amount
   of computation resources.  Therefore, the usage of data origin
   authentication and data integrity protection of at least the RTP
   packet is RECOMMENDED; for example, with SRTP [RFC3711].

   Note that the appropriate mechanism to ensure confidentiality and
   integrity of RTP packets and their payloads is very dependent on the
   application and on the transport and signaling protocols employed.
   Thus, although SRTP is given as an example above, other possible
   choices exist.

   Note that end-to-end security with either authentication, integrity
   or confidentiality protection will prevent a network element not
   within the security context from performing media-aware operations
   other than discarding complete packets.  To allow any (media-aware)
   intermediate network element to perform its operations, it is
   required to be a trusted entity which is included in the security
   context establishment.

6. Congestion control

   As scalable codec G.718 implicitly provides means for congestion
   control by providing

8.  IANA Considerations

   IANA is kindly requested to register a possibility media type for 'thinning' the bitstream. The G.718 codec
   for RTP payload format according to transport, as specified in Section 5.1 of this specification provides several
   different means document.

9.  Acknowledgements

   Thanks to Qin Wu for reducing the G.718 session bandwidth. The most
   appropriate mechanism (in terms useful review and commentary.

10.  References

10.1.  Normative References

   [AMR-WB]         3GPP, "Speech codec speech processing functions;
                    Adaptive Multi-Rate - Wideband (AMR-WB) speech
                    codec; General description", 3GPP TS 26.171 5.0.0,
                    April 2001.

   [ITU.G718.2008]  International Telecommunications Union, "Frame Error
                    Robust Narrowband and Wideband Embedded Variable
                    Bit-Rate Coding of impact Speech and Audio from 8-32
                    Kbit/s", ITU-T Recommendation G.718, May 2008.

   [RFC2119]        Bradner, S., "Key words for use in RFCs to the user experience)
   depends on the employed payload structure Indicate
                    Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3264]        Rosenberg, J. and also on the employed
   session configuration (single RTP session or multiple RTP sessions).
   The following means (in no particular order) can be used to assist
   congestion control procedures -- either by the sender or by the
   intermediate node.

   o  The transport blocks carrying the EDUs representing the highest
      layers within the payload may be dropped.

   o  The payloads carrying the EDUs representing the highest layers in
      an G.718 session are dropped.

   o  Transport blocks or payloads carrying EDUs belonging to redundant
      frames included in the payload are dropped.

7. IANA Considerations

   IANA is kindly requested to register a media type for the G.718 codec
   for RTP transport, as specified in section 4.1.  of this document.

APPENDIX A: Payload examples

   The G.718 payload structure enables flexible transport either by
   carrying all layers in the same payload or separating the layers into
   separate payloads. The following subsections illustrate different
   possibilities for transport by simple examples. Note that examples do
   not show the full payload structure to keep the illustration simple.

A.1. Simple payload examples

A.1.1. All the layers in the same payload

   The illustration below shows layers L1-L3 from two encoded frames
   encapsulated into separate payloads using single transport block.

    +-------+--------+-----+------+------+------+
    | RTP1  | L-ID=3 |NF=0 |F1-L1 |F1-L2 |F1-L3 |
    +-------+--------+-----+------+------+------+

    +-------+--------+-----+------+------+------+
    | RTP2  | L-ID=3 |NF=0 |F2-L1 |F2-L2 |F2-L3 |
    +-------+--------+-----+------+------+------+

   In case the same layers from two input frames are encapsulated into
   one payload using single transport block, the structure is as shown
   below.

    +-------+--------+-----+------+------+------+------+------+------+
    | RTP1  | L-ID=3 |NF=1 |F1-L1 |F2-L1 |F1-L2 |F2-L2 |F3-L3 |F2-L3 |
    +-------+--------+-----+------+------+------+------+------+------+

   The third example illustrates the case where the layers L1-L3 from
   two input frames are encapsulated into one payload using two separate
   transport blocks, the first one carrying L1 and the other one
   containing L2 and L3.

    +-------+--------+-----+------+------+
    | RTP1  | L-ID=1 |NF=1 |F1-L1 |F2-L1 |
    +-------+--------+-----+------+------+------+------+
            | L-ID=7 |NF=1 |F1-L2 |F2-L2 |F2-L2 |F2-L3 |
            +--------+-----+------+------+------+------+

A.1.2. Layers in separate RTP streams

   In this case the data for each layer is transmitted in its own
   payload.

   In the first example each transport block including a single EDU is
   carried in its own RTP payload.

    +-------+--------+-----+-----+    +-------+--------+-----+-----+
    | RTP1a | L-ID=1 |NF=0 |F1-L1|    | RTP1b | L-ID=6 |NF=0 |F1-L2|
    +-------+--------+-----+-----+    +-------+--------+-----+-----+

    +-------+--------+-----+-----+    +-------+--------+-----+-----+
    | RTP1c |L-ID=10 |NF=0 |F1-L3|    | RTP2a | L-ID=1 |NF=0 |F2-L1|
    +-------+--------+-----+-----+    +-------+--------+-----+-----+

    +-------+--------+-----+-----+    +-------+--------+-----+-----+
    | RTP2b | L-ID=6 |NF=0 |F2-L2|    | RTP2c |L-ID=10 |NF=0 |F2-L3|
    +-------+--------+-----+-----+    +-------+--------+-----+-----+

   If the payloads carry data from two consecutive input frames, the
   same encoded data as in the previous example is arranged as follows.

    +-------+--------+-----+-----+-----+
    | RTP1a | L-ID=1 |NF=1 |F1-L1|F2-L1|
    +-------+--------+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP1b | L-ID=6 |NF=1 |F1-L2|F2-L2|
    +-------+--------+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP1c |L-ID=10 |NF=1 |F1-L3|F2-L3|
    +-------+--------+-----+-----+-----+

A.2. Advanced examples

A.2.1. Different update rate for subset of layers

   An example employing different update rates (i.e. different number of
   frames per packet) for selected subsets of layers. In these examples
   all core codec layers L1-L5 are shown.

    +-------+--------+-----+-----+-----+-----+-----+
    | RTP1  | L-ID=1 |NF=3 |F1-L1|F2-L1|F3-L1|F4-L1|
    +-------+--------+-----+-----+-----+-----+-----+

    +-------+--------+-----+-----+-----+-----+-----+
    | RTP2a | L-ID=7 |NF=1 |F1-L2|F2-L2|F1-L3|F2-L3|
    +-------+--------+-----+-----+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP3a |L-ID=14 |NF=0 |F1-L4|F1-L5|
    +-------+--------+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP3b |L-ID=14 |NF=0 |F2-L4|F2-L5|
    +-------+--------+-----+-----+-----+

    +-------+--------+-----+-----+-----+-----+-----+
    | RTP2b | L-ID=7 |NF=1 |F3-L2|F4-L2|F3-L3|F4-L3|
    +-------+--------+-----+-----+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP3c |L-ID=14 |NF=0 |F3-L4|F3-L5|
    +-------+--------+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP3d |L-ID=14 |NF=0 |F4-L4|F4-L5|
    +-------+--------+-----+-----+-----+

A.2.2. Redundant frames with limited set of layers

   An example transmitting layers L1-L3 as primary data and L1 (of the
   previous frame) as redundant data is shown below. Each payload
   carries one primary (i.e. new) frame in one transport block and one
   redundant frame, which in this example is the frame preceding the
   primary frame, in another transport block.

    +-------+--------+-----+-----+--------+-----+-----+-----+-----+
    | RTP1  | L-ID=1 |NF=0 |F0-L1| L-ID=3 |NF=0 |F1-L1|F1-L2|F1-L3|
    +-------+--------+-----+-----+--------+-----+-----+-----+-----+

    +-------+--------+-----+-----+--------+-----+-----+-----+-----+
    | RTP2  | L-ID=1 |NF=0 |F1-L1| L-ID=3 |NF=0 |F2-L1|F2-L2|F2-L3|
    +-------+--------+-----+-----+--------+-----+-----+-----+-----+

    +-------+--------+-----+-----+--------+-----+-----+-----+-----+
    | RTP3  | L-ID=1 |NF=0 |F2-L1| L-ID=3 |NF=0 |F3-L1|F3-L2|F3-L3|
    +-------+--------+-----+-----+--------+-----+-----+-----+-----+

   Alternatively, the payload carrying also redundant data for a subset
   of layers can be arranged differently, as shown in the example below.

    +-------+--------+-----+-----+-----+-----+--------+-----+-----+
    | RTP1  | L-ID=3 |NF=0 |F0-L1|F0-L2|F0-L3| L-ID=1 |NF=0 |F1-L1|
    +-------+--------+-----+-----+-----+-----+--------+-----+-----+

    +-------+--------+-----+-----+-----+-----+--------+-----+-----+
    | RTP2  | L-ID=3 |NF=0 |F1-L1|F1-L2|F1-L3| L-ID=1 |NF=0 |F2-L1|
    +-------+--------+-----+-----+-----+-----+--------+-----+-----+

    +-------+--------+-----+-----+-----+-----+--------+-----+-----+
    | RTP3  | L-ID=3 |NF=0 |F2-L1|F2-L2|F2-L3| L-ID=1 |NF=0 |F3-L1|
    +-------+--------+-----+-----+-----+-----+--------+-----+-----+

   Now the first transport block carries the primary data and the second
   transport block carries the redundant data, which in this case covers
   the frame following the primary frame. The benefit of this approach
   is that the redundant data is included in the last (secondary)
   transport block of the payload, which might be beneficial for
   possible payload scaling operation within the network.

8. References

8.1. Normative References

   [AMR-WB]  3GPP TS 26.171, "Adaptive Multi-Rate Wideband (AMR-WB)
             speech codec; General description (Release 7)", v7.0.0,
             September 2006.

   [G.718]   ITU-T Recommendation G.718, "Frame Error Robust Narrowband
             and Wideband Embedded Variable Bit-Rate Coding of Speech
             and Audio from 8-32 Kbit/s", (consented) May 2008.

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3264] Rosenberg, J., H. Schulzrinne, H., "An Offer/Answer
                    Model with Session Description Protocol (SDP)",
                    RFC 3264, June 2002.

   [RFC3550]Schulzrinne,

   [RFC3550]        Schulzrinne, H., Casner, S., Frederick, R. R., and V.

                    Jacobson, V., "RTP: A Transport Protocol for Real-Time
                    Applications", STD 64, RFC 3550, July 2003.

   [RFC3551]        Schulzrinne, H., H. and S. Casner, S., "RTP Profile for
                    Audio and Video Conferences with Minimal Control",
                    STD 65, RFC 3551, July 2003.

   [RFC3711]        Baugher, M., McGrew, D., Naslund, M., Carrara, E.,
                    and K. Norrman,
             K., "The Secure Real-Time Real-time Transport
                    Protocol (SRTP)", RFC 3711, March 2004.

   [RFC4288]        Freed, N., N. and J. Klensin, J., "Media Type Specifications
                    and Registration Procedures", BCP 13, RFC 4288,
                    December 2005.

   [RFC4566]        Handley, M., Jacobson, V. V., and C. Perkins, C., "SDP:
                    Session Description Protocol", RFC 4566, July 2006.

   [RFC4585]        Ott, J., Wenger, S., Sato, N., Burmeister, C., and
                    J. Rey, J., "Extended RTP Profile for Real-Time Real-time
                    Transport Control Protocol (RTCP)-Based Feedback
                    (RTP/AVPF)", RFC 4585, July 2006.

   [RFC4855]        Casner, S., "Media Type Registration of RTP Payload
                    Formats", RFC 4855, February 2007.

   [RFC4867]        Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q.
                    Xie, Q., "RTP Payload Format and File Storage Format fort he for
                    the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Multi-
                    Rate Wideband (AMR-WB) Audio Codecs", RFC 4867,
                    April 2007.

   [RFC5104] Wenger, S., Chandra, U., Westerlund, M., Burman, B., "Codec
             Control Messages in the RTP Audio-Visual Profile with
             Feedback (AVPF)", RFC 5104, Feburary 2008.

   [smd-sdp]

   [RFC5583]        Schierl, T., T. and S. Wenger, S., "Signaling media decoding
             dependency Media Decoding
                    Dependency in the Session Description Protocol
                    (SDP)", draft-
             schierl-mmusic-layered-codec-04 (work in progress), June
             2007.

8.2. RFC 5583, July 2009.

10.2.  Informative References

   [McCanne]        McCanne, S., Jacobson, V., and M. Vetterli, M., "Receiver-
             driven
                    "Receiver-driven layered multicast", in Proc. of ACM SIGCOMM'96,
             pages 117--130, Stanford, CA, August SIGCOMM
                    Computer Communication Review Volume 26 Issue 4,
                    October 1996.

   [RFC2326]        Schulzrinne, H., Rao, A., and R. Lanphier, R., "Real
                    Time Streaming Protocol (RTSP)", RFC 2326,
                    April 1998.

   [RFC2974]        Handley, M., Perkins, C., and E. Whelan, E., "Session
                    Announcement Protocol", RFC 2974, October 2000.

   [RFC3828]        Larzon, L-A., Degermark, M., Pink, S., Jonsson,
                    L-E., and G. Fairhurst, G., "The Lightweight User
                    Datagram Protocol (UDP-Lite)", RFC 3828, July 2004.

   [RFC4340]        Kohler, E., Handley, M., and S. Floyd, S., "Data "Datagram
                    Congestion Control Protocol (DCCP)", RFC 4340,
                    March 2006.

   [RFC5117]        Westerlund, M., M. and S. Wenger, S., "RTP Topologies",
                    RFC 5117, January 2008.

Author's Addresses

   Ari Lakaniemi
   Nokia
   P.O.Box 407
   FIN-00045 Nokia Group, FINLAND

   Phone: +358-71-8008000
   Email: ari.lakaniemi@nokia.com

   Ye-Kui Wang
   Huawei Technologies
   400 Somerset Corp Blvd, Suite 602
   Bridgewater, NJ 08807, USA

   Phone: +1-908-541-3518
   EMail: yekuiwang@huawei.com

Acknowledgment

   Funding for

Appendix A.  Payload Examples

   The G.718 payload structure enables flexible transport either by
   carrying all layers in the RFC Editor function is currently provided same payload or separating the layers into
   separate payloads.  The following subsections illustrate different
   possibilities for transport by simple examples.  Note that examples
   do not show the
   Internet Society.

9. Open Issues

   1) Support of super-wideband (SWB) audio and stereophonic encoding
      extensions full payload structure to ITU-T G.718 currently being worked on by ITU-T keep the illustration
   simple.

A.1.  Simple Payload Examples

A.1.1.  All The Layers in The Same Payload

   The illustration below shows layers L1-L3 from two encoded frames
   encapsulated into separate payloads using single transport block.

    +-------+--------+-----+------+------+------+
    | RTP1  | L-ID=3 |NF=0 |F1-L1 |F1-L2 |F1-L3 |
    +-------+--------+-----+------+------+------+

    +-------+--------+-----+------+------+------+
    | RTP2  | L-ID=3 |NF=0 |F2-L1 |F2-L2 |F2-L3 |
    +-------+--------+-----+------+------+------+

   In the case where the same layers from two input frames are
   encapsulated into one payload using single transport block, the
   structure is to
      be specified after ITU-T completes as shown below.

    +-------+--------+-----+------+------+------+------+------+------+
    | RTP1  | L-ID=3 |NF=1 |F1-L1 |F2-L1 |F1-L2 |F2-L2 |F3-L3 |F2-L3 |
    +-------+--------+-----+------+------+------+------+------+------+

   The third example illustrates the work in that regards.

        a. Some further study is needed to see if separate parameters
          for sending and receiving capabilities/preferences case where the layers L1-L3 from
   two input frames are needed
          -- especially for upcoming stereo and SWB options.

        b. The support for upcoming SWB and stereo options needs to be
          taken encapsulated into account. Basically we can either 1) extend the
          parameter "layers" to cover also this aspect, or 2) define one payload using two separate parameter(s) for these new options when more details
          on
   transport blocks, the stereo/SWB support are available.

   2) For streaming or other applications that allow for relatively long
      end-to-end delay, sometimes it would be beneficial to aggregate
      more than 4 frames in first one Transport Block (TB). Should carrying L1 and the length
      of other one
   containing L2 and L3.

    +-------+--------+-----+------+------+
    | RTP1  | L-ID=1 |NF=1 |F1-L1 |F2-L1 |
    +-------+--------+-----+------+------+------+------+
            | L-ID=7 |NF=1 |F1-L2 |F2-L2 |F2-L2 |F2-L3 |
            +--------+-----+------+------+------+------+

A.1.2.  Layers in Seperate RTP Streams

   In this case the NF field be larger?

   3) On data for each layer structure and configuration signalling. Currently, is transmitted in its own
   payload.

   In the first example each transport block including a
      unique layer ID single EDU is assigned for any possible layer combinations.

      See
   carried in its own RTP payload.

    +-------+--------+-----+-----+    +-------+--------+-----+-----+
    | RTP1a | L-ID=1 |NF=0 |F1-L1|    | RTP1b | L-ID=6 |NF=0 |F1-L2|
    +-------+--------+-----+-----+    +-------+--------+-----+-----+

    +-------+--------+-----+-----+    +-------+--------+-----+-----+
    | RTP1c |L-ID=10 |NF=0 |F1-L3|    | RTP2a | L-ID=1 |NF=0 |F2-L1|
    +-------+--------+-----+-----+    +-------+--------+-----+-----+

    +-------+--------+-----+-----+    +-------+--------+-----+-----+
    | RTP2b | L-ID=6 |NF=0 |F2-L2|    | RTP2c |L-ID=10 |NF=0 |F2-L3|
    +-------+--------+-----+-----+    +-------+--------+-----+-----+

   If the editing notes below Table 3 for other possible approaches.
      One of payloads carry data from two consecutive input frames, the alternative ways may be chosen
   same encoded data as in the final draft.

   4) Currently, it previous example is mandated that lower layer EDUs arranged as follows.

   +-------+--------+-----+-----+-----+
    | RTP1a | L-ID=1 |NF=1 |F1-L1|F2-L1|
    +-------+--------+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP1b | L-ID=6 |NF=1 |F1-L2|F2-L2|
    +-------+--------+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP1c |L-ID=10 |NF=1 |F1-L3|F2-L3|
    +-------+--------+-----+-----+-----+

A.2.  Advanced Examples

A.2.1.  Different Update Rate for Subset of later frames go
      before higher layer EDUs Layers

   An example employing different update rates (i.e. different number of earlier
   frames in a transport block.
      This way is friendlier to adaptation (dropping per packet) for selected subsets of higher layers).
      However, if layers.  In these examples
   all core codec layers L1-L5 are received, then the depacketizer needs
      to reorder the EDUs to their decoding order before feeding them to
      the decoder. Therefore, the other way around (i.e. lower layer
      EDUs of later frames go after higher layer EDUs of earlier frames,
      or EDUs in transport blocks are placed in decoding order) is more
      friendly to the depacketizer. Another benefit shown.

    +-------+--------+-----+-----+-----+-----+-----+
    | RTP1  | L-ID=1 |NF=3 |F1-L1|F2-L1|F3-L1|F4-L1|
    +-------+--------+-----+-----+-----+-----+-----+

    +-------+--------+-----+-----+-----+-----+-----+
    | RTP2a | L-ID=7 |NF=1 |F1-L2|F2-L2|F1-L3|F2-L3|
    +-------+--------+-----+-----+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP3a |L-ID=14 |NF=0 |F1-L4|F1-L5|
    +-------+--------+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP3b |L-ID=14 |NF=0 |F2-L4|F2-L5|
    +-------+--------+-----+-----+-----+

    +-------+--------+-----+-----+-----+-----+-----+
    | RTP2b | L-ID=7 |NF=1 |F3-L2|F4-L2|F3-L3|F4-L3|
    +-------+--------+-----+-----+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP3c |L-ID=14 |NF=0 |F3-L4|F3-L5|
    +-------+--------+-----+-----+-----+

    +-------+--------+-----+-----+-----+
    | RTP3d |L-ID=14 |NF=0 |F4-L4|F4-L5|
    +-------+--------+-----+-----+-----+

A.2.2.  Redundant Frames With Limited Set of Layers

   An example transmitting layers L1-L3 as primary data and L1 (of the latter is
      that it does not introduce any end-to-end delay. Which way to be
      specified (or both allowed if needed)
   previous frame) as redundant data is FFS.

   5) MANEs dropping RTP packets are RTP translators. But are those
      MANEs dropping a subset of the transport blocks shown below.  Each payload
   carries one primary (i.e. new) frame in one packet also
      RTP translators?

   6) The RTCP based cross-session synchronization transport block and one
   redundant frame, which in this example is not possible until the first RTCP SRs are received frame preceding the
   primary frame, in all sessions. This implies that
      decoding only another transport block.

    +-------+--------+-----+-----+--------+-----+-----+-----+-----+
    | RTP1  | L-ID=1 |NF=0 |F0-L1| L-ID=3 |NF=0 |F1-L1|F1-L2|F1-L3|
    +-------+--------+-----+-----+--------+-----+-----+-----+-----+

    +-------+--------+-----+-----+--------+-----+-----+-----+-----+
    | RTP2  | L-ID=1 |NF=0 |F1-L1| L-ID=3 |NF=0 |F2-L1|F2-L2|F2-L3|
    +-------+--------+-----+-----+--------+-----+-----+-----+-----+

    +-------+--------+-----+-----+--------+-----+-----+-----+-----+
    | RTP3  | L-ID=1 |NF=0 |F2-L1| L-ID=3 |NF=0 |F3-L1|F3-L2|F3-L3|
    +-------+--------+-----+-----+--------+-----+-----+-----+-----+

   Alternatively, the payload carrying also redundant data for a subset
   of layers may can be possible until RTCP SRs arranged differently, as shown in
      all sessions have been received. This may imposes higher end-to-
      end delay or higher bandwidth for RTCP data, the example below.

    +-------+--------+-----+-----+-----+-----+--------+-----+-----+
    | RTP1  | L-ID=3 |NF=0 |F0-L1|F0-L2|F0-L3| L-ID=1 |NF=0 |F1-L1|
    +-------+--------+-----+-----+-----+-----+--------+-----+-----+

    +-------+--------+-----+-----+-----+-----+--------+-----+-----+
    | RTP2  | L-ID=3 |NF=0 |F1-L1|F1-L2|F1-L3| L-ID=1 |NF=0 |F2-L1|
    +-------+--------+-----+-----+-----+-----+--------+-----+-----+

    +-------+--------+-----+-----+-----+-----+--------+-----+-----+
    | RTP3  | L-ID=3 |NF=0 |F2-L1|F2-L2|F2-L3| L-ID=1 |NF=0 |F3-L1|
    +-------+--------+-----+-----+-----+-----+--------+-----+-----+

   Now the first transport block carries the primary data and the approach may
      not work perfectly for some multicast topologies. There is a study
      ongoing by some AVT members. Once there is an acceptable solution
      fouthe draft documenting that solution may be referenced second
   transport block carries the redundant data, which in this
      draft.

   7) It might be better to change case covers
   the semantics of frame following the media type
      parameter 'layers' to be similar as that for L-ID.

   8) Offer/answer with answer being capable primary frame.  The benefit of modifying this approach
   is that the layer
      configuration redundant data is FFS.

   9) Some references need to be updated included in the final draft.

10. Changes Log

   From draft-ietf-avt-rtp-g718-00 to draft-ietf-avt-rtp-g718-01

   - Updated the boiler template.

   - Changed Ye-Kui Wang's affiliation and address.

   From draft-ietf-avt-rtp-g718-01 to draft-ietf-avt-rtp-g718-02
   -  Updated last (secondary)
   transport block of the boiler template (added payload, which might be beneficial for
   possible payload scaling operation within the last sentence in Copyright
     Notice). network.

Authors' Addresses

   Glen Zorn (editor)
   Network Zen
   227/358 Thanon Sanphawut
   Bang Na, Bangkok  10260
   Thailand

   Phone: +66 (0) 87-040-4617
   EMail: gwz@net-zen.net

   Ye-Kui Wang
   Huawei Technologies
   400 Somerset Corp Blvd.
   Suite 402
   Bridgewater, NJ  08807
   USA

   Phone: +1 (908) 541-3518
   EMail: yekuiwang@huawei.com

   Ari Lakaniemi
   Nokia
   P.O.Box 407
   FIN-00045 Nokia Group
   Finland

   Phone: +358-71-8008000
   EMail: ari.lakaniemi@nokia.com