Internet Engineering Task Force                            G. Liebl,
                                                          T.Stockhammer
  Internet Draft                                  LNT, Munich Univ. of
                                                             Technology
  Document: draft-ietf-avt-uxp-02.txt
March 1, draft-ietf-avt-uxp-03.txt
  June 2002                                       M. Wagner, J.Pandel,
                                                     W. Weng, G. Baese,
                                                  M. Nguyen, F. Burkert
  Expires: Sept. 1, December  2002                           Siemens AG, Munich

       An RTP Payload Format for Erasure-Resilient Transmission of
                     Progressive Multimedia Streams

  Status of this Memo
     This document is an Internet-Draft and is in full conformance
        with all provisions of Section 10 of RFC2026 []. RFC2026.

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups. Note that
     other groups may also distribute working documents as Internet-
     Drafts. Internet-Drafts are draft documents valid for a maximum
     of six months and may be updated, replaced, or obsoleted by other
     documents at any time. It is inappropriate to use Internet-
     Drafts as reference material or to cite them other than as "work
     in progress."
     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt
     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.

1.

  Abstract
     This document specifies an efficient way to ensure erasure-resilient erasure-
     resilient transmission of progressively encoded multimedia
     sources via RTP using Reed-Solomon codes. (RS) codes together with
     interleaving. The level of erasure protection can be explicitly
     adapted to the importance of the respective parts in the source
     stream, thus allowing a graceful degradation of application
     quality with increasing packet loss rate on the network. Hence,
     this type of unequal erasure protection (UXP) schemes is intended
     to cope with the rapidly varying channel conditions on wireless
     access links to the Internet backbone. Nevertheless, backward
     compatibility to currently standardized non-progressive
     multimedia codecs is ensured, since equal erasure protection
     (EXP) represents a subset of generic UXP. By defining applying
     interleaving and RS codes a comparably simple  payload format, the proposed
   scheme format is defined, which can
     be easily integrated into the existing framework for RTP.

Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert      [Page1]

2. Conventions used in this document

   The following terms are used throughout this document:

   1.) Message block: a higher layer transport unit (e.g. an IP
   packet), that enters/leaves the segmentation/reassembly stage at the
   interface to wireless data link layers.

   2.) Segment: denotes a link layer transport unit.

   3.) CRC: Cyclic Redundancy Check, usually added to transport units
   at the sender

  1. Introduction

     Due to detect the existence increasing popularity of erroneous bits in a
   transport unit at high-quality multimedia
     applications over the receiver.

   4.) Segmentation/Reassembly Process: If Internet and the size high level of public

  Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert      [Page1]
     acceptance of existing mobile communication systems, there is a
     strong demand for a future combination of these two techniques:
     One possible scenario consists of an integrated communication
     environment, where users can set up multimedia connections
     anytime and anywhere via radio access links to the transport
   units at Internet.
     For this reason, several packet-oriented transmission modes have
     been proposed for next generation wireless standards like EGPRS
     (Enhanced General Packet Radio Service) or UMTS (Universal Mobile
     Telecommunications System), which are mostly based on the link layer is smaller than same
     principle: Long message blocks, i.e. IP packets, that at enter the upper layers,
   message blocks have to be
     wireless part of the network are split up into several parts, i.e.
   segments, segments of
     desired length, which can be multiplexed onto link layer packets
     of fixed size. The latter are then transmitted subsequently sequentially over
     the link. If
   nothing is lost, wireless link, reassembled, and passed on to the original message block can be restored at the
   receiving entity (reassembly).

   5.) Quality-of-service: application-dependent criterion next network
     element.
     However, compared to define a
   certain desired operation point.

   6.) Codec: denotes the rather benign channel characteristics on
     today's fixed networks, wireless links suffer from severe fading,
     noise, and interference conditions in general, thus resulting in
     a functional pair consisting comparably high residual bit error rate after detection and
     decoding. By use of a source encoding
   unit efficient CRC-mechanisms, these bit errors
     are usually detected with very high probability, and every
     corrupted segment, i.e. which contains at least one erroneous
     bit, is discarded to prevent error propagation through the sender and a corresponding source decoding unit
     network. But if only one single segment is missing at the
   receiver; usually standardized for different
     reassemble stage, the upper layer IP packet cannot be
     reconstructed anymore. The result is a significant increase in
     packet loss rate at IP level.
     Since most multimedia applications
   like audio or video.

   7.) Progressive source coding: results in successive blocks of
   (source-)encoded data (e.g. a single video or audio frame), each of
   which can be viewed as only recover from a bitstream of certain length, whose distinct
   elements are very
     limited number of different importance lost message blocks, it is vitally necessary to the reconstruction process
     keep packet loss at the decoder. Elements are commonly ordered from highest to least
   importance, where the latter elements depend IP level within a certain acceptable range
     depending on the previous.

   8.) Reed-Solomon (RS) code: belongs individual quality-of-service requirements.
     However, due to the class of linear nonbinary
   block codes, and is uniquely specified delay constraints typically imposed by most
     audio or video codecs, the block length n, the
   number use of parity symbols t, and the symbol alphabet.

   9.) n: ARQ-schemes is a variable, which denotes often prohibited
     both the block length of a RS
   codeword, at link level and the number of columns in a TB (see 16).

   10.) k: is a variable, at transport level. In addition,
     retransmission strategies cannot be applied to any broadcast or
     multicast scenarios. Thus, forward erasure correction strategies
     have to be considered, which denotes provide a simple means to
     reconstruct the number content of information
   symbols in a RS codeword.

   11.) t: is a variable, which denotes lost packets at the receiver from the
     redundancy that has been spread out over a certain number of parity symbols in
   a RS codeword.

   12.) Erasure: When a
     subsequent packets.
     There already exist some previous studies and proposals regarding
     erasure-resilient packet is lost during transmission, an erasure
   is said to have happened. transmission[1,8]. Since the position most of them
     are based on the erased packet assumption that all parts in a sequence is usually known, a corresponding erasure marker can be
   set at message block are
     equally important to the receiving entity.

   13.) Base layer: comprises receiver, i.e. the first and most important elements respective
     application cannot operate on partly complete blocks, they were
     optimized with respect to assigning equal erasure protection over
     the whole message block. However, recent developments both in
   a
     audio and video coding have introduced the notion of
     progressively encoded bitstream, without media streams, for which all subsequent
   information is useless.

   14.) Enhancement layer: comprises one or unequal erasure
     protection strategies seem to be more sets of the less
   important subsequent elements promising, as it will be
     explained in a progressively encoded bitstream.
   A specific enhancement layer can be decoded, if and only if more detail below. Although the base
   layer and all previous enhancement layer data (of higher importance) scheme defined in
     [1] is available.

   15.) Info stream: denotes the final bitstream which has in principle capable of supporting some kind of unequal
     erasure protection, possible implementations seem to be
   protected by the proposed UXP scheme. It usually consists of the
   (source-encoded) bitstream (progressive or not), which is already
   arranged according quite
     complex with respect to a desired syntax (e.g. as specified in the
   respective RTP profile for the media codec gain in use).
   In any case, performance. Finally, in [1]
     it is assumed that every info subsequent RTP packets can have variable
     length, which would cause significant segmentation overhead at
     the link layer of almost all wireless systems.
     This document defines a payload format for RTP, such that
     different elements in a progressively encoded multimedia stream is already octet-
   aligned
     can be protected against packet erasures according to their
     respective quality-of-service requirement. The general principle,
     including the standard procedures defined in the context
   of the used syntax specifications.

   16.) Transmission block (TB): denotes a memory array of L rows and n
   columns. Each row use of a TB represents a RS codeword, whereas each
   column, Reed-Solomon codes together with an
     appropriate interleaving scheme for adding redundancy, follows
     the respective UXP header (see 33) ideas already presented in [2], but allows for finer
     granularity in front,
   forms the payload of a single RTP packet.
   Each TB consists structure of at least two distinct transmission sub blocks
   (TSB, see 17): the progressive media stream. The first L_s rows belong to
     proposed scheme is generic in the signaling TSB,
   whereas the last L_d=(L-L_s) rows belong to one or more data TSB.

   17.) Transmission sub block (TSB): denotes a memory array of 0<l<L
   rows and n columns, which way that it (1) is a horizontal slice independent
     of a TB. Depending
   on whether the info byte positions are filled with descriptors (see
   28) or media data, the TSB is of type signaling of media stream, be it audio or data,
   respectively.

   18.) L: is a variable, which denotes both the number video, and (2) can be
     adapted to varying transmission quality very quickly by use of rows
     inband-signaling.
  2. Conventions used in this document

     The following terms are used throughout this document:
     1.)  Message block: a TB
   and the payload length (without UXP header) of higher layer transport unit (e.g. an RTP packet in
   bytes.

   19.) Unequal erasure protection (UXP): IP
          packet), that enters/leaves the segmentation/reassembly
          stage at the interface to wireless data link layers.
     2.)  Segment: denotes a specific strategy
   which varies link layer transport unit.
     3.)  CRC: Cyclic Redundancy Check, usually added to transport
          units at the level of erasure protection across a TB according sender to a given redundancy profile.

   20.) Equal erasure protection (EXP): is a subset of UXP, for which detect the level existence of erasure protection is kept constant across erroneous
          bits in a TB.

   21.) Redundancy profile: describes transport unit at the receiver.
     4.)  Segmentation/Reassembly Process: If the size of the different erasure
   protection classes in a TB, i.e. the number of rows (codewords) per
   class.

   22.) Erasure protection class: contains a set of rows (codewords) of
          transport units at the TB with same erasure correction capability.

   23.) i: link layer is a variable, smaller than that at
          the upper layers, message blocks have to be split up into
          several parts, i.e. segments, which are then transmitted
          subsequently over the link. If nothing is lost, the original
          message block can be restored at the receiving entity
          (reassembly).
     5.)  Quality-of-service: application-dependent criterion to
          define a certain desired operation point.
     6.)  Codec: denotes a functional pair consisting of a source
          encoding unit at the number sender and a corresponding source
          decoding unit at the receiver; usually standardized for
          different multimedia applications like audio or video.
     7.)  Media stream: A bitstream. which results at the output of parity bytes an
          encoder for
   each row a specific media type, e.g. H.263, MPEG-4-video.
     8.)  Progressive  media stream: A media stream which can be
          divided into successive elements. . The distinct elements
          are of different importance to the reconstruction process at
          the decoder and are commonly ordered from highest to least
          importance, where the latter elements depend on the
          previous.
     9.)  Progressive source coding: results in erasure protection a progressive media
          stream.
     10.) Reed-Solomon (RS) code: belongs to the class i.

   24.) CA_i: of linear
          nonbinary block codes, and is uniquely specified by the
          block length n, the number of parity symbols t, and the
          symbol alphabet.
     11.) n: is a variable, which denotes both the set block length of rows contained a
          RS codeword, and the number of columns in
   erasure protection class i.

   25.) A_i: a TB (see 19).
     12.) k: is a variable, which denotes the total number of rows
   contained information
          symbols in erasure protection class i, i.e. the cardinality of
   CA_i.

   26.) T: a RS codeword.
     13.) t: is a variable, which denotes the number of parity bytes for
   each row in the highest erasure protection class (with respect to
   application data) symbols
          in a TB.

   27.) AV: denotes the RS codeword.
     14.) Erasure: When a packet is lost during transmission, an
          erasure protection vector of length (T+1) used is said to describe a certain redundancy profile.

   28.) DP: descriptor used for in-band signaling have happened. Since the position of the
          erased packet in a sequence is usually known, a
          corresponding erasure
   protection vector.

   29.) SI: stuffing indicator, which contains the number of media
   stuffing symbols marker can be set at the end receiving
          entity.
     15.) Base layer: comprises the first and most important elements
          of a data TSB (see 31).

   30.) Descriptor Stuffing: insertion the   progressively encoded source, without which all
          subsequent information is useless.
     16.) Enhancement layer: comprises one or more sets of otherwise unused descriptor
   values (i.e. 0x00) at the end less
          important subsequent elements of the signaling TSB. Descriptor
   stuffing is performed, progressively encoded
          source. A specific enhancement layer can be decoded, if and
          only if the final sequence of descriptors base layer and
   stuffing indicators for a valid redundancy profile all previous enhancement layer
          data (of higher importance) is shorter than available.
     17.) Info stream: denotes the space initially reserved for it in  bitstream which has to be
          protected by the signaling TSB.

   31.) Media Stuffing: insertion UXP scheme. It usually consists of additional symbols at the end of
          media stream (progressively source encoded or not), which is
          arranged according to a
   data TSB. Media stuffing desired syntax (e.g. to achieve an
          appropriate framing, see 6.3 ). In any case, it is performed, if the assumed
          that every info stream (see 15) is shorter than already octet-aligned according to
          the space reserved for it standard procedures defined in the data TSB for a
   desired redundancy profile. Since the number context of stuffing symbols is
   signaled in the respective SI, any byte value may be used (e.g.
   0x00).

   32.) Interleaver: performs the spreading
          syntax specifications.
     18.) Info octet: Denotes one element of a codeword, i.e. a row
   in the TB, over info stream.
     19.) Transmission block (TB): denotes a memory array of L rows
          and n successive packets, such that the probability columns. Each row of
   an erasure burst in a codeword is kept small.

   33.) UXP header: is TB represents a RS codeword,
          whereas each column, together with the additional respective UXP header information contained
          (see 36) in
   each RTP packet after UXP has been applied. It is always present at
   the start of front, forms the payload section of an a single RTP packet.

   34.) X:
          Each TB consists of at least two distinct transmission sub
          blocks (TSB, see20): The first L_s rows belong to the
          signaling TSB, whereas the last L_d=(L-L_s) rows belong to
          one or more data TSB.

     20.) Transmission sub block (TSB): denotes a currently not used extension field memory array of 1 bit in
          0<l<L rows and n columns, which is a horizontal slice of a
          TB. Depending on whether the
   UXP header.

   35.) P: info octet positions are filled
          with descriptors (see31) or media data, the TSB is of type
          signaling or data, respectively.
     21.) L: is a variable variable, which denotes both the number of parity symbols per
   row used to protect rows in a
          TB and the inband signaling payload length (without UXP header, see 36) of an
          RTP packet in octets.
     22.) Unequal erasure protection (UXP): denotes a specific
          strategy which varies the level of erasure protection across
          a TB according to a given redundancy profile.

   36.) ceil(.): denotes

     23.)  Equal erasure protection (EXP): is a subset of UXP, for
          which the ceiling function, i.e. rounding up to the
   next integer.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
   this document are to be interpreted as described in RFC-2119 [].

3. Introduction

   Due to the increasing popularity of high-quality multimedia
   applications over the Internet and the high level of public
   acceptance of existing mobile communication systems, there erasure protection is kept constant
          across a
   strong demand for a future combination of these two techniques: One
   possible scenario consists of an integrated communication
   environment, where users can set up multimedia connections anytime
   and anywhere via radio access links to TB.
     24.) Redundancy profile: describes the Internet.
   For this reason, several packet-oriented transmission modes have
   been proposed for next generation wireless standards like EGPRS
   (Enhanced General Packet Radio Service) or UMTS (Universal Mobile
   Telecommunications System), which are mostly based on size of the same
   principle: Long message blocks, different
          erasure protection classes in a TB, i.e. IP packets, that enter the
   wireless part number of the network are split up into segments rows
          (codewords) per class.
     25.) Erasure protection class: contains a set of desired
   length, which can be multiplexed onto link layer packets rows (codewords)
          of fixed
   size. The latter are then transmitted sequentially over the wireless
   link, reassembled, and passed on to the next network element.

   However, compared to the rather benign channel characteristics on
   today's fixed networks, wireless links suffer from severe fading,
   noise, and interference conditions in general, thus resulting in a
   comparably high residual bit error rate after detection and
   decoding. By use of efficient CRC-mechanisms, these bit errors are
   usually detected TB with very high probability, and every corrupted
   segment, i.e. which contains at least one erroneous bit, is
   discarded to prevent error propagation through the network. But if
   only one single segment is missing at the reassembly stage, the
   upper layer IP packet cannot be reconstructed anymore. The result same erasure correction capability.
     26.) i: is a significant increase in packet loss rate at IP level.

   Since most multimedia applications can only recover from a very
   limited variable, which denotes the number of lost message blocks, it parity
          symbols for each row in erasure protection class i.

     27.) EPC_i: is vitally necessary to
   keep packet loss at IP level within a certain acceptable range
   depending on variable, which denotes the individual quality-of-service requirements.
   However, due to set of rows
          contained in erasure protection class i.
     28.) R_i: is a variable, which denotes the delay constraints typically imposed by most
   audio or video codecs, total number of rows
          contained in erasure protection class i, i.e. the use
          cardinality of ARQ-schemes EPC_i.
     29.) T: is often prohibited
   both at link level and at transport level. In addition,
   retransmission strategies cannot be applied to any broadcast or
   multicast scenarios. Thus, forward a variable, which denotes the number of parity
          symbols for each row in the highest erasure correction strategies
   have protection class
          (with respect to be considered, which provide application data) in a simple means TB.
     30.) EPV: denotes the erasure protection vector of length (T+1)
          used to reconstruct describe a certain redundancy profile.
     31.) DP: descriptor used for in-band signaling of the content erasure
          protection vector.
     32.) SI: stuffing indicator, which contains the number of lost packets media
          stuffing symbols at the receiver from the redundancy that
   has been spread out over end of a certain number data TSB (see 34).
     33.) Descriptor Stuffing: insertion of subsequent packets.

   There already exist some previous studies and proposals regarding
   erasure-resilient packet transmission, otherwise unused
          descriptor values (i.e. 0x00) at the end of whom the most important
   one with respect to RTP signaling
          TSB. Descriptor stuffing is described in [1]. Since most performed, if the final sequence
          of them are
   based on descriptors and stuffing indicators for a valid
          redundancy profile is shorter than the assumption that all parts space initially
          reserved for it in a message block are
   equally important to the receiver, i.e. signaling TSB.
     34.) Media Stuffing: insertion of additional symbols at the respective application
   cannot operate on partly complete blocks, they were optimized with
   respect to assigning equal erasure protection over end
          of a data TSB. Media stuffing is performed, if the whole message
   block. However, recent developments both in audio and video coding
   have introduced info
          stream (see 17) is shorter than the notion of progressively encoded source streams, space reserved for which unequal erasure protection strategies seem to be more
   promising, as it will be explained in more detail below. Although
   the scheme defined in [1]
          the data TSB for a desired redundancy profile. Since the
          number of stuffing symbols is signaled in principle capable of supporting some
   kind of unequal erasure protection, possible implementations seem to
   be quite complex with respect to the gain in performance. Finally,
   in [1] it is assumed that subsequent RTP packets can have variable
   length, which would cause significant segmentation overhead at respective SI,
          any octet value may be used (e.g. 0x00).
     35.) Interleaver: performs the
   link layer spreading of almost all wireless systems.

   This document defines a payload format for RTP, codeword, i.e. a
          row in the TB, over n successive packets, such that different
   elements in a progressively encoded multimedia stream can be
   protected against packet erasures according to their respective
   quality-of-service requirement. The general principle, including the
   use
          probability of Reed-Solomon codes together with an appropriate interleaving
   scheme for adding redundancy, follows the ideas already presented erasure burst in
   [2], but allows for finer granularity a codeword is kept small.
     36.) UXP header: is the additional header information contained
          in each RTP packet after UXP has been applied. It is always
          present at the structure start of the
   progressive source stream. The proposed scheme is generic payload section of an RTP
          packet.
     37.) X: denotes a currently not used extension field of 1 bit in
          the way
   that it (1) UXP header.
     38.) P: is independent a variable which denotes the number of parity symbols
          per row used to protect the type inband signaling of multimedia stream, be it
   audio or video, the
          redundancy profile.
     39.) ceil(.): denotes the ceiling function, i.e. rounding up to
          the next integer.

     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
     NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and (2) can be adapted
     "OPTIONAL" in this document are to varying transmission
   quality very quickly by use of inband-signaling.

4. be interpreted as described in
     RFC-2119.

  3. Reed-Solomon Codes

     Reed-Solomon (RS) codes are a special class of linear nonbinary
     block codes, which are known to offer maximum erasure correction
     capability with minimum amount of redundancy.
     An arbitrary t-erasure-correcting (n,k) RS code defined over
     Galois field GF(q) has the following parameters [3]:
     - Block length:                                      n=q-1
     - No. of information symbols in a codeword:          k
     - No. of parity-check symbols in a codeword:         n-k=t
     - Minimum distance:                                  d=t+1

     In what follows, only systematic RS codes over GF(2^8) shall be
     considered, i.e. the symbols of interest can be directly related
     to a tuple of eight bits, which is commonly called a byte an octet in
     packet transmission. The principle structure of a codeword is
     shown in Fig. 1.
     By shortening the initial (n=255,n-t) RS code, any desired
     (n',n'-t) RS code for a given erasure correction capability t may
     be obtained.

       block of n bytes octets
     <----------------->
     +-+-+-+-+-+-+-+-+-+
     |&|&|&|&|&|&|&|*|*|
     +-+-+-+-+-+-+-+-+-+
     <------------><--->
         k=n-t       t
       (&:info)     (*:parity)

     Fig. 1: Structure of a systematic RS codeword

5.

  4. Progressive Source Coding

   If the

     The output of an encoder for a multimedia codec, be it audio specific media type, e.g. H.263 or video,
     MPEG-4-video is said to be progressive, a media stream. If the encoded bitstream must consist media stream
     consists of several distinct elements, often organized in separate layers. The latter
   shall be defined via their relative which are of different
     importance with respect to the quality of the reconstruction
     process at the receiver. receiver, then the media stream is progressive.
     The progressive media stream is often organized in separate
     layers. Hence, there exists at least one layer, often called base
     layer, without which reconstruction fails at all, whereas all the
     other layers, often called enhancement layers, just help to
     continually improve the quality. Consequently, the different
     layers are usually contained in the (source-)encoded bitstream media stream
     in decreasing order of importance, i.e. the base layer data is
     followed by the various enhancement layers.
     An example can be found in the fine granular scalability modes
     which have been proposed to various standardization bodies like MPEG-4 [4]
   or ITU (H.26L) [5],
     MPEG-4, where the resolution of the scaling process in the
     progressive source encoder is as low as one symbol in the
     enhancement layer. layer [4]. Another example is given by data
     partitioning which can be applied to the  ITU/MPEG H.26L standard
     [5], MPEG-4, and H.263++. Also, the existence of I,P, and B
     frames in streams which comply with standards like MPEG-2 can be
     interpreted as progressive.
     From the above definition, it is quite obvious that the most
     important base layer data must be protected as strongly as
     possible against packet loss during transmission. However, the
     protection of the enhancement layers could be continually
     lowered, since a loss at this stage has only minor consequences
     for the reconstruction process. Thus, by using a suitable unequal
     erasure protection strategy across a progressive source media stream,
     the overhead due to redundancy spent per (channel-)encoded block is reduced. Furthermore, if
     channel conditions get worse during transmission, only more and
     more enhancement layers are lost, i.e. a graceful degradation in
     application quality at the receiver is achieved [6].
     Nevertheless, it should be mentioned that the specific structure
     of
   a (source-)encoded bitstream the media stream strongly depends on the actual media codec in use,
     use and the desired syntax which is used for adapting the
   output of the codec to a does not always provide suitable mechanisms for transport level format
     over data networks, like framing (see also
   7.3). 6.3 ). In order to
     keep the description of the unequal erasure protection strategy
     in section 6 5 as general as possible, the final bitstream which
     has to be protected by the proposed UXP scheme will be called
     "info stream" in the following. Furthermore, it is assumed that
     every info stream is already octet-aligned according to the
     standard procedures defined in the context of the used syntax
     specifications.

6.

  5. General Structure of UXP schemes

     In this section, the principle features of the proposed UXP
     scheme are described with a special focus on the protection and
     reconstruction procedure which is applied to the info stream. In
     addition, the behavior of the sender and receiver is specified as
     far as it concerns the reconstruction of the info stream.
     However, the complete UXP payload structure, including the
     additional UXP header, is described in section 7. 6.

     The reason for using the term "info stream" as well as the
     details of the construction are described in Section 6.3 . For
     now, we assume that we have an info stream which has to be
     protected.

     Fig. 1 already illustrated the structure of a systematic
     codeword, which shall be represented by a single row and with n
     successive columns symbols that contain the information and the parity bytes.
     octets. This structure shall now be extended by forming a
     transmission block (TB) consisting of L codewords of length n bytes
     octets each, which amounts to a total of L rows and n columns
     [7]: Each column, together with the respective UXP header in
     front, shall represent the payload of an RTP packet, i.e. the
     whole data of a TB is transmitted via a sequence of n RTP packets
     all carrying a payload of length (L+2)
   bytes octets (UXP header
     included).

   The value of L should be chosen in such a way that the whole length
   of the resulting IP packet (i.e. RTP payload plus sum of RTP, UDP,
   and IP header) equals a multiple of the segment size on the wireless
   link to avoid stuffing at the data link layer.

     Each TB usually consists of two or more horizontal slices, sub blocks,
     the so-
   called so-called transmission sub blocks (TSB), as can be seen in
     Fig. 2: : The first L_s rows always belong to the signaling TSB,
     which is used to convey the actual redundancy profile in the data
     part to the receiver (see 7.3). 6.4.). The following L_d=(L-L_s) rows
     belong to one or more data TSBs, which contain the interleaved
     and RS encoded info stream, as will be described below.

     Transmission Block (TB)

                  /\ +-+-+-+-+-+-+-+-+-+ /\
                  |  |  signaling TSB  |  |  L_s bytes octets
                  |  +-+-+-+-+-+-+-+-+-+ \/
                  |  |                 | /\               /\
                  |  +   data TSB #1   +  |  L_d(1) bytes octets |
                  |  |                 |  |                |
                  |  +-+-+-+-+-+-+-+-+-+ \/                |
     L bytes octets     |  |                 | /\                |
     payload      |  +   data TSB #2   +  |  L_d(2) bytes octets |
     per packet   |  +                 |  |                |  L_d bytes
     octets
                  |  +-+-+-+-+-+-+-+-+-+ \/                |
                  |  |        .        |  .                |
                  |  +        .        +  .                |
                  |  |        .        |  .                |
                  |  +-+-+-+-+-+-+-+-+-+ /\                |
                  |  |   data TSB #z   |  |  L_d(z) bytes octets |
                  \/ +-+-+-+-+-+-+-+-+-+ \/               \/
                     <----------------->
                           n packets

     Fig. 2: General structure of a TB

     Since the UXP procedure is mainly applied to the data TSBs, it
     will be described next, whereas the content and syntax of the
     signaling TSB will be defined in section 7.3. 6.4.
     For means of simplification, only one single data TSB will be
     assumed throughout the following explanation of the encoding and
     decoding procedure. However, an extension to more than one data
     TSB per TB is straightforward, and will be shown in section 7.4. 6.5.
     As depicted in Fig. 3, the rows of a transmission sub block shall
     be partitioned into T+1 different classes CA_i, EPC_i, where i=0...T,
     such that each class contains exactly A_i=|CA_i| R_i=|EPC_i| consecutive
     rows of the matrix, where the A_i R_i have to satisfy the following
     relationship:
     A_0+A_1+...+A_T=L_d
     Data Transmission Sub Block (data TSB)
                                   T
                               <------->
                  /\ +-+-+-+-+-+-+-+-+-+ /\
                  |  |&|&|&|&|&|*|*|*|*|  |
                  |  +-+-+-+-+-+-+-+-+-+  |  A_T=3
                  |  |&|&|&|&|&|*|*|*|*|  |
                  |  +-+-+-+-+-+-+-+-+-+  |
     L_d bytes octets   |  |&|&|&|&|&|*|*|*|*| \/
     per packet   |  +-+-+-+-+-+-+-+-+-+ /\
                  |  |%|%|%|%|%|%|*|*|*|  |  A_(T-1)=1
                  |  +-+-+-+-+-+-+-+-+-+ \/
                  |  |$|$|$|$|$|$|$|*|*|  .
                  |  +-+-+-+-+-+-+-+-+-+  .
                  |  |!|!|!|!|!|!|!|!|*|  .
                  |  +-+-+-+-+-+-+-+-+-+ /\
                  |  |#|#|#|#|#|#|#|#|#|  |  A_0=1
                  \/ +-+-+-+-+-+-+-+-+-+ \/
                     <----------------->
                           n packets
     &,%,$,!,# : info bytes octets belonging to a certain info stream in
                 decreasing order of importance
     * :         parity bytes octets gained from Reed-Solomon coding
     Fig. 3: General structure for coding with unequal erasure
     protection

     Furthermore, all rows in a particular class CA_i EPC_i shall contain
     exactly the same number of parity bytes, octets, which is equal to the
     index i of the class. For each row in a certain class CA_i, EPC_i, the
     same (n,n-
   i) (n,n-i) RS code shall be applied.
     As can be observed from Fig. 3, class CA_T EPC_T contains the largest
     number of parity bytes octets per row, i.e. offers the highest erasure
     protection capability in the block. Consequently, the most
     important element in the info stream must be assigned to class CA_T,
     EPC_T, where the value of T should be chosen according to the
     desired outage threshold of the application given a certain
     packet erasure rate on the link.
     All other classes CA_(T-1)...CA_0 EPC_(T-1)...EPC_0 shall be sequentially filled
     with the remaining elements of the info stream in decreasing
     order of importance, where the optimal choice for the size of
     each class (0 or more rows), i.e. the structure of the redundancy
     profile, should depend on the quality-of-service requirements for
     the various (progressively-encoded) layers.
     The following set of rules contains a compact description of all
     the operations that must be performed for each transmission
     block:
     1.) The total number of columns n of the TB shall be chosen
     according to the actual delay constraints of the application.
     2.) Next, the expected number of rows reserved for the signaling
     TSB has to selected, which limits the data TSB to L_d=(L-L_s)
     rows.

     3.) The maximum erasure correction capability T in the data TSB
     should be chosen according to the desired outage threshold of the
     application given the actual packet erasure rate on the link.
     4.) The redundancy profile for the rest of the data TSB should
     depend on the size and number of the various layers in the info
     stream, as well as the desired probability of successful decoding
     for each of them (quality-of-service requirement).
     5.) Any suitable optimization algorithm may be used for deriving
     an adequate redundancy profile. However, the result has to
     satisfy the following constraints:
     a) All available info byte octet positions in the data TSB have to be
     completely filled. If the info stream is too short for a desired
     profile, media stuffing may be applied to the empty info byte info octet
     positions at the end of the data TSB by appending a sufficient
     number of octets (with arbitrary value, e.g. 0x00). The actual
     number of stuffing symbols per data TSB is then signaled via the
     respective stuffing indicator (see 6.4.). However, before
     resorting to any stuffing, it should be checked whether it is
     possible to strengthen the protection of certain rows instead,
     thus improving the overall robustness of the decoding process.
     b) The info stream should be fully contained within the data TSB
     (unless cutting it off at a specific point is explicitly allowed
     by the properties of the used media codec).
     c) The number of required descriptors and stuffing indicators
     (see section 6.4.) to signal the profile shall not exceed the
     space initially reserved for them in the signaling TSB.
     Constraints a) and b) should be already incorporated in the
     optimization algorithm. However, if constraint c) is not met, the
     data TSB has to be reduced by one row in favor of the signaling
     TSB to accommodate more space for the descriptors and stuffing
     indicators, i.e. steps 2-5 have to be repeated until a valid
     redundancy profile has been obtained.
     6.) For each nonempty class EPC_i, i=T...0, in the data TSB, the
     following steps have to be performed:
     a) All rows of this specific class shall be filled from left to
     right and top to bottom with data octets of the info stream in
     decreasing order of importance (i.e. starting with the most
     important element).
     b) For each row in the class, the required i parity-check octets
     are computed from the same set of codewords of an (n,n-i) RS
     code, and filled in the empty positions at the end of each row.
     Thus, every row in the data TSB by appending class constitutes a sufficient
   number of bytes (with arbitrary value, e.g. 0x00). The actual number valid codeword of stuffing symbols per the
     chosen RS code.

     7.) After having filled the whole data TSB is then signaled via with information and
     parity octets, the respective
   stuffing indicator (see 7.3). However, before resorting to any
   stuffing, it should be checked whether it redundancy profile is possible mapped to strengthen the protection signaling
     TSB as described in section 6.4.
     8.) Each column of certain rows instead, thus improving the overall
   robustness of resulting TB is now read out octet-wise
     from top to bottom and, together with the decoding process.
   b) respective UXP header
     (see section 6.2.) in front, is mapped onto the payload section
     of one and only one RTP packet.

     9.) The info stream should n resulting RTP packets shall be fully contained within transmitted subsequently
     to the data TSB
   (unless cutting it off remote host, starting with the leftmost one.
     10.) At the corresponding protocol entity at a specific point is explicitly allowed by the properties of remote host, the used media codec).
   c) The number
     payload (without the UXP header) of required descriptors and stuffing indicators (see
   section 7.3) all successfully received RTP
     packets belonging to signal the profile same sending TB shall not exceed the space
   initially reserved for them in the signaling TSB.
   Constraints a) and b) should be already incorporated filled into a
     similar receiving TB column-wise from top to bottom and left to
     right.
     11.) For every erased packet of a received TB, the respective
     column in the
   optimization algorithm. However, if constraint c) is not met, TB shall be filled with a suitable erasure marker.
     12.) Before any other operations can be performed, the
   data TSB redundancy
     profile has to be reduced by one row in favor of restored from the signaling TSB according to accomodate more space for
     the descriptors procedure defined in section 6.4.. If the attempt fails
     because of too many lost packets, the whole TB shall be discarded
     and stuffing
   indicators, i.e. steps 2-5 have the receiving entity should wait for the next incoming TB.
     13.) If the attempt to be repeated until a valid recover the redundancy profile has been obtained.

   6.) For
     successful, a decoding operation shall be performed for each nonempty class CA_i, i=T...0, in row
     of the data TSB, the
   following steps have to be performed:
   a) All TSB by applying any suitable algorithm for erasure
     decoding.
     14.) For all rows of this specific class shall be filled the data TSB for which the decoding
     operation has been successful, the reconstructed data octets are
     read out from left to right and top to bottom with data bytes bottom, and appended to
     the reconstructed version of the info stream stream.

     One can easily realize that the above rules describe an
     interleaver, i.e. at the sender a single codeword of a TB is
     spread out over n successive packets. Thus, each codeword of a
     transmitted TB experiences the same number of erasures at exactly
     the same positions.
     Two important conclusions can be drawn from this:
     a) Since the same RS code is applied to all rows contained in
   decreasing order a
     specific class, either all of importance (i.e. starting with them can be correctly decoded or
     not. Hence, there exist no partly decodable classes at the most
   important element).
     receiver.
     b) For each row in the class, If decoding is successful for a certain class EPC_i, all the required i parity-check bytes
     classes EPC_(i+1)...EPC_T can also be decoded, since they are
   computed from the same set of codewords of an (n,n-i) RS code, and
   filled in the empty positions
     protected by at the end of each least one more parity octet per row. Thus, every
   row Together
     with rule 6, it is therefore always ensured, that in the class constitutes case a valid codeword of the chosen RS code.

   7.) After having filled
     decodable enhancement layer exists, all other layers it depends
     on can also be reconstructed!

     Given the whole data TSB with information and
   parity bytes, maximum erasure protection value T, the redundancy
     profile is mapped to the signaling for a data TSB
   as described in section 7.3.

   8.) Each column of size (L_d x n) shall be denoted by a
     so-called erasure protection vector EPV of length (T+1), where
     EPV:=(A_0,A_1,...,A_(T-1),A_T)

     From the resulting TB above definition, it is now read out byte-wise from
   top easy to bottom and, together with realize that the respective UXP header (see
   section 7.2) in front, trivial
     cases of no erasure protection and EXP are a subset of UXP:
     a) no erasure protection at all: all application data is mapped
     onto the payload section of one and
   only one RTP packet.

   9.) The n resulting RTP packets shall be transmitted subsequently
        class EPC_0, i.e. EPV=(L_d,0,0,...,0).

     b) EXP: all application data is mapped onto class EPC_T, i.e.
        EPV=(0,0,...,0,A_T=L_d).
     Hence, backward compatibility to currently standardized non-
     progressive multimedia codecs is definitely achieved.

  6. RTP payload structure

     This section is organized as follows. First, the remote host, starting with the leftmost one.

   10.) At the corresponding protocol entity at specific
     settings in the remote host, RTP header is shown. Next, the RTP payload (without the header
     for UXP (the so-called UXP header) of all successfully received RTP
   packets belonging to is specified. After that, the same sending TB shall be filled into a
   similar receiving TB column-wise from top to bottom and left to
   right.

   11.) For every erased packet
     structure of a received TB, the respective column
   in the TB shall be filled with a suitable erasure marker.

   12.) Before any other operations can be performed, bitstream which is protected by UXP, the redundancy
   profile has to be restored from so-
     called info stream, is discussed. Finally, the in-band signaling TSB according to
     of the
   procedure defined in section 7.3. If erasure protection vector is introduced

     For every packet, the attempt fails because  UXP payload is formed by reading out a
     column of
   too many lost packets, the whole TB shall be discarded and prefixing it with the
   receiving entity should wait for UXP header. Thus, an
     UXP-compliant RTP packet looks as follows:

     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
     |RTP Header| UXP Header| one column of the next incoming TB (the source
   decoder may be informed about the missing info stream, if required).

   13.) If        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

  6.1 Specific settings in the attempt RTP header

     The timestamp of each RTP packet is set to recover the redundancy profile has been
   successful, a decoding operation shall be performed for each row sampling time of
     the data TSB by applying any suitable algorithm for erasure
   decoding.

   14.) For all rows first octet of the progressive media stream in the
     corresponding TB. If several data TSBs are included in one TB,
     the sampling time of data TSB for which #1 is relevant. This results in the decoding operation
   has been successful,
     TS value being the reconstructed data bytes are read out from
   left same for all RTP packets belonging to right a
     specific TB.
     The payload type is of dynamic type, and top obtained through out-of-
     band signaling similar to bottom, and appended [1]. End systems, which cannot
     recognize a payload type, must discard it.
     The marker bit is set to 1 for every last packet in a TB.
     Otherwise, its value is 0.
     All other fields in the reconstructed
   version of RTP header are set to those values
     proposed for regular multimedia transmission using the info stream.

   15.) For all rows RTP-format
     of the data TSB for media stream which is protected by UXP.

  6.2. Structure of the decoding operation
   has failed, a sufficient number UXP header

     The UXP header shall consist of suitable dummy symbols may 2 octets, and is shown in Fig. 4:

      0                   1 1 1 1 1 1
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |X|  block PT   | block length n|
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     Fig. 4: Proposed UXP header

     The fields in the header shall be
   added defined as follows:
     - X (bit 0): extension bit, reserved for future enhancements,
     currently not in use -> default value: 0
     - block PT (bits 1-7): regular RTP payload type to indicate the
     media type contained in the reconstructed info stream to inform the source decoder
   about the missing symbols.

   One can easily realize that the above rules describe an interleaver,
   i.e. at the sender a single codeword of a TB is spread out over
     - block length n
   successive packets. Thus, each codeword (bits 8-15): indicates total number of a transmitted RTP
     -                             packets
                                   resulting from one TB
   experiences (which equals
                                   the same number of erasures at exactly columns of the same
   positions.
   Two important conclusions can be drawn from this:
   a) Since TB)

     The syntax of the same RS code info stream which is applied to all rows protected by UXP is
     specified by the RTP payload type field contained in a
   specific class, either all the UXP
     header. The details of them can be correctly decoded or not.
   Hence, there exist no partly decodable classes at the receiver.
   b) If decoding is successful info stream are described in Sec. 6.3
     For example, payload type H.263 means that the info stream
     conforms to the specifications of the RTP profile for a certain class CA_i, all H.263 and
     does not represent the
   classes CA_(i+1)...CA_T "raw" H.263 media stream produced by an
     H.263 encoder.
     However, UXP can also be decoded, since they are
   protected by at least one more parity byte per row. Together with
   rule 6, it is therefore always ensured, that in applied to the "raw" media stream (in
     case a decodable
   enhancement layer exists, all other layers it depends on is already octet-aligned), if this can also be
   reconstructed!

   Given signaled to the maximum erasure protection value T,
     receiver via other means, e.g. by use of H.245 or SDP.
     Based on the redundancy profile
   for a data TSB RTP sequence number, the marker bit, and the
     repetition of size (L_d x n) shall be denoted the block length n in each UXP header, the
     receiving entity is able to recognize both TB boundaries and the
     actual position of lost packets in the TB.

  6.3 Framing and Timing Mechanism in UXP: The info stream.

     As described in section 5, UXP creates its own packetization
     scheme by a so-called
   erasure protection vector AV interleaving. The regular framing and timing structure
     of length (T+1), where

   AV:=(A_0,A_1,...,A_(T-1),A_T)

   From the above definition, it RTP is easy therefore destroyed. This section describes which kind
     of problems arise with interleaving and how they can be solved.
     This finally leads to realize that the trivial
   cases specification of no erasure protection and EXP are a subset the info stream.
     The timestamp of UXP:
   a) no erasure protection at all: all application data is mapped onto
      class CA_0, i.e. AV=(L_d,0,0,...,0).
   b) EXP: all application data is mapped onto class CA_T, i.e.
      AV=(0,0,...,0,A_T=L_d).

   Hence, backward compatibility to currently standardized non-
   progressive multimedia codecs is definitely achieved.

7. an RTP payload structure

   For every packet whose payload is formed by reading out a column usually describes the sampling
     time of the TB, first octet included in the RTP header must be followed by an data packet. This is
     in principle also true for UXP header.

7.1. Specific settings RTP packets. According to the time
     stamp definition in 6.1 every packet contains the RTP header

   The timestamp of each RTP packet resulting from reading out a
     the sampling time of the first octet in the corresponding TB.
     Therefore, all packets which belong to one TB is
   set contain the same
     timestamp. This can lead to problems since due to the theoretical
     size limit of a TB, it can contain data from different sampling
     time instant when instances, e.g. several video frames. Then the first byte timing
     information of the progressive
   source data stream later frames has been written into the TB. This results in to be determined from the
   TS value being
     media stream itself and not from the same for all RTP packets belonging to a specific
   TB.

   The payload type timestamp.
     A second problem arising with interleaving is that the framing
     mechanism of dynamic type, and obtained through out-of-
   band signaling similar to [1]. The signaling protocol must establish RTP is not supported. Consider a payload length to be associated media encoder,
     which does not create a fully decodable bitstream, e.g. H.26L
     with the payload type value. End
   systems, video coding layer (VCL) and network adaptation layer
     (NAL) concept [9]. In this concept the VCL creates slices which cannot recognize a payload type, must discard it.

   The marker bit is set to 1
     are NAL prepared for every last packet transmission over several networks at the
     NAL. Consequently, in a TB. Otherwise,
   its value case of RTP transmission, header
     information which allows to decode the slices is 0.

   All other fields included only in
     the RTP header are set packets. Thus, to those values proposed
   for regular multimedia transmission using fill an UXP TB with the same source codecs,
   but no erasure protection scheme enabled. "raw" media
     stream from the VCL can lead, even without packet losses, to a
     non-decodable stream.
     The framing problem can be solved in two ways:
     One solution could be to use the RTP payload shall consist of the UXP header followed by one
   column specification of the TB.

7.2. Structure of
     media stream to create a bitstream with an appropriate framing,
     the UXP header

   The UXP header shall consist of 2 octets, and is shown in Fig. 4:

    0                   1 1 1 1 1 1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |X|  block PT   | block length n|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Fig. 4: Proposed UXP header

   The fields in so-called info stream. For example, to create an H.263 info
     stream, the following steps are necessary:
     1.)  Generate an H.263-compliant media stream, i.e. take a slice
          or a video frame directly from the H.263 encoder.
     2.)  Apply the H.263 payload specification (e.g. RFC 2429) to
          create the header shall be defined as follows:
   - X (bit 0): extension bit, reserved for future enhancements,
                currently not in use -> default value: 0

   - block PT (bits 1-7): regular RTP payload type for only one packet.
     3.)  Insert the latter row by row into one data TSB.
     It is possible to indicate apply the
                          media type contained in procedure mentioned above several
     times for different data TSBs (see 6.5.). Due to the info stream

   - block length n (bits 8-15): indicates total number in-band
     signaling, it is possible to determine the beginning and end of RTP packets
                                 resulting from one TB (which equals
     every TSB without parsing the number whole TB. This allows a fast
     decomposition of columns the TB into the different TSB.

     Another solution of the TB)

   The syntax framing problem would be to relay on the
     framing mechanism of the info stream media stream. This is, for example,
     possible for media streams which contain start codes.
     The timing problem can be solved in two ways.
     One solution is protected by UXP is specified
   by to comply with the RTP payload type field specification of
     the media stream. If the specification allows to put into one
     packet octets which belong to different sampling times, this
     should also be allowed for a TB.
     The second solution for the timing problem is to rely on the
     timing information contained in the UXP header. For
   example, media stream itself, if
     available.
     Therefore, there are two different modes for framing:
     1.)  RTP payload type H.263 means that framing (if an RTP payload specification exists
          for the info media stream),
     2.)  pure media stream conforms to framing (if framing is contained in the specifications
          media stream),

     and two different modes for timing:
     1.)  timing rules of the RTP profile payload specification for H.263, the media
          stream,
     2.)  timing information within the media stream.

     All combinations of timing and framing modes are possible, but does not
     framing mode 1 and timing mode 1 represent the "raw" H.263 default mode of
     operation for UXP. The use of other timing and framing modes has
     to be signaled by non RTP means.

     The info stream is thus defined by the media stream produced by a H.263 encoder.
   However, UXP can also together with
     framing and timing rules.
     In the following, some examples will be applied given:
     1.)  The info stream for MPEG-4 video according to RFC 3016 is
          the raw output of the pure MPEG-4 compliant media
   codec stream, since RFC 3016
          specifies (in case it is already octet-aligned), if this of video) to take the MPEG-4 compliant
          video stream as payload.
     2.)  The info stream for H.263+ can be signaled created according to the receiver via other means, e.g. by use of H.245 or SDP.

   Based on the RTP sequence number, the marker bit, and the repetition
   of the block length n in each UXP header, the receiving entity RFC
          2429 as follows:
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
     |H.263+ payload| H.263+ compliant stream (possibly changed with|
     |header        | respect to RFC 2429) containing a slice/frame |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

     This info stream is
   able inserted into  one single data TSB.
     If necessary, for example, if the slices are too short to recognize both achieve
     a reasonable TB boundaries and the actual position of lost
   packets size, several info streams can be inserted in the TB. Furthermore, the specific choice of equal TS
   values for all RTP packets belonging one
     TB by concatenating several data TSBs to a one TB allows for overcoming
   possible sequence number overflow.

7.3. (see 6.5.).

  6.4. In-band signaling of the structure of the redundancy profile

     To enable a dynamic adaptation to varying link conditions, the
     actual redundancy profile used in the data TSB as well as the
     beginning and end of a TSB must be signaled to the receiving
     entity. Since out-of-band signaling either results in excessive
     additional control traffic, or prevents quick changes of the
     profile between successive TBs, an in-band signaling procedure is
     desired.
     As without knowledge of the correct redundancy profile, the
     decoding process cannot be applied to any of the erasure
     protection classes, it has to be protected at least as strongly
     as the most important element in the info stream against packet loss. stream. Therefore, an
     additional class CA_P EPC_P is used in the signaling TSB, where the
     number of parity symbols is by default set to the following
     value:
     P=ceil(n/2)
     Hence, up to 50% of the RTP packets can be lost, before the
     redundancy profile cannot be recovered anymore. This seems to be
     a reasonable value for the lowest point of operation over a lossy
     link. Alternatively, p P may be explicitly signaled during session
     setup by means of SDP or H.245 protocol.

     Consequently, since all other classes must have equal or less
     erasure protection capability, the maximum allowable value for
     class
   CA_T EPC_T in the data TSB is now limited to T<=P.
     The signaling of the erasure protection vector is accomplished by
     means of descriptors. For each class CA_i EPC_i with A_i>0, R_i>0, there is a
     descriptor DP_i providing information about the size of class CA_i
     EPC_i (i.e. the value of A_i) R_i) and establishing a relationship
     between the erasure protection of class CA_i EPC_i and that of the
     first preceding class CA_(i+j) EPC_(i+j) with A_(i+j)>0, where j>0. A
     descriptor DP_i is mapped onto one byte, octet, which is sub-divided
     into two half-bytes half-octets (i.e. the higher and the lower four bits).
     The first half-byte half-octet is of type unsigned and contains the 4-bit
     representation of the decimal value
   A_i. R_i. The second half-byte half-octet is
     of type signed and contains the difference in erasure protection
     between class CA_i EPC_i and class
   CA_(i+j), EPC_(i+j), i.e. the signed 4-bit
     representation of the decimal value (-j) (where the MSB denotes
     the sign, and the lower three bits the absolute value). Note that
     the erasure protection p P of class CA_p EPC_p is fixed, whereas the
     size A_p A_P may vary.
     Thus, the data to be filled into class CA_p EPC_P shall consist of a
     sequence of descriptors separated by stuffing indicators (see
     below), where the number of descriptors is primarily given by the
     number of protection classes CA_i, EPC_i, 0<=i<=T, in the data TSB with
   A_i>0.
     R_i>0.
     Without a-priori knowledge, the initial value for the size of the
     signaling TSB should be set to one (row). When the number of
     necessary descriptors and stuffing indicators exceeds the (n-p) (n-P)
     information positions, one or more additional rows have to be
     reserved. This is usually done by increasing the value for L_s to
   A_p>1,
     A_P>1, i.e. the data TSB is reduced to (L-A_p) (L-A_P) rows. Hence, in
     order to indicate the actual size of the signaling TSB, an
     additional descriptor is inserted at the very beginning, which
     takes on the value 0xq0, where q denotes the (octal) four bit
     representation of the decimal value A_p. A_P.
     Furthermore, the end of each data TSB is signaled by the
     otherwise unused descriptor value 0x00, followed by exactly one
     stuffing indicator (SI). The latter is mapped onto a byte, an octet,
     which is of type unsigned and contains the 8-bit representation
     of the decimal value of the number of media stuffing symbols used
     at the end of the respective data TSB.
     The (extended) sequence of descriptors and stuffing indicators is
     then mapped to the info byte octet positions in the A_p A_P rows of the
     signaling TSB from left to right and top to bottom. Each row is
     then encoded with the same (n,n-p) (n,n-P) RS code.
     If the number of descriptors and stuffing indicators is less than
     the available info byte octet positions, however, empty positions in class
   CA_p
     EPC_P may be filled up with the otherwise unused descriptor 0x00.

     At the receiving entity, the sequence of descriptors shall be
     recovered by performing erasure decoding on the first row of the
     TB (which definitely belongs to the signaling TSB) using the same
     algorithm as later for the data TSB. If successful, the very
     first descriptor now indicates the number of rows of the
     signaling TSB, and the next (A_p-1) (A_P-1) rows are decoded to
     reconstruct the redundancy profile for the data TSB(s), together
     with the number of media stuffing symbols denoted by the
     respective SI(s).

     The complete structure of the TB is now depicted in Fig. 5.

     Transmission Block (TB)
                                  P
                             <--------->
                  /\ +-+-+-+-+-+-+-+-+-+ /\
                  |  |?|?|?|?|*|*|*|*|*|  |  A_P=1
                  |  +-+-+-+-+-+-+-+-+-+ \/
                  |  |&|&|&|&|&|*|*|*|*| /\
                  |  +-+-+-+-+-+-+-+-+-+  |  A_T=3
                  |  |&|&|&|&|&|*|*|*|*|  |
                  |  +-+-+-+-+-+-+-+-+-+  |
     L bytes octets     |  |&|&|&|&|&|*|*|*|*| \/
     payload      |  +-+-+-+-+-+-+-+-+-+ /\
     per packet   |  +%|%|%|%|%|%|*|*|*|  |%|%|%|%|%|%|*|*|*|  |  A_(T-1)=1
                  |  +-+-+-+-+-+-+-+-+-+ \/
                  |  |$|$|$|$|$|$|$|*|*|  .
                  |  +-+-+-+-+-+-+-+-+-+  .
                  |  |!|!|!|!|!|!|!|!|*|  .
                  |  +-+-+-+-+-+-+-+-+-+ /\
                  |  |#|#|#|#|#|#|#|#|#|  |  A_0=1
                  \/ +-+-+-+-+-+-+-+-+-+ \/
                     <----------------->
                           n packets
     ? :          descriptors and stuffing indicators for in-band
                  signaling of the redundancy profile

     &,%,$,!,# :  info bytes octets belonging to a certain element of the
                  info stream in decreasing order of importance

     * :          parity bytes octets gained from Reed-Solomon coding

     Fig. 5: General structure for UXP with in-band signaling of the
     redundancy profile
     The following simple example is meant to illustrate the idea
     behind using descriptors: Let an erasure protection vector of
     length T+1=7 be given as follows:
   AV=(A_0,A_1,...,A_5,A_6)=(7,0,2,2,0,3,10)
     EPV=(A_0,A_1,...,A_5,A_6)=(7,0,2,2,0,3,10)
     Hence, the length L of the TB (including one row for the
     signaling TSB) is equal to 7+2+2+3+10+1=25 (rows/bytes). (rows/octets). If the
     width is assumed to be equal to 20 (columns/packets), then the
     erasure protection of the descriptors is p=10. P=10.
     The corresponding sequence of descriptors can be written as
     DP=(DP_6,DP_5,DP_3,DP_2,DP_0)=(0xAC,0x39,0x2A,0x29,0x7A),
     where the values of the descriptors are given in hexadecimal
     notation. Next, the descriptor indicating the length of the
     signaling TSB has to be inserted, the end of the data TSB has to
     be marked by 0x00, and the SI has to be appended. If the number
     of media stuffing symbols is assumed to be 3, the 10 info bytes octets
     in the signaling TSB take on the following values (descriptor
     stuffing included):
     (0x10,0xAC,0x39,0x2A,0x29,0x7A,0x00,0x03,0x00,0x00)

   7.4

     6.5. Optional Concatenation of Transmission Sub Blocks:

     The following procedure may be applied if a single info stream
     would be too short to achieve an efficient mapping to a
     transmission block with respect to the fixed payload length L and
     the desired number of packets n. For example, intra-coded video
     frames (I-frames) are usually much larger than the following
     predicted ones (P-frames). In this case, a certain number z of
     successive small info streams should be each mapped to a
     transmission sub block with length L_d(y) and width n, such that L_d(1)+L_d(2)+?+L_d(z)=L_d.
     L_d(1)+L_d(2)+...+L_d(z)=L_d.
     The resulting transmission sub blocks can then be easily
     concatenated to form a TB of size L x n having one common
     signaling TSB: Since the second half-byte half-octet of the descriptors is
     of type signed, we are able to incorporate both decreasing and
     increasing erasure protection profiles within one single
     signaling TSB.
     Note that once the lengths L_d(y) of the individual blocks have
     been fixed, the respective redundancy profiles can be determined
     independently of each other. However, the space initially
     reserved for the signaling TSB should be already large enough to
     avoid profile recalculation for each of the data TSBs in case the
     sequence of descriptors gets too long!

     Again, we will give a simple example to illustrate this idea: Let
     the erasure protection vectors for two concatenated data TSBs be
     given as follows:

   AV1=(A1_0,A1_1,...,A1_5,A1_6)=(0,0,2,2,0,3,10),
   AV2=(A2_0,A2_1,...,A2_5,A2_6)=(0,0,2,2,0,3,10).
     EPV1=(A1_0,A1_1,...,A1_5,A1_6)=(0,0,2,2,0,3,10),
     EPV2=(A2_0,A2_1,...,A2_5,A2_6)=(0,0,2,2,0,3,10).
     Hence, two single identical data TSBs will be concatenated to
     form a TB of length L=2*(2+2+3+10)+2=36 (rows/bytes). (rows/octets). If the
     width is again assumed to be equal to 20 (columns/packets), then
     the erasure protection of the descriptors is p=10, P=10, and therefore
     a total of two rows for the signaling TSB have been reserved this
     time. The corresponding sequence of descriptors can now be
     written as DP=(0xAC,0x39,0x2A,0x29,0xA4,0x39,0x2A,0x29), where
     the values of the descriptors are given in hexadecimal notation.
     If the number of media stuffing symbols is assumed to be 3 for
     each data TSB, the 20 info byte octet positions in the signaling TSB
     are filled with the following values (descriptor stuffing
     included):

   (0x20,0xAC,0x39,0x2A,0x29,0x00,0x03,0xA4,0x39,0x2A,0x29,0x00,0x03,
     (0x20,0xAC,0x39,0x2A,0x29,0x00,0x03,0xA4,0x39,0x2A,0x29,0x00,0x03
     ,
     0x00,0x00,0x00,0x00,0x00,0x00,0x00)

  8. Security Considerations
     The payload of the RTP-packets consists of an interleaved multimedia media
     and parity stream. Therefore, it is reasonable to encrypt the
     resulting stream with one key rather than using different keys
     for
   multimedia media and parity data. It should also be noted that
     encryption of the multimedia media data without encryption of the parity
     data could enable known-plaintext attacks.
     The overall proportion between parity bytes octets and info bytes octets
     should be chosen carefully if the packet loss is due to network
     congestion. If the proportion of parity bytes octets per TB is
     increased in this case, it could lead to increasing network
     congestion. Therefore, the proportion between parity bytes octets and
     info bytes octets per TB MUST NOT be increased as packet loss increases
     due to network congestion.
     The overall ratio between parity and info bytes octets MUST NOT be
     higher than 1:1, i.e. the absolute bitrate spent for redundancy
     must not be larger than the bitrate required for transmission of
     multimedia data itself.

  9. Application Statement
     There are currently two different schemes proposed for unequal
     error protection in the IETF-AVT: Unequal Level Protection (ULP)
     and Unequal Erasure Protection (UXP).
     Although both methods seem to address the same problem, the
     proposed solutions differ in many respects. This section tries to
     describe possible application scenarios and to show the strength
     and weaknesses of both approaches.
     The main difference between both approaches is that while ULP
     preserves the structure of the packets which have to be protected
     and provides the redundancy in extra packets, UXP interleaves the
     info stream which has to be protected, inserts the redundancy
     information, and thus creates a totally new packet structure.
     Another difference concerns multicast compatibility: It cannot be
     assumed that all future terminals will be able to apply UXP/ULP.

     Therefore, backward compatibility could be an issue in some
     cases. Since ULP does not change the original packet structure,
     but only adds some extra packets, it is possible for terminals
     which do not
     support ULP to discard the extra packets. In case of UXP,
     however, two separate streams with and without erasure protection
     have to be sent, which increases the bandwidth. overall data rate.
     Next, both approaches offer different mechanism mechanisms to adjust packet
     sizes, if necessary: UXP allows to adjust the packet sizes
     arbitrarily. This is an advantage in case the loss probability is
     dependent on the packet length, which happens, for example, if
     the end-to-end connection contains wireless links. In this case
     proper adjustment of the packet size is one essential network adaption
     adaptation technique. In addition, if a preencoded stream is sent
     over the network, the packet size can be adjusted independently
     of slice structures.
     Since ULP does not change the existing packetization scheme, this
     flexibility does not exist.
     The ability of UXP to adjust the packet size arbitrarily can be
     especially exploited in a streaming scenario, if a delay of
     several hundred milliseconds is acceptable. It is then possible
     to fill several video frames into a single TB of desired size,
     e.g. a group of pictures consisting of I-frame, P-frames and B-frames. B-
     frames. The redundancy scheme can thus be selected in such a way
     as to guarantee the following property: In case of packet loss,
     the streams for P-
   frames P-frames are only recoverable, if the I-frame, on
     which the decoding of P-frames depends, is recoverable. The same
     is true for B-frames, which can only be decoded if the respective
     P-frames are recoverable. This prevents situations in which, for
     example, the B-frames have been received correctly, but the P-frames P-
     frames have been lost, i.e. assures a gradual decrease in
     application quality also on the frame level. Of course, a similar
     encoding is possible with ULP. But in this case one might have to
     send several frames within one packet which leads to large packet
     sizes.

   Furthmore,
     Furthermore, decoding delay is also a crucial issue in
     communications. Again, both approaches have different delay
     properties: UXP introduces a decoding delay because a reasonable
     amount of correctly received packets are necessary to start
     decoding of a TB. The delay in general depends on the dimensions
     of the interleaver. This should be considered for any system
     design which includes UXP.
     With ULP, every correctly received media packet can be decoded
     right away. However, a significant delay is introduced, if
     packets are corrupted, because in this case one has to wait for
     several redundancy packets. Thus, the delay is in general
     dependent on the actual ULP-FEC-packet scheme and cannot be
     considered in advance during the system design phase.
     Finally, we want to point out that UXP uses RS-codes which are
     known
     to be the most efficient type of block codes in terms of erasure
     correction capability.

  10. Intellectual Property Considerations
     Siemens AG has filed patent applications that might possibly have
     technical relations to this contribution.
     On IPR related issues, Siemens AG refers to the Siemens Statement
     on Patent Licensing, see http://www.ietf.org/ietf/IPR/SIEMENS-General. http://www.ietf.org/ietf/IPR/SIEMENS-
     General.

  11. References
     [1] J. Rosenberg and H. Schulzrinne, "An RTP Payload Format for
     Generic Forward Error Correction", Request for Comments 2733,
     Internet Engineering Task Force, Dec. 1999.
     [2] A. Albanese, J. Bloemer, J. Edmonds, M. Luby, and M. Sudan,
     "Priority encoding transmission", IEEE Trans. Inform. Theory,
     vol. 42, no. 6, pp. 1737-1744, Nov. 1996.
     [3] Shu Lin and Daniel J. Costello, Error Control Coding:
     Fundamentals and Applications, Prentice-Hall, Inc., Englewood
     Cliffs, N.J., 1983.
     [4] W. Li: "Fine Granularity Scalability Using Bit-Plane Coding of
   DCT Coefficients", ISO/IEC JTC1/SC29/WG11, Doc. MPEG98/M4204, Dec.
   1998. "Streaming video profile in MPEG-4", IEEE trans. on
     Circuits and Systems for Video Technology, Vol. 11, no. 3, 301-
     317, Mar 2001.
     [5] G. Blaettermann, G. Heising, and D. Marpe: "A Quality
     Scalable Mode for H.26L", ITU-T SG16, Q.15, Q15-J24, Osaka, May
     2000.
     [6] F. Burkert, T. Stockhammer, and J. Pandel, "Progressive A/V
     coding for lossy packet networks - a principle approach", Tech.
     Rep., ITU-T SG16, Q.15, Q15-I36, Red Bank, N.J., Oct. 1999.
     [7] Guenther Liebl, "Modeling, theoretical analysis, and coding
     for wireless packet erasure channels", Diploma Thesis, Inst. for
     Communications Engineering, Munich University of Technology,
     1999.
     [8] U. Horn, K. Stuhlmuller, M. Link, and B. Girod, "Robust
     Internet video transmission based on scalable coding and unequal
     error protection", Image Com., vol. 15, no. 1-2, pp. 77-94, Sep.
     1999.
     [9] S. Wenger, "H.26L over IP: The IP-Network Adaptation Layer",
     Packet Video 2002, Pittsburgh, Pennsylvania, USA, April 24-
     26,2002.

  12. Acknowledgments
     Many thanks to Thomas Stockhammer, who initially came up with the
   idea of unequal erasure protection to improve progressive video
   transmission over lossy networks. Philippe Gentric and Stephen Casner for helpful
     comments and improvements.

  13. Author's Addresses
     Guenther Liebl, Thomas Stockhammer
     Institute for Communications Engineering (LNT)
     Munich University of Technology
     D-80290 Munich
     Germany
     Email: {liebl,tom}@lnt.e-technik.tu-muenchen.de

     Minh-Ha Nguyen, Frank Burkert
     Siemens AG - ICM D MP RD MCH 83/81
     D-81675 Munich
     Germany
     Email: {minhha.nguyen,frank.burkert}@mch.siemens.de

     Marcel Wagner, Juergen Pandel, Wenrong Weng, Gero Baese
     Siemens AG - Corporate Technology CT IC 2
     D-81730 Munich
     Germany
     Email:
   {marcel.wagner,juergen.pandel,wenrong.weng,gero.baese}@mchp.siemens.
   de
     {marcel.wagner,juergen.pandel,wenrong.weng,gero.baese}@mchp.sieme
     ns.de

  Full Copyright Statement
     "Copyright (C) The Internet Society (date). All Rights Reserved.
     This document and translations of it may be copied and furnished
     to others, and derivative works that comment on or otherwise
     explain it or assist in its implementation may be prepared,
     copied, published and distributed, in whole or in part, without
     restriction of any kind, provided that the above copyright notice
     and this paragraph are included on all such copies and derivative
     works. However, this document itself may not be modified in any
     way, such as by removing the copyright notice or references to
     the Internet Society or other Internet organizations, except as
     needed for the purpose of developing Internet standards in which
     case the procedures for copyrights defined in the Internet
     Standards process must be followed, or as required to translate
     it into languages other than English.
     The limited permissions granted above are perpetual and will not
     be revoked by the Internet Society or its successors or assigns.

     This document and the information contained herein is provided on
     an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
     ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES; EXPRESS OR
     IMPLIED; INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE
     OF INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
     WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
     PURPOSE.