draft-ietf-avt-uxp-02.txt   draft-ietf-avt-uxp-03.txt 
Internet Engineering Task Force G. Liebl, Internet Engineering Task Force G. Liebl,
T.Stockhammer T.Stockhammer
Internet Draft LNT, Munich Univ. Internet Draft LNT, Munich Univ. of
of Technology Technology
Document: draft-ietf-avt-uxp-02.txt Document: draft-ietf-avt-uxp-03.txt
March 1, 2002 M. Wagner, J.Pandel, June 2002 M. Wagner, J.Pandel,
W. Weng, G. Baese, W. Weng, G. Baese,
M. Nguyen, F. Burkert M. Nguyen, F. Burkert
Expires: Sept. 1, 2002 Siemens AG, Munich Expires: December 2002 Siemens AG, Munich
An RTP Payload Format for Erasure-Resilient Transmission of Progressive An RTP Payload Format for Erasure-Resilient Transmission of
Multimedia Streams Progressive Multimedia Streams
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance
This document is an Internet-Draft and is in full conformance with with all provisions of Section 10 of RFC2026.
all provisions of Section 10 of RFC2026 [].
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of Drafts. Internet-Drafts are draft documents valid for a maximum
six months and may be updated, replaced, or obsoleted by other of six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts documents at any time. It is inappropriate to use Internet-
as reference material or to cite them other than as "work in Drafts as reference material or to cite them other than as "work
progress." in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
1. Abstract Abstract
This document specifies an efficient way to ensure erasure-
This document specifies an efficient way to ensure erasure-resilient resilient transmission of progressively encoded multimedia
transmission of progressively encoded multimedia sources via RTP sources via RTP using Reed-Solomon (RS) codes together with
using Reed-Solomon codes. The level of erasure protection can be interleaving. The level of erasure protection can be explicitly
explicitly adapted to the importance of the respective parts in the adapted to the importance of the respective parts in the source
source stream, thus allowing a graceful degradation of application stream, thus allowing a graceful degradation of application
quality with increasing packet loss rate on the network. Hence, this quality with increasing packet loss rate on the network. Hence,
type of unequal erasure protection (UXP) schemes is intended to cope this type of unequal erasure protection (UXP) schemes is intended
with the rapidly varying channel conditions on wireless access links to cope with the rapidly varying channel conditions on wireless
to the Internet backbone. Nevertheless, backward compatibility to access links to the Internet backbone. Nevertheless, backward
currently standardized non-progressive multimedia codecs is ensured, compatibility to currently standardized non-progressive
since equal erasure protection (EXP) represents a subset of generic multimedia codecs is ensured, since equal erasure protection
UXP. By defining a comparably simple payload format, the proposed (EXP) represents a subset of generic UXP. By applying
scheme can be easily integrated into the existing framework for RTP. interleaving and RS codes a payload format is defined, which can
be easily integrated into the existing framework for RTP.
Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page1]
2. Conventions used in this document
The following terms are used throughout this document:
1.) Message block: a higher layer transport unit (e.g. an IP
packet), that enters/leaves the segmentation/reassembly stage at the
interface to wireless data link layers.
2.) Segment: denotes a link layer transport unit.
3.) CRC: Cyclic Redundancy Check, usually added to transport units
at the sender to detect the existence of erroneous bits in a
transport unit at the receiver.
4.) Segmentation/Reassembly Process: If the size of the transport
units at the link layer is smaller than that at the upper layers,
message blocks have to be split up into several parts, i.e.
segments, which are then transmitted subsequently over the link. If
nothing is lost, the original message block can be restored at the
receiving entity (reassembly).
5.) Quality-of-service: application-dependent criterion to define a
certain desired operation point.
6.) Codec: denotes a functional pair consisting of a source encoding
unit at the sender and a corresponding source decoding unit at the
receiver; usually standardized for different multimedia applications
like audio or video.
7.) Progressive source coding: results in successive blocks of
(source-)encoded data (e.g. a single video or audio frame), each of
which can be viewed as a bitstream of certain length, whose distinct
elements are of different importance to the reconstruction process
at the decoder. Elements are commonly ordered from highest to least
importance, where the latter elements depend on the previous.
8.) Reed-Solomon (RS) code: belongs to the class of linear nonbinary
block codes, and is uniquely specified by the block length n, the
number of parity symbols t, and the symbol alphabet.
9.) n: is a variable, which denotes both the block length of a RS
codeword, and the number of columns in a TB (see 16).
10.) k: is a variable, which denotes the number of information
symbols in a RS codeword.
11.) t: is a variable, which denotes the number of parity symbols in
a RS codeword.
12.) Erasure: When a packet is lost during transmission, an erasure
is said to have happened. Since the position of the erased packet in
a sequence is usually known, a corresponding erasure marker can be
set at the receiving entity.
13.) Base layer: comprises the first and most important elements in
a progressively encoded bitstream, without which all subsequent
information is useless.
14.) Enhancement layer: comprises one or more sets of the less
important subsequent elements in a progressively encoded bitstream.
A specific enhancement layer can be decoded, if and only if the base
layer and all previous enhancement layer data (of higher importance)
is available.
15.) Info stream: denotes the final bitstream which has to be
protected by the proposed UXP scheme. It usually consists of the
(source-encoded) bitstream (progressive or not), which is already
arranged according to a desired syntax (e.g. as specified in the
respective RTP profile for the media codec in use).
In any case, it is assumed that every info stream is already octet-
aligned according to the standard procedures defined in the context
of the used syntax specifications.
16.) Transmission block (TB): denotes a memory array of L rows and n
columns. Each row of a TB represents a RS codeword, whereas each
column, together with the respective UXP header (see 33) in front,
forms the payload of a single RTP packet.
Each TB consists of at least two distinct transmission sub blocks
(TSB, see 17): The first L_s rows belong to the signaling TSB,
whereas the last L_d=(L-L_s) rows belong to one or more data TSB.
17.) Transmission sub block (TSB): denotes a memory array of 0<l<L
rows and n columns, which is a horizontal slice of a TB. Depending
on whether the info byte positions are filled with descriptors (see
28) or media data, the TSB is of type signaling or data,
respectively.
18.) L: is a variable, which denotes both the number of rows in a TB
and the payload length (without UXP header) of an RTP packet in
bytes.
19.) Unequal erasure protection (UXP): denotes a specific strategy
which varies the level of erasure protection across a TB according
to a given redundancy profile.
20.) Equal erasure protection (EXP): is a subset of UXP, for which
the level of erasure protection is kept constant across a TB.
21.) Redundancy profile: describes the size of the different erasure
protection classes in a TB, i.e. the number of rows (codewords) per
class.
22.) Erasure protection class: contains a set of rows (codewords) of
the TB with same erasure correction capability.
23.) i: is a variable, which denotes the number of parity bytes for
each row in erasure protection class i.
24.) CA_i: is a variable, which denotes the set of rows contained in
erasure protection class i.
25.) A_i: is a variable, which denotes the total number of rows
contained in erasure protection class i, i.e. the cardinality of
CA_i.
26.) T: is a variable, which denotes the number of parity bytes for
each row in the highest erasure protection class (with respect to
application data) in a TB.
27.) AV: denotes the erasure protection vector of length (T+1) used
to describe a certain redundancy profile.
28.) DP: descriptor used for in-band signaling of the erasure
protection vector.
29.) SI: stuffing indicator, which contains the number of media
stuffing symbols at the end of a data TSB (see 31).
30.) Descriptor Stuffing: insertion of otherwise unused descriptor
values (i.e. 0x00) at the end of the signaling TSB. Descriptor
stuffing is performed, if the final sequence of descriptors and
stuffing indicators for a valid redundancy profile is shorter than
the space initially reserved for it in the signaling TSB.
31.) Media Stuffing: insertion of additional symbols at the end of a
data TSB. Media stuffing is performed, if the info stream (see 15)
is shorter than the space reserved for it in the data TSB for a
desired redundancy profile. Since the number of stuffing symbols is
signaled in the respective SI, any byte value may be used (e.g.
0x00).
32.) Interleaver: performs the spreading of a codeword, i.e. a row
in the TB, over n successive packets, such that the probability of
an erasure burst in a codeword is kept small.
33.) UXP header: is the additional header information contained in
each RTP packet after UXP has been applied. It is always present at
the start of the payload section of an RTP packet.
34.) X: denotes a currently not used extension field of 1 bit in the
UXP header.
35.) P: is a variable which denotes the number of parity symbols per
row used to protect the inband signaling of the redundancy profile.
36.) ceil(.): denotes the ceiling function, i.e. rounding up to the
next integer.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC-2119 [].
3. Introduction 1. Introduction
Due to the increasing popularity of high-quality multimedia Due to the increasing popularity of high-quality multimedia
applications over the Internet and the high level of public applications over the Internet and the high level of public
Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page1]
acceptance of existing mobile communication systems, there is a acceptance of existing mobile communication systems, there is a
strong demand for a future combination of these two techniques: One strong demand for a future combination of these two techniques:
possible scenario consists of an integrated communication One possible scenario consists of an integrated communication
environment, where users can set up multimedia connections anytime environment, where users can set up multimedia connections
and anywhere via radio access links to the Internet. anytime and anywhere via radio access links to the Internet.
For this reason, several packet-oriented transmission modes have For this reason, several packet-oriented transmission modes have
been proposed for next generation wireless standards like EGPRS been proposed for next generation wireless standards like EGPRS
(Enhanced General Packet Radio Service) or UMTS (Universal Mobile (Enhanced General Packet Radio Service) or UMTS (Universal Mobile
Telecommunications System), which are mostly based on the same Telecommunications System), which are mostly based on the same
principle: Long message blocks, i.e. IP packets, that enter the principle: Long message blocks, i.e. IP packets, that enter the
wireless part of the network are split up into segments of desired wireless part of the network are split up into segments of
length, which can be multiplexed onto link layer packets of fixed desired length, which can be multiplexed onto link layer packets
size. The latter are then transmitted sequentially over the wireless of fixed size. The latter are then transmitted sequentially over
link, reassembled, and passed on to the next network element. the wireless link, reassembled, and passed on to the next network
element.
However, compared to the rather benign channel characteristics on However, compared to the rather benign channel characteristics on
today's fixed networks, wireless links suffer from severe fading, today's fixed networks, wireless links suffer from severe fading,
noise, and interference conditions in general, thus resulting in a noise, and interference conditions in general, thus resulting in
comparably high residual bit error rate after detection and a comparably high residual bit error rate after detection and
decoding. By use of efficient CRC-mechanisms, these bit errors are decoding. By use of efficient CRC-mechanisms, these bit errors
usually detected with very high probability, and every corrupted are usually detected with very high probability, and every
segment, i.e. which contains at least one erroneous bit, is corrupted segment, i.e. which contains at least one erroneous
discarded to prevent error propagation through the network. But if bit, is discarded to prevent error propagation through the
only one single segment is missing at the reassembly stage, the network. But if only one single segment is missing at the
upper layer IP packet cannot be reconstructed anymore. The result is reassemble stage, the upper layer IP packet cannot be
a significant increase in packet loss rate at IP level. reconstructed anymore. The result is a significant increase in
packet loss rate at IP level.
Since most multimedia applications can only recover from a very Since most multimedia applications can only recover from a very
limited number of lost message blocks, it is vitally necessary to limited number of lost message blocks, it is vitally necessary to
keep packet loss at IP level within a certain acceptable range keep packet loss at IP level within a certain acceptable range
depending on the individual quality-of-service requirements. depending on the individual quality-of-service requirements.
However, due to the delay constraints typically imposed by most However, due to the delay constraints typically imposed by most
audio or video codecs, the use of ARQ-schemes is often prohibited audio or video codecs, the use of ARQ-schemes is often prohibited
both at link level and at transport level. In addition, both at link level and at transport level. In addition,
retransmission strategies cannot be applied to any broadcast or retransmission strategies cannot be applied to any broadcast or
multicast scenarios. Thus, forward erasure correction strategies multicast scenarios. Thus, forward erasure correction strategies
have to be considered, which provide a simple means to reconstruct have to be considered, which provide a simple means to
the content of lost packets at the receiver from the redundancy that reconstruct the content of lost packets at the receiver from the
has been spread out over a certain number of subsequent packets. redundancy that has been spread out over a certain number of
subsequent packets.
There already exist some previous studies and proposals regarding There already exist some previous studies and proposals regarding
erasure-resilient packet transmission, of whom the most important erasure-resilient packet transmission[1,8]. Since most of them
one with respect to RTP is described in [1]. Since most of them are are based on the assumption that all parts in a message block are
based on the assumption that all parts in a message block are equally important to the receiver, i.e. the respective
equally important to the receiver, i.e. the respective application application cannot operate on partly complete blocks, they were
cannot operate on partly complete blocks, they were optimized with optimized with respect to assigning equal erasure protection over
respect to assigning equal erasure protection over the whole message the whole message block. However, recent developments both in
block. However, recent developments both in audio and video coding audio and video coding have introduced the notion of
have introduced the notion of progressively encoded source streams, progressively encoded media streams, for which unequal erasure
for which unequal erasure protection strategies seem to be more protection strategies seem to be more promising, as it will be
promising, as it will be explained in more detail below. Although explained in more detail below. Although the scheme defined in
the scheme defined in [1] is in principle capable of supporting some [1] is in principle capable of supporting some kind of unequal
kind of unequal erasure protection, possible implementations seem to erasure protection, possible implementations seem to be quite
be quite complex with respect to the gain in performance. Finally, complex with respect to the gain in performance. Finally, in [1]
in [1] it is assumed that subsequent RTP packets can have variable it is assumed that subsequent RTP packets can have variable
length, which would cause significant segmentation overhead at the length, which would cause significant segmentation overhead at
link layer of almost all wireless systems. the link layer of almost all wireless systems.
This document defines a payload format for RTP, such that
different elements in a progressively encoded multimedia stream
can be protected against packet erasures according to their
respective quality-of-service requirement. The general principle,
including the use of Reed-Solomon codes together with an
appropriate interleaving scheme for adding redundancy, follows
the ideas already presented in [2], but allows for finer
granularity in the structure of the progressive media stream. The
proposed scheme is generic in the way that it (1) is independent
of the type of media stream, be it audio or video, and (2) can be
adapted to varying transmission quality very quickly by use of
inband-signaling.
2. Conventions used in this document
This document defines a payload format for RTP, such that different The following terms are used throughout this document:
elements in a progressively encoded multimedia stream can be 1.) Message block: a higher layer transport unit (e.g. an IP
protected against packet erasures according to their respective packet), that enters/leaves the segmentation/reassembly
quality-of-service requirement. The general principle, including the stage at the interface to wireless data link layers.
use of Reed-Solomon codes together with an appropriate interleaving 2.) Segment: denotes a link layer transport unit.
scheme for adding redundancy, follows the ideas already presented in 3.) CRC: Cyclic Redundancy Check, usually added to transport
[2], but allows for finer granularity in the structure of the units at the sender to detect the existence of erroneous
progressive source stream. The proposed scheme is generic in the way bits in a transport unit at the receiver.
that it (1) is independent of the type of multimedia stream, be it 4.) Segmentation/Reassembly Process: If the size of the
audio or video, and (2) can be adapted to varying transmission transport units at the link layer is smaller than that at
quality very quickly by use of inband-signaling. the upper layers, message blocks have to be split up into
several parts, i.e. segments, which are then transmitted
subsequently over the link. If nothing is lost, the original
message block can be restored at the receiving entity
(reassembly).
5.) Quality-of-service: application-dependent criterion to
define a certain desired operation point.
6.) Codec: denotes a functional pair consisting of a source
encoding unit at the sender and a corresponding source
decoding unit at the receiver; usually standardized for
different multimedia applications like audio or video.
7.) Media stream: A bitstream. which results at the output of an
encoder for a specific media type, e.g. H.263, MPEG-4-video.
8.) Progressive media stream: A media stream which can be
divided into successive elements. . The distinct elements
are of different importance to the reconstruction process at
the decoder and are commonly ordered from highest to least
importance, where the latter elements depend on the
previous.
9.) Progressive source coding: results in a progressive media
stream.
10.) Reed-Solomon (RS) code: belongs to the class of linear
nonbinary block codes, and is uniquely specified by the
block length n, the number of parity symbols t, and the
symbol alphabet.
11.) n: is a variable, which denotes both the block length of a
RS codeword, and the number of columns in a TB (see 19).
12.) k: is a variable, which denotes the number of information
symbols in a RS codeword.
13.) t: is a variable, which denotes the number of parity symbols
in a RS codeword.
14.) Erasure: When a packet is lost during transmission, an
erasure is said to have happened. Since the position of the
erased packet in a sequence is usually known, a
corresponding erasure marker can be set at the receiving
entity.
15.) Base layer: comprises the first and most important elements
of the progressively encoded source, without which all
subsequent information is useless.
16.) Enhancement layer: comprises one or more sets of the less
important subsequent elements of the progressively encoded
source. A specific enhancement layer can be decoded, if and
only if the base layer and all previous enhancement layer
data (of higher importance) is available.
17.) Info stream: denotes the bitstream which has to be
protected by the UXP scheme. It usually consists of the
media stream (progressively source encoded or not), which is
arranged according to a desired syntax (e.g. to achieve an
appropriate framing, see 6.3 ). In any case, it is assumed
that every info stream is already octet-aligned according to
the standard procedures defined in the context of the used
syntax specifications.
18.) Info octet: Denotes one element of the info stream.
19.) Transmission block (TB): denotes a memory array of L rows
and n columns. Each row of a TB represents a RS codeword,
whereas each column, together with the respective UXP header
(see 36) in front, forms the payload of a single RTP packet.
Each TB consists of at least two distinct transmission sub
blocks (TSB, see20): The first L_s rows belong to the
signaling TSB, whereas the last L_d=(L-L_s) rows belong to
one or more data TSB.
4. Reed-Solomon Codes 20.) Transmission sub block (TSB): denotes a memory array of
0<l<L rows and n columns, which is a horizontal slice of a
TB. Depending on whether the info octet positions are filled
with descriptors (see31) or media data, the TSB is of type
signaling or data, respectively.
21.) L: is a variable, which denotes both the number of rows in a
TB and the payload length (without UXP header, see 36) of an
RTP packet in octets.
22.) Unequal erasure protection (UXP): denotes a specific
strategy which varies the level of erasure protection across
a TB according to a given redundancy profile.
23.) Equal erasure protection (EXP): is a subset of UXP, for
which the level of erasure protection is kept constant
across a TB.
24.) Redundancy profile: describes the size of the different
erasure protection classes in a TB, i.e. the number of rows
(codewords) per class.
25.) Erasure protection class: contains a set of rows (codewords)
of the TB with same erasure correction capability.
26.) i: is a variable, which denotes the number of parity
symbols for each row in erasure protection class i.
27.) EPC_i: is a variable, which denotes the set of rows
contained in erasure protection class i.
28.) R_i: is a variable, which denotes the total number of rows
contained in erasure protection class i, i.e. the
cardinality of EPC_i.
29.) T: is a variable, which denotes the number of parity
symbols for each row in the highest erasure protection class
(with respect to application data) in a TB.
30.) EPV: denotes the erasure protection vector of length (T+1)
used to describe a certain redundancy profile.
31.) DP: descriptor used for in-band signaling of the erasure
protection vector.
32.) SI: stuffing indicator, which contains the number of media
stuffing symbols at the end of a data TSB (see 34).
33.) Descriptor Stuffing: insertion of otherwise unused
descriptor values (i.e. 0x00) at the end of the signaling
TSB. Descriptor stuffing is performed, if the final sequence
of descriptors and stuffing indicators for a valid
redundancy profile is shorter than the space initially
reserved for it in the signaling TSB.
34.) Media Stuffing: insertion of additional symbols at the end
of a data TSB. Media stuffing is performed, if the info
stream (see 17) is shorter than the space reserved for it in
the data TSB for a desired redundancy profile. Since the
number of stuffing symbols is signaled in the respective SI,
any octet value may be used (e.g. 0x00).
35.) Interleaver: performs the spreading of a codeword, i.e. a
row in the TB, over n successive packets, such that the
probability of an erasure burst in a codeword is kept small.
36.) UXP header: is the additional header information contained
in each RTP packet after UXP has been applied. It is always
present at the start of the payload section of an RTP
packet.
37.) X: denotes a currently not used extension field of 1 bit in
the UXP header.
38.) P: is a variable which denotes the number of parity symbols
per row used to protect the inband signaling of the
redundancy profile.
39.) ceil(.): denotes the ceiling function, i.e. rounding up to
the next integer.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC-2119.
3. Reed-Solomon Codes
Reed-Solomon (RS) codes are a special class of linear nonbinary Reed-Solomon (RS) codes are a special class of linear nonbinary
block codes, which are known to offer maximum erasure correction block codes, which are known to offer maximum erasure correction
capability with minimum amount of redundancy. capability with minimum amount of redundancy.
An arbitrary t-erasure-correcting (n,k) RS code defined over
An arbitrary t-erasure-correcting (n,k) RS code defined over Galois Galois field GF(q) has the following parameters [3]:
field GF(q) has the following parameters [3]:
- Block length: n=q-1 - Block length: n=q-1
- No. of information symbols in a codeword: k - No. of information symbols in a codeword: k
- No. of parity-check symbols in a codeword: n-k=t - No. of parity-check symbols in a codeword: n-k=t
- Minimum distance: d=t+1 - Minimum distance: d=t+1
In what follows, only systematic RS codes over GF(2^8) shall be In what follows, only systematic RS codes over GF(2^8) shall be
considered, i.e. the symbols of interest can be directly related to considered, i.e. the symbols of interest can be directly related
a tuple of eight bits, which is commonly called a byte in packet to a tuple of eight bits, which is commonly called an octet in
transmission. The principle structure of a codeword is shown in Fig. packet transmission. The principle structure of a codeword is
1. shown in Fig. 1.
By shortening the initial (n=255,n-t) RS code, any desired (n',n'-t) By shortening the initial (n=255,n-t) RS code, any desired
RS code for a given erasure correction capability t may be obtained. (n',n'-t) RS code for a given erasure correction capability t may
be obtained.
block of n bytes block of n octets
<-----------------> <----------------->
+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+
|&|&|&|&|&|&|&|*|*| |&|&|&|&|&|&|&|*|*|
+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+
<------------><---> <------------><--->
k=n-t t k=n-t t
(&:info) (*:parity) (&:info) (*:parity)
Fig. 1: Structure of a systematic RS codeword Fig. 1: Structure of a systematic RS codeword
5. Progressive Source Coding 4. Progressive Source Coding
If the output of a multimedia codec, be it audio or video, is said
to be progressive, the encoded bitstream must consist of several
distinct elements, often organized in separate layers. The latter
shall be defined via their relative importance with respect to the
quality of the reconstruction process at the receiver. Hence, there
exists at least one layer, often called base layer, without which
reconstruction fails at all, whereas all the other layers, often
called enhancement layers, just help to continually improve the
quality. Consequently, the different layers are usually contained in
the (source-)encoded bitstream in decreasing order of importance,
i.e. the base layer data is followed by the various enhancement
layers.
An example can be found in the fine granular scalability modes which
have been proposed to various standardization bodies like MPEG-4 [4]
or ITU (H.26L) [5], where the resolution of the scaling process in
the progressive source encoder is as low as one symbol in the
enhancement layer.
The output of an encoder for a specific media type, e.g. H.263 or
MPEG-4-video is said to be a media stream. If the media stream
consists of several distinct elements, which are of different
importance with respect to the quality of the reconstruction
process at the receiver, then the media stream is progressive.
The progressive media stream is often organized in separate
layers. Hence, there exists at least one layer, often called base
layer, without which reconstruction fails at all, whereas all the
other layers, often called enhancement layers, just help to
continually improve the quality. Consequently, the different
layers are usually contained in the (source-)encoded media stream
in decreasing order of importance, i.e. the base layer data is
followed by the various enhancement layers.
An example can be found in the fine granular scalability modes
which have been proposed to various standardization bodies like
MPEG-4, where the resolution of the scaling process in the
progressive source encoder is as low as one symbol in the
enhancement layer [4]. Another example is given by data
partitioning which can be applied to the ITU/MPEG H.26L standard
[5], MPEG-4, and H.263++. Also, the existence of I,P, and B
frames in streams which comply with standards like MPEG-2 can be
interpreted as progressive.
From the above definition, it is quite obvious that the most From the above definition, it is quite obvious that the most
important base layer data must be protected as strongly as possible important base layer data must be protected as strongly as
against packet loss during transmission. However, the protection of possible against packet loss during transmission. However, the
the enhancement layers could be continually lowered, since a loss at protection of the enhancement layers could be continually
this stage has only minor consequences for the reconstruction lowered, since a loss at this stage has only minor consequences
process. Thus, by using a suitable unequal erasure protection for the reconstruction process. Thus, by using a suitable unequal
strategy across a progressive source stream, the overhead due to erasure protection strategy across a progressive media stream,
redundancy spent per (channel-)encoded block is reduced. the overhead due to redundancy is reduced. Furthermore, if
Furthermore, if channel conditions get worse during transmission, channel conditions get worse during transmission, only more and
only more and more enhancement layers are lost, i.e. a graceful more enhancement layers are lost, i.e. a graceful degradation in
degradation in application quality at the receiver is achieved [6]. application quality at the receiver is achieved [6].
Nevertheless, it should be mentioned that the specific structure
Nevertheless, it should be mentioned that the specific structure of of the media stream strongly depends on the actual media codec in
a (source-)encoded bitstream strongly depends on the actual media use and does not always provide suitable mechanisms for transport
codec in use, and the desired syntax which is used for adapting the over data networks, like framing (see also 6.3 ). In order to
output of the codec to a suitable transport level format (see also keep the description of the unequal erasure protection strategy
7.3). In order to keep the description of the unequal erasure in section 5 as general as possible, the final bitstream which
protection strategy in section 6 as general as possible, the final has to be protected by the proposed UXP scheme will be called
bitstream which has to be protected by the proposed UXP scheme will "info stream" in the following. Furthermore, it is assumed that
be called "info stream" in the following. Furthermore, it is assumed every info stream is already octet-aligned according to the
that every info stream is already octet-aligned according to the
standard procedures defined in the context of the used syntax standard procedures defined in the context of the used syntax
specifications. specifications.
6. General Structure of UXP schemes 5. General Structure of UXP schemes
In this section, the principle features of the proposed UXP scheme In this section, the principle features of the proposed UXP
are described with a special focus on the protection and scheme are described with a special focus on the protection and
reconstruction procedure which is applied to the info stream. In reconstruction procedure which is applied to the info stream. In
addition, the behavior of the sender and receiver is specified as addition, the behavior of the sender and receiver is specified as
far as it concerns the reconstruction of the info stream. However, far as it concerns the reconstruction of the info stream.
the complete UXP payload structure, including the additional UXP However, the complete UXP payload structure, including the
header, is described in section 7. additional UXP header, is described in section 6.
Fig. 1 already illustrated the structure of a systematic codeword, The reason for using the term "info stream" as well as the
which shall be represented by a single row and n successive columns details of the construction are described in Section 6.3 . For
that contain the information and the parity bytes. This structure now, we assume that we have an info stream which has to be
shall now be extended by forming a transmission block (TB) protected.
consisting of L codewords of length n bytes each, which amounts to a
total of L rows and n columns [7]: Each column, together with the
respective UXP header in front, shall represent the payload of an
RTP packet, i.e. the whole data of a TB is transmitted via a
sequence of n RTP packets all carrying a payload of length (L+2)
bytes (UXP header included).
The value of L should be chosen in such a way that the whole length Fig. 1 already illustrated the structure of a systematic
of the resulting IP packet (i.e. RTP payload plus sum of RTP, UDP, codeword, which shall be represented by a single row with n
and IP header) equals a multiple of the segment size on the wireless successive symbols that contain the information and the parity
link to avoid stuffing at the data link layer. octets. This structure shall now be extended by forming a
transmission block (TB) consisting of L codewords of length n
octets each, which amounts to a total of L rows and n columns
[7]: Each column, together with the respective UXP header in
front, shall represent the payload of an RTP packet, i.e. the
whole data of a TB is transmitted via a sequence of n RTP packets
all carrying a payload of length (L+2) octets (UXP header
included).
Each TB usually consists of two or more horizontal slices, the so- Each TB usually consists of two or more horizontal sub blocks,
called transmission sub blocks (TSB), as can be seen in Fig. 2: The the so-called transmission sub blocks (TSB), as can be seen in
first L_s rows always belong to the signaling TSB, which is used to Fig. : The first L_s rows always belong to the signaling TSB,
convey the actual redundancy profile in the data part to the which is used to convey the actual redundancy profile in the data
receiver (see 7.3). The following L_d=(L-L_s) rows belong to one or part to the receiver (see 6.4.). The following L_d=(L-L_s) rows
more data TSBs, which contain the interleaved and RS encoded info belong to one or more data TSBs, which contain the interleaved
stream, as will be described below. and RS encoded info stream, as will be described below.
Transmission Block (TB) Transmission Block (TB)
/\ +-+-+-+-+-+-+-+-+-+ /\ /\ +-+-+-+-+-+-+-+-+-+ /\
| | signaling TSB | | L_s bytes | | signaling TSB | | L_s octets
| +-+-+-+-+-+-+-+-+-+ \/ | +-+-+-+-+-+-+-+-+-+ \/
| | | /\ /\ | | | /\ /\
| + data TSB #1 + | L_d(1) bytes | | + data TSB #1 + | L_d(1) octets |
| | | | | | | | | |
| +-+-+-+-+-+-+-+-+-+ \/ | | +-+-+-+-+-+-+-+-+-+ \/ |
L bytes | | | /\ | L octets | | | /\ |
payload | + data TSB #2 + | L_d(2) bytes | payload | + data TSB #2 + | L_d(2) octets |
per packet | + | | | L_d bytes per packet | + | | | L_d
octets
| +-+-+-+-+-+-+-+-+-+ \/ | | +-+-+-+-+-+-+-+-+-+ \/ |
| | . | . | | | . | . |
| + . + . | | + . + . |
| | . | . | | | . | . |
| +-+-+-+-+-+-+-+-+-+ /\ | | +-+-+-+-+-+-+-+-+-+ /\ |
| | data TSB #z | | L_d(z) bytes | | | data TSB #z | | L_d(z) octets |
\/ +-+-+-+-+-+-+-+-+-+ \/ \/ \/ +-+-+-+-+-+-+-+-+-+ \/ \/
<-----------------> <----------------->
n packets n packets
Fig. 2: General structure of a TB Fig. 2: General structure of a TB
Since the UXP procedure is mainly applied to the data TSBs, it will Since the UXP procedure is mainly applied to the data TSBs, it
be described next, whereas the content and syntax of the signaling will be described next, whereas the content and syntax of the
TSB will be defined in section 7.3. signaling TSB will be defined in section 6.4.
For means of simplification, only one single data TSB will be For means of simplification, only one single data TSB will be
assumed throughout the following explanation of the encoding and assumed throughout the following explanation of the encoding and
decoding procedure. However, an extension to more than one data TSB decoding procedure. However, an extension to more than one data
per TB is straightforward, and will be shown in section 7.4. TSB per TB is straightforward, and will be shown in section 6.5.
As depicted in Fig. 3, the rows of a transmission sub block shall
As depicted in Fig. 3, the rows of a transmission sub block shall be be partitioned into T+1 different classes EPC_i, where i=0...T,
partitioned into T+1 different classes CA_i, where i=0...T, such such that each class contains exactly R_i=|EPC_i| consecutive
that each class contains exactly A_i=|CA_i| consecutive rows of the rows of the matrix, where the R_i have to satisfy the following
matrix, where the A_i have to satisfy the following relationship: relationship:
A_0+A_1+...+A_T=L_d A_0+A_1+...+A_T=L_d
Data Transmission Sub Block (data TSB) Data Transmission Sub Block (data TSB)
T T
<-------> <------->
/\ +-+-+-+-+-+-+-+-+-+ /\ /\ +-+-+-+-+-+-+-+-+-+ /\
| |&|&|&|&|&|*|*|*|*| | | |&|&|&|&|&|*|*|*|*| |
| +-+-+-+-+-+-+-+-+-+ | A_T=3 | +-+-+-+-+-+-+-+-+-+ | A_T=3
| |&|&|&|&|&|*|*|*|*| | | |&|&|&|&|&|*|*|*|*| |
| +-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+ |
L_d bytes | |&|&|&|&|&|*|*|*|*| \/ L_d octets | |&|&|&|&|&|*|*|*|*| \/
per packet | +-+-+-+-+-+-+-+-+-+ /\ per packet | +-+-+-+-+-+-+-+-+-+ /\
| |%|%|%|%|%|%|*|*|*| | A_(T-1)=1 | |%|%|%|%|%|%|*|*|*| | A_(T-1)=1
| +-+-+-+-+-+-+-+-+-+ \/ | +-+-+-+-+-+-+-+-+-+ \/
| |$|$|$|$|$|$|$|*|*| . | |$|$|$|$|$|$|$|*|*| .
| +-+-+-+-+-+-+-+-+-+ . | +-+-+-+-+-+-+-+-+-+ .
| |!|!|!|!|!|!|!|!|*| . | |!|!|!|!|!|!|!|!|*| .
| +-+-+-+-+-+-+-+-+-+ /\ | +-+-+-+-+-+-+-+-+-+ /\
| |#|#|#|#|#|#|#|#|#| | A_0=1 | |#|#|#|#|#|#|#|#|#| | A_0=1
\/ +-+-+-+-+-+-+-+-+-+ \/ \/ +-+-+-+-+-+-+-+-+-+ \/
<-----------------> <----------------->
n packets n packets
&,%,$,!,# : info octets belonging to a certain info stream in
&,%,$,!,# : info bytes belonging to a certain info stream in
decreasing order of importance decreasing order of importance
* : parity bytes gained from Reed-Solomon coding * : parity octets gained from Reed-Solomon coding
Fig. 3: General structure for coding with unequal erasure
Fig. 3: General structure for coding with unequal erasure protection protection
Furthermore, all rows in a particular class CA_i shall contain
exactly the same number of parity bytes, which is equal to the index
i of the class. For each row in a certain class CA_i, the same (n,n-
i) RS code shall be applied.
As can be observed from Fig. 3, class CA_T contains the largest
number of parity bytes per row, i.e. offers the highest erasure
protection capability in the block. Consequently, the most important
element in the info stream must be assigned to class CA_T, where the
value of T should be chosen according to the desired outage
threshold of the application given a certain packet erasure rate on
the link.
All other classes CA_(T-1)...CA_0 shall be sequentially filled with
the remaining elements of the info stream in decreasing order of
importance, where the optimal choice for the size of each class (0
or more rows), i.e. the structure of the redundancy profile, should
depend on the quality-of-service requirements for the various
(progressively-encoded) layers.
The following set of rules contains a compact description of all the
operations that must be performed for each transmission block:
Furthermore, all rows in a particular class EPC_i shall contain
exactly the same number of parity octets, which is equal to the
index i of the class. For each row in a certain class EPC_i, the
same (n,n-i) RS code shall be applied.
As can be observed from Fig. 3, class EPC_T contains the largest
number of parity octets per row, i.e. offers the highest erasure
protection capability in the block. Consequently, the most
important element in the info stream must be assigned to class
EPC_T, where the value of T should be chosen according to the
desired outage threshold of the application given a certain
packet erasure rate on the link.
All other classes EPC_(T-1)...EPC_0 shall be sequentially filled
with the remaining elements of the info stream in decreasing
order of importance, where the optimal choice for the size of
each class (0 or more rows), i.e. the structure of the redundancy
profile, should depend on the quality-of-service requirements for
the various (progressively-encoded) layers.
The following set of rules contains a compact description of all
the operations that must be performed for each transmission
block:
1.) The total number of columns n of the TB shall be chosen 1.) The total number of columns n of the TB shall be chosen
according to the actual delay constraints of the application. according to the actual delay constraints of the application.
2.) Next, the expected number of rows reserved for the signaling
2.) Next, the expected number of rows reserved for the signaling TSB TSB has to selected, which limits the data TSB to L_d=(L-L_s)
has to selected, which limits the data TSB to L_d=(L-L_s) rows. rows.
3.) The maximum erasure correction capability T in the data TSB 3.) The maximum erasure correction capability T in the data TSB
should be chosen according to the desired outage threshold of the should be chosen according to the desired outage threshold of the
application given the actual packet erasure rate on the link. application given the actual packet erasure rate on the link.
4.) The redundancy profile for the rest of the data TSB should 4.) The redundancy profile for the rest of the data TSB should
depend on the size and number of the various layers in the info depend on the size and number of the various layers in the info
stream, as well as the desired probability of successful decoding stream, as well as the desired probability of successful decoding
for each of them (quality-of-service requirement). for each of them (quality-of-service requirement).
5.) Any suitable optimization algorithm may be used for deriving
5.) Any suitable optimization algorithm may be used for deriving an an adequate redundancy profile. However, the result has to
adequate redundancy profile. However, the result has to satisfy the satisfy the following constraints:
following constraints: a) All available info octet positions in the data TSB have to be
a) All available info byte positions in the data TSB have to be
completely filled. If the info stream is too short for a desired completely filled. If the info stream is too short for a desired
profile, media stuffing may be applied to the empty info byte profile, media stuffing may be applied to the empty info octet
positions at the end of the data TSB by appending a sufficient positions at the end of the data TSB by appending a sufficient
number of bytes (with arbitrary value, e.g. 0x00). The actual number number of octets (with arbitrary value, e.g. 0x00). The actual
of stuffing symbols per data TSB is then signaled via the respective number of stuffing symbols per data TSB is then signaled via the
stuffing indicator (see 7.3). However, before resorting to any respective stuffing indicator (see 6.4.). However, before
stuffing, it should be checked whether it is possible to strengthen resorting to any stuffing, it should be checked whether it is
the protection of certain rows instead, thus improving the overall possible to strengthen the protection of certain rows instead,
robustness of the decoding process. thus improving the overall robustness of the decoding process.
b) The info stream should be fully contained within the data TSB b) The info stream should be fully contained within the data TSB
(unless cutting it off at a specific point is explicitly allowed by (unless cutting it off at a specific point is explicitly allowed
the properties of the used media codec). by the properties of the used media codec).
c) The number of required descriptors and stuffing indicators (see c) The number of required descriptors and stuffing indicators
section 7.3) to signal the profile shall not exceed the space (see section 6.4.) to signal the profile shall not exceed the
initially reserved for them in the signaling TSB. space initially reserved for them in the signaling TSB.
Constraints a) and b) should be already incorporated in the Constraints a) and b) should be already incorporated in the
optimization algorithm. However, if constraint c) is not met, the optimization algorithm. However, if constraint c) is not met, the
data TSB has to be reduced by one row in favor of the signaling TSB data TSB has to be reduced by one row in favor of the signaling
to accomodate more space for the descriptors and stuffing TSB to accommodate more space for the descriptors and stuffing
indicators, i.e. steps 2-5 have to be repeated until a valid indicators, i.e. steps 2-5 have to be repeated until a valid
redundancy profile has been obtained. redundancy profile has been obtained.
6.) For each nonempty class EPC_i, i=T...0, in the data TSB, the
6.) For each nonempty class CA_i, i=T...0, in the data TSB, the
following steps have to be performed: following steps have to be performed:
a) All rows of this specific class shall be filled from left to a) All rows of this specific class shall be filled from left to
right and top to bottom with data bytes of the info stream in right and top to bottom with data octets of the info stream in
decreasing order of importance (i.e. starting with the most decreasing order of importance (i.e. starting with the most
important element). important element).
b) For each row in the class, the required i parity-check bytes are b) For each row in the class, the required i parity-check octets
computed from the same set of codewords of an (n,n-i) RS code, and are computed from the same set of codewords of an (n,n-i) RS
filled in the empty positions at the end of each row. Thus, every code, and filled in the empty positions at the end of each row.
row in the class constitutes a valid codeword of the chosen RS code. Thus, every row in the class constitutes a valid codeword of the
chosen RS code.
7.) After having filled the whole data TSB with information and 7.) After having filled the whole data TSB with information and
parity bytes, the redundancy profile is mapped to the signaling TSB parity octets, the redundancy profile is mapped to the signaling
as described in section 7.3. TSB as described in section 6.4.
8.) Each column of the resulting TB is now read out octet-wise
8.) Each column of the resulting TB is now read out byte-wise from from top to bottom and, together with the respective UXP header
top to bottom and, together with the respective UXP header (see (see section 6.2.) in front, is mapped onto the payload section
section 7.2) in front, is mapped onto the payload section of one and of one and only one RTP packet.
only one RTP packet.
9.) The n resulting RTP packets shall be transmitted subsequently to
the remote host, starting with the leftmost one.
9.) The n resulting RTP packets shall be transmitted subsequently
to the remote host, starting with the leftmost one.
10.) At the corresponding protocol entity at the remote host, the 10.) At the corresponding protocol entity at the remote host, the
payload (without the UXP header) of all successfully received RTP payload (without the UXP header) of all successfully received RTP
packets belonging to the same sending TB shall be filled into a packets belonging to the same sending TB shall be filled into a
similar receiving TB column-wise from top to bottom and left to similar receiving TB column-wise from top to bottom and left to
right. right.
11.) For every erased packet of a received TB, the respective
11.) For every erased packet of a received TB, the respective column column in the TB shall be filled with a suitable erasure marker.
in the TB shall be filled with a suitable erasure marker.
12.) Before any other operations can be performed, the redundancy 12.) Before any other operations can be performed, the redundancy
profile has to be restored from the signaling TSB according to the profile has to be restored from the signaling TSB according to
procedure defined in section 7.3. If the attempt fails because of the procedure defined in section 6.4.. If the attempt fails
too many lost packets, the whole TB shall be discarded and the because of too many lost packets, the whole TB shall be discarded
receiving entity should wait for the next incoming TB (the source and the receiving entity should wait for the next incoming TB.
decoder may be informed about the missing info stream, if required).
13.) If the attempt to recover the redundancy profile has been 13.) If the attempt to recover the redundancy profile has been
successful, a decoding operation shall be performed for each row of successful, a decoding operation shall be performed for each row
the data TSB by applying any suitable algorithm for erasure of the data TSB by applying any suitable algorithm for erasure
decoding. decoding.
14.) For all rows of the data TSB for which the decoding
operation has been successful, the reconstructed data octets are
read out from left to right and top to bottom, and appended to
the reconstructed version of the info stream.
14.) For all rows of the data TSB for which the decoding operation One can easily realize that the above rules describe an
has been successful, the reconstructed data bytes are read out from interleaver, i.e. at the sender a single codeword of a TB is
left to right and top to bottom, and appended to the reconstructed spread out over n successive packets. Thus, each codeword of a
version of the info stream. transmitted TB experiences the same number of erasures at exactly
the same positions.
15.) For all rows of the data TSB for which the decoding operation
has failed, a sufficient number of suitable dummy symbols may be
added to the reconstructed info stream to inform the source decoder
about the missing symbols.
One can easily realize that the above rules describe an interleaver,
i.e. at the sender a single codeword of a TB is spread out over n
successive packets. Thus, each codeword of a transmitted TB
experiences the same number of erasures at exactly the same
positions.
Two important conclusions can be drawn from this: Two important conclusions can be drawn from this:
a) Since the same RS code is applied to all rows contained in a a) Since the same RS code is applied to all rows contained in a
specific class, either all of them can be correctly decoded or not. specific class, either all of them can be correctly decoded or
Hence, there exist no partly decodable classes at the receiver. not. Hence, there exist no partly decodable classes at the
b) If decoding is successful for a certain class CA_i, all the receiver.
classes CA_(i+1)...CA_T can also be decoded, since they are b) If decoding is successful for a certain class EPC_i, all the
protected by at least one more parity byte per row. Together with classes EPC_(i+1)...EPC_T can also be decoded, since they are
rule 6, it is therefore always ensured, that in case a decodable protected by at least one more parity octet per row. Together
enhancement layer exists, all other layers it depends on can also be with rule 6, it is therefore always ensured, that in case a
reconstructed! decodable enhancement layer exists, all other layers it depends
on can also be reconstructed!
Given the maximum erasure protection value T, the redundancy profile
for a data TSB of size (L_d x n) shall be denoted by a so-called
erasure protection vector AV of length (T+1), where
AV:=(A_0,A_1,...,A_(T-1),A_T) Given the maximum erasure protection value T, the redundancy
profile for a data TSB of size (L_d x n) shall be denoted by a
so-called erasure protection vector EPV of length (T+1), where
EPV:=(A_0,A_1,...,A_(T-1),A_T)
From the above definition, it is easy to realize that the trivial From the above definition, it is easy to realize that the trivial
cases of no erasure protection and EXP are a subset of UXP: cases of no erasure protection and EXP are a subset of UXP:
a) no erasure protection at all: all application data is mapped onto a) no erasure protection at all: all application data is mapped
class CA_0, i.e. AV=(L_d,0,0,...,0). onto
b) EXP: all application data is mapped onto class CA_T, i.e. class EPC_0, i.e. EPV=(L_d,0,0,...,0).
AV=(0,0,...,0,A_T=L_d).
b) EXP: all application data is mapped onto class EPC_T, i.e.
EPV=(0,0,...,0,A_T=L_d).
Hence, backward compatibility to currently standardized non- Hence, backward compatibility to currently standardized non-
progressive multimedia codecs is definitely achieved. progressive multimedia codecs is definitely achieved.
7. RTP payload structure 6. RTP payload structure
For every packet whose payload is formed by reading out a column of
the TB, the RTP header must be followed by an UXP header.
7.1. Specific settings in the RTP header
The timestamp of each RTP packet resulting from reading out a TB is This section is organized as follows. First, the specific
set to the time instant when the first byte of the progressive settings in the RTP header is shown. Next, the RTP payload header
source data stream has been written into the TB. This results in the for UXP (the so-called UXP header) is specified. After that, the
TS value being the same for all RTP packets belonging to a specific structure of the bitstream which is protected by UXP, the so-
TB. called info stream, is discussed. Finally, the in-band signaling
of the erasure protection vector is introduced
The payload type is of dynamic type, and obtained through out-of- For every packet, the UXP payload is formed by reading out a
band signaling similar to [1]. The signaling protocol must establish column of the TB and prefixing it with the UXP header. Thus, an
a payload length to be associated with the payload type value. End UXP-compliant RTP packet looks as follows:
systems, which cannot recognize a payload type, must discard it.
The marker bit is set to 1 for every last packet in a TB. Otherwise, +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
its value is 0. |RTP Header| UXP Header| one column of the TB |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
All other fields in the RTP header are set to those values proposed 6.1 Specific settings in the RTP header
for regular multimedia transmission using the same source codecs,
but no erasure protection scheme enabled.
The RTP payload shall consist of the UXP header followed by one The timestamp of each RTP packet is set to the sampling time of
column of the TB. the first octet of the progressive media stream in the
corresponding TB. If several data TSBs are included in one TB,
the sampling time of data TSB #1 is relevant. This results in the
TS value being the same for all RTP packets belonging to a
specific TB.
The payload type is of dynamic type, and obtained through out-of-
band signaling similar to [1]. End systems, which cannot
recognize a payload type, must discard it.
The marker bit is set to 1 for every last packet in a TB.
Otherwise, its value is 0.
All other fields in the RTP header are set to those values
proposed for regular multimedia transmission using the RTP-format
of the media stream which is protected by UXP.
7.2. Structure of the UXP header 6.2. Structure of the UXP header
The UXP header shall consist of 2 octets, and is shown in Fig. 4: The UXP header shall consist of 2 octets, and is shown in Fig. 4:
0 1 1 1 1 1 1 0 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|X| block PT | block length n| |X| block PT | block length n|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fig. 4: Proposed UXP header Fig. 4: Proposed UXP header
skipping to change at page 14, line 20 skipping to change at page 14, line 16
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|X| block PT | block length n| |X| block PT | block length n|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fig. 4: Proposed UXP header Fig. 4: Proposed UXP header
The fields in the header shall be defined as follows: The fields in the header shall be defined as follows:
- X (bit 0): extension bit, reserved for future enhancements, - X (bit 0): extension bit, reserved for future enhancements,
currently not in use -> default value: 0 currently not in use -> default value: 0
- block PT (bits 1-7): regular RTP payload type to indicate the - block PT (bits 1-7): regular RTP payload type to indicate the
media type contained in the info stream media type contained in the info stream
- block length n (bits 8-15): indicates total number of RTP
- block length n (bits 8-15): indicates total number of RTP packets - packets
resulting from one TB (which equals resulting from one TB (which equals
the number of columns of the TB) the number of columns of the TB)
The syntax of the info stream which is protected by UXP is specified The syntax of the info stream which is protected by UXP is
by the RTP payload type field contained in the UXP header. For specified by the RTP payload type field contained in the UXP
example, payload type H.263 means that the info stream conforms to header. The details of the info stream are described in Sec. 6.3
the specifications of the RTP profile for H.263, but does not For example, payload type H.263 means that the info stream
represent the "raw" H.263 stream produced by a H.263 encoder. conforms to the specifications of the RTP profile for H.263 and
However, UXP can also be applied to the raw output of the media does not represent the "raw" H.263 media stream produced by an
codec (in case it is already octet-aligned), if this can be signaled H.263 encoder.
to the receiver via other means, e.g. by use of H.245 or SDP. However, UXP can also be applied to the "raw" media stream (in
case it is already octet-aligned), if this can be signaled to the
receiver via other means, e.g. by use of H.245 or SDP.
Based on the RTP sequence number, the marker bit, and the
repetition of the block length n in each UXP header, the
receiving entity is able to recognize both TB boundaries and the
actual position of lost packets in the TB.
Based on the RTP sequence number, the marker bit, and the repetition 6.3 Framing and Timing Mechanism in UXP: The info stream.
of the block length n in each UXP header, the receiving entity is
able to recognize both TB boundaries and the actual position of lost
packets in the TB. Furthermore, the specific choice of equal TS
values for all RTP packets belonging to a TB allows for overcoming
possible sequence number overflow.
7.3. In-band signaling of the structure of the redundancy profile As described in section 5, UXP creates its own packetization
scheme by interleaving. The regular framing and timing structure
of RTP is therefore destroyed. This section describes which kind
of problems arise with interleaving and how they can be solved.
This finally leads to the specification of the info stream.
The timestamp of an RTP packet usually describes the sampling
time of the first octet included in the RTP data packet. This is
in principle also true for UXP RTP packets. According to the time
stamp definition in 6.1 every packet contains the timestamp of
the sampling time of the first octet in the corresponding TB.
Therefore, all packets which belong to one TB contain the same
timestamp. This can lead to problems since due to the theoretical
size limit of a TB, it can contain data from different sampling
time instances, e.g. several video frames. Then the timing
information of the later frames has to be determined from the
media stream itself and not from the RTP timestamp.
A second problem arising with interleaving is that the framing
mechanism of RTP is not supported. Consider a media encoder,
which does not create a fully decodable bitstream, e.g. H.26L
with the video coding layer (VCL) and network adaptation layer
(NAL) concept [9]. In this concept the VCL creates slices which
are NAL prepared for transmission over several networks at the
NAL. Consequently, in case of RTP transmission, header
information which allows to decode the slices is included only in
the RTP packets. Thus, to fill an UXP TB with the "raw" media
stream from the VCL can lead, even without packet losses, to a
non-decodable stream.
The framing problem can be solved in two ways:
One solution could be to use the RTP payload specification of the
media stream to create a bitstream with an appropriate framing,
the so-called info stream. For example, to create an H.263 info
stream, the following steps are necessary:
1.) Generate an H.263-compliant media stream, i.e. take a slice
or a video frame directly from the H.263 encoder.
2.) Apply the H.263 payload specification (e.g. RFC 2429) to
create the RTP payload for only one packet.
3.) Insert the latter row by row into one data TSB.
It is possible to apply the procedure mentioned above several
times for different data TSBs (see 6.5.). Due to the in-band
signaling, it is possible to determine the beginning and end of
every TSB without parsing the whole TB. This allows a fast
decomposition of the TB into the different TSB.
To enable a dynamic adaptation to varying link conditions, the Another solution of the framing problem would be to relay on the
actual redundancy profile used in the data TSB must be signaled to framing mechanism of the media stream. This is, for example,
the receiving entity. Since out-of-band signaling either results in possible for media streams which contain start codes.
excessive additional control traffic, or prevents quick changes of The timing problem can be solved in two ways.
the profile between successive TBs, an in-band signaling procedure One solution is to comply with the RTP payload specification of
is desired. the media stream. If the specification allows to put into one
packet octets which belong to different sampling times, this
should also be allowed for a TB.
The second solution for the timing problem is to rely on the
timing information contained in the media stream itself, if
available.
Therefore, there are two different modes for framing:
1.) RTP payload framing (if an RTP payload specification exists
for the media stream),
2.) pure media stream framing (if framing is contained in the
media stream),
As without knowledge of the correct redundancy profile, the decoding and two different modes for timing:
process cannot be applied to any of the erasure protection classes, 1.) timing rules of the RTP payload specification for the media
it has to be protected at least as strongly as the most important stream,
element in the info stream against packet loss. Therefore, an 2.) timing information within the media stream.
additional class CA_P is used in the signaling TSB, where the number
of parity symbols is by default set to the following value:
P=ceil(n/2) All combinations of timing and framing modes are possible, but
framing mode 1 and timing mode 1 represent the default mode of
operation for UXP. The use of other timing and framing modes has
to be signaled by non RTP means.
The info stream is thus defined by the media stream together with
framing and timing rules.
In the following, some examples will be given:
1.) The info stream for MPEG-4 video according to RFC 3016 is
the pure MPEG-4 compliant media stream, since RFC 3016
specifies (in case of video) to take the MPEG-4 compliant
video stream as payload.
2.) The info stream for H.263+ can be created according to RFC
2429 as follows:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
|H.263+ payload| H.263+ compliant stream (possibly changed with|
|header | respect to RFC 2429) containing a slice/frame |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
This info stream is inserted into one single data TSB.
If necessary, for example, if the slices are too short to achieve
a reasonable TB size, several info streams can be inserted in one
TB by concatenating several data TSBs to one TB (see 6.5.).
6.4. In-band signaling of the structure of the redundancy profile
To enable a dynamic adaptation to varying link conditions, the
actual redundancy profile used in the data TSB as well as the
beginning and end of a TSB must be signaled to the receiving
entity. Since out-of-band signaling either results in excessive
additional control traffic, or prevents quick changes of the
profile between successive TBs, an in-band signaling procedure is
desired.
As without knowledge of the correct redundancy profile, the
decoding process cannot be applied to any of the erasure
protection classes, it has to be protected at least as strongly
as the most important element in the info stream. Therefore, an
additional class EPC_P is used in the signaling TSB, where the
number of parity symbols is by default set to the following
value:
P=ceil(n/2)
Hence, up to 50% of the RTP packets can be lost, before the Hence, up to 50% of the RTP packets can be lost, before the
redundancy profile cannot be recovered anymore. This seems to be a redundancy profile cannot be recovered anymore. This seems to be
reasonable value for the lowest point of operation over a lossy a reasonable value for the lowest point of operation over a lossy
link. Alternatively, p may be explicitly signaled during session link. Alternatively, P may be explicitly signaled during session
setup by means of SDP or H.245 protocol. setup by means of SDP or H.245 protocol.
Consequently, since all other classes must have equal or less Consequently, since all other classes must have equal or less
erasure protection capability, the maximum allowable value for class erasure protection capability, the maximum allowable value for
CA_T in the data TSB is now limited to T<=P. class EPC_T in the data TSB is now limited to T<=P.
The signaling of the erasure protection vector is accomplished by The signaling of the erasure protection vector is accomplished by
means of descriptors. For each class CA_i with A_i>0, there is a means of descriptors. For each class EPC_i with R_i>0, there is a
descriptor DP_i providing information about the size of class CA_i descriptor DP_i providing information about the size of class
(i.e. the value of A_i) and establishing a relationship between the EPC_i (i.e. the value of R_i) and establishing a relationship
erasure protection of class CA_i and that of the first preceding between the erasure protection of class EPC_i and that of the
class CA_(i+j) with A_(i+j)>0, where j>0. A descriptor DP_i is first preceding class EPC_(i+j) with A_(i+j)>0, where j>0. A
mapped onto one byte, which is sub-divided into two half-bytes (i.e. descriptor DP_i is mapped onto one octet, which is sub-divided
the higher and the lower four bits). The first half-byte is of type into two half-octets (i.e. the higher and the lower four bits).
unsigned and contains the 4-bit representation of the decimal value The first half-octet is of type unsigned and contains the 4-bit
A_i. The second half-byte is of type signed and contains the representation of the decimal value R_i. The second half-octet is
difference in erasure protection between class CA_i and class of type signed and contains the difference in erasure protection
CA_(i+j), i.e. the signed 4-bit representation of the decimal value between class EPC_i and class EPC_(i+j), i.e. the signed 4-bit
(-j) (where the MSB denotes the sign, and the lower three bits the representation of the decimal value (-j) (where the MSB denotes
absolute value). Note that the erasure protection p of class CA_p is the sign, and the lower three bits the absolute value). Note that
fixed, whereas the size A_p may vary. the erasure protection P of class EPC_p is fixed, whereas the
size A_P may vary.
Thus, the data to be filled into class CA_p shall consist of a Thus, the data to be filled into class EPC_P shall consist of a
sequence of descriptors separated by stuffing indicators (see sequence of descriptors separated by stuffing indicators (see
below), where the number of descriptors is primarily given by the below), where the number of descriptors is primarily given by the
number of protection classes CA_i, 0<=i<=T, in the data TSB with number of protection classes EPC_i, 0<=i<=T, in the data TSB with
A_i>0. R_i>0.
Without a-priori knowledge, the initial value for the size of the Without a-priori knowledge, the initial value for the size of the
signaling TSB should be set to one (row). When the number of signaling TSB should be set to one (row). When the number of
necessary descriptors and stuffing indicators exceeds the (n-p) necessary descriptors and stuffing indicators exceeds the (n-P)
information positions, one or more additional rows have to be information positions, one or more additional rows have to be
reserved. This is usually done by increasing the value for L_s to reserved. This is usually done by increasing the value for L_s to
A_p>1, i.e. the data TSB is reduced to (L-A_p) rows. Hence, in order A_P>1, i.e. the data TSB is reduced to (L-A_P) rows. Hence, in
to indicate the actual size of the signaling TSB, an additional order to indicate the actual size of the signaling TSB, an
descriptor is inserted at the very beginning, which takes on the additional descriptor is inserted at the very beginning, which
value 0xq0, where q denotes the (octal) four bit representation of takes on the value 0xq0, where q denotes the (octal) four bit
the decimal value A_p. representation of the decimal value A_P.
Furthermore, the end of each data TSB is signaled by the
Furthermore, the end of each data TSB is signaled by the otherwise otherwise unused descriptor value 0x00, followed by exactly one
unused descriptor value 0x00, followed by exactly one stuffing stuffing indicator (SI). The latter is mapped onto an octet,
indicator (SI). The latter is mapped onto a byte, which is of type which is of type unsigned and contains the 8-bit representation
unsigned and contains the 8-bit representation of the decimal value of the decimal value of the number of media stuffing symbols used
of the number of media stuffing symbols used at the end of the at the end of the respective data TSB.
respective data TSB.
The (extended) sequence of descriptors and stuffing indicators is The (extended) sequence of descriptors and stuffing indicators is
then mapped to the info byte positions in the A_p rows of the then mapped to the octet positions in the A_P rows of the
signaling TSB from left to right and top to bottom. Each row is then signaling TSB from left to right and top to bottom. Each row is
encoded with the same (n,n-p) RS code. then encoded with the same (n,n-P) RS code.
If the number of descriptors and stuffing indicators is less than If the number of descriptors and stuffing indicators is less than
the available info byte positions, however, empty positions in class the available octet positions, however, empty positions in class
CA_p may be filled up with the otherwise unused descriptor 0x00. EPC_P may be filled up with the otherwise unused descriptor 0x00.
At the receiving entity, the sequence of descriptors shall be At the receiving entity, the sequence of descriptors shall be
recovered by performing erasure decoding on the first row of the TB recovered by performing erasure decoding on the first row of the
(which definitely belongs to the signaling TSB) using the same TB (which definitely belongs to the signaling TSB) using the same
algorithm as later for the data TSB. If successful, the very first algorithm as later for the data TSB. If successful, the very
descriptor now indicates the number of rows of the signaling TSB, first descriptor now indicates the number of rows of the
and the next (A_p-1) rows are decoded to reconstruct the redundancy signaling TSB, and the next (A_P-1) rows are decoded to
profile for the data TSB(s), together with the number of media reconstruct the redundancy profile for the data TSB(s), together
stuffing symbols denoted by the respective SI(s). with the number of media stuffing symbols denoted by the
respective SI(s).
The complete structure of the TB is now depicted in Fig. 5. The complete structure of the TB is now depicted in Fig. 5.
Transmission Block (TB) Transmission Block (TB)
P P
<---------> <--------->
/\ +-+-+-+-+-+-+-+-+-+ /\ /\ +-+-+-+-+-+-+-+-+-+ /\
| |?|?|?|?|*|*|*|*|*| | A_P=1 | |?|?|?|?|*|*|*|*|*| | A_P=1
| +-+-+-+-+-+-+-+-+-+ \/ | +-+-+-+-+-+-+-+-+-+ \/
| |&|&|&|&|&|*|*|*|*| /\ | |&|&|&|&|&|*|*|*|*| /\
| +-+-+-+-+-+-+-+-+-+ | A_T=3 | +-+-+-+-+-+-+-+-+-+ | A_T=3
| |&|&|&|&|&|*|*|*|*| | | |&|&|&|&|&|*|*|*|*| |
| +-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+ |
L bytes | |&|&|&|&|&|*|*|*|*| \/ L octets | |&|&|&|&|&|*|*|*|*| \/
payload | +-+-+-+-+-+-+-+-+-+ /\ payload | +-+-+-+-+-+-+-+-+-+ /\
per packet | +%|%|%|%|%|%|*|*|*| | A_(T-1)=1 per packet | |%|%|%|%|%|%|*|*|*| | A_(T-1)=1
| +-+-+-+-+-+-+-+-+-+ \/ | +-+-+-+-+-+-+-+-+-+ \/
| |$|$|$|$|$|$|$|*|*| . | |$|$|$|$|$|$|$|*|*| .
| +-+-+-+-+-+-+-+-+-+ . | +-+-+-+-+-+-+-+-+-+ .
| |!|!|!|!|!|!|!|!|*| . | |!|!|!|!|!|!|!|!|*| .
| +-+-+-+-+-+-+-+-+-+ /\ | +-+-+-+-+-+-+-+-+-+ /\
| |#|#|#|#|#|#|#|#|#| | A_0=1 | |#|#|#|#|#|#|#|#|#| | A_0=1
\/ +-+-+-+-+-+-+-+-+-+ \/ \/ +-+-+-+-+-+-+-+-+-+ \/
<-----------------> <----------------->
n packets n packets
? : descriptors and stuffing indicators for in-band ? : descriptors and stuffing indicators for in-band
signaling of the redundancy profile signaling of the redundancy profile
&,%,$,!,# : info bytes belonging to a certain element of the &,%,$,!,# : info octets belonging to a certain element of the
info stream in decreasing order of importance info stream in decreasing order of importance
* : parity bytes gained from Reed-Solomon coding * : parity octets gained from Reed-Solomon coding
Fig. 5: General structure for UXP with in-band signaling of the Fig. 5: General structure for UXP with in-band signaling of the
redundancy profile redundancy profile
The following simple example is meant to illustrate the idea
The following simple example is meant to illustrate the idea behind behind using descriptors: Let an erasure protection vector of
using descriptors: Let an erasure protection vector of length T+1=7 length T+1=7 be given as follows:
be given as follows: EPV=(A_0,A_1,...,A_5,A_6)=(7,0,2,2,0,3,10)
AV=(A_0,A_1,...,A_5,A_6)=(7,0,2,2,0,3,10) Hence, the length L of the TB (including one row for the
Hence, the length L of the TB (including one row for the signaling signaling TSB) is equal to 7+2+2+3+10+1=25 (rows/octets). If the
TSB) is equal to 7+2+2+3+10+1=25 (rows/bytes). If the width is width is assumed to be equal to 20 (columns/packets), then the
assumed to be equal to 20 (columns/packets), then the erasure erasure protection of the descriptors is P=10.
protection of the descriptors is p=10.
The corresponding sequence of descriptors can be written as The corresponding sequence of descriptors can be written as
DP=(DP_6,DP_5,DP_3,DP_2,DP_0)=(0xAC,0x39,0x2A,0x29,0x7A), DP=(DP_6,DP_5,DP_3,DP_2,DP_0)=(0xAC,0x39,0x2A,0x29,0x7A),
where the values of the descriptors are given in hexadecimal where the values of the descriptors are given in hexadecimal
notation. Next, the descriptor indicating the length of the notation. Next, the descriptor indicating the length of the
signaling TSB has to be inserted, the end of the data TSB has to be signaling TSB has to be inserted, the end of the data TSB has to
marked by 0x00, and the SI has to be appended. If the number of be marked by 0x00, and the SI has to be appended. If the number
media stuffing symbols is assumed to be 3, the 10 info bytes in the of media stuffing symbols is assumed to be 3, the 10 info octets
signaling TSB take on the following values (descriptor stuffing in the signaling TSB take on the following values (descriptor
included): stuffing included):
(0x10,0xAC,0x39,0x2A,0x29,0x7A,0x00,0x03,0x00,0x00) (0x10,0xAC,0x39,0x2A,0x29,0x7A,0x00,0x03,0x00,0x00)
7.4 Optional Concatenation of Transmission Sub Blocks: 6.5. Optional Concatenation of Transmission Sub Blocks:
The following procedure may be applied if a single info stream would The following procedure may be applied if a single info stream
be too short to achieve an efficient mapping to a transmission block would be too short to achieve an efficient mapping to a
with respect to the fixed payload length L and the desired number of transmission block with respect to the fixed payload length L and
packets n. For example, intra-coded video frames (I-frames) are the desired number of packets n. For example, intra-coded video
usually much larger than the following predicted ones (P-frames). In frames (I-frames) are usually much larger than the following
this case, a certain number z of successive small info streams predicted ones (P-frames). In this case, a certain number z of
should be each mapped to a transmission sub block with length L_d(y) successive small info streams should be each mapped to a
and width n, such that L_d(1)+L_d(2)+?+L_d(z)=L_d. transmission sub block with length L_d(y) and width n, such that
L_d(1)+L_d(2)+...+L_d(z)=L_d.
The resulting transmission sub blocks can then be easily The resulting transmission sub blocks can then be easily
concatenated to form a TB of size L x n having one common signaling concatenated to form a TB of size L x n having one common
TSB: Since the second half-byte of the descriptors is of type signaling TSB: Since the second half-octet of the descriptors is
signed, we are able to incorporate both decreasing and increasing of type signed, we are able to incorporate both decreasing and
erasure protection profiles within one single signaling TSB. increasing erasure protection profiles within one single
Note that once the lengths L_d(y) of the individual blocks have been signaling TSB.
fixed, the respective redundancy profiles can be determined Note that once the lengths L_d(y) of the individual blocks have
independently of each other. However, the space initially reserved been fixed, the respective redundancy profiles can be determined
for the signaling TSB should be already large enough to avoid independently of each other. However, the space initially
profile recalculation for each of the data TSBs in case the sequence reserved for the signaling TSB should be already large enough to
of descriptors gets too long! avoid profile recalculation for each of the data TSBs in case the
sequence of descriptors gets too long!
Again, we will give a simple example to illustrate this idea: Let Again, we will give a simple example to illustrate this idea: Let
the erasure protection vectors for two concatenated data TSBs be the erasure protection vectors for two concatenated data TSBs be
given as follows: given as follows:
EPV1=(A1_0,A1_1,...,A1_5,A1_6)=(0,0,2,2,0,3,10),
AV1=(A1_0,A1_1,...,A1_5,A1_6)=(0,0,2,2,0,3,10), EPV2=(A2_0,A2_1,...,A2_5,A2_6)=(0,0,2,2,0,3,10).
AV2=(A2_0,A2_1,...,A2_5,A2_6)=(0,0,2,2,0,3,10). Hence, two single identical data TSBs will be concatenated to
form a TB of length L=2*(2+2+3+10)+2=36 (rows/octets). If the
Hence, two single identical data TSBs will be concatenated to form a width is again assumed to be equal to 20 (columns/packets), then
TB of length L=2*(2+2+3+10)+2=36 (rows/bytes). If the width is again the erasure protection of the descriptors is P=10, and therefore
assumed to be equal to 20 (columns/packets), then the erasure a total of two rows for the signaling TSB have been reserved this
protection of the descriptors is p=10, and therefore a total of two time. The corresponding sequence of descriptors can now be
rows for the signaling TSB have been reserved this time. The written as DP=(0xAC,0x39,0x2A,0x29,0xA4,0x39,0x2A,0x29), where
corresponding sequence of descriptors can now be written as the values of the descriptors are given in hexadecimal notation.
DP=(0xAC,0x39,0x2A,0x29,0xA4,0x39,0x2A,0x29), where the values of If the number of media stuffing symbols is assumed to be 3 for
the descriptors are given in hexadecimal notation. If the number of each data TSB, the 20 info octet positions in the signaling TSB
media stuffing symbols is assumed to be 3 for each data TSB, the 20 are filled with the following values (descriptor stuffing
info byte positions in the signaling TSB are filled with the included):
following values (descriptor stuffing included): (0x20,0xAC,0x39,0x2A,0x29,0x00,0x03,0xA4,0x39,0x2A,0x29,0x00,0x03
,
(0x20,0xAC,0x39,0x2A,0x29,0x00,0x03,0xA4,0x39,0x2A,0x29,0x00,0x03,
0x00,0x00,0x00,0x00,0x00,0x00,0x00) 0x00,0x00,0x00,0x00,0x00,0x00,0x00)
8. Security Considerations 8. Security Considerations
The payload of the RTP-packets consists of an interleaved media
The payload of the RTP-packets consists of an interleaved multimedia
and parity stream. Therefore, it is reasonable to encrypt the and parity stream. Therefore, it is reasonable to encrypt the
resulting stream with one key rather than using different keys for resulting stream with one key rather than using different keys
multimedia and parity data. It should also be noted that encryption for media and parity data. It should also be noted that
of the multimedia data without encryption of the parity data could encryption of the media data without encryption of the parity
enable known-plaintext attacks. data could enable known-plaintext attacks.
The overall proportion between parity octets and info octets
The overall proportion between parity bytes and info bytes should be should be chosen carefully if the packet loss is due to network
chosen carefully if the packet loss is due to network congestion. If congestion. If the proportion of parity octets per TB is
the proportion of parity bytes per TB is increased in this case, it increased in this case, it could lead to increasing network
could lead to increasing network congestion. Therefore, the congestion. Therefore, the proportion between parity octets and
proportion between parity bytes and info bytes per TB MUST NOT be info octets per TB MUST NOT be increased as packet loss increases
increased as packet loss increases due to network congestion. due to network congestion.
The overall ratio between parity and info octets MUST NOT be
The overall ratio between parity and info bytes MUST NOT be higher higher than 1:1, i.e. the absolute bitrate spent for redundancy
than 1:1, i.e. the absolute bitrate spent for redundancy must not be must not be larger than the bitrate required for transmission of
larger than the bitrate required for transmission of multimedia data multimedia data itself.
itself.
9. Application Statement 9. Application Statement
There are currently two different schemes proposed for unequal
There are currently two different schemes proposed for unequal error error protection in the IETF-AVT: Unequal Level Protection (ULP)
protection in the IETF-AVT: Unequal Level Protection (ULP) and and Unequal Erasure Protection (UXP).
Unequal Erasure Protection (UXP). Although both methods seem to address the same problem, the
Although both methods seem to address the same problem, the proposed proposed solutions differ in many respects. This section tries to
solutions differ in many respects. This section tries to describe describe possible application scenarios and to show the strength
possible application scenarios and to show the strength and and weaknesses of both approaches.
weaknesses of both approaches.
The main difference between both approaches is that while ULP The main difference between both approaches is that while ULP
preserves the structure of the packets which have to protected and preserves the structure of the packets which have to be protected
provides the redundancy in extra packets, UXP interleaves the info and provides the redundancy in extra packets, UXP interleaves the
stream which has to be protected, inserts the redundancy information, info stream which has to be protected, inserts the redundancy
and thus creates a totally new packet structure. information, and thus creates a totally new packet structure.
Another difference concerns multicast compatibility: It cannot be Another difference concerns multicast compatibility: It cannot be
assumed that all future terminals will be able to apply UXP/ULP. assumed that all future terminals will be able to apply UXP/ULP.
Therefore, backward compatibility could be an issue in some cases.
Since ULP does not change the original packet structure, but only
adds some extra packets, it is possible for terminals which do not
support ULP to discard the extra packets. In case of UXP, however,
two separate streams with and without erasure protection have to be
sent, which increases the bandwidth.
Next, both approaches offer different mechanism to adjust packet Therefore, backward compatibility could be an issue in some
cases. Since ULP does not change the original packet structure,
but only adds some extra packets, it is possible for terminals
which do not
support ULP to discard the extra packets. In case of UXP,
however, two separate streams with and without erasure protection
have to be sent, which increases the overall data rate.
Next, both approaches offer different mechanisms to adjust packet
sizes, if necessary: UXP allows to adjust the packet sizes sizes, if necessary: UXP allows to adjust the packet sizes
arbitrarily. This is an advantage in case the loss probability is arbitrarily. This is an advantage in case the loss probability is
dependent on the packet length, which happens, for example, if the dependent on the packet length, which happens, for example, if
end-to-end connection contains wireless links. In this case proper the end-to-end connection contains wireless links. In this case
adjustment of the packet size is one essential network adaption proper adjustment of the packet size is one essential network
technique. In addition, if a preencoded stream is sent over the adaptation technique. In addition, if a preencoded stream is sent
network, the packet size can be adjusted independently of slice over the network, the packet size can be adjusted independently
structures. of slice structures.
Since ULP does not change the existing packetization scheme, this Since ULP does not change the existing packetization scheme, this
flexibility does not exist. flexibility does not exist.
The ability of UXP to adjust the packet size arbitrarily can be The ability of UXP to adjust the packet size arbitrarily can be
especially exploited in a streaming scenario, if a delay of several especially exploited in a streaming scenario, if a delay of
hundred milliseconds is acceptable. It is then possible to fill several hundred milliseconds is acceptable. It is then possible
several video frames into a single TB of desired size, e.g. a group to fill several video frames into a single TB of desired size,
of pictures consisting of I-frame, P-frames and B-frames. The e.g. a group of pictures consisting of I-frame, P-frames and B-
redundancy scheme can thus be selected in such a way as to guarantee frames. The redundancy scheme can thus be selected in such a way
the following property: In case of packet loss, the streams for P- as to guarantee the following property: In case of packet loss,
frames are only recoverable, if the I-frame, on which the decoding of the streams for P-frames are only recoverable, if the I-frame, on
P-frames depends, is recoverable. The same is true for B-frames, which the decoding of P-frames depends, is recoverable. The same
which can only be decoded if the respective P-frames are recoverable. is true for B-frames, which can only be decoded if the respective
This prevents situations in which, for example, the B-frames have P-frames are recoverable. This prevents situations in which, for
been received correctly, but the P-frames have been lost, i.e. example, the B-frames have been received correctly, but the P-
assures a gradual decrease in application quality also on the frame frames have been lost, i.e. assures a gradual decrease in
level. Of course, a similar encoding is possible with ULP. But in application quality also on the frame level. Of course, a similar
this case one might have to send several frames within one packet encoding is possible with ULP. But in this case one might have to
which leads to large packet sizes. send several frames within one packet which leads to large packet
sizes.
Furthmore, decoding delay is also a crucial issue in communications. Furthermore, decoding delay is also a crucial issue in
Again, both approaches have different delay properties: UXP communications. Again, both approaches have different delay
introduces a decoding delay because a reasonable amount of correctly properties: UXP introduces a decoding delay because a reasonable
received packets are necessary to start decoding of a TB. The delay amount of correctly received packets are necessary to start
in general depends on the dimensions of the interleaver. This should decoding of a TB. The delay in general depends on the dimensions
be considered for any system design which includes UXP. of the interleaver. This should be considered for any system
With ULP, every correctly received media packet can be decoded right design which includes UXP.
away. However, a significant delay is introduced, if packets are With ULP, every correctly received media packet can be decoded
corrupted, because in this case one has to wait for several right away. However, a significant delay is introduced, if
redundancy packets. Thus, the delay is in general dependent on the packets are corrupted, because in this case one has to wait for
actual ULP-FEC-packet scheme and cannot be considered in advance several redundancy packets. Thus, the delay is in general
during the system design phase. dependent on the actual ULP-FEC-packet scheme and cannot be
considered in advance during the system design phase.
Finally, we want to point out that UXP uses RS-codes which are known Finally, we want to point out that UXP uses RS-codes which are
known
to be the most efficient type of block codes in terms of erasure to be the most efficient type of block codes in terms of erasure
correction capability. correction capability.
10. Intellectual Property Considerations 10. Intellectual Property Considerations
Siemens AG has filed patent applications that might possibly have Siemens AG has filed patent applications that might possibly have
technical relations to this contribution. technical relations to this contribution.
On IPR related issues, Siemens AG refers to the Siemens Statement on On IPR related issues, Siemens AG refers to the Siemens Statement
Patent Licensing, see http://www.ietf.org/ietf/IPR/SIEMENS-General. on Patent Licensing, see http://www.ietf.org/ietf/IPR/SIEMENS-
General.
11. References 11. References
[1] J. Rosenberg and H. Schulzrinne, "An RTP Payload Format for [1] J. Rosenberg and H. Schulzrinne, "An RTP Payload Format for
Generic Forward Error Correction", Request for Comments 2733, Generic Forward Error Correction", Request for Comments 2733,
Internet Engineering Task Force, Dec. 1999. Internet Engineering Task Force, Dec. 1999.
[2] A. Albanese, J. Bloemer, J. Edmonds, M. Luby, and M. Sudan, [2] A. Albanese, J. Bloemer, J. Edmonds, M. Luby, and M. Sudan,
"Priority encoding transmission", IEEE Trans. Inform. Theory, vol. "Priority encoding transmission", IEEE Trans. Inform. Theory,
42, no. 6, pp. 1737-1744, Nov. 1996. vol. 42, no. 6, pp. 1737-1744, Nov. 1996.
[3] Shu Lin and Daniel J. Costello, Error Control Coding: [3] Shu Lin and Daniel J. Costello, Error Control Coding:
Fundamentals and Applications, Prentice-Hall, Inc., Englewood Fundamentals and Applications, Prentice-Hall, Inc., Englewood
Cliffs, N.J., 1983. Cliffs, N.J., 1983.
[4] W. Li: "Streaming video profile in MPEG-4", IEEE trans. on
[4] W. Li: "Fine Granularity Scalability Using Bit-Plane Coding of Circuits and Systems for Video Technology, Vol. 11, no. 3, 301-
DCT Coefficients", ISO/IEC JTC1/SC29/WG11, Doc. MPEG98/M4204, Dec. 317, Mar 2001.
1998. [5] G. Blaettermann, G. Heising, and D. Marpe: "A Quality
Scalable Mode for H.26L", ITU-T SG16, Q.15, Q15-J24, Osaka, May
[5] G. Blaettermann, G. Heising, and D. Marpe: "A Quality Scalable 2000.
Mode for H.26L", ITU-T SG16, Q.15, Q15-J24, Osaka, May 2000.
[6] F. Burkert, T. Stockhammer, and J. Pandel, "Progressive A/V [6] F. Burkert, T. Stockhammer, and J. Pandel, "Progressive A/V
coding for lossy packet networks - a principle approach", Tech. coding for lossy packet networks - a principle approach", Tech.
Rep., ITU-T SG16, Q.15, Q15-I36, Red Bank, N.J., Oct. 1999. Rep., ITU-T SG16, Q.15, Q15-I36, Red Bank, N.J., Oct. 1999.
[7] Guenther Liebl, "Modeling, theoretical analysis, and coding
[7] Guenther Liebl, "Modeling, theoretical analysis, and coding for for wireless packet erasure channels", Diploma Thesis, Inst. for
wireless packet erasure channels", Diploma Thesis, Inst. for Communications Engineering, Munich University of Technology,
Communications Engineering, Munich University of Technology, 1999. 1999.
[8] U. Horn, K. Stuhlmuller, M. Link, and B. Girod, "Robust
Internet video transmission based on scalable coding and unequal
error protection", Image Com., vol. 15, no. 1-2, pp. 77-94, Sep.
1999.
[9] S. Wenger, "H.26L over IP: The IP-Network Adaptation Layer",
Packet Video 2002, Pittsburgh, Pennsylvania, USA, April 24-
26,2002.
12. Acknowledgments 12. Acknowledgments
Many thanks to Philippe Gentric and Stephen Casner for helpful
Many thanks to Thomas Stockhammer, who initially came up with the comments and improvements.
idea of unequal erasure protection to improve progressive video
transmission over lossy networks.
13. Author's Addresses 13. Author's Addresses
Guenther Liebl, Thomas Stockhammer Guenther Liebl, Thomas Stockhammer
Institute for Communications Engineering (LNT) Institute for Communications Engineering (LNT)
Munich University of Technology Munich University of Technology
D-80290 Munich D-80290 Munich
Germany Germany
Email: {liebl,tom}@lnt.e-technik.tu-muenchen.de Email: {liebl,tom}@lnt.e-technik.tu-muenchen.de
Minh-Ha Nguyen, Frank Burkert Minh-Ha Nguyen, Frank Burkert
Siemens AG - ICM D MP RD MCH 83/81 Siemens AG - ICM D MP RD MCH 83/81
D-81675 Munich D-81675 Munich
skipping to change at page 22, line 25 skipping to change at page 23, line 24
Siemens AG - ICM D MP RD MCH 83/81 Siemens AG - ICM D MP RD MCH 83/81
D-81675 Munich D-81675 Munich
Germany Germany
Email: {minhha.nguyen,frank.burkert}@mch.siemens.de Email: {minhha.nguyen,frank.burkert}@mch.siemens.de
Marcel Wagner, Juergen Pandel, Wenrong Weng, Gero Baese Marcel Wagner, Juergen Pandel, Wenrong Weng, Gero Baese
Siemens AG - Corporate Technology CT IC 2 Siemens AG - Corporate Technology CT IC 2
D-81730 Munich D-81730 Munich
Germany Germany
Email: Email:
{marcel.wagner,juergen.pandel,wenrong.weng,gero.baese}@mchp.siemens. {marcel.wagner,juergen.pandel,wenrong.weng,gero.baese}@mchp.sieme
de ns.de
Full Copyright Statement Full Copyright Statement
"Copyright (C) The Internet Society (date). All Rights Reserved. "Copyright (C) The Internet Society (date). All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished
others, and derivative works that comment on or otherwise explain it to others, and derivative works that comment on or otherwise
or assist in its implementation may be prepared, copied, published explain it or assist in its implementation may be prepared,
and distributed, in whole or in part, without restriction of any copied, published and distributed, in whole or in part, without
kind, provided that the above copyright notice and this paragraph restriction of any kind, provided that the above copyright notice
are included on all such copies and derivative works. However, this and this paragraph are included on all such copies and derivative
document itself may not be modified in any way, such as by removing works. However, this document itself may not be modified in any
the copyright notice or references to the Internet Society or other way, such as by removing the copyright notice or references to
Internet organizations, except as needed for the purpose of the Internet Society or other Internet organizations, except as
developing Internet standards in which case the procedures for needed for the purpose of developing Internet standards in which
copyrights defined in the Internet Standards process must be case the procedures for copyrights defined in the Internet
followed, or as required to translate it into languages other than Standards process must be followed, or as required to translate
English. it into languages other than English.
The limited permissions granted above are perpetual and will not
The limited permissions granted above are perpetual and will not be be revoked by the Internet Society or its successors or assigns.
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an This document and the information contained herein is provided on
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
TASK FORCE DISCLAIMS ALL WARRANTIES; EXPRESS OR IMPLIED; INCLUDING ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES; EXPRESS OR
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF INFORMATION HEREIN IMPLIED; INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE
WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF OF INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/