draft-ietf-avt-rtp-ipmr-12.txt   draft-ietf-avt-rtp-ipmr-13.txt 
Audio/Video Transport Working Group S. Ikonin Audio/Video Transport Working Group S. Ikonin
Internet Draft SPIRIT DSP Internet Draft SPIRIT DSP
Intended status: Proposed Standard February 04, 2010 Intended status: Proposed Standard September 20, 2010
RTP Payload Format for IP-MR Speech Codec draft-ietf-avt-rtp-ipmr-12.txt RTP Payload Format for IP-MR Speech Codec
draft-ietf-avt-rtp-ipmr-13.txt
Status of this Memo Abstract
This Internet-Draft is submitted to IETF in full conformance with the This document specifies the payload format for packetization of
provisions of BCP 78 and BCP 79. SPIRIT IP-MR encoded speech signals into the real-time transport
protocol (RTP). The payload format supports transmission of multiple
frames per packet and introduced redundancy for robustness against
packet loss and bit errors.
Copyright (c) 2010 IETF Trust and the persons identified as the document Status of this Memo
authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions This Internet-Draft is submitted to IETF in full conformance with the
Relating to IETF Documents (http://trustee.ietf.org/license-info) provisions of BCP 78 and BCP 79.
in effect on the date of publication of this document. Please
review these documents carefully, as they describe your rights and
restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License
text as described in Section 4.e of the Trust Legal Provisions and
are provided without warranty as described in the Simplified BSD
License.
The source codes included in this document are provided under BSD Internet-Drafts are working documents of the Internet Engineering
license (http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf). Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are working documents of the Internet Engineering Task Internet-Drafts are draft documents valid for a maximum of six months
Force (IETF), its areas, and its working groups. Note that other groups and may be updated, replaced, or obsoleted by other documents at any
may also distribute working documents as Internet-Drafts. time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
Internet-Drafts are draft documents valid for a maximum of six months The list of current Internet-Drafts can be accessed at
and may be updated, replaced, or obsoleted by other documents at any http://www.ietf.org/1id-abstracts.html
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/1id-abstracts.html http://www.ietf.org/shadow.html
The list of Internet-Draft Shadow Directories can be accessed at This Internet-Draft will expire on December 18, 2010.
http://www.ietf.org/shadow.html
This Internet-Draft will expire on June 04, 2010. Copyright Notice
Abstract Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document specifies the payload format for packetization of SPIRIT This document is subject to BCP 78 and the IETF Trust's Legal
IP-MR encoded speech signals into the Real-time Transport Protocol Provisions Relating to IETF Documents
(RTP). The payload format supports transmission of multiple frames per (http://trustee.ietf.org/license-info) in effect on the date of
payload and introduced redundancy for robustness against packet loss. publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
The source codes included in this document are provided under BSD
license (http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf).
Table of Contents Table of Contents
1. Introduction......................................................3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. IP-MR Codec Description...........................................3 2. IP-MR Codec Description . . . . . . . . . . . . . . . . . . . . 3
3. Payload Format....................................................4 3. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. RTP Header Usage.............................................4 3.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . . 4
3.2. Payload Format Structure.....................................5 3.2. RTP Payload Structure . . . . . . . . . . . . . . . . . . . 5
3.3. Payload Header...............................................5 3.3. Speech Payload Header . . . . . . . . . . . . . . . . . . . 5
3.4. Speech Table of Contents.....................................6 3.4. Speech Payload Table of Contents . . . . . . . . . . . . . 6
3.5. Speech Data..................................................7 3.5. Speech Payload Data . . . . . . . . . . . . . . . . . . . . 6
3.6. Redundancy Header............................................7 3.6. Redundancy Payload Header . . . . . . . . . . . . . . . . . 7
3.7. Redundancy Table of Contents.................................8 3.7. Redundancy Payload Table of Contents . . . . . . . . . . . 8
3.8. Redundancy Data..............................................9 3.8. Redundancy Payload Data . . . . . . . . . . . . . . . . . . 8
4. Payload Examples..................................................9 4. Payload Examples . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1. Payload Carrying a Single Frame..............................9 4.1. Payload Carrying a Single Frame . . . . . . . . . . . . . . 9
4.2. Payload Carrying Multiple Frames with Redundancy............10 4.2. Payload Carrying Multiple Frames with Redundancy . . . . 10
5. Media Type Registration..........................................11 5. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 11
5.1. Registration of media subtype audio/ip-mr_v2.5..............11 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12
5.2. Mapping Media Type Parameters into SDP......................12 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 12
6. Security Considerations..........................................13 7.1. Media Type Registration . . . . . . . . . . . . . . . . . 12
7. Congestion Control...............................................13 7.2. Mapping Media Type Parameters into SDP . . . . . . . . . 13
8. IANA Considerations..............................................14 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
9. Normative References.............................................14 9. Normative References . . . . . . . . . . . . . . . . . . . . . 14
10. Author(s) Information...........................................15 10. Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . . 14
11. Disclaimer......................................................15 11. Legal Terms . . . . . . . . . . . . . . . . . . . . . . . . . 15
12. Legal Terms.....................................................15 12. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 16
APPENDIX A. RETRIEVING FRAME INFORMATION............................17 APPENDIX A. RETRIEVING FRAME INFORMATION . . . . . . . . . . . . 17
A.1. get_frame_info.c...............................................17 A.1. get_frame_info.c . . . . . . . . . . . . . . . . . . . . 17
Authors' Addresses..................................................19
1. Introduction 1. Introduction
This document specifies the payload format for packetization of SPIRIT This document specifies the payload format for packetization of
IP-MR encoded speech signals into the Real-time Transport Protocol SPIRIT IP-MR encoded speech signals into the real-time transport
(RTP). The payload format supports transmission of multiple frames per protocol (RTP). The payload format supports transmission of multiple
payload and introduced redundancy for robustness against packet loss. frames per packet and introduced redundancy for robustness against
packet loss and bit errors.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC 2119]. document are to be interpreted as described in RFC 2119 [RFC 2119].
2. IP-MR Codec Description 2. IP-MR Codec Description
The IP-MR codec is scalable adaptive multi-rate wideband speech codec IP-MR is a wideband speech codec designed by SPIRIT for conferencing
designed by SPIRIT for use in IP based networks. These codec is suitable services over packet-switched networks such as the Internet.
for real time communications such as telephony and videoconferencing.
The codec operates on 20 ms frames at 16 kHz sampling rate and has an
algorithmic delay of 25ms.
The IP-MR supports six wide band speech coding modes with respective bit
rates ranging from about 7.7 to about 34.2 kbps. The coding mode can be
changed at any 20 ms frame boundary making possible to dynamically
adjust the speech encoding rate during a session to adapt to the varying
transmission conditions.
The coded frame consists of multiple coding layers - base (or core)
layer and several enhancement layers which are coded independently.
Only the core layer is mandatory to decode understandable speech and
upper layers provide quality enhancement. These enhancement layers
may be omitted and remaining base layer can be meaningfully decoded
without artifacts. This makes the bit stream scalable and allows
to reduce bit rate during transmission without re-encoding.
This memo specifies an optional form of redundancy coding within RTP
for protection against packet loss. It is based on commonly known
scheme when previously transmitted frames are aggregated together
with new ones. Each frame is retransmitted once in the following
RTP payload packet. f(n-2)...f(n+4) denotes a sequence of speech
frames, and p(n-1)...p(n+4) is a sequence of payload packets:
--+--------+--------+--------+--------+--------+--------+--------+--
| f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
--+--------+--------+--------+--------+--------+--------+--------+--
<---- p(n-1) ----> IP-MR is a scalable codec. It means that not only source has the
<----- p(n) -----> ability to change transmission rate on a fly, but the gateway is also
<---- p(n+1) ----> able to decrease bandwidth at any time without performance overhead.
<---- p(n+2) ----> There are 6 coding rates from 7.7 to 34.2 kbps available.
<---- p(n+3) ---->
<---- p(n+4) ---->
But because of the scalable nature of IP-MR codec there is no need to Codec operates on a frame-by-frame basis with a frame size of 20 ms
duplicate the whole previous frame - only the core layer may be at 16 kHz sampling rate with the total end-to-end delay of 25ms. Each
retransmitted. This reduces redundancy overhead while keeping compressed frame represented as a sequence of layers. The first
efficiency. Moreover, the speech bits encoded in core layer are divided (base) layer is mandatory while the other (enhancement) can be safely
on six classes (from A to F) of perceptual sensitivity to errors. Using discarded. Information about particular frame structure is available
these classes as introduced redundancy make possible to adjust trade-off from the payload header. In order to adjust outgoing bandwidth the
between overhead and robustness against packet loss. gateway MUST read frame(s) structure from the payload header, define
which enhancement layers to discard and compose new RTP packet
according to this specification.
The mechanism described does not really require signaling at the session In fact, not all of bits within a frame are equally tolerant to
setup. The sender is responsible for selecting an appropriate amount of distortion. IP-MR defines 6 classes ('A'-'F') of sensitivity to bit
redundancy based on feedback about the channel conditions. errors. Any damage of class 'A' bits cause significant reconstruction
artifacts while the lost in class 'F' may be even not perceived by
the listener. Note, only base layer in a bitstream is represented as
a set of classes.
The main codec characteristics can be summarized as follows: The IP-MR payload format allows frame duplicate through the packets
to improve robustness against packet loss (Section 3.6). Base layer
can be retransmitted completely or in several sensitive classes.
Enchantment layers are not retransmittable.
o Wideband, 16 kHz, speech codec The fine-grained redundancy in conjunction with bitrate scalability
allows application adjust the trade-off between overhead and
robustness against packet loss. Note, this approach supported
natively within a packet and requires no out-of-band signals or
session initialization procedures.
o Adaptive multi rate with six modes from about 7.7 to 34.2 kbps Main IP-MR features are as the following:
o Bit rate scalable o High quality wideband speech codec.
o Variable bit rate changing in accordance with actual speech o Bitrate scalable with 6 average rates from 7.7 to 34.2 kbps.
content
o Discontinuous Transmission (DTX), silence suppression and o Built-in discontinuous transmission (DTX) and comfort noise
comfort noise generation generation (CNG) support.
o In-band redundancy scheme for protection against packet loss o Flexible in-band redundancy control scheme for packet loss
protection.
3. Payload Format 3. Payload Format
The main purpose of the payload design for IP-MR is to maximize the The payload format consists of the RTP header, and IP-MR payload.
potential of the codec with as minimal overhead as possible. The payload
format allows changing parameters of the codec (such as bit rate,
level of scalability, DTX and redundancy mode) without re-negotiation
at any packet boundary. This make possible dynamically adjust streaming
parameters in accordance to changing network conditions. The payload
format also supports aggregation of multiple consecutive frames
(up to 4) in a payload. That allows controlling trade-off between
delay and header overhead.
3.1. RTP Header Usage 3.1. RTP Header Usage
The RTP timestamp corresponds to the sampling instant of the first The format of the RTP header is specified in RFC 1889. This payload
sample encoded for the first frame-block in the packet. The timestamp format uses the fields of the header in a manner consistent with that
clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms, specification.
corresponding to 320 samples at 16 kHz. Thus the timestamp is increased
by 320 for each consecutive frame. The timestamp is also used to recover
the correct decoding order of the frame-blocks.
The RTP header marker bit (M) SHALL be set to 1 whenever the first The RTP timestamp corresponds to the sampling instant of the first
frame-block carried in the packet is the first frame-block in a sample encoded for the first frame-block in the packet. The timestamp
talkspurt (see definition of the talkspurt in Section 4.1 [RFC 3551]). clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms,
For all other packets, the marker bit SHALL be set to zero (M=0). this corresponding to 320 samples per frame. Thus the timestamp is
increased by 320 for each consecutive frame. The timestamp is also
used to recover the correct decoding order of the frame-blocks.
The assignment of an RTP payload type for the format defined in this The RTP header marker bit (M) SHALL be set to 1 whenever the first
memo is outside the scope of this document. The RTP profiles in use frame-block carried in the packet is the first frame-block in a
currently mandate binding the payload type dynamically for this payload talkspurt (see definition of the talkspurt in Section 4.1 [RFC
format. This is basically necessary because the payload type expresses 3551]). For all other packets, the marker bit SHALL be set to zero
the configuration of the payload itself, i.e. basic or interleaved mode, (M=0).
and the number of channels carried.
The remaining RTP header fields are used as specified in [RFC 3550]. The assignment of an RTP payload type for the format defined in this
memo is outside the scope of this document. The RTP profiles in use
currently mandate binding the payload type dynamically for this
payload format. This is basically necessary because the payload type
expresses the configuration of the payload itself, i.e. basic or
interleaved mode, and the number of channels carried.
3.2. Payload Format Structure The remaining RTP header fields are used as specified in [RFC 3550].
The IP-MR payload format consists of a payload header with general 3.2. RTP Payload Structure
information about packet, a speech table of contents (TOC), and speech
data. An optional redundancy section follows after speech data. The
redundancy section consists of redundancy header, redundancy TOC and
redundancy data payload.
The following diagram shows the standard payload format layout: The IP-MR payload composed of two payloads, one for current (speech)
speech and one for redundancy. Both of payloads are represented in a
form of: Header, Table of contents (TOC) and Data. Redundancy payload
carries data for preceding and pre-preceding packets.
+---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + +--------+-----+----------------------+- - - - +- - +- - - - - +
| payload | speech | speech | redundancy | redundancy | redundancy | | Header | TOC | Data | Header | TOC | Data |
| header | TOC | data | header | TOC | data | +--------+-----+----------------------+- - - - +- - +- - - - - +
+---------+--------+--------+- - - - - - +- - - - - - +- - - - - - + |<- Speech -------------------------->|<- Redundancy (opt) ---->|
3.3. Payload Header 3.3. Speech Payload Header
The payload header has the following format: This header carries parameters which are common for all frames in the
packet:
0 1 0 1
0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+
|T| CR | BR |D|A|GR |R| |T| CR | BR |D|A|GR |R|
+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+
o T (1 bit): Reserved compatibility with future extensions. MUST o T (1 bit): Reserved. MUST be always set to 0. Receiver SHOULD
be set to 0. discard packet if 'T' bit is not equal to 0.
o CR (3 bits): coding rate of frame(s) in this packet, as per the o CR (3 bits): Coding rate index - top enchantment layer
following table: available. The CR value 7 (NO_DATA) indicates that there is no
speech data (and speech TOC accordingly) in the payload. This MAY
be used to transmit redundancy data only.
+-------+--------------+ o BR (3 bits): Base rate index - base layer bitrate. Speech
| CR | avg. bitrate | payload can be scaled to any rate index between BR and CR. Packets
+-------+--------------+ with BR = 6 or BR > CR MUST be discarded. Redundancy data is also
| 0 | 7.7 kbps | considered as having a base rate of BR.
| 1 | 9.8 kbps |
| 2 | 14.3 kbps |
| 3 | 20.8 kbps |
| 4 | 27.9 kbps |
| 5 | 34.2 kbps |
| 6 | (reserved) |
| 7 | NO_DATA |
+-------+--------------+
The CR value 7 (NO_DATA) indicates that there is no speech data (and o D (1 bit): Reserved. MUST be always set to 1. Receiver MAY
speech TOC accordingly) in the payload. This MAY be used to transmit discard packet if 'D' bit is zero.
redundancy data only. The value 6 is reserved. If receiving this value
the packet MUST be discarded.
o BR (3 bits): base rate for core layer of frame(s) in this packet o A (1 bit): Byte-alignment. The value of 1 specifies that padding
using the table for CR. The base rate is the lowest rate for bits were added to enable each compressed frame (3.5) starts with
scalability, so speech payload can be scaled down not lower than BR the byte (8 bit) boundary. The value of 0 specifies unaligned
value. Packets with BR = 6 or BR > CR MUST be discarded. frames. Note, speech payload is always padded to byte boundary
independently on 'A' bit value.
o D (1 bit): reserved. Must be always set to 1. o GR (2 bits): Number of frames in packet (grouping size). Actual
Previously, this bit indicated DTX mode availability, but in fact grouping size is GR + 1, thus maximum grouping supported is 4.
payload dublicates this information.
o A (1 bit): reserved. Must be always set to 1. o R (1 bit): Redundancy presence. Value of 1 indicates redundancy
Previously, this bit indicated aligned mode, but this mode has payload presence.
never been used and was always set to 1.
o GR (2 bits): number of frames in packet (grouping size). Actual Note, the values of 'T' and 'D' bits are fixed, any other values are
grouping size is GR + 1, thus maximum grouping supported is 4. not allowed by specification. Note, the values of padding bit is not
specified.
o R (1 bit): redundancy presence bit. If R=1 then the packet The following table defines mapping between rate index and rate
contains redundancy information for lost packets recovery. value:
In this case after speech data the redundancy section is present.
3.4. Speech Table of Contents +------------+--------------+
| rate index | avg. bitrate |
+------------+--------------+
| 0 | 7.7 kbps |
| 1 | 9.8 kbps |
| 2 | 14.3 kbps |
| 3 | 20.8 kbps |
| 4 | 27.9 kbps |
| 5 | 34.2 kbps |
| 6 | (reserved) |
| 7 | NO_DATA |
+------------+--------------+
The speech TOC contains entries for each frame in packet (grouping size The value of 6 is reserved. If receiving this value the packet MUST
in total). Each entry contains a single field: be discarded.
0 3.4. Speech Payload Table of Contents
+-+
|E|
+-+
o E (1 bit): frame existence indicator. If set to 0, this indicates The speech TOC is a bit mask indicating the presence of each frame in
the corresponding frame is absent and the receiver should set the packet. TOC is only available if 'CR' value is not equal to 7
special LOST_FRAME flag for decoder. This can be followed by the (NO_DATA).
lost frame itself or by empty frames generated by the encoder
during silence intervals in DTX mode.
Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC 0 1 2 3
is empty. +-+-+-+-+
|E|E|E|E|
+-+-+-+-+
|<----->| <-- #(GR+1)
3.5. Speech Data o E (1 bit): Frame existence indicator. The value of 0 indicates
speech data does not present for corresponding frame. IP-MR
encoder sets E flag to 0 for the periods of silence in DTX mode.
Application MUST set this bit to 0 if the frame is known to be
damaged.
Speech data of a payload contains one or more speech frames or comfort 3.5. Speech Payload Data
noise frames, as specified in the speech TOC of the payload.
Each speech frame represents 20 ms of speech encoded with the rate Speech data contains (GR+1) compressed IP-MR frames (20ms of data).
indicated in the CR and base rate indicated in BR field of the payload Compressed frame have zero length if corresponding TOC flag is zero.
header.
The size of coded speech frame is variable due to the nature of codec. The beginning of each compressed frame is aligned if 'A' bit is
The Encoder's algorithm decides what size of each frame is and returns nonzero, while the end of speech payload is always aligned to a byte
it after encoding. In order to save bandwidth the size is not placed (8 bit) boundary:
into payload obviously. The frame size can be determined by frame's
content using a special service function specified in Appendix A.
This function provides complete information about coded frame including
size, number of layers, size of each layer and size of perceptual
sensitive classes.
3.6. Redundancy Header +- - -+------------+------------+------------+------------+
| TOC | Frame1 | Frame2 | Frame3 | Frame4 |
+- - -+------------+------------+------------+------------+ ALWAYS
|<- aligned |<- aligned |<- aligned |<- aligned |<- ALIGNED
If a packet contains redundancy (R field of payload header is 1) the Marked regions MUST be aligned (padded) only if 'A' bit is set to '1'.
speech data is followed by redundancy header:
0 1 2 3 4 5 The compressed frame structure is the following:
+-+-+-+-+-+-+
| CL1 | CL2 |
+-+-+-+-+-+-+
Redundancy header consists of two fields. Each field contains class |<---- sensitive classes ------>|<----- enchantment layers --------->|
specifier for amount of redundancy partly taken from the preceding +-------------------------------+----+-----+------+- - - - - +-------+
packet (CL1) and pre-preceding packet (CL2), e.g. distant from the | L1 (Base Layer) | L2 | L3 | L4 | | LN |
current packet by 1 and 2 packets accordingly. The values are listed +-------------------------------+----+-----+------+- - - - - +-------+
in the table below: |<- A --->|<- B ->| ... |<- F ->| |
|<- BR rate ------------------->| |
|<- CR rate -------------------------------------------------------->|
+-------+-------------------+ The Annex A of this document provides helper routine written in "C"
| CL | amount redundancy | which MUST be used to extract sensitivity classes and enchantment
+-------+-------------------+ layers bounds from the compressed frame data.
| 0 | NONE |
| 1 | CLASS A |
| 2 | CLASS B |
| 3 | CLASS C |
| 4 | CLASS D |
| 5 | CLASS E |
| 6 | CLASS F |
| 7 | (reserved) |
+-------+-------------------+
Each specifier takes 3 bits, thus the total redundancy header size is 6 3.6. Redundancy Payload Header
bits.
These classes indicate subjective importance of bits from core layer. The redundancy payload presence is signaled by R bit of speech
Class A contains the bits most sensitive to errors and lost of these payload header. Redundancy header composed of two fields of 3 bits
bits results in a corrupted speech frame which should not be decoded each:
without applying packet loss concealment (PLC) procedure. Class B is
less sensitive than class A and so on to F. Sum of all bit classes
from A to F composes core layer.
Putting some part (classes of bits) from previous frame into current 0 1 2 3 4 5
packet makes possible to partially decode previous frame in case of +-+-+-+-+-+-+
it's lost. Than more information is delivered than less speech quality | CL1 | CL2 |
degradation will be. Flags CL1 and CL2 specify how many classes from +-+-+-+-+-+-+
previous frames current packet contain. E.g. CL1=3 (class C), it means
that packet contains bits from classes A, B and C of previous frame.
If CL1=6 (class F) then whole core layer is included.
3.7. Redundancy Table of Contents Both of 'CL1' and 'CL2' fields specify the sensitivity classes
available for preceding and pre-preceding packets correspondingly.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------+--------------------+
| Pkt1 Entries| Pkt2 Entries| | CL | Redundancy classes |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | available |
+-------+--------------------+
| 0 | NONE |
| 1 | A |
| 2 | A-B |
| 3 | A-C |
| 4 | A-D |
| 5 | A-E |
| 6 | A-F |
| 7 | (reserved) |
+-------+--------------------+
The redundancy TOC contains entries for redundancy frames from preceding Receiver can reconstruct base layer of preceding packets completely
and pre-preceding packets. Each entry takes 1 bit like speech TOC entry (CL=6) or partially (0<CL< 6) based on sensitivity classes delivered.
(3.3): Decoder MUST discard redundancy payload if CL is equal to 0 or 7.
0 Note, the index of the base rate and grouping parameter are not
+-+ transmitted for redundancy payload. Application MUST assume that 'BR'
|E| and 'GR' are the same as for current packet.
+-+
o E (1 bit): frame existence indicator. If set to 0, this indicates 3.7. Redundancy Payload Table of Contents
the corresponding frame is absent.
o For each preceding and pre-preceding packet the number of entries The redundancy TOC is a bit mask indicating the presence of each
is equal to the grouping size of the current packet. E.g. maximum frame in the redundancy payload. Redundancy TOC is only available if
number of entries is 4*2 = 8. 'CL' value is not equal to 0 or 7.
o If class specifier in the redundancy header is CL=0 (NO_DATA) 0 1 ...
then there is no entries for corresponding packet redundancy. +-+-+-+-+-+-+-+-+
|E|E|E|E|E|E|E|E|
+-+-+-+-+-+-+-+-+
| |<----->| pre-preceding payload #(GR+1)
|<----->| preceding payload #(GR+1)
3.8. Redundancy Data o E (1 bit): Redundancy frame existence indicator. The value of 0
indicates redundancy data does not present for corresponding frame.
Redundancy data of a payload contains redundancy information for one or 3.8. Redundancy Payload Data
more speech frames or comfort noise frames that may be lost during
transition, as specified in the redundancy TOC of the payload. Actually
redundancy is the most important part of preceding frames representing
20 ms of speech. This data MAY be used for partial reconstruction of
lost frames. The amount of available redundancy is specified by CL flag
in redundancy header section (3.5). This flag SHOULD be passed to
decoder. The size of redundancy frame is variable and can be obtained
using service function specified in Appendix A.
4. Payload Examples IP-MR defines 6 classes ('A'-'F') of sensitivity to bit errors. Any
damage of class 'A' bits cause significant reconstruction artifacts
while the lost in class 'F' may be even not perceived by the
listener. Note, only base layer in a bitstream is represented as a
set of classes. Together, the set of sensitivity classes approach and
redundancy allows IP-MR duplicate frames through the packets to
improve robustness against packet loss.
A few examples to highlight the payload format follow. Redundancy data carries a number of sensitivity classes for preceding
and pre-preceding packets as indicated by 'CL1' and 'CL2' fields of
redundancy header. The sensitivity classes data is available
individually for each frame only if corresponding 'E' bit of
redundancy TOC is nonzero:
4.1. Payload Carrying a Single Frame +---+---+----+----|-----+-----+-----+-----+-----+-----+-----+
|A-C|A-B|1000|1001|cl_A1|cl_B1|cl_C1|cl_A1|cl_B1|cl_A4|cl_B4|
+---+---+----+----|-----+-----+-----+-----+-----+-----+-----+
|<- CL >|<- TOC ->|<- preceding --->|<- pre-preceding ----->|
The following diagram shows a standard IP-MR payload carrying a single Redundancy data only available if base (BR) and coding (CR) rates of
speech frame without redundancy: preceding and pre-preceding packets are the same as for the current
packet.
0 1 2 3 Receiver MAY use redundancy data to compensate packet loss, note this
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 case the 'CL' field MUST be also passed to decoder. Helper routine
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ provided in Annex A MUST be used to extract sensitivity classes
|0|CR=1 |BR=0 |0|0|0 0|0|1|sp(0) | length for each frame. The following pseudo code describes the
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ sequence of operations:
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| sp(193)|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In the payload the speech frame is not damaged at the IP origin (E=1), int sensitivityBits[numOfRedundancyFrames][6];
the coding rate is 9.7 kbps(CR=1), the base rate is 7.8 kbps (BR=0), and int redundancyBits [numOfRedundancyFrames];
the DTX mode is off. There is no byte alignment (A=0) and no redundancy for(i = 0 ; i < numOfRedundancyFrames; i++) {
(R=0). The encoded speech bits - s(0) to s(193) - are placed immediately GetFrameInfo(CR, BR, pRedundancyPayloadData, dummy,
after TOC. Finally, one zero bit is added at the end as padding to make sensitivityBits[i], dummy);
the payload byte aligned. redundancyBits[i] = 0;
for(j = 0; j < CL[i]; j++ ) {
redundancyBits[i] += sensitivityBits[i][j];
}
flushBits(pRedundancyPayloadData, redundancyBits[i]);
}
4.2. Payload Carrying Multiple Frames with Redundancy 4. Payload Examples
The following diagram shows a payload that contains three frames, one of This section provides detailed examples of IP-MR payload format.
them with no speech data. The coding rate is 7.7 kbps (CR=0), the base
rate is 7.7 kbps (BR=0), and the DTX mode is on. The speech frames are
byte aligned (A=1), so 1 zero bit is added at the end of the header.
Besides the speech frames the payload contains six redundancy frames
(three per each delayed packet).
The first speech frame consists of bits sp1(0) to sp1(92). After that 3 4.1. Payload Carrying a Single Frame
bits are added for byte alignment. The second frame does not contain any
speech information that is represented in the payload by its TOC entry.
The third frame consists of bits sp3(0) to sp3(171).
The redundancy header follows after speech data. The one-packet-delayed The following diagram shows typical IP-MR payload carrying a one
redundancy contains class A+B bits (CL1=2), and two-packet-delayed (GR=0) non-aligned (A=0) speech frame without redundancy (R=0). The
redundancy contains class A bits (Cl2=1). The one-packet-delayed base layer is coded at 7.8 kbps (BR=0) while the coding rate is 9.7
redundancy contains three frames with 20, 39 and 35 bits respectively. kbps (CR=1). The 'E' bit value of 1 signals that compressed frame
bits s(0) - s(193) are present. There is a padding bit 'P' to
maintain speech payload size alignment.
The first frame of two-packet-delayed redundancy is absent, it is 0 1 2 3
represented in its TOC entry, and two other frames have sizes 15 and 19 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
bits. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0|CR=1 |BR=0 |1|0|0 0|0|1|s(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| s(193)|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Note that all speech frames are padded with zero bits for byte 4.2. Payload Carrying Multiple Frames with Redundancy
alignment.
0 1 2 3 The following diagram shows a payload carrying 3 (GR=2) aligned (A=1)
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 speech frames with redundancy (R=1). The TOC value of '101' indicates
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ speech data presents for a first (bits sp1(0)-sp1(92)) and third
|0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0) | frames (bits sp3(0)-sp3(171)). There is no enchantment layers because
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ of base and coding rates are equal (BR=CR=0). Padding bit 'P' is
| | inserted to maintain necessary alignment.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| sp1(92)|P|P|P|sp3(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| sp3(171)|P|P|P|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|CL1=2|CL2=1|1 1 1|0 1 1|red1_1(0) red1_1(19)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|red1_2(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| red1_2(38)|red1_3(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| red1_3(34)|red2_2(0) red2_2(14)|red2_3(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| red2_3(18)|P|P|P|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
5. Media Type Registration The redundancy payload presents for both preceding and pre-preceding
payloads (CL1 = A-B, CL2=A), but redundancy data only available for a
5 (TOC='111011') of 6 (2*(GR+1)) frames. There are redundancy data of
20, 39 and 35 bits for each three frames of preceding packet and 15
and 19 bits for two frames of pre-preceding packet.
This section describes the media types and names associated with this 0 1 2 3
payload format. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| sp1(92)|P|P|P|sp3(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| sp3(171)|P|P|P|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|CL1=2|CL2=1|1 1 1|0 1 1|red1_1_AB(0) red1_1_AB(19)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|red1_2_AB(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|red1_2_AB(38)|red1_3_AB(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| red1_3_AB(34)|red2_2_A(0) red2_2_A(14)|red2_3_A(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| red2_3_A(18)|P|P|P|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
5.1. Registration of media subtype audio/ip-mr_v2.5 5. Congestion Control The general congestion control considerations for
transporting RTP data applicable to IP-MR speech over RTP (see RTP
[RFC 3550] and any applicable RTP profile like AVP [RFC 3551]).
However, the multi-rate capability of IP-MR speech coding provides a
mechanism that may help to control congestion, since the bandwidth
demand can be adjusted by selecting a different encoding mode.
Type name: audio The number of frames encapsulated in each RTP payload highly
influences the overall bandwidth of the RTP stream due to header
overhead constraints. Packetizing more frames in each RTP payload can
reduce the number of packets sent and hence the overhead from
IP/UDP/RTP headers, at the expense of increased delay.
Subtype name: ip-mr_v2.5 Due to scalability nature of IP_MR codec the transmission rate can be
reduced at any transport stage to fit channel bandwidth. The minimal
rate is specified by BR field of payload header and can be is low as
7.7 kbps. It is up to application to keep balance between coding
quality (high BR) and bitstream scalability (small BR). Because of
coding quality depends rather on coding rate(CR) than base rate (BR),
it is not recommended to use high BR values for real-time
communications.
Required parameters: none Application MAY utilize bitstream redundancy to combat packet loss.
But the gateway is free to chose any option to reduce transmission
rate - coding layer or redundancy bits can be dropped. Due to this
fact it is not RECOMMENDED application to increase total bitrate when
adding redundancy in a response to packet loss.
Optional parameters: 6. Security Considerations
* ptime: Gives the length of time in milliseconds represented by the
media in a packet. Allowed values are: 20, 40, 60 and 80.
Encoding considerations: This media type is framed binary data (see RFC RTP packets using the payload format defined in this specification
4288, Section 4.8). are subject to the security considerations discussed in the RTP
specification [RFC3550] and in any applicable RTP profile. As this
format transports encoded audio, the main security issues include
confidentiality, integrity protection, and data origin authentication
of the audio itself.
Security considerations: See RFC 3550 [RFC 3550] The payload format itself does not have any built-in security
mechanisms. Any suitable external mechanisms, such as SRTP [RFC-
3711], MAY be used.
Interoperability considerations: none This payload format does not exhibit any significant non-uniformity
in the receiver side computational complexity for packet processing
and thus is unlikely to pose a denial-of-service threat due to the
receipt of pathological data.
Published specification: RFC XXXX 7. Payload Format Parameters
Applications that use this media type: Real-time audio applications like This section describes the media types and names associated with this
voice over IP and teleconference, and multi-media streaming. payload format. Note, the IP-MR bitstream was frozen starting from
internal release version of 2.5. Currently 'IP-MR' and 'IP-MR v2.5'
terms are synonyms.
Additional information: none 7.1. Media Type Registration
Person & email address to contact for further information: Media Type name: audio
Yury Morzeev
morzeev@spiritdsp.com
Intended usage: COMMON Media Subtype name: ip-mr_v2.5
Restrictions on usage: This media type depends on RTP framing, and hence Required parameters: none
is only defined for transfer via RTP [RFC 3550].
Authors: Optional parameters:
Sergey Ikonin <info@spiritdsp.com> These parameters apply to RTP transfer only.
Change controller: IETF Audio/Video Transport working group delegated ptime: The media packet length in in milliseconds. Allowed values
from the IESG. are: 20, 40, 60 and 80.
5.2. Mapping Media Type Parameters into SDP Encoding considerations:
This media type is framed binary data (see RFC4288, Section 4.8).
The information carried in the media type specification has a specific Security considerations:
mapping to fields in the Session Description Protocol (SDP) [RFC 4566], See section 6 of RFC XXXX (RFC editor please replace with this RFC
which is commonly used to describe RTP sessions. When SDP is used to number).
specify sessions employing the IP-MR codec, the mapping is as follows:
o The media type ("audio") goes in SDP "m=" as the media name. Interoperability considerations:
none
o The media subtype (payload format name) goes in SDP "a=rtpmap" Published specification:
as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000. RFC XXXX (RFC editor please replace with this RFC number)
o The parameter "ptime" goes in the SDP "a=ptime" attributes. Applications that use this media type:
Real-time audio applications like voice over IP and
teleconference, and multi-media streaming.
Any remaining parameters go in the SDP "a=fmtp" attribute by copying Additional information:
them directly from the media type parameter string as a semicolon- none
separated list of parameter=value pairs.
Note that the payload format (encoding) names are commonly shown in Person & email address to contact for further information:
upper case. Media subtypes are commonly shown in lower case. These Dmitry Yudin <yudin@spiritdsp.com>
names are case-insensitive in both places.
6. Security Considerations Intended usage:
COMMON
RTP packets using the payload format defined in this specification Restrictions on usage:
are subject to the security considerations discussed in the RTP This media type depends on RTP framing, and hence is only defined
specification [RFC 3550] and in any applicable RTP profile. The main fortransfer via RTP [RFC 3550].
security considerations for the RTP packet carrying the RTP payload
format defined within this memo are confidentiality, integrity, and
source authenticity. Confidentiality is achieved by encryption of the
RTP payload. Integrity of the RTP packets is achieved through a suitable
cryptographic integrity protection mechanism. Such a cryptographic
system may also allow the authentication of the source of the payload.
A suitable security mechanism for this RTP payload format should Authors:
provide confidentiality, integrity protection, and at least source Sergey Ikonin <info@spiritdsp.com> Dmitry Yudin
authentication capable of determining if an RTP packet is from a <yudin@spiritdsp.com>
member of the RTP session.
Note that the appropriate mechanism to provide security to RTP and Change controller:
payloads following this memo may vary. It is dependent on the IETF Audio/Video Transport working group delegated from the IESG.
application, the transport, and the signaling protocol employed.
Therefore, a single mechanism is not sufficient, although if suitable,
usage of the Secure Real-time Transport Protocol (SRTP) [RFC 3711] is
recommended. Other mechanisms that may be used are IPsec [RFC 4301]
and Transport Layer Security (TLS) [RFC 5246] (RTP over TCP); other
alternatives may exist.
This payload format does not exhibit any significant non-uniformity in 7.2. Mapping Media Type Parameters into SDP
the receiver side computational complexity for packet processing, and
thus is unlikely to pose a denial-of-service threat due to the receipt
of pathological data.
7. Congestion Control The information carried in the media type specification has a
specific mapping to fields in the Session Description Protocol (SDP)
[RFC 4566], which is commonly used to describe RTP sessions. When SDP
is used to specify sessions employing the IP-MR codec, the mapping is
as follows:
o The media type ("audio") goes in SDP "m=" as the media name.
The general congestion control considerations for transporting RTP data o The media subtype (payload format name) goes in SDP "a=rtpmap"
apply; see RTP [RFC 3550] and any applicable RTP profile like AVP as the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000.
[RFC 3551]. However, the multi-rate capability of IP-MR speech coding
provides a mechanism that may help to control congestion, since the
bandwidth demand can be adjusted by selecting a different encoding mode.
The number of frames encapsulated in each RTP payload highly o The parameter "ptime" goes in the SDP "a=ptime" attributes.
influences the overall bandwidth of the RTP stream due to header
overhead constraints. Packetizing more frames in each RTP payload
can reduce the number of packets sent and hence the overhead from
IP/UDP/RTP headers, at the expense of increased delay.
If in-band redundancy scheme is used to protect against packet loss, Any remaining parameters go in the SDP "a=fmtp" attribute by copying
the amount of introduced redundancy will need to be regulated so that them directly from the media type parameter string as a semicolon-
the use of redundancy itself does not cause a congestion problem. In separated list of parameter=value pairs.
other words, a sender SHALL NOT increase the total bitrate when adding
redundancy in response to packet loss, and needs instead to adjust it Note that the payload format (encoding) names are commonly shown in
down in accordance to the congestion control algorithm being run. Thus, upper case. Media subtypes are commonly shown in lower case. These
when adding redundancy, the media bitrate will need to be reduced to names are case-insensitive in both places.
provide room for the redundancy.
8. IANA Considerations 8. IANA Considerations
One media type has been defined and needs registration in the media One media type has been defined and needs registration in the media
types registry. types registry.
9. Normative References 9. Normative References
[RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
V. Jacobson, "RTP: A Transport Protocol for Real-Time Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003. Applications", STD 64, RFC 3550, July 2003.
[RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
and Video Conferences with Minimal Control", STD 65, Video Conferences with Minimal Control", STD 65, RFC 3551,
RFC 3551, July 2003. July 2003.
[RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
Description Protocol", RFC 4566, July 2006. Description Protocol", RFC 4566, July 2006.
[RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman, [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E.,
K., "The Secure Real-Time Transport Protocol (SRTP)", RFC Norrman, K., "The Secure Real-Time Transport Protocol
3711, March 2004. (SRTP)", RFC 3711, March 2004.
[RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
Security (TLS) Protocol Version 1.2", RFC 5246, (TLS) Protocol Version 1.2", RFC 5246, August 2008.
August 2008.
[RFC 4301] Kent, S. and K. Seo, "Security Architecture for the [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005. Internet Protocol", RFC 4301, December 2005.
10. Author(s) Information: 10. Disclaimer
Sergey Ikonin This document may contain material from IETF Documents or IETF
email: info@spiritdsp.com Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Russia 109004 11. Legal Terms
Building 27, A. Solzhenitsyna street
Tel: +7 495 661-2178
Fax: +7 495 912-6786
11. Disclaimer All IETF Documents and the information contained therein are provided
on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
This document may contain material from IETF Documents or IETF The IETF Trust takes no position regarding the validity or scope of
Contributions published or made publicly available before November 10, any Intellectual Property Rights or other rights that might be
2008. The person(s) controlling the copyright in some of this material claimed to pertain to the implementation or use of the technology
may not have granted the IETF Trust the right to allow modifications of described in any IETF Document or the extent to which any license
such material outside the IETF Standards Process. Without obtaining an under such rights might or might not be available; nor does it
adequate license from the person(s) controlling the copyright in such represent that it has made any independent effort to identify any
materials, this document may not be modified outside the IETF Standards such rights.
Process, and derivative works of it may not be created outside the IETF
Standards Process, except to format it for publication as an RFC or to
translate it into languages other than English.
12. Legal Terms Copies of Intellectual Property disclosures made to the IETF
Secretariat and any assurances of licenses to be made available, or
the result of an attempt made to obtain a general license or
permission for the use of such proprietary rights by implementers or
users of this specification can be obtained from the IETF on-line IPR
repository at http://www.ietf.org/ipr.
All IETF Documents and the information contained therein are provided on The IETF invites any interested party to bring to its attention any
an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS copyrights, patents or patent applications, or other proprietary
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND rights that may cover technology that may be required to implement
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR any standard or specification contained in an IETF Document. Please
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE address the information to the IETF at ietf-ipr@ietf.org.
INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF Trust takes no position regarding the validity or scope of any The definitive version of an IETF Document is that published by, or
Intellectual Property Rights or other rights that might be claimed to under the auspices of, the IETF. Versions of IETF Documents that are
pertain to the implementation or use of the technology described in any published by third parties, including those that are translated into
IETF Document or the extent to which any license under such rights might other languages, should not be considered to be definitive versions
or might not be available; nor does it represent that it has made any of IETF Documents. The definitive version of these Legal Provisions
independent effort to identify any such rights. is that published by, or under the auspices of, the IETF. Versions of
these Legal Provisions that are published by third parties, including
those that are translated into other languages, should not be
considered to be definitive versions of these Legal Provisions.
Copies of Intellectual Property disclosures made to the IETF Secretariat For the avoidance of doubt, each Contributor to the IETF Standards
and any assurances of licenses to be made available, or the result of an Process licenses each Contribution that he or she makes as part of
attempt made to obtain a general license or permission for the use of the IETF Standards Process to the IETF Trust pursuant to the
such proprietary rights by implementers or users of this specification provisions of RFC 5378. No language to the contrary, or terms,
can be obtained from the IETF on-line IPR repository at conditions or rights that differ from or are inconsistent with the
http://www.ietf.org/ipr. rights and licenses granted under RFC 5378, shall have any effect and
shall be null and void, whether published or posted by such
Contributor, or included with or in such Contribution.
The IETF invites any interested party to bring to its attention any 12. Authors' Addresses
copyrights, patents or patent applications, or other proprietary rights
that may cover technology that may be required to implement any standard
or specification contained in an IETF Document. Please address the
information to the IETF at ietf-ipr@ietf.org.
The definitive version of an IETF Document is that published by, or SPIRIT DSP
under the auspices of, the IETF. Versions of IETF Documents that are Building 27, A. Solzhenitsyna street
published by third parties, including those that are translated into 109004, Moscow, RUSSIA
other languages, should not be considered to be definitive versions of
IETF Documents. The definitive version of these Legal Provisions is that
published by, or under the auspices of, the IETF. Versions of these
Legal Provisions that are published by third parties, including those
that are translated into other languages, should not be considered to be
definitive versions of these Legal Provisions.
For the avoidance of doubt, each Contributor to the IETF Standards Tel: +7 495 661-2178
Process licenses each Contribution that he or she makes as part of the Fax: +7 495 912-6786
IETF Standards Process to the IETF Trust pursuant to the provisions of EMail: info@spiritdsp.com
RFC 5378. No language to the contrary, or terms, conditions or rights
that differ from or are inconsistent with the rights and licenses
granted under RFC 5378, shall have any effect and shall be null and
void, whether published or posted by such Contributor, or included with
or in such Contribution.
APPENDIX A. RETRIEVING FRAME INFORMATION APPENDIX A. RETRIEVING FRAME INFORMATION
This appendix contains the c-code for implementation of frame parsing This appendix contains the c-code for implementation of frame parsing
function. This function extracts information about coded frame including function. This function extracts information about coded frame
frame size, number of layers, size of each layer and size of perceptual including frame size, number of layers, size of each layer and size
sensitive classes. of perceptual sensitive classes.
A.1. get_frame_info.c A.1. get_frame_info.c
/* /*
Copyright (c) <insert year> Copyright (c) 2010
IETF Trust and the persons identified as authors of the code. IETF Trust and the persons identified as authors of the code.
All rights reserved. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of Internet Society, IETF or IETF Trust, nor the names
of specific contributors, may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/
/******************************************************************
get_frame_info.c
Retrieving frame information for IP-MR Speech Codec
******************************************************************/ Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
- Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of Internet Society, IETF or IETF Trust, nor the names
of specific contributors, may be used to endorse or promote products
derived from this software without specific prior written permission.
#define RATES_NUM 6 // number of codec rates THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
#define SENSE_CLASSES 6 // number of sensitivity classes (A..F) AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
// frame types */
#define FT_SPEECH 0 // active speech
#define FT_DTX_SID 1 // silence insertion descriptor
// get specified bit from coded data /******************************************************************
int GetBit(unsigned char *data, int curBit)
{
return ((data[curBit >> 3] >> (curBit % 8)) & 1);
}
// retrieve frame information get_frame_info.c
int GetFrameInfo( // o: frame size in bits
short rate, // i: encoding rate (0..5)
short base_rate, // i: base (core) layer rate,
// if base_rate > rate, then assumed
// that base_rate = rate.
unsigned char *pCoded, // i: coded bit frame
short pLayerBits // o: number of bits in layers
[RATES_NUM],
short pSenseBits // o: number of bits in sensitivity classes
[SENSE_CLASSES],
short *nLayers // o: number of layers
)
{
static const short Bits_1[4] = {0, 9, 9, 15};
static const short Bits_2[16] = { 43,50,36,31,46,48,40,44,47,43,44,
45,43,44,47,36};
static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31}, Retrieving frame information for IP-MR Speech Codec
{25, 0, 23, 32, 36, 31},};
int FrType; ******************************************************************/
int i,nBits; #define RATES_NUM 6 // number of codec rates
#define SENSE_CLASSES 6 // number of sensitivity classes (A..F)
if (rate < 0 || rate > 5) { // frame types
return 0; // incorrect stream #define FT_SPEECH 0 // active speech
} #define FT_DTX_SID 1 // silence insertion descriptor
for(i = 0; i < SENSE_CLASSES; i++) { // get specified bit from coded data
pSenseBits[i] = 0; int GetBit(unsigned char *data, int curBit)
} {
return ((data[curBit >> 3] >> (curBit % 8)) & 1);
}
nBits = 0; // retrieve frame information
// extract frame type bit if required int GetFrameInfo( // o: frame size in bits
FrType = GetBit(pCoded, nBits++) ? FT_SPEECH : FT_DTX_SID; short rate, // i: encoding rate (0..5)
short base_rate, // i: base (core) layer rate,
// if base_rate > rate, then assumed
// that base_rate = rate.
unsigned char *pCoded, // i: coded bit frame
short pLayerBits // o: number of bits in layers
[RATES_NUM],
short pSenseBits // o: number of bits in sensitivity classes
[SENSE_CLASSES],
short *nLayers // o: number of layers
)
{
static const short Bits_1[4] = {0, 9, 9, 15};
static const short Bits_2[16] = { 43,50,36,31,46,48,40,44,47,43,44,
45,43,44,47,36};
static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31},
{25, 0, 23, 32, 36, 31},};
{ int FrType;
int cw_0; int i,nBits;
int b[14];
// extract meaning bits if (rate < 0 || rate > 5) {
for(i = 0 ; i < 14; i++) { return 0; // incorrect stream
b[i] = GetBit(pCoded, nBits++); }
}
// parse for(i = 0; i < SENSE_CLASSES; i++) {
if(FrType == FT_DTX_SID) { pSenseBits[i] = 0;
cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3); }
rate = 0;
pSenseBits[0] = 10 + Bits_2[cw_0];
} else {
int i, idx; nBits = 0;
int nFlag_1, nFlag_2, cw_1, cw_2; // extract frame type bit if required
FrType = GetBit(pCoded, nBits++) ? FT_SPEECH : FT_DTX_SID;
{
int cw_0;
int b[14];
nFlag_1 = b[0] + b[2] + b[4] + b[6]; // extract meaning bits
cw_1 = (cw_1 << 1) | b[0]; for(i = 0 ; i < 14; i++) {
cw_1 = (cw_1 << 1) | b[2]; b[i] = GetBit(pCoded, nBits++);
cw_1 = (cw_1 << 1) | b[4]; }
cw_1 = (cw_1 << 1) | b[6];
nFlag_2 = b[1] + b[3] + b[5] + b[7]; // parse
cw_2 = (cw_2 << 1) | b[1]; if(FrType == FT_DTX_SID) {
cw_2 = (cw_2 << 1) | b[3]; cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3);
cw_2 = (cw_2 << 1) | b[5]; rate = 0;
cw_2 = (cw_2 << 1) | b[7]; pSenseBits[0] = 10 + Bits_2[cw_0];
} else {
cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3); int i, idx;
if (base_rate < 0) base_rate = 0; int nFlag_1, nFlag_2, cw_1, cw_2;
if (base_rate > rate) base_rate = rate;
idx = base_rate == 0 ? 0 : 1;
pSenseBits[0] = 15+Bits_2[cw_0]; nFlag_1 = b[0] + b[2] + b[4] + b[6];
pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3]; cw_1 = (cw_1 << 1) | b[0];
pSenseBits[2] = nFlag_1*5; cw_1 = (cw_1 << 1) | b[2];
pSenseBits[3] = nFlag_2*30; cw_1 = (cw_1 << 1) | b[4];
pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]); cw_1 = (cw_1 << 1) | b[6];
for (i = 1; i < rate+1; i++) { nFlag_2 = b[1] + b[3] + b[5] + b[7];
pLayerBits[i] = 4*(Bits_3[idx][i]); cw_2 = (cw_2 << 1) | b[1];
} cw_2 = (cw_2 << 1) | b[3];
} cw_2 = (cw_2 << 1) | b[5];
cw_2 = (cw_2 << 1) | b[7];
pLayerBits[0] = 0; cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3);
for (i = 0; i < SENSE_CLASSES; i++) { if (base_rate < 0) base_rate = 0;
pLayerBits[0] += pSenseBits[i]; if (base_rate > rate) base_rate = rate;
} idx = base_rate == 0 ? 0 : 1;
*nLayers = rate+1; pSenseBits[0] = 15+Bits_2[cw_0];
} pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3];
pSenseBits[2] = nFlag_1*5;
pSenseBits[3] = nFlag_2*30;
pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]);
{ for (i = 1; i < rate+1; i++) {
// count total frame size pLayerBits[i] = 4*(Bits_3[idx][i]);
int payloadBitCount = 0; }
for (i = 0; i < *nLayers; i++) { }
payloadBitCount += pLayerBits[i];
}
return payloadBitCount;
}
}
Authors' Addresses pLayerBits[0] = 0;
for (i = 0; i < SENSE_CLASSES; i++) {
pLayerBits[0] += pSenseBits[i];
}
SPIRIT DSP *nLayers = rate+1;
Building 27, A. Solzhenitsyna street }
109004, Moscow, RUSSIA
Tel: +7 495 661-2178 {
Fax: +7 495 912-6786 // count total frame size
EMail: info@spiritdsp.com int payloadBitCount = 0;
for (i = 0; i < *nLayers; i++) {
payloadBitCount += pLayerBits[i];
}
return payloadBitCount;
}
}
 End of changes. 169 change blocks. 
679 lines changed or deleted 662 lines changed or added

This html diff was produced by rfcdiff 1.39. The latest version is available from http://tools.ietf.org/tools/rfcdiff/