Audio/Video Transport Working Group                            S. Ikonin
Internet Draft                                                SPIRIT DSP
Intended status: Proposed Standard                     February 04,                    September 20, 2010

               RTP Payload Format for IP-MR Speech Codec draft-ietf-avt-rtp-ipmr-12.txt
                     draft-ietf-avt-rtp-ipmr-13.txt

Abstract

   This document specifies the payload format for packetization of
   SPIRIT IP-MR encoded speech signals into the real-time transport
   protocol (RTP). The payload format supports transmission of multiple
   frames per packet and introduced redundancy for robustness against
   packet loss and bit errors.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on December 18, 2010.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

   The source codes included in this document are provided under BSD
   license (http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf).

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html

This Internet-Draft will expire on June 04, 2010.

Abstract

This document specifies the payload format for packetization

Table of SPIRIT
IP-MR encoded speech signals into the Real-time Transport Protocol
(RTP). The payload format supports transmission of multiple frames per
payload and introduced redundancy for robustness against packet loss.

Table of Contents

 1. Introduction......................................................3
 2. Contents

   1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2. IP-MR Codec Description...........................................3 Description  . . . . . . . . . . . . . . . . . . . . 3
   3. Payload Format....................................................4 Format . . . . . . . . . . . . . . . . . . . . . . . . . 4
      3.1. RTP Header Usage.............................................4 Usage  . . . . . . . . . . . . . . . . . . . . . 4
      3.2. RTP Payload Format Structure.....................................5 Structure . . . . . . . . . . . . . . . . . . . 5
      3.3. Speech Payload Header...............................................5 Header . . . . . . . . . . . . . . . . . . . 5
      3.4. Speech Payload Table of Contents.....................................6 Contents  . . . . . . . . . . . . . 6
      3.5. Speech Data..................................................7 Payload Data . . . . . . . . . . . . . . . . . . . . 6
      3.6. Redundancy Header............................................7 Payload Header . . . . . . . . . . . . . . . . . 7
      3.7. Redundancy Payload Table of Contents.................................8 Contents  . . . . . . . . . . . 8
      3.8. Redundancy Data..............................................9 Payload Data . . . . . . . . . . . . . . . . . . 8
   4. Payload Examples..................................................9 Examples . . . . . . . . . . . . . . . . . . . . . . . . 9
      4.1. Payload Carrying a Single Frame..............................9 Frame . . . . . . . . . . . . . . 9
      4.2. Payload Carrying Multiple Frames with Redundancy............10 Redundancy  . . . .  10
   5. Congestion Control . . . . . . . . . . . . . . . . . . . . . .  11
   6. Security Considerations  . . . . . . . . . . . . . . . . . . .  12
   7. Payload Format Parameters  . . . . . . . . . . . . . . . . . .  12
      7.1. Media Type Registration..........................................11
    5.1. Registration of media subtype audio/ip-mr_v2.5..............11
    5.2. . . . . . . . . . . . . . . . . .  12
      7.2. Mapping Media Type Parameters into SDP......................12
 6. Security Considerations..........................................13
 7. Congestion Control...............................................13 SDP  . . . . . . . . .  13
   8. IANA Considerations..............................................14 Considerations  . . . . . . . . . . . . . . . . . . . . .  14
   9. Normative References.............................................14 References . . . . . . . . . . . . . . . . . . . . .  14
   10. Author(s) Information...........................................15 Disclaimer  . . . . . . . . . . . . . . . . . . . . . . . . .  14
   11. Disclaimer......................................................15
 12. Legal Terms.....................................................15 Terms . . . . . . . . . . . . . . . . . . . . . . . . .  15
   12. Authors' Addresses  . . . . . . . . . . . . . . . . . . . . .  16
   APPENDIX A. RETRIEVING FRAME INFORMATION............................17 INFORMATION  . . . . . . . . . . . .  17
      A.1. get_frame_info.c...............................................17
 Authors' Addresses..................................................19 get_frame_info.c  . . . . . . . . . . . . . . . . . . . .  17

1. Introduction

   This document specifies the payload format for packetization of
   SPIRIT IP-MR encoded speech signals into the Real-time Transport Protocol real-time transport
   protocol (RTP). The payload format supports transmission of multiple
   frames per
payload packet and introduced redundancy for robustness against
   packet loss. loss and bit errors.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC 2119].

2. IP-MR Codec Description

The

   IP-MR codec is scalable adaptive multi-rate a wideband speech codec designed by SPIRIT for use in IP based networks. These codec is suitable
for real time communications conferencing
   services over packet-switched networks such as telephony and videoconferencing.

The codec the Internet.

   IP-MR is a scalable codec. It means that not only source has the
   ability to change transmission rate on a fly, but the gateway is also
   able to decrease bandwidth at any time without performance overhead.
   There are 6 coding rates from 7.7 to 34.2 kbps available.

   Codec operates on a frame-by-frame basis with a frame size of 20 ms frames
   at 16 kHz sampling rate and has an
algorithmic with the total end-to-end delay of 25ms. Each
   compressed frame represented as a sequence of layers. The IP-MR supports six wide band speech coding modes with respective bit
rates ranging from about 7.7 to about 34.2 kbps. The coding mode first
   (base) layer is mandatory while the other (enhancement) can be
changed at any 20 ms safely
   discarded. Information about particular frame boundary making possible structure is available
   from the payload header. In order to dynamically adjust outgoing bandwidth the speech encoding rate during a session to adapt to
   gateway MUST read frame(s) structure from the varying
transmission conditions.

The coded frame consists of multiple coding layers - base (or core)
layer and several payload header, define
   which enhancement layers which are coded independently.
Only the core layer is mandatory to decode understandable speech and
upper layers provide quality enhancement. These enhancement layers
may be omitted and remaining base layer can be meaningfully decoded
without artifacts. This makes the bit stream scalable discard and allows compose new RTP packet
   according to reduce bit rate during transmission without re-encoding.

This memo specifies an optional form this specification.

   In fact, not all of redundancy coding bits within RTP
for protection against packet loss. It is based on commonly known
scheme when previously transmitted frames are aggregated together
with new ones. Each a frame is retransmitted once are equally tolerant to
   distortion. IP-MR defines 6 classes ('A'-'F') of sensitivity to bit
   errors. Any damage of class 'A' bits cause significant reconstruction
   artifacts while the lost in class 'F' may be even not perceived by
   the following
RTP payload packet. f(n-2)...f(n+4) denotes listener. Note, only base layer in a sequence of speech
frames, and p(n-1)...p(n+4) bitstream is represented as
   a sequence of payload packets:

   --+--------+--------+--------+--------+--------+--------+--------+--
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
   --+--------+--------+--------+--------+--------+--------+--------+--

      <---- p(n-1) ---->
               <----- p(n) ----->
                        <---- p(n+1) ---->
                                 <---- p(n+2) ---->
                                          <---- p(n+3) ---->
                                                   <---- p(n+4) ---->

But because of the scalable nature set of classes.

   The IP-MR codec there is no need to
duplicate the whole previous payload format allows frame - only duplicate through the core packets
   to improve robustness against packet loss (Section 3.6). Base layer may
   can be
retransmitted. This reduces redundancy overhead while keeping
efficiency. Moreover, the speech bits encoded retransmitted completely or in core layer several sensitive classes.
   Enchantment layers are divided
on six classes (from A to F) of perceptual sensitivity to errors. Using
these classes as introduced not retransmittable.

   The fine-grained redundancy make possible to in conjunction with bitrate scalability
   allows application adjust the trade-off between overhead and
   robustness against packet loss.

The mechanism described does not really require signaling at the Note, this approach supported
   natively within a packet and requires no out-of-band signals or
   session
setup. The sender is responsible for selecting an appropriate amount of
redundancy based on feedback about the channel conditions.

The main codec characteristics can be summarized initialization procedures.

   Main IP-MR features are as follows: the following:

      o Wideband, 16 kHz, High quality wideband speech codec codec.

      o Adaptive multi rate Bitrate scalable with six modes 6 average rates from about 7.7 to 34.2 kbps

    o Bit rate scalable

    o Variable bit rate changing in accordance with actual speech
      content kbps.

      o Discontinuous Transmission (DTX), silence suppression Built-in discontinuous transmission (DTX) and comfort noise
      generation (CNG) support.

      o In-band Flexible in-band redundancy control scheme for protection against packet loss
      protection.

3. Payload Format

   The main purpose payload format consists of the payload design for RTP header, and IP-MR is to maximize the
potential of the codec with as minimal overhead as possible. payload.

3.1. RTP Header Usage

   The payload format allows changing parameters of the codec  (such as bit rate,
level of scalability, DTX and redundancy mode) without re-negotiation
at any packet boundary. This make possible dynamically adjust streaming
parameters RTP header is specified in accordance to changing network conditions. The RFC 1889. This payload
   format also supports aggregation uses the fields of multiple consecutive frames
(up to 4) the header in a payload. That allows controlling trade-off between
delay and header overhead.

3.1. RTP Header Usage manner consistent with that
   specification.

   The RTP timestamp corresponds to the sampling instant of the first
   sample encoded for the first frame-block in the packet. The timestamp
   clock frequency SHALL be 16 kHz. The duration of one frame is 20 ms,
   this corresponding to 320 samples at 16 kHz. per frame. Thus the timestamp is
   increased by 320 for each consecutive frame. The timestamp is also
   used to recover the correct decoding order of the frame-blocks.

   The RTP header marker bit (M) SHALL be set to 1 whenever the first
   frame-block carried in the packet is the first frame-block in a
   talkspurt (see definition of the talkspurt in Section 4.1 [RFC
   3551]). For all other packets, the marker bit SHALL be set to zero
   (M=0).

   The assignment of an RTP payload type for the format defined in this
   memo is outside the scope of this document. The RTP profiles in use
   currently mandate binding the payload type dynamically for this
   payload format. This is basically necessary because the payload type
   expresses the configuration of the payload itself, i.e. basic or
   interleaved mode, and the number of channels carried.

   The remaining RTP header fields are used as specified in [RFC 3550].

3.2. RTP Payload Format Structure

   The IP-MR payload format consists composed of a payload header with general
information about packet, a two payloads, one for current (speech)
   speech table of contents (TOC), and speech
data. An optional redundancy section follows after speech data. The
redundancy section consists one for redundancy. Both of redundancy header, redundancy TOC payloads are represented in a
   form of: Header, Table of contents (TOC) and
redundancy data payload.

The following diagram shows the standard Data. Redundancy payload format layout:

  +---------+--------+--------+- - -
   carries data for preceding and pre-preceding packets.

     +--------+-----+----------------------+- - - - +- - - - - -  +- - - - - - +
     | payload | speech | speech | redundancy | redundancy | redundancy |
  | header Header | TOC | data Data                 | header Header | TOC | data Data     |
  +---------+--------+--------+- - -
     +--------+-----+----------------------+- - - - +- - - - - -  +- - - - - - +
     |<- Speech -------------------------->|<- Redundancy (opt) ---->|

3.3. Speech Payload Header

The payload

   This header has carries parameters which are common for all frames in the following format:
   packet:

                           0                   1
                           0 1 2 3 4 5 6 7 8 9 0 1
                          +-+-+-+-+-+-+-+-+-+-+-+-+
                          |T| CR  | BR  |D|A|GR |R|
                          +-+-+-+-+-+-+-+-+-+-+-+-+

      o T (1 bit): Reserved compatibility with future extensions. Reserved. MUST be always set to 0. Receiver SHOULD
      discard packet if 'T' bit is not equal to 0.

      o CR (3 bits): coding Coding rate of frame(s) in this packet, as per the
       following table:

                          +-------+--------------+
                          |  CR   | avg. bitrate |
                          +-------+--------------+
                          |   0   |   7.7 kbps   |
                          |   1   |   9.8 kbps   |
                          |   2   |  14.3 kbps   |
                          |   3   |  20.8 kbps   |
                          |   4   |  27.9 kbps   |
                          |   5   |  34.2 kbps   |
                          |   6   |  (reserved)  |
                          |   7   |   NO_DATA    |
                          +-------+--------------+

The CR value 7 (NO_DATA) indicates that there is no speech data (and
speech TOC accordingly) index - top enchantment layer
      available. The CR value 7 (NO_DATA) indicates that there is no
      speech data (and speech TOC accordingly) in the payload. This MAY
      be used to transmit redundancy data only. The value 6 is reserved. If receiving this value
the packet MUST be discarded.

      o BR (3 bits): base Base rate for core layer of frame(s) in this packet
      using the table for CR. The index - base rate is the lowest rate for
      scalability, so speech layer bitrate. Speech
      payload can be scaled down not lower than to any rate index between BR
      value. and CR. Packets
      with BR = 6 or BR > CR MUST be discarded. Redundancy data is also
      considered as having a base rate of BR.

      o D (1 bit): reserved. Must Reserved. MUST be always set to 1.
      Previously, this Receiver MAY
      discard packet if 'D' bit indicated DTX mode availability, but in fact
      payload dublicates this information. is zero.

      o A (1 bit): reserved. Must be always set Byte-alignment. The value of 1 specifies that padding
      bits were added to 1.
      Previously, this bit indicated aligned mode, but this mode has
      never been used and was enable each compressed frame (3.5) starts with
      the byte (8 bit) boundary. The value of 0 specifies unaligned
      frames. Note, speech payload is always set padded to 1. byte boundary
      independently on  'A' bit value.

      o GR (2 bits): number Number of frames in packet (grouping size). Actual
      grouping size is GR + 1, thus maximum grouping supported is 4.

      o R (1 bit): redundancy presence bit. If R=1 then the packet
      contains redundancy information for lost packets recovery.
      In this case after speech data the redundancy section is present.

3.4. Speech Table Redundancy presence. Value of Contents

The speech TOC contains entries for each frame in packet (grouping size
in total). Each entry contains a single field:

                                   0
                                  +-+
                                  |E|
                                  +-+

    o E (1 bit): frame existence indicator. If set to 0, this 1 indicates redundancy
      payload presence.

   Note, the corresponding frame is absent values of 'T' and the receiver should set
      special LOST_FRAME flag for decoder. This can be followed by the
      lost frame itself or by empty frames generated 'D' bits are fixed, any other values are
   not allowed by specification.  Note, the encoder
      during silence intervals in DTX mode.

Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC
is empty.

3.5. Speech Data

Speech data of a payload contains one or more speech frames or comfort
noise frames, as specified in the speech TOC of the payload.

Each speech frame represents 20 ms of speech encoded with the rate
indicated in the CR and base rate indicated in BR field of the payload
header.

The size of coded speech frame is variable due to the nature of codec.
The Encoder's algorithm decides what size values of each frame is and returns
it after encoding. In order to save bandwidth the size padding bit is not placed
into payload obviously. The frame size can be determined by frame's
content using a special service function specified in Appendix A.
This function provides complete information about coded frame including
size, number of layers, size of each layer and size of perceptual
sensitive classes.

3.6. Redundancy Header

If a packet contains redundancy (R field of payload header is 1) the
speech data is followed by redundancy header:

                             0 1 2 3 4 5
                            +-+-+-+-+-+-+
                            | CL1 | CL2 |
                            +-+-+-+-+-+-+

Redundancy header consists of two fields. Each field contains class
specifier for amount of redundancy partly taken from the preceding
packet (CL1) and pre-preceding packet (CL2), e.g. distant from the
current packet by 1 and 2 packets accordingly.
   specified.

   The values are listed
in the following table below:

                     +-------+-------------------+ defines mapping between rate index and rate
   value:

                    +------------+--------------+
                    |  CL rate index | amount redundancy avg. bitrate |
                     +-------+-------------------+
                    +------------+--------------+
                    |      0     |       NONE   7.7 kbps   |
                    |      1     |      CLASS A   9.8 kbps   |
                    |      2     |      CLASS B  14.3 kbps   |
                    |      3     |      CLASS C  20.8 kbps   |
                    |      4     |      CLASS D  27.9 kbps   |
                    |      5     |      CLASS E  34.2 kbps   |
                    |      6     |      CLASS F  (reserved)  |
                    |      7     |     (reserved)   NO_DATA    |
                     +-------+-------------------+

Each specifier takes 3 bits, thus the total redundancy header size is 6
bits.

These classes indicate subjective importance
                    +------------+--------------+

   The value of bits from core layer.
Class A contains 6 is reserved. If receiving this value the bits most sensitive to errors and lost packet MUST
   be discarded.

3.4. Speech Payload Table of these
bits results in a corrupted Contents

   The speech frame which should not be decoded
without applying packet loss concealment (PLC) procedure. Class B TOC is
less sensitive than class A and so on to F. Sum of all a bit classes
from A to F composes core layer.

Putting some part (classes mask indicating the presence of bits) from previous frame into current
packet makes possible to partially decode previous each frame in case of
it's lost. Than more information
   the packet. TOC is delivered than less speech quality
degradation will be. Flags CL1 and CL2 specify how many classes from
previous frames current packet contain. E.g. CL1=3 (class C), it means
that packet contains bits from classes A, B and C of previous frame.
If CL1=6 (class F) then whole core layer only available if 'CR' value is included.

3.7. Redundancy Table of Contents

                    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                    | Pkt1 Entries| Pkt2 Entries|
                    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The redundancy TOC contains entries for redundancy frames from preceding
and pre-preceding packets. Each entry takes 1 bit like speech TOC entry
(3.3): not equal to 7
   (NO_DATA).

                               0
                                  +-+
                                  |E|
                                  +-+ 1 2 3
                              +-+-+-+-+
                              |E|E|E|E|
                              +-+-+-+-+
                              |<----->| <-- #(GR+1)

      o E (1 bit): frame Frame existence indicator. If set to 0, this The value of 0 indicates
      the
      speech data does not present for corresponding frame is absent.

    o For each preceding and pre-preceding packet the number of entries
      is equal frame. IP-MR
      encoder sets E flag to 0 for the grouping size of the current packet. E.g. maximum
      number periods of entries is 4*2 = 8.

    o If class specifier silence in DTX mode.
      Application MUST set this bit to 0 if the redundancy header is CL=0 (NO_DATA)
      then there frame is no entries for corresponding packet redundancy.

3.8. Redundancy known to be
      damaged.

3.5. Speech Payload Data

Redundancy

   Speech data of a payload contains redundancy information for one or
more speech frames or comfort noise (GR+1) compressed IP-MR frames that may be lost during
transition, as specified in the redundancy (20ms of data).
   Compressed frame have zero length if corresponding TOC flag is zero.

   The beginning of the payload. Actually
redundancy each compressed frame is aligned if 'A' bit is
   nonzero, while the most important part end of preceding frames representing
20 ms of speech. This data MAY be used for partial reconstruction of
lost frames. The amount of available redundancy speech payload is specified by CL flag
in redundancy header section (3.5). This flag SHOULD always aligned to a byte
   (8 bit) boundary:

   +- - -+------------+------------+------------+------------+
   | TOC | Frame1     | Frame2     | Frame3     | Frame4     |
   +- - -+------------+------------+------------+------------+   ALWAYS
         |<- aligned  |<- aligned  |<- aligned  |<- aligned  |<- ALIGNED

   Marked regions MUST be passed aligned (padded) only if 'A' bit is set to
decoder. '1'.

   The size of redundancy compressed frame structure is variable and can be obtained
using service function specified in Appendix A.

4. Payload Examples the following:

   |<---- sensitive classes ------>|<----- enchantment layers --------->|
   +-------------------------------+----+-----+------+- - - - - +-------+
   | L1 (Base Layer)               | L2 | L3  | L4   |          | LN    |
   +-------------------------------+----+-----+------+- - - - - +-------+
   |<- A few examples --->|<- B ->| ... |<- F ->|                                    |
   |<- BR rate ------------------->|                                    |
   |<- CR rate -------------------------------------------------------->|

   The Annex A of this document provides helper routine written in "C"
   which MUST be used to highlight extract sensitivity classes and enchantment
   layers bounds from the payload format follow.

4.1. compressed frame data.

3.6. Redundancy Payload Carrying a Single Frame Header

   The following diagram shows a standard IP-MR redundancy payload carrying a single presence is signaled by R bit of speech frame without redundancy:

   0                   1                   2                   3
   0 1 2
   payload header. Redundancy header composed of two fields of 3 4 5 6 7 8 9 bits
   each:

                               0 1 2 3 4 5 6 7 8 9
                              +-+-+-+-+-+-+
                              | CL1 | CL2 |
                              +-+-+-+-+-+-+

   Both of 'CL1' and 'CL2' fields specify the sensitivity classes
   available for preceding and pre-preceding packets correspondingly.

                    +-------+--------------------+
                    |  CL   | Redundancy classes |
                    |       |      available     |
                    +-------+--------------------+
                    |   0   |       NONE         |
                    |   1   |        A           |
                    |   2   |        A-B         |
                    |   3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |0|CR=1 |BR=0 |0|0|0 0|0|1|sp(0)   |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+        A-C         |
                    |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   4   |        A-D         |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                    |   5   |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+        A-E         |
                    |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   6   |        A-F         |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                    |                      sp(193)|P|
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In the payload the speech frame is not damaged at the IP origin (E=1),
the coding rate is 9.7 kbps(CR=1), the   7   |    (reserved)      |
                    +-------+--------------------+

   Receiver can reconstruct base rate is 7.8 kbps (BR=0), and
the DTX mode is off. There is no byte alignment (A=0) and no layer of preceding packets completely
   (CL=6) or partially (0<CL< 6) based on sensitivity classes delivered.
   Decoder MUST discard redundancy
(R=0). The encoded speech bits - s(0) to s(193) - are placed immediately
after TOC. Finally, one zero bit payload if CL is added at the end as padding equal to make 0 or 7.

   Note, the payload byte aligned.

4.2. Payload Carrying Multiple Frames with Redundancy

The following diagram shows a payload that contains three frames, one index of
them with no speech data. The coding rate is 7.7 kbps (CR=0), the base rate is 7.7 kbps (BR=0), and the DTX mode is on. The speech frames grouping parameter are
byte aligned (A=1), so 1 zero bit is added at the end of the header.
Besides the speech frames the payload contains six not
   transmitted for redundancy frames
(three per each delayed packet).

The first speech frame consists of bits sp1(0) to sp1(92). After payload. Application MUST assume that 3
bits 'BR'
   and 'GR' are added the same as for byte alignment. current packet.

3.7. Redundancy Payload Table of Contents

   The second frame does not contain any
speech information that redundancy TOC is represented in a bit mask indicating the payload by its TOC entry.
The third frame consists presence of bits sp3(0) to sp3(171).

The redundancy header follows after speech data. The one-packet-delayed
redundancy contains class A+B bits (CL1=2), and two-packet-delayed
redundancy contains class A bits (Cl2=1). The one-packet-delayed
redundancy contains three frames with 20, 39 and 35 bits respectively.

The first each
   frame of two-packet-delayed in the redundancy payload. Redundancy TOC is absent, it only available if
   'CL' value is
represented in its TOC entry, and two other frames have sizes 15 and 19
bits.

Note that all speech frames are padded with zero bits for byte
alignment. not equal to 0                   1                   2                   3 or 7.

                 0 1 2 3 4 5 ...
                +-+-+-+-+-+-+-+-+
                |E|E|E|E|E|E|E|E|
                +-+-+-+-+-+-+-+-+
                |       |<----->| pre-preceding payload #(GR+1)
                |<----->| preceding payload #(GR+1)

   o E (1 bit): Redundancy frame existence indicator. The value of 0
   indicates redundancy data does not present for corresponding frame.

3.8. Redundancy Payload Data

   IP-MR defines 6 classes ('A'-'F') of sensitivity to bit errors. Any
   damage of class 'A' bits cause significant reconstruction artifacts
   while the lost in class 'F' may be even not perceived by the
   listener. Note, only base layer in a bitstream is represented as a
   set of classes. Together, the set of sensitivity classes approach and
   redundancy allows IP-MR duplicate frames through the packets to
   improve robustness against packet loss.

   Redundancy data carries a number of sensitivity classes for preceding
   and pre-preceding packets as indicated by 'CL1' and 'CL2' fields of
   redundancy header. The sensitivity classes data is available
   individually for each frame only if corresponding 'E' bit of
   redundancy TOC is nonzero:

   +---+---+----+----|-----+-----+-----+-----+-----+-----+-----+
   |A-C|A-B|1000|1001|cl_A1|cl_B1|cl_C1|cl_A1|cl_B1|cl_A4|cl_B4|
   +---+---+----+----|-----+-----+-----+-----+-----+-----+-----+
   |<- CL >|<- TOC ->|<- preceding --->|<- pre-preceding ----->|

   Redundancy data only available if base (BR) and coding (CR) rates of
   preceding and pre-preceding packets are the same as for the current
   packet.

   Receiver MAY use redundancy data to compensate packet loss, note this
   case the 'CL' field MUST be also passed to decoder. Helper routine
   provided in Annex A MUST be used to extract sensitivity classes
   length for each frame. The following pseudo code describes the
   sequence of operations:

      int sensitivityBits[numOfRedundancyFrames][6];
      int redundancyBits [numOfRedundancyFrames];
      for(i = 0 ; i < numOfRedundancyFrames; i++) {
          GetFrameInfo(CR, BR, pRedundancyPayloadData, dummy,
                       sensitivityBits[i], dummy);
          redundancyBits[i] = 0;
          for(j = 0; j < CL[i]; j++ ) {
               redundancyBits[i] += sensitivityBits[i][j];
          }
          flushBits(pRedundancyPayloadData, redundancyBits[i]);
      }

4. Payload Examples

   This section provides detailed examples of IP-MR payload format.

4.1. Payload Carrying a Single Frame

   The following diagram shows typical IP-MR payload carrying a one
   (GR=0) non-aligned (A=0) speech frame without redundancy (R=0). The
   base layer is coded at 7.8 kbps (BR=0) while the coding rate is 9.7
   kbps (CR=1). The 'E' bit value of 1 signals that compressed frame
   bits s(0) - s(193) are present. There is a padding bit 'P' to
   maintain speech payload size alignment.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0|CR=1 |BR=0 |1|0|0 0|0|1|s(0)                                 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       s(193)|P|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

4.2. Payload Carrying Multiple Frames with Redundancy

   The following diagram shows a payload carrying 3 (GR=2) aligned (A=1)
   speech frames with redundancy (R=1). The TOC value of '101' indicates
   speech data presents for a first (bits sp1(0)-sp1(92)) and third
   frames (bits sp3(0)-sp3(171)). There is no enchantment layers because
   of base and coding rates are equal (BR=CR=0). Padding bit 'P' is
   inserted to maintain necessary alignment.

   The redundancy payload presents for both preceding and pre-preceding
   payloads (CL1 = A-B, CL2=A), but redundancy data only available for a
   5 (TOC='111011') of 6 (2*(GR+1)) frames. There are redundancy data of
   20, 39 and 35 bits for each three frames of preceding packet and 15
   and 19 bits for two frames of pre-preceding packet.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0|CR=0 |BR=0 |1|1|1 0|1|1 0 1|P|sp1(0)                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                  sp1(92)|P|P|P|sp3(0)                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                               sp3(171)|P|P|P|P|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |CL1=2|CL2=1|1 1 1|0 1 1|red1_1(0)                    red1_1(19)| 1|red1_1_AB(0)              red1_1_AB(19)|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |red1_2(0)
      |red1_2_AB(0)                                                   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |   red1_2(38)|red1_3(0)
      |red1_2_AB(38)|red1_3_AB(0)                                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         red1_3(34)|red2_2(0)          red2_2(14)|red2_3(0)      red1_3_AB(34)|red2_2_A(0)      red2_2_A(14)|red2_3_A(0)  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             red2_3(18)|P|P|P|P|           red2_3_A(18)|P|P|P|P|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5. Media Type Registration

This section describes the media types and names associated with this
payload format.

5.1. Registration of media subtype audio/ip-mr_v2.5

Type name: audio

Subtype name: ip-mr_v2.5

Required parameters: none

Optional parameters:
* ptime: Gives the length of time in milliseconds represented by the
media in a packet. Allowed values are: 20, 40, 60 and 80.

Encoding considerations: This media type is framed binary Congestion Control The general congestion control considerations for
   transporting RTP data applicable to IP-MR speech over RTP (see RFC
4288, Section 4.8).

Security considerations: See RFC 3550 RTP
   [RFC 3550]

Interoperability considerations: none

Published specification: RFC XXXX

Applications that use this media type: Real-time audio applications like
voice over IP and teleconference, and multi-media streaming.

Additional information: none

Person & email address to contact for further information:
Yury Morzeev
morzeev@spiritdsp.com

Intended usage: COMMON

Restrictions on usage: This media type depends on RTP framing, and hence
is only defined for transfer via any applicable RTP profile like AVP [RFC 3550].

Authors:
Sergey Ikonin <info@spiritdsp.com>

Change controller: IETF Audio/Video Transport working group delegated
from the IESG.

5.2. Mapping Media Type Parameters into SDP

The information carried in 3551]).
   However, the media type specification has multi-rate capability of IP-MR speech coding provides a specific
mapping to fields in the Session Description Protocol (SDP) [RFC 4566],
which is commonly used to describe RTP sessions. When SDP is used
   mechanism that may help to
specify sessions employing the IP-MR codec, control congestion, since the mapping is as follows:

    o bandwidth
   demand can be adjusted by selecting a different encoding mode.

   The media type ("audio") goes number of frames encapsulated in SDP "m=" as each RTP payload highly
   influences the media name.

    o The media subtype (payload format name) goes in SDP "a=rtpmap"
    as overall bandwidth of the encoding name. The RTP clock rate in "a=rtpmap" MUST 16000.

    o The parameter "ptime" goes stream due to header
   overhead constraints. Packetizing more frames in each RTP payload can
   reduce the SDP "a=ptime" attributes.

Any remaining parameters go in number of packets sent and hence the SDP "a=fmtp" attribute by copying
them directly overhead from
   IP/UDP/RTP headers, at the expense of increased delay.

   Due to scalability nature of IP_MR codec the media type parameter string transmission rate can be
   reduced at any transport stage to fit channel bandwidth. The minimal
   rate is specified by BR field of payload header and can be is low as a semicolon-
separated list
   7.7 kbps. It is up to application to keep balance between coding
   quality (high BR) and bitstream scalability (small BR). Because of parameter=value pairs.

Note that
   coding quality depends rather on coding rate(CR) than base rate (BR),
   it is not recommended to use high BR values for real-time
   communications.

   Application MAY utilize bitstream redundancy to combat packet loss.
   But the payload format (encoding) names are commonly shown in
upper case. Media subtypes are commonly shown in lower case. These
names are case-insensitive gateway is free to chose any option to reduce transmission
   rate - coding layer or redundancy bits can be dropped. Due to this
   fact it is not RECOMMENDED application to increase total bitrate when
   adding redundancy in both places. a response to packet loss.

6. Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [RFC 3550] [RFC3550] and in any applicable RTP profile. The main
security considerations for the RTP packet carrying the RTP payload
format defined within As this memo are confidentiality, integrity, and
source authenticity. Confidentiality is achieved by encryption of the
RTP payload. Integrity of the RTP packets is achieved through a suitable
cryptographic integrity protection mechanism. Such a cryptographic
system may also allow the authentication of the source of
   format transports encoded audio, the payload.

A suitable main security mechanism for this RTP payload format should
provide issues include
   confidentiality, integrity protection, and at least source data origin authentication capable of determining if an RTP packet is from a
member
   of the RTP session.

Note that the appropriate mechanism to provide security to RTP and
payloads following this memo may vary. It is dependent on the
application, the transport, and the signaling protocol employed.
Therefore, a single mechanism is audio itself.

   The payload format itself does not sufficient, although if suitable,
usage of the Secure Real-time Transport Protocol (SRTP) [RFC 3711] is
recommended.  Other mechanisms that may have any built-in security
   mechanisms.  Any suitable external mechanisms, such as SRTP [RFC-
   3711], MAY be used are IPsec [RFC 4301]
and Transport Layer Security (TLS) [RFC 5246] (RTP over TCP); other
alternatives may exist. used.

   This payload format does not exhibit any significant non-uniformity
   in the receiver side computational complexity for packet processing, processing
   and thus is unlikely to pose a denial-of-service threat due to the
   receipt of pathological data.

7. Congestion Control

The general congestion control considerations for transporting Payload Format Parameters

   This section describes the media types and names associated with this
   payload format.  Note, the IP-MR bitstream was frozen starting from
   internal release version of 2.5. Currently 'IP-MR' and 'IP-MR v2.5'
   terms are synonyms.

7.1. Media Type Registration

   Media Type name:     audio

   Media Subtype name:  ip-mr_v2.5

   Required parameters: none

   Optional parameters:
      These parameters apply to RTP transfer only.

      ptime: The media packet length in in milliseconds. Allowed values
      are: 20, 40, 60 and 80.

   Encoding considerations:
      This media type is framed binary data
apply; see (see RFC4288, Section 4.8).

   Security considerations:
      See section 6 of RFC XXXX (RFC editor please replace with this RFC
      number).

   Interoperability considerations:
      none

   Published specification:
      RFC XXXX (RFC editor please replace with this RFC number)

   Applications that use this media type:
      Real-time audio applications like voice over IP and
      teleconference, and multi-media streaming.

   Additional information:
      none

   Person & email address to contact for further information:
      Dmitry Yudin <yudin@spiritdsp.com>

   Intended usage:
      COMMON

   Restrictions on usage:
      This media type depends on RTP [RFC 3550] framing, and any applicable hence is only defined
      fortransfer via RTP profile like AVP [RFC 3551]. However, 3550].

   Authors:
      Sergey Ikonin <info@spiritdsp.com> Dmitry Yudin
      <yudin@spiritdsp.com>

   Change controller:
      IETF Audio/Video Transport working group delegated from the multi-rate capability of IP-MR speech coding
provides IESG.

7.2. Mapping Media Type Parameters into SDP

   The information carried in the media type specification has a mechanism that may help
   specific mapping to control congestion, since fields in the
bandwidth demand can be adjusted by selecting a different encoding mode. Session Description Protocol (SDP)
   [RFC 4566], which is commonly used to describe RTP sessions. When SDP
   is used to specify sessions employing the IP-MR codec, the mapping is
   as follows:
      o The number of frames encapsulated media type ("audio") goes in each RTP payload highly
influences SDP "m=" as the overall bandwidth of media name.

      o The media subtype (payload format name) goes in SDP "a=rtpmap"
      as the encoding name. The RTP stream due to header
overhead constraints. Packetizing more frames clock rate in "a=rtpmap" MUST 16000.

      o The parameter "ptime" goes in each RTP payload
can reduce the number of packets sent and hence SDP "a=ptime" attributes.

   Any remaining parameters go in the overhead SDP "a=fmtp" attribute by copying
   them directly from
IP/UDP/RTP headers, at the expense of increased delay.

If in-band redundancy scheme is used to protect against packet loss, the amount media type parameter string as a semicolon-
   separated list of introduced redundancy will need to be regulated so parameter=value pairs.

   Note that the use of redundancy itself does not cause a congestion problem. In
other words, a sender SHALL NOT increase the total bitrate when adding
redundancy payload format (encoding) names are commonly shown in response to packet loss, and needs instead to adjust it
down
   upper case. Media subtypes are commonly shown in accordance to the congestion control algorithm being run. Thus,
when adding redundancy, the media bitrate will need to be reduced to
provide room for the redundancy. lower case. These
   names are case-insensitive in both places.

8. IANA Considerations

   One media type has been defined and needs registration in the media
   types registry.

9. Normative References

   [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC 3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC 3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
              Video Conferences with Minimal Control", STD 65, RFC 3551,
              July 2003.

   [RFC 4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
              Description Protocol", RFC 4566, July 2006.

   [RFC 3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E.,
              Norrman, K., "The Secure Real-Time Transport Protocol
              (SRTP)", RFC 3711, March 2004.

   [RFC 5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

   [RFC 4301] Kent, S. and K. Seo, "Security Architecture for the
              Internet Protocol", RFC 4301, December 2005.

10. Author(s) Information:

Sergey Ikonin
email: info@spiritdsp.com

Russia 109004
Building 27, A. Solzhenitsyna street
Tel: +7 495 661-2178
Fax: +7 495 912-6786

11. Disclaimer

   This document may contain material from IETF Documents or IETF
   Contributions published or made publicly available before November
   10, 2008. The person(s) controlling the copyright in some of this
   material may not have granted the IETF Trust the right to allow
   modifications of such material outside the IETF Standards Process.
   Without obtaining an adequate license from the person(s) controlling
   the copyright in such materials, this document may not be modified
   outside the IETF Standards Process, and derivative works of it may
   not be created outside the IETF Standards Process, except to format
   it for publication as an RFC or to translate it into languages other
   than English.

12.

11. Legal Terms

   All IETF Documents and the information contained therein are provided
   on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
   IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
   WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
   WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
   ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
   FOR A PARTICULAR PURPOSE.

   The IETF Trust takes no position regarding the validity or scope of
   any Intellectual Property Rights or other rights that might be
   claimed to pertain to the implementation or use of the technology
   described in any IETF Document or the extent to which any license
   under such rights might or might not be available; nor does it
   represent that it has made any independent effort to identify any
   such rights.

   Copies of Intellectual Property disclosures made to the IETF
   Secretariat and any assurances of licenses to be made available, or
   the result of an attempt made to obtain a general license or
   permission for the use of such proprietary rights by implementers or
   users of this specification can be obtained from the IETF on-line IPR
   repository at http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   any standard or specification contained in an IETF Document. Please
   address the information to the IETF at ietf-ipr@ietf.org.

   The definitive version of an IETF Document is that published by, or
   under the auspices of, the IETF. Versions of IETF Documents that are
   published by third parties, including those that are translated into
   other languages, should not be considered to be definitive versions
   of IETF Documents. The definitive version of these Legal Provisions
   is that published by, or under the auspices of, the IETF. Versions of
   these Legal Provisions that are published by third parties, including
   those that are translated into other languages, should not be
   considered to be definitive versions of these Legal Provisions.

   For the avoidance of doubt, each Contributor to the IETF Standards
   Process licenses each Contribution that he or she makes as part of
   the IETF Standards Process to the IETF Trust pursuant to the
   provisions of RFC 5378. No language to the contrary, or terms,
   conditions or rights that differ from or are inconsistent with the
   rights and licenses granted under RFC 5378, shall have any effect and
   shall be null and void, whether published or posted by such
   Contributor, or included with or in such Contribution.

12. Authors' Addresses

   SPIRIT DSP
   Building 27, A. Solzhenitsyna street
   109004, Moscow, RUSSIA

   Tel: +7 495 661-2178
   Fax: +7 495 912-6786
   EMail: info@spiritdsp.com

APPENDIX A. RETRIEVING FRAME INFORMATION

   This appendix contains the c-code for implementation of frame parsing
   function. This function extracts information about coded frame
   including frame size, number of layers, size of each layer and size
   of perceptual sensitive classes.

A.1. get_frame_info.c

   /*
     Copyright (c) <insert year> 2010
     IETF Trust and the persons identified as authors of the code.
     All rights reserved.

     Redistribution and use in source and binary forms, with or without
     modification, are permitted provided that the following conditions
     are met:
     - Redistributions of source code must retain the above copyright notice,
       this list of conditions and the following disclaimer.
     - Redistributions in binary form must reproduce the above copyright
       notice, this list of conditions and the following disclaimer in the
       documentation and/or other materials provided with the distribution.
     - Neither the name of Internet Society, IETF or IETF Trust, nor the names
       of specific contributors, may be used to endorse or promote products
       derived from this software without specific prior written permission.

   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
   ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
   SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
   CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   POSSIBILITY OF SUCH DAMAGE.

   */

   /******************************************************************

     get_frame_info.c

     Retrieving frame information for IP-MR Speech Codec

   ******************************************************************/
   #define RATES_NUM       6   // number of codec rates
   #define SENSE_CLASSES   6   // number of sensitivity classes (A..F)

   // frame types
   #define FT_SPEECH       0   // active speech
   #define FT_DTX_SID      1   // silence insertion descriptor

   // get specified bit from coded data
   int GetBit(unsigned char *data, int curBit)
   {
     return ((data[curBit >> 3] >> (curBit % 8)) & 1);
   }

   // retrieve frame information
   int GetFrameInfo(           // o: frame size in bits
     short rate,               // i: encoding rate (0..5)
     short base_rate,          // i: base (core) layer rate,
                               //    if base_rate > rate, then assumed
                               //    that base_rate = rate.
     unsigned char *pCoded,    // i: coded bit frame
     short pLayerBits          // o: number of bits in layers
         [RATES_NUM],
     short pSenseBits          // o: number of bits in sensitivity classes
         [SENSE_CLASSES],
     short *nLayers            // o: number of layers
   )
   {
     static const short Bits_1[4]    = {0, 9, 9, 15};
     static const short Bits_2[16]   = { 43,50,36,31,46,48,40,44,47,43,44,
                                         45,43,44,47,36};
     static const short Bits_3[2][6] = {{13, 11, 23, 33, 36, 31},
                                        {25,  0, 23, 32, 36, 31},};

     int FrType;
     int i,nBits;

     if (rate < 0 || rate > 5) {
       return 0; // incorrect stream
     }

     for(i = 0; i < SENSE_CLASSES; i++) {
       pSenseBits[i] = 0;
     }

     nBits = 0;
     // extract frame type bit if required
     FrType = GetBit(pCoded, nBits++) ? FT_SPEECH : FT_DTX_SID;
     {
       int cw_0;
       int b[14];

       // extract meaning bits
       for(i = 0 ; i < 14; i++) {
           b[i] = GetBit(pCoded, nBits++);
       }

       // parse
       if(FrType == FT_DTX_SID) {
         cw_0 = (b[0]<<0)|(b[1]<<1)|(b[2]<<2)|(b[3]<<3);
         rate = 0;
         pSenseBits[0] = 10 + Bits_2[cw_0];
       } else {

         int i, idx;
         int nFlag_1, nFlag_2, cw_1, cw_2;

         nFlag_1 = b[0] + b[2] + b[4] + b[6];
         cw_1 = (cw_1 << 1) | b[0];
         cw_1 = (cw_1 << 1) | b[2];
         cw_1 = (cw_1 << 1) | b[4];
         cw_1 = (cw_1 << 1) | b[6];

         nFlag_2 = b[1] + b[3] + b[5] + b[7];
         cw_2 = (cw_2 << 1) | b[1];
         cw_2 = (cw_2 << 1) | b[3];
         cw_2 = (cw_2 << 1) | b[5];
         cw_2 = (cw_2 << 1) | b[7];

         cw_0 = (b[10]<<0)|(b[11]<<1)|(b[12]<<2)|(b[13]<<3);
         if (base_rate < 0)    base_rate = 0;
         if (base_rate > rate) base_rate = rate;
         idx = base_rate == 0 ? 0 : 1;

         pSenseBits[0] = 15+Bits_2[cw_0];
         pSenseBits[1] = Bits_1[(cw_1 >> 0)&0x3] + Bits_1[(cw_1>>2)&0x3];
         pSenseBits[2] = nFlag_1*5;
         pSenseBits[3] = nFlag_2*30;
         pSenseBits[5] = (4 - nFlag_2)*(Bits_3[idx][0]);

         for (i = 1; i < rate+1; i++) {
           pLayerBits[i] = 4*(Bits_3[idx][i]);
         }
       }

       pLayerBits[0] = 0;
       for (i = 0; i < SENSE_CLASSES; i++) {
           pLayerBits[0] += pSenseBits[i];
       }

       *nLayers = rate+1;
     }

     {
       // count total frame size
       int payloadBitCount = 0;
       for (i = 0; i < *nLayers; i++) {
         payloadBitCount += pLayerBits[i];
       }
       return payloadBitCount;
     }
   }

Authors' Addresses

   SPIRIT DSP
   Building 27, A. Solzhenitsyna street
   109004, Moscow, RUSSIA

   Tel: +7 495 661-2178
   Fax: +7 495 912-6786
   EMail: info@spiritdsp.com