draft-ietf-avt-evrc-smv-02.txt   rfc3558.txt 
Internet Draft Adam H. Li Network Working Group A. Li
draft-ietf-avt-evrc-smv-02.txt UCLA Request for Comments: 3558 UCLA
June 7, 2002 Editor Category: Standards Track July 2003
Expires: December 7, 2002
RTP Payload Format for Enhanced Variable Rate Codecs (EVRC) and
Selectable Mode Vocoders (SMV)
STATUS OF THIS MEMO
This document is an Internet-Draft and is in full conformance with RTP Payload Format for Enhanced Variable Rate Codecs (EVRC)
all provisions of Section 10 of RFC 2026. and Selectable Mode Vocoders (SMV)
Internet-Drafts are working documents of the Internet Engineering Status of this Memo
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months This document specifies an Internet standards track protocol for the
and may be updated, replaced, or obsoleted by other documents at any Internet community, and requests discussion and suggestions for
time. It is inappropriate to use Internet- Drafts as reference improvements. Please refer to the current edition of the "Internet
material or to cite them other than as work in progress. Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
The list of current Internet-Drafts can be accessed at Copyright Notice
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at Copyright (C) The Internet Society (2003). All Rights Reserved.
http://www.ietf.org/shadow.html.
ABSTRACT Abstract
This document describes the RTP payload format for Enhanced Variable This document describes the RTP payload format for Enhanced Variable
Rate Codec (EVRC) Speech and Selectable Mode Vocoder (SMV) Speech. Rate Codec (EVRC) Speech and Selectable Mode Vocoder (SMV) Speech.
Two sub-formats are specified for different application scenarios. A Two sub-formats are specified for different application scenarios. A
bundled/interleaved format is included to reduce the effect of packet bundled/interleaved format is included to reduce the effect of packet
loss on speech quality and amortize the overhead of the RTP header loss on speech quality and amortize the overhead of the RTP header
over more than one speech frame. A non-bundled format is also over more than one speech frame. A non-bundled format is also
supported for conversational applications. supported for conversational applications.
Table of Contents Table of Contents
1. Introduction ................................................... 2 1. Introduction ................................................... 2
2. Background ..................................................... 2 2. Background ..................................................... 2
3. The Codecs Supported ........................................... 3 3. The Codecs Supported ........................................... 3
3.1. EVRC ......................................................... 3 3.1. EVRC ...................................................... 3
3.2. SMV .......................................................... 3 3.2. SMV ....................................................... 3
3.3. Other Frame-Based Vocoders ................................... 4 3.3. Other Frame-Based Vocoders ................................ 4
4. RTP/Vocoder Packet Format ...................................... 4 4. RTP/Vocoder Packet Format ...................................... 4
4.1. Interleaved/Bundled Packet Format ............................ 4 4.1. Interleaved/Bundled Packet Format ......................... 5
4.2. Header-Free Packet Format .................................... 6 4.2. Header-Free Packet Format ................................. 6
4.3. Determining the Format of Packets ............................ 6 4.3. Determining the Format of Packets ......................... 7
5. Packet Table of Contents Entries and Codec Data Frame Format ... 7 5. Packet Table of Contents Entries and Codec Data Frame Format ... 7
5.1. Packet Table of Contents entries ............................. 7 5.1. Packet Table of Contents entries .......................... 7
5.2. Codec Data Frames ............................................ 7 5.2. Codec Data Frames ......................................... 8
6. Interleaving Codec Data Frames ................................. 8 6. Interleaving Codec Data Frames ................................. 9
7. Bundling Codec Data Frames .................................... 11 7. Bundling Codec Data Frames .................................... 12
8. Handling Missing Codec Data Frames ............................ 11 8. Handling Missing Codec Data Frames ............................ 12
9. Implementation Issues ......................................... 11 9. Implementation Issues ......................................... 12
9.1. Interleaving Length ......................................... 11 9.1. Interleaving Length .......................................12
9.2. Validation of Received Packets .............................. 12 9.2. Validation of Received Packets ............................13
9.3. Processing the Late Packets ................................. 12 9.3. Processing the Late Packets ...............................13
10. Mode Request ................................................. 12 10. Mode Request ................................................. 13
11. Storage Format ............................................... 13 11. Storage Format ............................................... 14
12. IANA Considerations .......................................... 14 12. IANA Considerations .......................................... 15
12.1. Registration of Media Type EVRC ............................ 14 12.1. Registration of Media Type EVRC ..........................15
12.2. Registration of Media Type EVRC0 ........................... 15 12.2. Registration of Media Type EVRC0 .........................16
12.3. Registration of Media Type SMV ............................. 16 12.3. Registration of Media Type SMV ...........................17
12.4. Registration of Media Type SMV0 ............................ 17 12.4. Registration of Media Type SMV0 ..........................18
13. Mapping to SDP Parameters .................................... 18 13. Mapping to SDP Parameters .................................... 19
14. Security Considerations ...................................... 18 14. Security Considerations ...................................... 20
15. Adding Support of Other Frame-Based Vocoders ................. 19 15. Adding Support of Other Frame-Based Vocoders ................. 20
16. Acknowledgements ............................................. 19 16. Acknowledgements ............................................. 21
17. References ................................................... 20 17. References ................................................... 21
18. Authors' Address ............................................. 20 17.1 Normative ................................................ 21
17.2 Informative .............................................. 22
18. Author's Address ............................................. 22
19. Full Copyright Statement ..................................... 23
1. Introduction 1. Introduction
This document describes how speech compressed with EVRC [1] or SMV This document describes how speech compressed with EVRC [1] or SMV
[2] may be formatted for use as an RTP payload type. The format is [2] may be formatted for use as an RTP payload type. The format is
also extensible to other codecs that generate a similar set of frame also extensible to other codecs that generate a similar set of frame
types. Two methods are provided to packetize the codec data frames types. Two methods are provided to packetize the codec data frames
into RTP packets: an interleaved/bundled format and a zero-header into RTP packets: an interleaved/bundled format and a zero-header
format. The sender may choose the best format for each application format. The sender may choose the best format for each application
scenario, based on network conditions, bandwidth availability, delay scenario, based on network conditions, bandwidth availability, delay
skipping to change at page 3, line 23 skipping to change at page 3, line 20
This can simplify the protocol for transporting vocoder data frames This can simplify the protocol for transporting vocoder data frames
through RTP and reduce the complexity of implementations. through RTP and reduce the complexity of implementations.
3. The Codecs Supported 3. The Codecs Supported
3.1. EVRC 3.1. EVRC
The Enhanced Variable Rate Codec (EVRC) [1] compresses each 20 The Enhanced Variable Rate Codec (EVRC) [1] compresses each 20
milliseconds of 8000 Hz, 16-bit sampled speech input into output milliseconds of 8000 Hz, 16-bit sampled speech input into output
frames in one of the three different sizes: Rate 1 (171 bits), Rate frames in one of the three different sizes: Rate 1 (171 bits), Rate
1/2 (80 bits), or Rate 1/8 (16 bits). In addition, there are two zero 1/2 (80 bits), or Rate 1/8 (16 bits). In addition, there are two
bit codec frame types: null frames and erasure frames. Null frames zero bit codec frame types: null frames and erasure frames. Null
are produced as a result of the vocoder running at rate 0. Null frames are produced as a result of the vocoder running at rate 0.
frames are zero bits long and are normally not transmitted. Erasure Null frames are zero bits long and are normally not transmitted.
frames are the frames substituted by the receiver to the codec for Erasure frames are the frames substituted by the receiver to the
the lost or damaged frames. Erasure frames are also zero bits long codec for the lost or damaged frames. Erasure frames are also zero
and are normally not transmitted. bits long and are normally not transmitted.
The codec chooses the output frame rate based on analysis of the The codec chooses the output frame rate based on analysis of the
input speech and the current operating mode (either normal or one of input speech and the current operating mode (either normal or one of
several reduced rate modes). For typical speech patterns, this several reduced rate modes). For typical speech patterns, this
results in an average output of 4.2 kilobits/second for normal mode results in an average output of 4.2 kilobits/second for normal mode
and a lower average output for reduced rate modes. and a lower average output for reduced rate modes.
3.2. SMV 3.2. SMV
The Selectable Mode Vocoder (SMV) [2] compresses each 20 milliseconds The Selectable Mode Vocoder (SMV) [2] compresses each 20 milliseconds
of 8000 Hz, 16-bit sampled speech input into output frames of one of of 8000 Hz, 16-bit sampled speech input into output frames of one of
the four different sizes: Rate 1 (171 bits), Rate 1/2 (80 bits), Rate the four different sizes: Rate 1 (171 bits), Rate 1/2 (80 bits), Rate
1/4 (40 bits), or Rate 1/8 (16 bits). In addition, there are two zero 1/4 (40 bits), or Rate 1/8 (16 bits). In addition, there are two
bit codec frame types: null frames and erasure frames. Null frames zero bit codec frame types: null frames and erasure frames. Null
are produced as a result of the vocoder running at rate 0. Null frames are produced as a result of the vocoder running at rate 0.
frames are zero bits long and are normally not transmitted. Erasure Null frames are zero bits long and are normally not transmitted.
frames are the frames substituted by the receiver to the codec for Erasure frames are the frames substituted by the receiver to the
the lost or damaged frames. Erasure frames are also zero bits long codec for the lost or damaged frames. Erasure frames are also zero
and are normally not transmitted. bits long and are normally not transmitted.
The SMV codec can operate in four modes. Each mode may produce frames The SMV codec can operate in six modes. Each mode may produce frames
of any of the rates (full rate to 1/8 rate) for varying percentages of any of the rates (full rate to 1/8 rate) for varying percentages
of time, based on the characteristics of the speech samples and the of time, based on the characteristics of the speech samples and the
selected mode. The SMV mode can change on a frame-by-frame basis. The selected mode. The SMV mode can change on a
SMV codec does not need additional information other than the codec frame-by-frame basis. The SMV codec does not need additional
data frames to correctly decode the data of various modes; therefore, information other than the codec data frames to correctly decode the
the mode of the encoder does not need to be transmitted with the data of various modes; therefore, the mode of the encoder does not
encoded frames. need to be transmitted with the encoded frames.
The SMV codec chooses the output frame rate based on analysis of the The SMV codec chooses the output frame rate based on analysis of the
input speech and the current operating mode. For typical speech input speech and the current operating mode. For typical speech
patterns, this results in an average output of 4.2 kilobits/second patterns, this results in an average output of 4.2 kilobits/second
for Mode 0 in two way conversation (approximately 50% active speech for Mode 0 in two way conversation (approximately 50% active speech
time and 50% in eighth rate while listening) and lower for other time and 50% in eighth rate while listening) and lower for other
reduced rate modes. SMV is more bandwidth efficient than EVRC. EVRC reduced rate modes. SMV is more bandwidth efficient than EVRC. EVRC
is equivalent in performance to SMV mode 1. is equivalent in performance to SMV mode 1.
3.3. Other Frame-Based Vocoders 3.3. Other Frame-Based Vocoders
skipping to change at page 5, line 4 skipping to change at page 5, line 21
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header [4] | | RTP Header [4] |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|R|R| LLL | NNN | MMM | Count | TOC | ... | TOC |padding| |R|R| LLL | NNN | MMM | Count | TOC | ... | TOC |padding|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| one or more codec data frames, one per TOC entry | | one or more codec data frames, one per TOC entry |
| .... | | .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The RTP header has the expected values as described in the RTP The RTP header has the expected values as described in the RTP
specification [4]. The RTP timestamp is in 1/8000 of a second units specification [4]. The RTP timestamp is in 1/8000 of a second units
for EVRC and SMV. For any other vocoders that use this packet format, for EVRC and SMV. For any other vocoders that use this packet
the timestamp unit needs to be defined explicitly. The M bit should format, the timestamp unit needs to be defined explicitly. The M bit
be set as specified in the applicable RTP profile, for example, RFC should be set as specified in the applicable RTP profile, for
1890 [5]. Note that RFC 1890 [5] specifies that if the sender does example, RFC 3551 [5]. Note that RFC 3551 [5] specifies that if the
not suppress silence, the M bit will always be zero. When multiple sender does not suppress silence, the M bit will always be zero.
codec data frames are present in a single RTP packet, the timestamp When multiple codec data frames are present in a single RTP packet,
is that of the oldest data represented in the RTP packet. The the timestamp is that of the oldest data represented in the RTP
assignment of an RTP payload type for this packet format is outside packet. The assignment of an RTP payload type for this packet format
the scope of this document; it is specified by the RTP profile under is outside the scope of this document; it is specified by the RTP
which this payload format is used. profile under which this payload format is used.
The first octet of a Interleaved/Bundled format packet is the The first octet of a Interleaved/Bundled format packet is the
Interleave Octet. The second octet contains the Mode Request and Interleave Octet. The second octet contains the Mode Request and
Frame Count fields. The Table of Contents (ToC) field then follows. Frame Count fields. The Table of Contents (ToC) field then follows.
The fields are specified as follows: The fields are specified as follows:
Reserved (RR): 2 bits Reserved (RR): 2 bits
Reserved bits. MUST be set to zero by sender, SHOULD be ignored Reserved bits. MUST be set to zero by sender, SHOULD be ignored
by receiver. by receiver.
skipping to change at page 5, line 38 skipping to change at page 6, line 12
bundling, a special case of interleaving. See Section 6 and bundling, a special case of interleaving. See Section 6 and
Section 7 for more detailed discussion. Section 7 for more detailed discussion.
Interleave Index (NNN): 3 bits Interleave Index (NNN): 3 bits
Indicates the index within an interleave group. MUST have a value Indicates the index within an interleave group. MUST have a value
less than or equal to the value of LLL. Values of NNN greater less than or equal to the value of LLL. Values of NNN greater
than the value of LLL are invalid. Packet with invalid NNN values than the value of LLL are invalid. Packet with invalid NNN values
SHOULD be ignored by the receiver. SHOULD be ignored by the receiver.
Mode Request (MMM): 3 bits Mode Request (MMM): 3 bits
The Mode Request field is used to signal Mode Request The Mode Request field is used to signal Mode Request information.
information. See Section 10 for details. See Section 10 for details.
Frame Count (Count): 5 bits Frame Count (Count): 5 bits
The number of ToC fields (and vocoder frames) present in the The number of ToC fields (and vocoder frames) present in the
packet is the value of the frame count field plus one. A value of packet is the value of the frame count field plus one. A value of
zero indicates that the packet contains one ToC field, while a zero indicates that the packet contains one ToC field, while a
value of 31 indicates that the packet contains 32 ToC fields. value of 31 indicates that the packet contains 32 ToC fields.
Padding (padding): 0 or 4 bits Padding (padding): 0 or 4 bits
This padding ensures that codec data frames start on an octet This padding ensures that codec data frames start on an octet
boundary. When the frame count is odd, the sender MUST add 4 bits boundary. When the frame count is odd, the sender MUST add 4 bits
skipping to change at page 6, line 21 skipping to change at page 6, line 45
packet using interleaving or bundling as described in Section 6 and packet using interleaving or bundling as described in Section 6 and
Section 7. Section 7.
4.2. Header-Free Packet Format 4.2. Header-Free Packet Format
The Header-Free Packet Format is designed for maximum bandwidth The Header-Free Packet Format is designed for maximum bandwidth
efficiency and low latency. Only one codec data frame can be sent in efficiency and low latency. Only one codec data frame can be sent in
each Header-Free format packet. None of the payload header fields each Header-Free format packet. None of the payload header fields
(LLL, NNN, MMM, Count) nor ToC entries are present. The codec rate (LLL, NNN, MMM, Count) nor ToC entries are present. The codec rate
for the data frame can be determined from the length of the codec for the data frame can be determined from the length of the codec
data frame, since there is only one codec data frame in each Header- data frame, since there is only one codec data frame in each
Free packet. Header-Free packet.
Use of the RTP header fields for Header-Free RTP/Vocoder Packet Use of the RTP header fields for Header-Free RTP/Vocoder Packet
Format is the same as described in Section 4.1 for Format is the same as described in Section 4.1 for
Interleaved/Bundled RTP/Vocoder Packet Format. The detailed format of Interleaved/Bundled RTP/Vocoder Packet Format. The detailed format
the codec data frame is specified in Section 5. of the codec data frame is specified in Section 5.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header [4] | | RTP Header [4] |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| | | |
+ ONLY one codec data frame +-+-+-+-+-+-+-+-+ + ONLY one codec data frame +-+-+-+-+-+-+-+-+
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4.3. Determining the Format of Packets 4.3. Determining the Format of Packets
All receivers SHOULD be able to process both packet formats. The All receivers SHOULD be able to process both packet formats. The
sender MAY choose to use one or both packet formats. sender MAY choose to use one or both packet formats.
A receiver MUST have prior knowledge of the packet format to A receiver MUST have prior knowledge of the packet format to
correctly decode the RTP packets. correctly decode the RTP packets. When packets of both formats are
When packets of both formats are used within the same session, used within the same session, different RTP payload type values MUST
different RTP payload type values MUST be used for each format to be used for each format to distinguish the packet formats. The
distinguish the packet formats. The association of payload type association of payload type number with the packet format is done
number with the packet format is done out-of-band, for example by SDP out-of-band, for example by SDP during the setup of a session.
during the setup of a session.
5. Packet Table of Contents Entries and Codec Data Frame Format 5. Packet Table of Contents Entries and Codec Data Frame Format
5.1. Packet Table of Contents entries 5.1. Packet Table of Contents entries
Each codec data frame in a Interleaved/Bundled packet has a Each codec data frame in a Interleaved/Bundled packet has a
corresponding Table of Contents (ToC) entry. The ToC entry indicates corresponding Table of Contents (ToC) entry. The ToC entry indicates
the rate of the codec frame. (Header-Free packets MUST NOT have a ToC the rate of the codec frame. (Header-Free packets MUST NOT have a
field.) ToC field.)
Each ToC entry is occupies four bits. The format of the bits is Each ToC entry is occupies four bits. The format of the bits is
indicated below: indicated below:
0 1 2 3 0 1 2 3
+-+-+-+-+ +-+-+-+-+
|fr type| |fr type|
+-+-+-+-+ +-+-+-+-+
Frame Type: 4 bits Frame Type: 4 bits
skipping to change at page 7, line 40 skipping to change at page 8, line 19
--------------------------------------------------------- ---------------------------------------------------------
0 Blank 0 (0 bit) 0 Blank 0 (0 bit)
1 1/8 2 (16 bits) 1 1/8 2 (16 bits)
2 1/4 5 (40 bits; not valid for EVRC) 2 1/4 5 (40 bits; not valid for EVRC)
3 1/2 10 (80 bits) 3 1/2 10 (80 bits)
4 1 22 (171 bits; 5 padded at end with zeros) 4 1 22 (171 bits; 5 padded at end with zeros)
5 Erasure 0 (SHOULD NOT be transmitted by sender) 5 Erasure 0 (SHOULD NOT be transmitted by sender)
All values not listed in the above table MUST be considered reserved. All values not listed in the above table MUST be considered reserved.
A ToC entry with a reserved Frame Type value SHOULD be considered A ToC entry with a reserved Frame Type value SHOULD be considered
invalid. Note that the EVRC codec does not have 1/4 rate frames, thus invalid. Note that the EVRC codec does not have 1/4 rate frames,
frame type value 2 MUST be considered a reserved value when the EVRC thus frame type value 2 MUST be considered a reserved value when the
codec is in use. EVRC codec is in use.
Other vocoders that use this packet format need to specify their own Other vocoders that use this packet format need to specify their own
table of frame types and corresponding codec data frames. table of frame types and corresponding codec data frames.
5.2. Codec Data Frames 5.2. Codec Data Frames
The output of the vocoder MUST be converted into codec data frames The output of the vocoder MUST be converted into codec data frames
for inclusion in the RTP payload. The conversions for EVRC and SMV for inclusion in the RTP payload. The conversions for EVRC and SMV
codecs are specified below. (Note: Because the EVRC codec does not codecs are specified below. (Note: Because the EVRC codec does not
have Rate 1/4 frames, the specifications of 1/4 frames does not apply have Rate 1/4 frames, the specifications of 1/4 frames does not apply
skipping to change at page 10, line 6 skipping to change at page 10, line 45
the maximum acceptable bundling value B) they can handle in a single the maximum acceptable bundling value B) they can handle in a single
RTP packet using the OPTIONAL maxptime RTP mode parameter identified RTP packet using the OPTIONAL maxptime RTP mode parameter identified
in Section 12. in Section 12.
Receivers MAY signal the maximum interleave length (i.e., the maximum Receivers MAY signal the maximum interleave length (i.e., the maximum
acceptable LLL value in the Interleaving Octet) they will accept acceptable LLL value in the Interleaving Octet) they will accept
using the OPTIONAL maxinterleave RTP mode parameter identified in using the OPTIONAL maxinterleave RTP mode parameter identified in
Section 12. Section 12.
The parameters maxptime and maxinterleave are exchanged at the The parameters maxptime and maxinterleave are exchanged at the
initial setup of the session. In one-to-one sessions, the sender MUST initial setup of the session. In one-to-one sessions, the sender
respect these values set be the receiver, and MUST NOT MUST respect these values set be the receiver, and MUST NOT
interleave/bundle more packets than what the receiver signals that it interleave/bundle more packets than what the receiver signals that it
can handle. This ensures that the receiver can allocate a known can handle. This ensures that the receiver can allocate a known
amount of buffer space that will be sufficient for all amount of buffer space that will be sufficient for all
interleaving/bundling used in that session. During the session, the interleaving/bundling used in that session. During the session, the
sender may decrease the bundling value or interleaving length (so sender may decrease the bundling value or interleaving length (so
that less buffer space is required at the receiver), but never exceed that less buffer space is required at the receiver), but never exceed
the maximum value set by the receiver. This prevents the situation the maximum value set by the receiver. This prevents the situation
where a receiver needs to allocate more buffer space in the middle of where a receiver needs to allocate more buffer space in the middle of
a session but is unable to do so. a session but is unable to do so.
skipping to change at page 10, line 34 skipping to change at page 11, line 24
than will fit in the MTU of the underlying network. than will fit in the MTU of the underlying network.
o Once beginning a session with a given maximum interleaving value o Once beginning a session with a given maximum interleaving value
set by maxinterleave in Section 12, MUST NOT increase the set by maxinterleave in Section 12, MUST NOT increase the
interleaving value (LLL) to exceed the maximum interleaving value interleaving value (LLL) to exceed the maximum interleaving value
that is signaled. that is signaled.
o MAY change the interleaving value, but MUST do so only between o MAY change the interleaving value, but MUST do so only between
interleave groups. interleave groups.
o Silence suppression MUST only be used between interleave groups. A o Silence suppression MUST only be used between interleave groups.
ToC with Frame Type 0 (Blank Frame, Section 5.1) MUST be used A ToC with Frame Type 0 (Blank Frame, Section 5.1) MUST be used
within interleaving groups if the codec outputs a blank frame. within interleaving groups if the codec outputs a blank frame.
The M bit in the RTP header is not set for these blank frames, as The M bit in the RTP header is not set for these blank frames, as
the stream is continuous in time. Because there is only one time the stream is continuous in time. Because there is only one time
stamp for each RTP packet, silence suppression used within an stamp for each RTP packet, silence suppression used within an
interleave group would cause ambiguities when reconstructing the interleave group would cause ambiguities when reconstructing the
speech at the receiver side, and thus is prohibited. speech at the receiver side, and thus is prohibited.
Given an RTP packet with sequence number S, interleave length (field Given an RTP packet with sequence number S, interleave length (field
LLL) L, interleave index value (field NNN) N, and bundling value B, LLL) L, interleave index value (field NNN) N, and bundling value B,
the interleave group consists of this RTP packet and other RTP the interleave group consists of this RTP packet and other RTP
skipping to change at page 11, line 30 skipping to change at page 12, line 30
indicated by maxptime (see Section 12) if it is signaled. indicated by maxptime (see Section 12) if it is signaled.
o SHOULD NOT bundle more codec data frames in a single RTP packet o SHOULD NOT bundle more codec data frames in a single RTP packet
than will fit in the MTU of the underlying network. than will fit in the MTU of the underlying network.
8. Handling Missing Codec Data Frames 8. Handling Missing Codec Data Frames
The vocoders covered by this payload format support erasure frames as The vocoders covered by this payload format support erasure frames as
an indication when frames are not available. The erasure frames are an indication when frames are not available. The erasure frames are
normally used internally by a receiver to advance the state of the normally used internally by a receiver to advance the state of the
voice decoder by exactly one frame time for each missing frame. Using voice decoder by exactly one frame time for each missing frame.
the information from packet sequence number, time stamp, and the M Using the information from packet sequence number, time stamp, and
bit, the receiver can detect missing codec data frames from RTP the M bit, the receiver can detect missing codec data frames from RTP
packet loss and/or silence suppression, and generate corresponding packet loss and/or silence suppression, and generate corresponding
erasure frames. Erasure frames MUST also be used in storage format to erasure frames. Erasure frames MUST also be used in storage format
record missing frames. to record missing frames.
9. Implementation Issues 9. Implementation Issues
9.1. Interleaving Length 9.1. Interleaving Length
The vocoder interpolates the missing speech content when given an The vocoder interpolates the missing speech content when given an
erasure frame. However, the best quality is perceived by the listener erasure frame. However, the best quality is perceived by the
when erasure frames are not consecutive. This makes interleaving listener when erasure frames are not consecutive. This makes
desirable as it increases speech quality when packet loss occurs. interleaving desirable as it increases speech quality when packet
loss occurs.
On the other hand, interleaving can greatly increase the end-to-end On the other hand, interleaving can greatly increase the end-to-end
delay. Where an interactive session is desired, either delay. Where an interactive session is desired, either
Interleaved/Bundled packet format with interleaving length (field Interleaved/Bundled packet format with interleaving length (field
LLL) 0 or Header-Free packet format is RECOMMENDED. LLL) 0 or Header-Free packet format is RECOMMENDED.
When end-to-end delay is not a primary concern, an interleaving When end-to-end delay is not a primary concern, an interleaving
length (field LLL) of 4 or 5 is RECOMMENDED as it offers a reasonable length (field LLL) of 4 or 5 is RECOMMENDED as it offers a reasonable
compromise between robustness and latency. compromise between robustness and latency.
skipping to change at page 12, line 35 skipping to change at page 13, line 39
9.3. Processing the Late Packets 9.3. Processing the Late Packets
Assume that the receiver has begun playing frames from an interleave Assume that the receiver has begun playing frames from an interleave
group. The time has come to play frame x from packet n of the group. The time has come to play frame x from packet n of the
interleave group. Further assume that packet n of the interleave interleave group. Further assume that packet n of the interleave
group has not been received. As described in Section 8, an erasure group has not been received. As described in Section 8, an erasure
frame will be sent to the receiving vocoder. frame will be sent to the receiving vocoder.
Now, assume that packet n of the interleave group arrives before Now, assume that packet n of the interleave group arrives before
frame x+1 of that packet is needed. Receivers should use frame x+1 of frame x+1 of that packet is needed. Receivers should use frame x+1
the newly received packet n rather than substituting an erasure of the newly received packet n rather than substituting an erasure
frame. In other words, just because packet n was not available the frame. In other words, just because packet n was not available the
first time it was needed to reconstruct the interleaved speech, the first time it was needed to reconstruct the interleaved speech, the
receiver should not assume it is not available when it is receiver should not assume it is not available when it is
subsequently needed for interleaved speech reconstruction. subsequently needed for interleaved speech reconstruction.
10. Mode Request 10. Mode Request
The Mode Request signal requests a particular encoding mode for the The Mode Request signal requests a particular encoding mode for the
speech encoding in the reverse direction. All implementations are speech encoding in the reverse direction. All implementations are
RECOMMENDED to honor the Mode Request signal. The Mode Request signal RECOMMENDED to honor the Mode Request signal. The Mode Request
SHOULD only be used in one-to-one sessions. In multiparty sessions, signal SHOULD only be used in one-to-one sessions. In multi-party
any received Mode Request signals SHOULD be ignored. sessions, any received Mode Request signals SHOULD be ignored.
In addition, the Mode Request signal MAY also be sent through non-RTP In addition, the Mode Request signal MAY also be sent through non-RTP
means, which is out of the scope of this specification. means, which is out of the scope of this specification.
The three-bit Mode Request field is used to signal the receiver to The three-bit Mode Request field is used to signal the receiver to
set a particular encoding mode to its audio encoder. If the Mode set a particular encoding mode to its audio encoder. If the Mode
Request field is set to a non-zero value in RTP packets from node A Request field is set to a valid value in RTP packets from node A to
to node B, it is a request for node B to change to the requested node B, it is a request for node B to change to the requested
encoding mode for its audio encoder and therefore the bit rate of the encoding mode for its audio encoder and therefore the bit rate of the
RTP stream from node B to node A. Once a node sets this field to a RTP stream from node B to node A. Once a node sets this field to a
non-zero value it SHOULD continue to set the field to the same value value, it SHOULD continue to set the field to the same value in
in subsequent packets until the requested mode has changed. This subsequent packets until the requested mode is different. This
design helps to eliminate the scenario of getting the codec stuck in design helps to eliminate the scenario of getting the codec stuck in
an unintended state if one of the packets that carries the Mode an unintended state if one of the packets that carries the Mode
Request is lost. An otherwise silent node MAY send an RTP packet Request is lost. An otherwise silent node MAY send an RTP packet
containing a blank frame in order to send a Mode Request. containing a blank frame in order to send a Mode Request.
Each codec type using this format SHOULD define its own Each codec type using this format SHOULD define its own
interpretation of the Mode Request field. Codecs SHOULD follow the interpretation of the Mode Request field. Codecs SHOULD follow the
convention that higher values of the three-bit field correspond to an convention that higher values of the three-bit field correspond to an
equal or lower average output bit rate. equal or lower average output bit rate.
For the EVRC codec, the Mode Request field MUST be interpreted For the EVRC codec, the Mode Request field MUST be interpreted
according to Tables 2.2.1.2-1 and 2.2.1.2-2 of the EVRC codec according to Tables 2.2.1.2-1 and 2.2.1.2-2 of the EVRC codec
specifications [1]. Values above '100' (4) are currently reserved. specifications [1].
If an unknown value above '100' (4) is received, it MUST be handled
as if '100' (4) were received, for interoperability with potential
future revisions.
For SMV codec, the Mode Request field MUST be interpreted according For SMV codec, the Mode Request field MUST be interpreted according
to Table 2.2-2 of the SMV codec specifications [2]. Values above to Table 2.2-2 of the SMV codec specifications [2].
'101' (5) are currently reserved. If an unknown value above '101' (5)
is received, it MUST be handled as if '101' (5) were received, also
for interoperability with potential future revisions.
11. Storage Format 11. Storage Format
The storage format is used for storing speech frames, e.g., as a file The storage format is used for storing speech frames, e.g., as a file
or e-mail attachment. or e-mail attachment.
The file begins with a magic number to identify the vocoder that is The file begins with a magic number to identify the vocoder that is
used. The magic number for EVRC corresponds to the ASCII character used. The magic number for EVRC corresponds to the ASCII character
string "#!EVRC\n", i.e., "0x23 0x21 0x45 0x56 0x52 0x43 0x0A". The string "#!EVRC\n", i.e., "0x23 0x21 0x45 0x56 0x52 0x43 0x0A". The
magic number for SMV corresponds to the ASCII character string magic number for SMV corresponds to the ASCII character string
skipping to change at page 14, line 7 skipping to change at page 15, line 7
frame. The ToC field is extended to one octet by setting the four frame. The ToC field is extended to one octet by setting the four
most significant bits of the octet to zero. For example, a ToC value most significant bits of the octet to zero. For example, a ToC value
of 4 (a full-rate frame) is stored as 0x04. of 4 (a full-rate frame) is stored as 0x04.
Speech frames lost in transmission and non-received frames MUST be Speech frames lost in transmission and non-received frames MUST be
stored as erasure frames (frame type 5, see definition in Section stored as erasure frames (frame type 5, see definition in Section
5.1) to maintain synchronization with the original media. 5.1) to maintain synchronization with the original media.
12. IANA Considerations 12. IANA Considerations
Four new MIME sub-types as described in this section are to be Four new MIME sub-types as described in this section have been
registered. registered by the IANA.
The MIME-names for the EVRC and SMV codec are allocated from the IETF The MIME-names for the EVRC and SMV codec are allocated from the IETF
tree since all the vocoders covered are expected to be widely used tree since all the vocoders covered are expected to be widely used
for Voice-over-IP applications. for Voice-over-IP applications.
12.1. Registration of Media Type EVRC 12.1. Registration of Media Type EVRC
Media Type Name: audio Media Type Name: audio
Media Subtype Name: EVRC Media Subtype Name: EVRC
Required Parameter: none Required Parameter: none
Optional parameters: Optional parameters:
The following parameters apply to RTP transfer only. The following parameters apply to RTP transfer only.
ptime: Defined as usual for RTP audio RFC 2327. ptime: Defined as usual for RTP audio (see RFC 2327).
maxptime: The maximum amount of media which can be encapsulated maxptime: The maximum amount of media which can be encapsulated in
in each packet, expressed as time in milliseconds. The time each packet, expressed as time in milliseconds. The time SHALL
SHALL be calculated as the sum of the time the media present be calculated as the sum of the time the media present in the
in the packet represents. The time SHOULD be a multiple of the packet represents. The time SHOULD be a multiple of the
duration of a single codec data frame (20 msec). If not duration of a single codec data frame (20 msec). If not
signaled, the default maxptime value SHALL be 200 signaled, the default maxptime value SHALL be 200 milliseconds.
milliseconds.
maxinterleave: Maximum number for interleaving length (field LLL maxinterleave: Maximum number for interleaving length (field LLL
in the Interleaving Octet). The interleaving lengths used in in the Interleaving Octet). The interleaving lengths used in
the entire session MUST NOT exceed this maximum value. If not the entire session MUST NOT exceed this maximum value. If not
signaled, the maxinterleave length SHALL be 5. signaled, the maxinterleave length SHALL be 5.
Encoding considerations: Encoding considerations:
This type is defined for transfer of EVRC-encoded data via RTP This type is defined for transfer of EVRC-encoded data via RTP
using the Interleaved/Bundled packet format specified in Sections using the Interleaved/Bundled packet format specified in Sections
4.1, 6, and 7 of RFC xxxx. It is also defined for other transfer 4.1, 6, and 7 of RFC 3558. It is also defined for other transfer
methods using the storage format specified in Section 11 of RFC methods using the storage format specified in Section 11 of RFC
xxxx. 3558.
Security considerations: Security considerations:
See Section 14 "Security Considerations" of RFC xxxx. See Section 14 "Security Considerations" of RFC 3558.
Public specification: Public specification:
The EVRC vocoder is specified in 3GPP2 C.S0014. The EVRC vocoder is specified in 3GPP2 C.S0014. Transfer methods
Transfer methods are specified in RFC xxxx. are specified in RFC 3558.
Additional information: Additional information:
The following information applies for storage format only. The following information applies for storage format only.
Magic number: #!EVRC\n (see Section 11 of RFC xxxx) Magic number: #!EVRC\n (see Section 11 of RFC 3558)
File extensions: evc, EVC File extensions: evc, EVC
Macintosh file type code: none Macintosh file type code: none
Object identifier or OID: none Object identifier or OID: none
Intended usage: Intended usage:
COMMON. It is expected that many VoIP applications (as well as COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type. mobile applications) will use this type.
Person & email address to contact for further information: Person & email address to contact for further information:
Adam Li Adam Li
skipping to change at page 15, line 39 skipping to change at page 16, line 39
Media Subtype Name: EVRC0 Media Subtype Name: EVRC0
Required Parameters: none Required Parameters: none
Optional parameters: none Optional parameters: none
Encoding considerations: none Encoding considerations: none
This type is only defined for transfer of EVRC-encoded data via This type is only defined for transfer of EVRC-encoded data via
RTP using the Header-Free packet format specified in Section 4.2 RTP using the Header-Free packet format specified in Section 4.2
of RFC xxxx. of RFC 3558.
Security considerations: Security considerations:
See Section 14 "Security Considerations" of RFC xxxx. See Section 14 "Security Considerations" of RFC 3558.
Public specification: Public specification:
The EVRC vocoder is specified in 3GPP2 C.S0014. The EVRC vocoder is specified in 3GPP2 C.S0014. Transfer methods
Transfer methods are specified in RFC xxxx. are specified in RFC 3558.
Additional information: none Additional information: none
Intended usage: Intended usage:
COMMON. It is expected that many VoIP applications (as well as COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type. mobile applications) will use this type.
Person & email address to contact for further information: Person & email address to contact for further information:
Adam Li Adam Li
adamli@icsl.ucla.edu adamli@icsl.ucla.edu
skipping to change at page 16, line 20 skipping to change at page 17, line 25
Media Type Name: audio Media Type Name: audio
Media Subtype Name: SMV Media Subtype Name: SMV
Required Parameter: none Required Parameter: none
Optional parameters: Optional parameters:
The following parameters apply to RTP transfer only. The following parameters apply to RTP transfer only.
ptime: Defined as usual for RTP audio 2327. ptime: Defined as usual for RTP audio (see RFC 2327).
maxptime: The maximum amount of media which can be encapsulated maxptime: The maximum amount of media which can be encapsulated
in each packet, expressed as time in milliseconds. The time in each packet, expressed as time in milliseconds. The time
SHALL be calculated as the sum of the time the media present SHALL be calculated as the sum of the time the media present
in the packet represents. The time SHOULD be a multiple of the in the packet represents. The time SHOULD be a multiple of the
duration of a single codec data frame (20 msec). If not duration of a single codec data frame (20 msec). If not
signaled, the default maxptime value SHALL be 200 signaled, the default maxptime value SHALL be 200
milliseconds. milliseconds.
maxinterleave: Maximum number for interleaving length (field LLL maxinterleave: Maximum number for interleaving length (field LLL
in the Interleaving Octet). The interleaving lengths used in in the Interleaving Octet). The interleaving lengths used in
the entire session MUST NOT exceed this maximum value. If not the entire session MUST NOT exceed this maximum value. If not
signaled, the maxinterleave length SHALL be 5. signaled, the maxinterleave length SHALL be 5.
Encoding considerations: Encoding considerations:
This type is defined for transfer of SMV-encoded data via RTP This type is defined for transfer of SMV-encoded data via RTP
using the Interleaved/Bundled packet format specified in Section using the Interleaved/Bundled packet format specified in Section
4.1, 6, and 7 of RFC xxxx. It is also defined for other transfer 4.1, 6, and 7 of RFC 3558. It is also defined for other transfer
methods using the storage format specified in Section 11 of RFC methods using the storage format specified in Section 11 of RFC
xxxx. 3558.
Security considerations: Security considerations:
See Section 14 "Security Considerations" of RFC xxxx. See Section 14 "Security Considerations" of RFC 3558.
Public specification: Public specification:
The SMV vocoder is specified in 3GPP2 C.S0030-0 v2.0. The SMV vocoder is specified in 3GPP2 C.S0030-0 v2.0.
Transfer methods are specified in RFC xxxx. Transfer methods are specified in RFC 3558.
Additional information: Additional information:
The following information applies to storage format only. The following information applies to storage format only.
Magic number: #!SMV\n (see Section 11 of RFC xxxx) Magic number: #!SMV\n (see Section 11 of RFC 3558)
File extensions: smv, SMV File extensions: smv, SMV
Macintosh file type code: none Macintosh file type code: none
Object identifier or OID: none Object identifier or OID: none
Intended usage: Intended usage:
COMMON. It is expected that many VoIP applications (as well as COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type. mobile applications) will use this type.
Person & email address to contact for further information: Person & email address to contact for further information:
Adam Li Adam Li
adamli@icsl.ucla.edu adamli@icsl.ucla.edu
Author/Change controller: Author/Change controller:
Adam Li Adam Li
skipping to change at page 17, line 28 skipping to change at page 18, line 37
Media Type Name: audio Media Type Name: audio
Media Subtype Name: SMV0 Media Subtype Name: SMV0
Required Parameter: none Required Parameter: none
Optional parameters: none Optional parameters: none
Encoding considerations: none Encoding considerations: none
This type is only defined for transfer of SMV-encoded data via This type is only defined for transfer of SMV-encoded data via RTP
RTP using the Header-Free packet format specified in Section 4.2 using the Header-Free packet format specified in Section 4.2 of
of RFC xxxx. RFC 3558.
Security considerations: Security considerations:
See Section 14 "Security Considerations" of RFC xxxx. See Section 14 "Security Considerations" of RFC 3558.
Public specification: Public specification:
The SMV vocoder is specified in 3GPP2 C.S0030-0 v2.0. The SMV vocoder is specified in 3GPP2 C.S0030-0 v2.0. Transfer
Transfer methods are specified in RFC xxxx. methods are specified in RFC 3558.
Additional information: none Additional information: none
Intended usage: Intended usage:
COMMON. It is expected that many VoIP applications (as well as COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type. mobile applications) will use this type.
Person & email address to contact for further information: Person & email address to contact for further information:
Adam Li Adam Li
adamli@icsl.ucla.edu adamli@icsl.ucla.edu
skipping to change at page 18, line 22 skipping to change at page 19, line 32
is as follows: is as follows:
o The MIME type ("audio") goes in SDP "m=" as the media name. o The MIME type ("audio") goes in SDP "m=" as the media name.
o The MIME subtype (payload format name) goes in SDP "a=rtpmap" o The MIME subtype (payload format name) goes in SDP "a=rtpmap"
as the encoding name. as the encoding name.
o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" o The parameters "ptime" and "maxptime" go in the SDP "a=ptime"
and "a=maxptime" attributes, respectively. and "a=maxptime" attributes, respectively.
o The parameter Ÿmaxinterleave÷ goes in the SDP "a=fmtp" o The parameter "maxinterleave" goes in the SDP "a=fmtp"
attribute by copying it directly from the MIME media type string attribute by copying it directly from the MIME media type
as ÷maxinterleave=value÷. string as "maxinterleave=value".
Some examples of SDP session descriptions for EVRC and SMV encodings Some examples of SDP session descriptions for EVRC and SMV encodings
follow below. follow below.
Example of usage of EVRC: Example of usage of EVRC:
m=audio 49120 RTP/AVP 97 m=audio 49120 RTP/AVP 97
a=rtpmap:97 EVRC a=rtpmap:97 EVRC/8000
a=fmtp:97 maxinterleave=2 a=fmtp:97 maxinterleave=2
a=maxptime:80 a=maxptime:80
Example of usage of SMV Example of usage of SMV
m=audio 49122 RTP/AVP 99 m=audio 49122 RTP/AVP 99
a=rtpmap:99 SMV0 a=rtpmap:99 SMV0/8000
a=fmtp:99 a=fmtp:99
Note that the payload format (encoding) names are commonly shown in Note that the payload format (encoding) names are commonly shown in
upper case. MIME subtypes are commonly shown in lower case. These upper case. MIME subtypes are commonly shown in lower case. These
names are case-insensitive in both places. Similarly, parameter names names are case-insensitive in both places. Similarly, parameter
are case-insensitive both in MIME types and in the default mapping to names are case-insensitive both in MIME types and in the default
the SDP a=fmtp attribute. mapping to the SDP a=fmtp attribute.
14. Security Considerations 14. Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [4], and any appropriate profile (for example [5]). specification [4], and any appropriate profile (for example [5]).
This implies that confidentiality of the media streams is achieved by This implies that confidentiality of the media streams is achieved by
encryption. Because the data compression used with this payload encryption. Because the data compression used with this payload
format is applied end-to-end, encryption may be performed after format is applied end-to-end, encryption may be performed after
compression so there is no conflict between the two operations. compression so there is no conflict between the two operations.
A potential denial-of-service threat exists for data encoding using A potential denial-of-service threat exists for data encoding using
compression techniques that have non-uniform receiver-end compression techniques that have non-uniform receiver-end
computational load. The attacker can inject pathological datagrams computational load. The attacker can inject pathological datagrams
into the stream which are complex to decode and cause the receiver to into the stream which are complex to decode and cause the receiver to
become overloaded. However, the encodings covered in this document do become overloaded. However, the encodings covered in this document
not exhibit any significant non-uniformity. do not exhibit any significant non-uniformity.
As with any IP-based protocol, in some circumstances, a receiver may As with any IP-based protocol, in some circumstances, a receiver may
be overloaded simply by the receipt of too many packets, either be overloaded simply by the receipt of too many packets, either
desired or undesired. Network-layer authentication may be used to desired or undesired. Network-layer authentication may be used to
discard packets from undesired sources, but the processing cost of discard packets from undesired sources, but the processing cost of
the authentication itself may be too high. In a multicast the authentication itself may be too high. In a multicast
environment, pruning of specific sources may be implemented in environment, pruning of specific sources may be implemented in future
future versions of IGMP [7] and in multicast routing protocols to versions of IGMP [7] and in multicast routing protocols to allow a
allow a receiver to select which sources are allowed to reach it. receiver to select which sources are allowed to reach it.
Interleaving may affect encryption. Depending on the used encryption Interleaving may affect encryption. Depending on the used encryption
scheme there may be restrictions on for example the time when keys scheme there may be restrictions on, for example, the time when keys
can be changed. Specifically, the key change may need to occur at the can be changed. Specifically, the key change may need to occur at
boundary between interleave groups. the boundary between interleave groups.
15. Adding Support of Other Frame-Based Vocoders 15. Adding Support of Other Frame-Based Vocoders
As described above, the RTP packet format defined in this document is As described above, the RTP packet format defined in this document is
very flexible and designed to be usable by other frame-based very flexible and designed to be usable by other frame-based
vocoders. vocoders.
Additional vocoders using this format MUST have properties as Additional vocoders using this format MUST have properties as
described in Section 3.3. described in Section 3.3.
For an eligible vocoder to use the payload format mechanisms defined For an eligible vocoder to use the payload format mechanisms defined
in this document, a new RTP payload format document needs to be in this document, a new RTP payload format document needs to be
published as a standards track RFC. That document can simply refer to published as a standards track RFC. That document can simply refer
this document and then specify the following parameters: to this document and then specify the following parameters:
o Define the unit used for RTP time stamp; o Define the unit used for RTP time stamp;
o Define the meaning of the Mode Request bits; o Define the meaning of the Mode Request bits;
o Define corresponding codec data frame type values for ToC; o Define corresponding codec data frame type values for ToC;
o Define the conversion procedure for vocoders output data frame; o Define the conversion procedure for vocoders output data frame;
o Define a magic number for storage format, and complete the o Define a magic number for storage format, and complete the
corresponding MIME registration. corresponding MIME registration.
16. Acknowledgements 16. Acknowledgements
The following authors have made significant contributions to this The following authors have made significant contributions to this
document: Adam H. Li, John D. Villasenor, Dong-Seek Park, Jeong-Hoon document: Adam H. Li, John D. Villasenor, Dong-Seek Park, Jeong-Hoon
Park, Keith Miller, S. Craig Greer, David Leon, Nikolai Leung, Park, Keith Miller, S. Craig Greer, David Leon, Nikolai Leung,
Marcello Lioy, Kyle J. McKay, Magdalena L. Espelien, Randall Gellens, Marcello Lioy, Kyle J. McKay, Magdalena L. Espelien, Randall Gellens,
Tom Hiller, Peter J. McCann, Stinson S. Mathai, Michael D. Turner, Tom Hiller, Peter J. McCann, Stinson S. Mathai, Michael D. Turner,
Ajay Rajkumar, Dan Gal, Magnus Westerlund, Lars-Erik Jonsson, Greg Ajay Rajkumar, Dan Gal, Magnus Westerlund, Lars-Erik Jonsson, Greg
Sherwood, and Thomas Zeng. Sherwood, and Thomas Zeng.
17. References 17. References
17.1 Normative
[1] 3GPP2 C.S0014, "Enhanced Variable Rate Codec, Speech Service [1] 3GPP2 C.S0014, "Enhanced Variable Rate Codec, Speech Service
Option 3 for Wideband Spread Spectrum Digital Systems", January Option 3 for Wideband Spread Spectrum Digital Systems", January
1997. 1997.
[2] 3GPP2 C.S0030-0 v2.0, "Selectable Mode Vocoder, Service Option [2] 3GPP2 C.S0030-0 v2.0, "Selectable Mode Vocoder, Service Option
for Wideband Spread Spectrum Communication Systems", May 2002. for Wideband Spread Spectrum Communication Systems", May 2002.
[3] Bradner, S., "Key words for use in RFCs to Indicate Requirement [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997. Levels", BCP 14, RFC 2119, March 1997.
[4] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, [4] Schulzrinne, H., Casner, S., Jacobson, V. and R. Frederick,
"RTP: A Transport Protocol for Real-Time Applications", RFC "RTP: A Transport Protocol for Real-Time Applications", RFC
1889, January 1996. 3550, July 2003.
[5] Schulzrinne, H., "RTP Profile for Audio and Video Conferences [5] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
with Minimal Control", RFC 1890, January 1996. Conferences with Minimal Control", RFC 3551, July 2003.
[6] M. Handley and V. Jacobson, "SDP: Session Description Protocol", [6] Handley, M. and V. Jacobson, "SDP: Session Description
RFC 2327, April 1998. Protocol", RFC 2327, April 1998.
17.2 Informative
[7] Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC [7] Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC
1112, August 1989. 1112, August 1989.
18. Authors' Address 18. Author's Address
The editor will serve as the point of contact for technical issues.
Adam H. Li Adam H. Li
Image Communication Lab Image Communication Lab
Electrical Engineering Department Electrical Engineering Department
University of California University of California
Los Angeles, CA 90095 Los Angeles, CA 90095
USA USA
Phone: +1 310 825 5178 Phone: +1 310 825 5178
Email: adamli@icsl.ucla.edu EMail: adamli@icsl.ucla.edu
19. Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 

This html diff was produced by rfcdiff 1.25, available from http://www.levkowetz.com/ietf/tools/rfcdiff/