draft-ietf-avt-evrc-smv-00.txt   draft-ietf-avt-evrc-smv-01.txt 
Internet Draft Adam H. Li Internet Draft Adam H. Li
draft-ietf-avt-evrc-smv-00.txt UCLA draft-ietf-avt-evrc-smv-01.txt UCLA
February 4, 2002 Editor May 16, 2002 Editor
Expires: August 4, 2002 Expires: November 16, 2002
An RTP Payload Format for EVRC and SMV Vocoders RTP Payload Format for EVRC and SMV Vocoders
STATUS OF THIS MEMO STATUS OF THIS MEMO
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026. all provisions of Section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
skipping to change at page 2, line 6 skipping to change at page 2, line 7
1. Introduction ................................................... 2 1. Introduction ................................................... 2
2. Background ..................................................... 2 2. Background ..................................................... 2
3. The Codecs Supported ........................................... 3 3. The Codecs Supported ........................................... 3
3.1. EVRC ......................................................... 3 3.1. EVRC ......................................................... 3
3.2. SMV .......................................................... 3 3.2. SMV .......................................................... 3
3.3. Other Frame-Based Vocoders ................................... 4 3.3. Other Frame-Based Vocoders ................................... 4
4. RTP/Vocoder Packet Format ...................................... 4 4. RTP/Vocoder Packet Format ...................................... 4
4.1. Type 1 Interleaved/Bundled Packet Format ..................... 4 4.1. Type 1 Interleaved/Bundled Packet Format ..................... 4
4.2. Type 2 Header-Free Packet Format ............................. 6 4.2. Type 2 Header-Free Packet Format ............................. 6
4.3. Detecting the Format of Packets .............................. 6 4.3. Determining the Format of Packets ............................ 6
5. Packet Table of Contents Entries and Codec Data Frame Format ... 7 5. Packet Table of Contents Entries and Codec Data Frame Format ... 7
5.1. Packet Table of Contents entries ............................. 7 5.1. Packet Table of Contents entries ............................. 7
5.2. Codec Data Frames ............................................ 8 5.2. Codec Data Frames ............................................ 8
6. Interleaving Codec Data Frames in Type 1 Packets ............... 9 6. Interleaving Codec Data Frames in Type 1 Packets ............... 9
6.1. Finding Interleave Group Boundaries ......................... 10 6.1. Finding Interleave Group Boundaries ......................... 11
6.2. Reconstructing Interleaved Speech ........................... 11 6.2. Additional Receiver Responsibilities ........................ 11
6.3. Receiving Invalid Interleaving Values ....................... 12 7. Bundling Codec Data Frames in Type 1 Packets .................. 11
6.4. Additional Receiver Responsibilities ........................ 12
7. Bundling Codec Data Frames in Type 1 Packets .................. 12
8. Handling Missing Codec Data Frames ............................ 12 8. Handling Missing Codec Data Frames ............................ 12
9. Implementation Issues ......................................... 13 9. Implementation Issues ......................................... 12
9.1. Interleaving Length ......................................... 13 9.1. Interleaving Length ......................................... 12
9.2. Mode Request ................................................ 13 9.2. Validation of Received Packets .............................. 12
10. IANA Considerations .......................................... 14 10. Mode Request ................................................. 13
10.1 Storage Mode ................................................ 14 11. Storage Mode ................................................. 13
10.2 EVRC MIME Registration ...................................... 15 12. IANA Considerations .......................................... 14
10.3 SMV MIME Registration ....................................... 16 12.1. Registration of Media Type EVRC ............................ 14
11. Mapping to SDP Parameters .................................... 17 12.2. Registration of Media Type EVRC0 ........................... 15
12. Security Considerations ...................................... 17 12.3. Registration of Media Type SMV ............................. 16
13. Adding Support of Other Frame-Based Vocoders ................. 18 12.4. Registration of Media Type SMV0 ............................ 17
14. Acknowledgements ............................................. 18 13. Mapping to SDP Parameters .................................... 17
15. References ................................................... 18 14. Security Considerations ...................................... 18
16. Authors' Address ............................................. 19 15. Adding Support of Other Frame-Based Vocoders ................. 19
16. Acknowledgements ............................................. 19
17. References ................................................... 20
18. Authors' Address ............................................. 20
1. Introduction 1. Introduction
This document describes how speech compressed with EVRC [1] or SMV This document describes how speech compressed with EVRC [1] or SMV
[2] may be formatted for use as an RTP payload type. The format is [2] may be formatted for use as an RTP payload type. The format is
also extensible to other codecs that generate a similar set of frame also extensible to other codecs that generate a similar set of frame
types. Two methods are provided to packetize the codec data frames types. Two methods are provided to packetize the codec data frames
into RTP packets: an interleaved/bundled format and a zero-header into RTP packets: an interleaved/bundled format and a zero-header
format. The sender may choose the best format for each application format. The sender may choose the best format for each application
scenario, based on network conditions, bandwidth availability, delay scenario, based on network conditions, bandwidth availability, delay
skipping to change at page 4, line 7 skipping to change at page 4, line 7
The SMV codec can operate in four modes. Each mode may produce frames The SMV codec can operate in four modes. Each mode may produce frames
of any of the rates (full rate to 1/8 rate) for varying percentages of any of the rates (full rate to 1/8 rate) for varying percentages
of time, based on the characteristics of the speech samples and the of time, based on the characteristics of the speech samples and the
selected mode. The SMV mode can change on a frame-by-frame basis. The selected mode. The SMV mode can change on a frame-by-frame basis. The
SMV codec does not need additional information other than the codec SMV codec does not need additional information other than the codec
data frames to correctly decode the data of various modes; therefore, data frames to correctly decode the data of various modes; therefore,
the mode of the encoder does not need to be transmitted with the the mode of the encoder does not need to be transmitted with the
encoded frames. encoded frames.
The percentage of different frame rates and the average data rate The percentage of different frame rates for the four SMV modes are
(ADR) for the four SMV modes are shown in the table below. shown in the table below.
Mode 0 Mode 1 Mode 2 Mode 3 Mode 0 Mode 1 Mode 2 Mode 3
------------------------------------------------------------- -------------------------------------------------------------
Rate 1 68.90% 38.14% 15.43% 07.49% Rate 1 68.90% 38.14% 15.43% 07.49%
Rate 1/2 06.03% 15.82% 38.34% 46.28% Rate 1/2 06.03% 15.82% 38.34% 46.28%
Rate 1/4 00.00% 17.37% 16.38% 16.38% Rate 1/4 00.00% 17.37% 16.38% 16.38%
Rate 1/8 25.07% 28.67% 29.85% 29.85% Rate 1/8 25.07% 28.67% 29.85% 29.85%
-------------------------------------------------------------
ADR 7205 bps 5182 bps 4073 bps 3692 bps
The SMV codec chooses the output frame rate based on an analysis of The SMV codec chooses the output frame rate based on an analysis of
the input speech and the current operating mode. For typical speech the input speech and the current operating mode. For typical speech
patterns, this results in an average output of 4.2k bits/second for patterns, this results in an average output of 4.2kilobits/second for
Mode 0 and lower for other reduced rate modes. Mode 0 in two way conversation (assuming 50% active speech time and
50% in eighth rate while listening) and lower for other reduced rate
modes.
SMV is more bandwidth efficient than EVRC. EVRC is equivalent in SMV is more bandwidth efficient than EVRC. EVRC is equivalent in
performance to SMV mode 1. performance to SMV mode 1.
3.3. Other Frame-Based Vocoders 3.3. Other Frame-Based Vocoders
Other frame-based vocoders can be carried in the packet format Other frame-based vocoders can be carried in the packet format
defined in this document, as long as they possess the following defined in this document, as long as they possess the following
properties: properties:
o The codec is frame-based; o The codec is frame-based;
o blank and erasure frames are supported; o blank and erasure frames are supported;
o the total number of rates is less than 17; o the total number of rates is less than 17;
o the maximum full rate frame can be transported in a single RTP o the maximum full rate frame can be transported in a single RTP
packet using this specific format. packet using this specific format.
Vocoders with the characteristics listed above can be transported Vocoders with the characteristics listed above can be transported
using the packet format specified in this document with some using the packet format specified in this document with some
additional specification work; the pieces that must be defined are additional specification work; the pieces that must be defined are
listed in Section 13. listed in Section 15.
4. RTP/Vocoder Packet Format 4. RTP/Vocoder Packet Format
The RTP payload data MUST be transmitted in packets of one of the In the packet format diagrams shown in this document, bit 0 is the
following two types. most significant bit. The vocoder speech data MUST be transmitted in
RTP packets of one of the following two types.
4.1. Type 1 Interleaved/Bundled Packet Format 4.1. Type 1 Interleaved/Bundled Packet Format
This format is used to send one or more vocoder frames per packet. This format is used to send one or more vocoder frames per packet.
Interleaving or bundling MAY be used. The RTP packet for this format Interleaving or bundling MAY be used. The RTP packet for this format
is as follows: is as follows:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 5, line 24 skipping to change at page 5, line 24
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The RTP header has the expected values as described in the RTP The RTP header has the expected values as described in the RTP
specification [4]. The RTP timestamp is in 1/8000 of a second units specification [4]. The RTP timestamp is in 1/8000 of a second units
for EVRC and SMV. For any other vocoders that use this packet format, for EVRC and SMV. For any other vocoders that use this packet format,
the timestamp unit needs to be defined explicitly. The M bit should the timestamp unit needs to be defined explicitly. The M bit should
be set as specified in the applicable RTP profile, for example, RFC be set as specified in the applicable RTP profile, for example, RFC
1890 [5]. Note that RFC 1890 [5] specifies that if the sender does 1890 [5]. Note that RFC 1890 [5] specifies that if the sender does
not suppress silence, the M bit will always be zero. When multiple not suppress silence, the M bit will always be zero. When multiple
codec data frames are present in a single RTP packet, the timestamp codec data frames are present in a single RTP packet, the timestamp
is, as always, that of the oldest data represented in the RTP packet. is that of the oldest data represented in the RTP packet. The
The assignment of an RTP payload type for this new packet format is assignment of an RTP payload type for this new packet format is
outside the scope of this document, and will not be specified here. outside the scope of this document; it is specified by the RTP
It is expected that the RTP profile for a particular class of profile under which this payload format is used.
applications will assign a payload type for this encoding, or if that
is not done, then a payload type in the dynamic range shall be chosen
by the sender.
The first octet of a Type 1 Interleaved/Bundled format packet is the The first octet of a Type 1 Interleaved/Bundled format packet is the
Interleave Octet. The second octet contains the Mode Request and Interleave Octet. The second octet contains the Mode Request and
Frame Count fields. The Table of Contents (ToC) field then follows. Frame Count fields. The Table of Contents (ToC) field then follows.
The fields are specified as follows: The fields are specified as follows:
Reserved (RR): 2 bits Reserved (RR): 2 bits
Reserved bits. MUST be set to zero by sender, SHOULD be ignored Reserved bits. MUST be set to zero by sender, SHOULD be ignored
by receiver. by receiver.
Interleave Length (LLL): 3 bits Interleave Length (LLL): 3 bits
skipping to change at page 5, line 54 skipping to change at page 5, line 50
Section 7 for more detailed discussion. Section 7 for more detailed discussion.
Interleave Index (NNN): 3 bits Interleave Index (NNN): 3 bits
Indicates the index within an interleave group. MUST have a value Indicates the index within an interleave group. MUST have a value
less than or equal to the value of LLL. Values of NNN greater less than or equal to the value of LLL. Values of NNN greater
than the value of LLL are invalid. Packet with invalid NNN values than the value of LLL are invalid. Packet with invalid NNN values
SHOULD be ignored by the receiver. SHOULD be ignored by the receiver.
Mode Request (FFF): 3 bits Mode Request (FFF): 3 bits
The Mode Request field is used to signal Mode Request The Mode Request field is used to signal Mode Request
information. See Section 9.2 for details. information. See Section 10 for details.
Frame Count (Count): 5 bits Frame Count (Count): 5 bits
Indicates the number of ToC fields (and therefore vocoder frames) The number of ToC fields (and vocoder frames) present in the
present. A value of zero indicates that the packet contains one packet is the value of the frame count field plus one. A value of
ToC field (and vocoder frame). A value of 31 indicates 32 ToC zero indicates that the packet contains one ToC field, while a
fields (and vocoder frames) are in the packet. The number of ToC value of 31 indicates that the packet contains 32 ToC fields.
fields (and vocoder frames) present is the value of the frame
count field plus one.
Padding (padding): 0 or 4 bits Padding (padding): 0 or 4 bits
This padding ensures that codec data frames start on an octet This padding ensures that codec data frames start on an octet
boundary. When the frame count is odd, the sender MUST add 4 bits boundary. When the frame count is odd, the sender MUST add 4 bits
of padding following the last TOC. When the frame count is even, of padding following the last TOC. When the frame count is even,
the sender MUST NOT add padding bits. If padding is present, the the sender MUST NOT add padding bits. If padding is present, the
padding bits MUST be set to zero by sender, and SHOULD be ignored padding bits MUST be set to zero by sender, and SHOULD be ignored
by receiver. by receiver.
The Table of Contents field (ToC) provides information on the codec The Table of Contents field (ToC) provides information on the codec
skipping to change at page 7, line 5 skipping to change at page 6, line 47
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header [4] | | RTP Header [4] |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| | | |
+ ONLY one codec data frame +-+-+-+-+-+-+-+-+ + ONLY one codec data frame +-+-+-+-+-+-+-+-+
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4.3. Detecting the Format of Packets 4.3. Determining the Format of Packets
All receivers MUST be able to process both types of packets. The All receivers SHOULD be able to process both types of packets. The
sender MAY choose to use one or both types of packets. sender MAY choose to use one or both types of packets.
A receiver MUST have prior knowledge of the packet type to correctly A receiver MUST have prior knowledge of the packet type to correctly
decode the RTP packets. The packet types used in an RTP session MUST decode the RTP packets. The packet types used in an RTP session MUST
be specified by the sender, and signaled through out-of-band means, be specified by the sender, and signaled through out-of-band means,
for example by SDP during the setup of a session. for example by SDP during the setup of a session.
When packets of both formats are used within the same session, When packets of both formats are used within the same session,
different RTP payload type values MUST be used for each format to different RTP payload type values MUST be used for each format to
distinguish the packet formats. The association of payload type distinguish the packet formats. The association of payload type
skipping to change at page 7, line 55 skipping to change at page 7, line 45
Value Rate Total codec data frame size (in octets) Value Rate Total codec data frame size (in octets)
--------------------------------------------------------- ---------------------------------------------------------
0 Blank 0 (0 bit) 0 Blank 0 (0 bit)
1 1/8 2 (16 bits) 1 1/8 2 (16 bits)
2 1/4 5 (40 bits; not valid for EVRC) 2 1/4 5 (40 bits; not valid for EVRC)
3 1/2 10 (80 bits) 3 1/2 10 (80 bits)
4 1 22 (171 bits; 5 padded at end with zeros) 4 1 22 (171 bits; 5 padded at end with zeros)
5 Erasure 0 (SHOULD NOT be transmitted by sender) 5 Erasure 0 (SHOULD NOT be transmitted by sender)
All values not listed in the above table MUST be considered All values not listed in the above table MUST be considered reserved.
reserved. A ToC entry with a reserved Frame Type value SHOULD be A ToC entry with a reserved Frame Type value SHOULD be considered
considered invalid and substituted with an erasure frame. Note invalid. Note that the EVRC codec does not have 1/4 rate frames, thus
that the EVRC codec does not have 1/4 rate frames, thus frame frame type value 2 MUST be considered a reserved value when the EVRC
type value 2 MUST be considered a reserved value when the EVRC
codec is in use. codec is in use.
Other vocoders that use this packet format need to specify their Other vocoders that use this packet format need to specify their own
own table of frame types and corresponding codec data frames. table of frame types and corresponding codec data frames.
5.2. Codec Data Frames 5.2. Codec Data Frames
The output of the vocoder MUST be converted into codec data frames The output of the vocoder MUST be converted into codec data frames
for inclusion in the RTP payload. The conversions for EVRC and SMV for inclusion in the RTP payload. The conversions for EVRC and SMV
codecs are specified below. (Note: Because the EVRC codec does not codecs are specified below. (Note: Because the EVRC codec does not
have Rate 1/4 frames, the specifications of 1/4 frames does not apply have Rate 1/4 frames, the specifications of 1/4 frames does not apply
to EVRC codec data frames). Other vocoders that use this packet to EVRC codec data frames). Other vocoders that use this packet
format need to specify how to convert vocoder output data into format need to specify how to convert vocoder output data into
frames. frames.
skipping to change at page 8, line 47 skipping to change at page 8, line 40
Following is a detailed listing showing a Rate 1 EVRC/SMV codec Following is a detailed listing showing a Rate 1 EVRC/SMV codec
output frame converted into a codec data frame: output frame converted into a codec data frame:
The codec data frame for a EVRC/SMV codec Rate 1 frame is 22 octets The codec data frame for a EVRC/SMV codec Rate 1 frame is 22 octets
long. Bits 1 through 171 from the EVRC/SMV codec Rate 1 frame are long. Bits 1 through 171 from the EVRC/SMV codec Rate 1 frame are
placed as indicated, with bits marked with "Z" set to zero. EVRC/SMV placed as indicated, with bits marked with "Z" set to zero. EVRC/SMV
codec Rate 1/8, Rate 1/4 and Rate 1/2 frames are converted similarly, codec Rate 1/8, Rate 1/4 and Rate 1/2 frames are converted similarly,
but do not require zero padding because they align on octet but do not require zero padding because they align on octet
boundaries. boundaries.
Rate 1 codec data frame (octets 0 - 3) Rate 1 codec data frame
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3| |0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3|
|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2| |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Rate 1 codec data frame (octets 19 - 21) : :
1 1 1 1
4 5 6 7
4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1| | | | | | |1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1| | | | | |
|4|4|4|4|4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|Z|Z|Z|Z|Z| |4|4|4|4|4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|Z|Z|Z|Z|Z|
|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1| | | | | | |5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1| | | | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
6. Interleaving Codec Data Frames in Type 1 Packets 6. Interleaving Codec Data Frames in Type 1 Packets
As indicated in Section 4.1, more than one codec data frame MAY be As indicated in Section 4.1, more than one codec data frame MAY be
included in a single Type 1 Interleaved/Bundled packet by a sender. included in a single Type 1 Interleaved/Bundled packet by a sender.
This is accomplished by interleaving or bundling. This is accomplished by interleaving or bundling.
Bundling is used to spread the transmission overhead of the RTP and Bundling is used to spread the transmission overhead of the RTP and
payload header over multiple vocoder frames. Interleaving payload header over multiple vocoder frames. Interleaving
additionally reduces the listener's perception of data loss by additionally reduces the listener's perception of data loss by
spreading such loss over non-consecutive vocoder frames. EVRC, SMV, spreading such loss over non-consecutive vocoder frames. EVRC, SMV,
skipping to change at page 9, line 40 skipping to change at page 9, line 29
field to greater than zero. Interleaving is indicated by setting the field to greater than zero. Interleaving is indicated by setting the
LLL field to a value greater than zero. LLL field to a value greater than zero.
The discussions on general interleaving apply to the bundling (which The discussions on general interleaving apply to the bundling (which
can be viewed as a reduced case of interleaving) with reduced can be viewed as a reduced case of interleaving) with reduced
complexity. The bundling case is discussed in detail in Section 7. complexity. The bundling case is discussed in detail in Section 7.
Senders MAY support interleaving and/or bundling. All receivers MUST Senders MAY support interleaving and/or bundling. All receivers MUST
support interleaving and bundling. support interleaving and bundling.
Given a time-ordered sequence of output frames from the EVRC codec Given a time-ordered sequence of output frames from the codec
numbered 0..n, a bundling value B (in the Count field), and an numbered 0..n, a bundling value B (the value in the Count field plus
interleave length L where n = B * (L+1) - 1, the output frames are one), and an interleave length L where n = B * (L+1) - 1, the output
placed into RTP packets as follows (the values of the fields LLL and frames are placed into RTP packets as follows (the values of the
NNN are indicated for each RTP packet): fields LLL and NNN are indicated for each RTP packet):
First RTP Packet in Interleave group: First RTP Packet in Interleave group:
LLL=L, NNN=0 LLL=L, NNN=0
Frame 0, Frame L+1, Frame 2(L+1), Frame 3(L+1), ... for a total of Frame 0, Frame L+1, Frame 2(L+1), Frame 3(L+1), ... for a total of
B frames B frames
Second RTP Packet in Interleave group: Second RTP Packet in Interleave group:
LLL=L, NNN=1 LLL=L, NNN=1
Frame 1, Frame 1+L+1, Frame 1+2(L+1), Frame 1+3(L+1), ... for a Frame 1, Frame 1+L+1, Frame 1+2(L+1), Frame 1+3(L+1), ... for a
total of B frames total of B frames
skipping to change at page 10, line 20 skipping to change at page 10, line 8
Within each interleave group, the RTP packets making up the Within each interleave group, the RTP packets making up the
interleave group MUST be transmitted in value-increasing order of the interleave group MUST be transmitted in value-increasing order of the
NNN field. While this does not guarantee reduced end-to-end delay on NNN field. While this does not guarantee reduced end-to-end delay on
the receiving end, when packets are delivered in order by the the receiving end, when packets are delivered in order by the
underlying transport, delay will be reduced to the minimum possible. underlying transport, delay will be reduced to the minimum possible.
Receivers MAY signal the maximum number of codec data frames (i.e., Receivers MAY signal the maximum number of codec data frames (i.e.,
the maximum acceptable bundling value B) they can handle in a single the maximum acceptable bundling value B) they can handle in a single
RTP packet using the OPTIONAL maxptime RTP mode parameter identified RTP packet using the OPTIONAL maxptime RTP mode parameter identified
in Section 10. in Section 12.
Receivers MAY signal the maximum interleave length (i.e., the maximum Receivers MAY signal the maximum interleave length (i.e., the maximum
acceptable LLL value in the Interleaving Octet) they will accept acceptable LLL value in the Interleaving Octet) they will accept
using the OPTIONAL maxinterleave RTP mode parameter identified in using the OPTIONAL maxinterleave RTP mode parameter identified in
Section 10. Section 12.
The parameters maxptime and maxinterleave are exchanged at the
initial setup of the session. In one-to-one sessions, the sender MUST
respect these values set be the receiver, and MUST NOT
interleave/bundle more packets than what the receiver signals that it
can handle. This ensures that the receiver can allocate a known
amount of buffer space that will be sufficient for all
interleaving/bundling used in that session. During the session, the
sender may decrease the bundling value or interleaving length (so
that less buffer space is required at the receiver), but never exceed
the maximum value set by the receiver. This prevents the situation
where a receiver needs to allocate more buffer space in the middle of
a session but is unable to do so.
Additionally, senders have the following restrictions: Additionally, senders have the following restrictions:
o MUST NOT bundle more codec data frames in a single RTP packet than o MUST NOT bundle more codec data frames in a single RTP packet than
indicated by maxptime (see Section 10) if it is signaled. indicated by maxptime (see Section 12) if it is signaled.
o SHOULD NOT bundle more codec data frames in a single RTP packet o SHOULD NOT bundle more codec data frames in a single RTP packet
than will fit in the MTU of the underlying network. than will fit in the MTU of the underlying network.
o Once beginning a session with a given maximum interleaving value o Once beginning a session with a given maximum interleaving value
set by maxinterleave in Section 10, MUST NOT increase the set by maxinterleave in Section 12, MUST NOT increase the
interleaving value (LLL) to exceed the maximum interleaving value interleaving value (LLL) to exceed the maximum interleaving value
that is signaled. that is signaled.
o MAY change the interleaving value only between interleave groups. o MAY change the interleaving value, but MUST do so only between
interleave groups.
o Silence suppression MAY only be used between interleave groups. A o Silence suppression MAY only be used between interleave groups. A
ToC with Frame Type 0 (Blank Frame, Section 5.1) MUST be used ToC with Frame Type 0 (Blank Frame, Section 5.1) MUST be used
within interleaving groups if the codec outputs a blank frame. within interleaving groups if the codec outputs a blank frame.
The M bits in the RTP header MUST NOT be set, as the stream is The M bits in the RTP header is not set for these blank frames,
continuous in time. Because there is only one time stamp for each as the stream is continuous in time. Because there is only one
RTP packet, silence suppression used within an interleave group time stamp for each RTP packet, silence suppression used within
will cause ambiguities when reconstructing the speech at the an interleave group would cause ambiguities when reconstructing
receiver side, and thus is prohibited. the speech at the receiver side, and thus is prohibited.
6.1. Finding Interleave Group Boundaries 6.1. Finding Interleave Group Boundaries
Given an RTP packet with sequence number S, interleave length (field Given an RTP packet with sequence number S, interleave length (field
LLL) L, interleave index value (field NNN) N, and bundling value B, LLL) L, interleave index value (field NNN) N, and bundling value B,
the interleave group consists of this RTP packet and other RTP the interleave group consists of this RTP packet and other RTP
packets with sequence numbers from S-N to S-N+L inclusive. (The packets with sequence numbers from S-N mod 65536 to S-N+L mod 65536
sequence numbers used here are for illustrative purposes. When inclusive. In other words, the interleave group always consists of
wrapping around happens, the sequence numbers need to be adjusted
accordingly). In other words, the interleave group always consists of
L+1 RTP packets with sequential sequence numbers. The bundling value L+1 RTP packets with sequential sequence numbers. The bundling value
for all RTP packets in an interleave group MUST be the same. for all RTP packets in an interleave group MUST be the same.
The receiver determines the expected bundling value for all RTP The receiver determines the expected bundling value for all RTP
packets in an interleave group by the number of codec data frames packets in an interleave group by the number of codec data frames
bundled in the first RTP packet of the interleave group received. bundled in the first RTP packet of the interleave group received.
Note that this may not be the first RTP packet of the interleave Note that this may not be the first RTP packet of the interleave
group if packets are delivered out of order by the underlying group if packets are delivered out of order by the underlying
transport. transport.
On receipt of an RTP packet in an interleave group with other than 6.2. Additional Receiver Responsibilities
the expected bundling value, the receiver MAY discard codec data
frames off the end of the RTP packet or add erasure codec data frames
to the end of the packet in order to manufacture a substitute packet
with the expected bundling value. The receiver MAY instead choose to
discard the whole interleave group.
6.2. Reconstructing Interleaved Speech
Given an RTP sequence number ordered set of RTP packets in an
interleave group numbered 0..L, where L is the interleave length and
B is the bundling value, and codec data frames within each RTP packet
that are numbered in order from first to last with the numbers 1..B,
the original, time-ordered sequence of output frames from the EVRC
codec may be reconstructed as follows:
First L+1 frames:
Frame 0 from packet 0 of interleave group
Frame 0 from packet 1 of interleave group
And so on up to...
Frame 0 from packet L of interleave group
Second L+1 frames:
Frame 1 from packet 0 of interleave group
Frame 1 from packet 1 of interleave group
And so on up to...
Frame 1 from packet L of interleave group
And so on up to...
Bth L+1 frames:
Frame B from packet 0 of interleave group
Frame B from packet 1 of interleave group
And so on up to...
Frame B from packet L of interleave group
6.3. Receiving Invalid Interleaving Values
On receipt of an RTP packet with an invalid value of the LLL or NNN
fields, the RTP packet SHOULD be treated as lost by the receiver for
the purpose of generating erasure frames as described in Section 8.
6.4. Additional Receiver Responsibilities
Assume that the receiver has begun playing frames from an interleave Assume that the receiver has begun playing frames from an interleave
group. The time has come to play frame x from packet n of the group. The time has come to play frame x from packet n of the
interleave group. Further assume that packet n of the interleave interleave group. Further assume that packet n of the interleave
group has not been received. As described in section 8, an erasure group has not been received. As described in Section 8, an erasure
frame will be sent to the receiving vocoder. frame will be sent to the receiving vocoder.
Now, assume that packet n of the interleave group arrives before Now, assume that packet n of the interleave group arrives before
frame x+1 of that packet is needed. Receivers SHOULD use frame x+1 of frame x+1 of that packet is needed. Receivers SHOULD use frame x+1 of
the newly received packet n rather than substituting an erasure the newly received packet n rather than substituting an erasure
frame. In other words, just because packet n was not available the frame. In other words, just because packet n was not available the
first time it was needed to reconstruct the interleaved speech, the first time it was needed to reconstruct the interleaved speech, the
receiver SHOULD NOT assume it is not available when it is receiver SHOULD NOT assume it is not available when it is
subsequently needed for interleaved speech reconstruction. subsequently needed for interleaved speech reconstruction.
skipping to change at page 12, line 42 skipping to change at page 11, line 53
Bundling codec data frames indicates multiple data frames are Bundling codec data frames indicates multiple data frames are
included consecutively in a packet, because the interleaving length included consecutively in a packet, because the interleaving length
(LLL) is 0. The interleaving group is thus reduced to a single RTP (LLL) is 0. The interleaving group is thus reduced to a single RTP
packet, and the reconstruction of the code data frames from RTP packet, and the reconstruction of the code data frames from RTP
packets becomes a much simpler process. packets becomes a much simpler process.
Furthermore, the additional restrictions on senders are reduced to: Furthermore, the additional restrictions on senders are reduced to:
o MUST NOT bundle more codec data frames in a single RTP packet than o MUST NOT bundle more codec data frames in a single RTP packet than
indicated by maxptime (see Section 10) if it is signaled. indicated by maxptime (see Section 12) if it is signaled.
o SHOULD NOT bundle more codec data frames in a single RTP packet o SHOULD NOT bundle more codec data frames in a single RTP packet
than will fit in the MTU of the underlying network. than will fit in the MTU of the underlying network.
8. Handling Missing Codec Data Frames 8. Handling Missing Codec Data Frames
The vocoders covered by this payload format support erasure frame as The vocoders covered by this payload format support erasure frame as
an indication when frames are not available. While an erasure frame an indication when frames are not available. The erasure frames are
MUST NOT be transmitted by an RTP sender, it MAY be used internally normally used internally by a receiver to advance the state of the
by a receiver to advance the state of the voice decoder by exactly voice decoder by exactly one frame time for each missing frame. Using
one frame time for each missing frame. Using the information from the information from packet sequence number, time stamp, and the M
packet sequence number, time stamp, and the M bit, the receiver can bit, the receiver can detect missing codec data frames from RTP
detect missing codec data frames from RTP packet loss and/or silence packet loss and/or silence suppression, and generate corresponding
suppression, and generate corresponding erasure frames. Erasure erasure frames. Erasure frames MUST also be used in storage mode to
frames SHOULD also be used in storage mode to record missing frames. record missing frames.
9. Implementation Issues 9. Implementation Issues
9.1. Interleaving Length 9.1. Interleaving Length
The vocoder interpolates the missing speech content when given an The vocoder interpolates the missing speech content when given an
erasure frame. However, the best quality is perceived by the listener erasure frame. However, the best quality is perceived by the listener
when erasure frames are not consecutive. This makes interleaving when erasure frames are not consecutive. This makes interleaving
desirable as it increases speech quality when packet loss occurs. desirable as it increases speech quality when packet loss occurs.
On the other hand, interleaving can greatly increase the end-to-end On the other hand, interleaving can greatly increase the end-to-end
delay. Where an interactive session is desired, either Type 1 delay. Where an interactive session is desired, either Type 1
Interleaved/Bundled with interleaving length (field LLL) 0 or Type 2 Interleaved/Bundled with interleaving length (field LLL) 0 or Type 2
Header-Free RTP payload types are RECOMMENDED. Header-Free RTP payload types are RECOMMENDED.
When end-to-end delay is not a concern, an interleaving length (field When end-to-end delay is not a primary concern, an interleaving
LLL) of 4 or 5 is RECOMMENDED. length (field LLL) of 4 or 5 is RECOMMENDED as it offers a reasonable
compromise between robustness and latency.
The parameters maxptime and maxinterleave are exchanged at the 9.2. Validation of Received Packets
initial setup of the session so that the receiver can allocate a
known amount of buffer space that will be sufficient for all future
reception in that session. During the session, the sender may
decrease the bundling value or interleaving length (so that less
buffer space is required at the receiver), but never require more
buffer space. This prevents the situation where a receiver needs to
allocate more buffer space in the middle of a session but is unable
to do so.
9.2. Mode Request When receiving an RTP packet, the receiver SHOULD check the validity
of the ToC fields and match the length of the packet with what is
indicated by the ToC fields. If any invalidity or mismatch is
detected, it is RECOMMENDED to discard the received packet to avoid
potential severe degradation of the speech quality. The discarded
packet is treated following the same procedure as a lost packet, and
the discarded data will be replaced with erasure frames.
On receipt of an RTP packet with an invalid value of the LLL or NNN
fields, the RTP packet SHOULD be treated as lost by the receiver for
the purpose of generating erasure frames as described in Section 8.
On receipt of an RTP packet in an interleave group with other than
the expected frame count value, the receiver MAY discard codec data
frames off the end of the RTP packet or add erasure codec data frames
to the end of the packet in order to manufacture a substitute packet
with the expected bundling value. The receiver MAY instead choose to
discard the whole interleave group.
10. Mode Request
The Mode Request signal requests a particular encoding mode for the The Mode Request signal requests a particular encoding mode for the
speech encoding in the reverse direction. All implementations are speech encoding in the reverse direction. All implementations are
RECOMMENDED to honor the Mode Request signal. The Mode Request signal RECOMMENDED to honor the Mode Request signal. The Mode Request signal
SHOULD only be used in one-to-one sessions. In multiparty sessions, SHOULD only be used in one-to-one sessions. In multiparty sessions,
any received Mode Request signals SHOULD be ignored. any received Mode Request signals SHOULD be ignored.
In addition, the Mode Request signal MAY also be sent through non-RTP In addition, the Mode Request signal MAY also be sent through non-RTP
means, which is out of the scope of this specification. means, which is out of the scope of this specification.
skipping to change at page 14, line 16 skipping to change at page 13, line 38
Each codec type using this format SHOULD define its own Each codec type using this format SHOULD define its own
interpretation of the Mode Request field. Codecs SHOULD follow the interpretation of the Mode Request field. Codecs SHOULD follow the
convention that higher values of the three-bit field correspond to an convention that higher values of the three-bit field correspond to an
equal or lower average output bit rate. equal or lower average output bit rate.
For the EVRC codec, the Mode Request field MUST be interpreted For the EVRC codec, the Mode Request field MUST be interpreted
according to Tables 2.2.1.2-1 and 2.2.1.2-2 of the EVRC codec according to Tables 2.2.1.2-1 and 2.2.1.2-2 of the EVRC codec
specifications [1]. Values above '100' (4) are currently reserved. specifications [1]. Values above '100' (4) are currently reserved.
If an unknown value above '100' (4) is received, it MUST be handled If an unknown value above '100' (4) is received, it MUST be handled
as if '100' (4) were received. as if '100' (4) were received, for interoperability with potential
future revisions.
For SMV codec, the Mode Request field MUST be interpreted according For SMV codec, the Mode Request field MUST be interpreted according
to Table 2.2-2 of the SMV codec specifications [2]. Values above to Table 2.2-2 of the SMV codec specifications [2]. Values above
'101' (5) are currently reserved. If an unknown value above '101' (5) '101' (5) are currently reserved. If an unknown value above '101' (5)
is received, it MUST be handled as if '101' (5) were received. is received, it MUST be handled as if '101' (5) were received, also
for interoperability with potential future revisions.
10. IANA Considerations
Two new MIME sub-types as described in this section are to be
registered.
The MIME-names for the EVRC and SMV codec are allocated from the IETF
tree since all the vocoders covered are expected to be widely used
for Voice-over-IP applications.
The RTP mode has been described in the previous sections.
10.1. Storage Mode 11. Storage Mode
The storage mode is used for storing speech frames, e.g., as a file The storage mode is used for storing speech frames, e.g., as a file
or e-mail attachment. or e-mail attachment.
The file begins with a magic number to identify the vocoder that is The file begins with a magic number to identify the vocoder that is
used. The magic number for EVRC corresponds to the ASCII character used. The magic number for EVRC corresponds to the ASCII character
string "#!EVRC\n", i.e., "0x23 0x21 0x45 0x56 0x52 0x43 0x0A" in string "#!EVRC\n", i.e., "0x23 0x21 0x45 0x56 0x52 0x43 0x0A" in
network byte order. The magic number for SMV corresponds to the ASCII network byte order. The magic number for SMV corresponds to the ASCII
character string "#!SMV\n", i.e., "0x23 0x21 0x53 0x4d 0x56 0x0a" in character string "#!SMV\n", i.e., "0x23 0x21 0x53 0x4d 0x56 0x0a" in
network byte order. network byte order.
The codec data frames are stored in consecutive order, with a single The codec data frames are stored in consecutive order, with a single
TOC entry field, expanded to one octet, prefixing each codec data TOC entry field, extended to one octet, prefixing each codec data
frame. The ToC field is expanded to one octet by setting the left- frame. The ToC field is extended to one octet by setting the four
most four bits of the octet to zero. For example, a ToC value of 4 (a most significant bits of the octet to zero. For example, a ToC value
full-rate frame) is stored as 0x04. of 4 (a full-rate frame) is stored as 0x04.
Speech frames lost in transmission and non-received frames MUST be Speech frames lost in transmission and non-received frames MUST be
stored as erasure frames (frame type 5, see definition in Section stored as erasure frames (frame type 5, see definition in Section
5.1) to maintain synchronization with the original media. 5.1) to maintain synchronization with the original media.
10.2. EVRC MIME Registration 12. IANA Considerations
Two new MIME sub-types as described in this section are to be
registered.
The MIME-names for the EVRC and SMV codec are allocated from the IETF
tree since all the vocoders covered are expected to be widely used
for Voice-over-IP applications.
12.1. Registration of Media Type EVRC
Media Type Name: audio Media Type Name: audio
Media Subtype Name: EVRC Media Subtype Name: EVRC
Type 1 Interleaved/Bundled packet format for EVRC
Required Parameter for RTP mode: Required Parameter: none
ptype: Indicates the Type of the RTP/Vocoder packets. The
valid values are 1 (Type 1 Interleaved/Bundled) or 2 (Type 2
Header-Free).
Optional parameters for RTP mode: Optional parameters:
The following parameter applies to RTP mode only.
ptime: Defined as usual for RTP audio [6]. ptime: Defined as usual for RTP audio [6].
maxptime: The maximum amount of media which can be encapsulated maxptime: The maximum amount of media which can be encapsulated
in each packet, expressed as time in milliseconds. The time in each packet, expressed as time in milliseconds. The time
SHALL be calculated as the sum of the time the media present SHALL be calculated as the sum of the time the media present
in the packet represents. The time SHOULD be a multiple of the in the packet represents. The time SHOULD be a multiple of the
duration of a single codec data frame (20 msec). If not duration of a single codec data frame (20 msec). If not
signaled, the default maxptime value SHALL be 200 signaled, the default maxptime value SHALL be 200
milliseconds. milliseconds.
maxinterleave: Maximum number for interleaving length (field LLL maxinterleave: Maximum number for interleaving length (field LLL
in the Interleaving Octet). The interleaving lengths used in in the Interleaving Octet). The interleaving lengths used in
the entire session MUST NOT exceed this maximum value. If not the entire session MUST NOT exceed this maximum value. If not
signaled, the maxinterleave length SHALL be 5. signaled, the maxinterleave length SHALL be 5.
Optional parameters for storage mode: none Encoding considerations:
For RTP mode, see Section 6 and Section 7 of RFC xxxx.
Encoding considerations for RTP mode: see Section 6 and Section 7 of For storage mode, see Section 11 of RFC xxxx.
RFC xxxx.
Encoding considerations for storage mode: see Section 10.1 of RFC Security considerations:
xxxx. See Section 14 "Security Considerations" of RFC xxxx.
Security considerations: see Section 12 "Security Considerations" of Public specification:
RFC xxxx. RFC xxxx.
Public specification: RFC xxxx. Additional information:
The following information applies for storage mode only.
Additional information for storage mode:
Magic number: #!EVRC\n Magic number: #!EVRC\n
File extensions: evc, EVC File extensions: evc, EVC
Macintosh file type code: none Macintosh file type code: none
Object identifier or OID: none Object identifier or OID: none
Intended usage: COMMON. It is expected that many VoIP applications Intended usage:
(as well as mobile applications) will use this type. COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type.
Person & email address to contact for further information: Person & email address to contact for further information:
Adam Li Adam Li
adamli@icsl.ucla.edu adamli@icsl.ucla.edu
Author/Change controller: Author/Change controller:
Adam Li Adam Li
adamli@icsl.ucla.edu adamli@icsl.ucla.edu
IETF Audio/Video Transport Working Group IETF Audio/Video Transport Working Group
10.3. SMV MIME Registration 12.2. Registration of Media Type EVRC0
Media Type Name: audio Media Type Name: audio
Media Subtype Name: SMV Media Subtype Name: EVRC0
Type 2 Header-Free packet format for EVRC
Required Parameter for RTP mode: Required Parameter: none
ptype: Indicates the Type of the RTP/Vocoder packets. The Optional parameters: none
valid values are 1 (Type 1 Interleaved/Bundled) or 2 (Type 2
Header-Free).
Optional parameters for RTP mode: Encoding considerations: none
Security considerations:
See Section 14 "Security Considerations" of RFC xxxx.
Public specification:
RFC xxxx.
Additional information: none
Intended usage:
COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type.
Person & email address to contact for further information:
Adam Li
adamli@icsl.ucla.edu
Author/Change controller:
Adam Li
adamli@icsl.ucla.edu
IETF Audio/Video Transport Working Group
12.3. Registration of Media Type SMV
Media Type Name: audio
Media Subtype Name: SMV
Type 1 Interleaved/Bundled packet format for SMV
Required Parameter: none
Optional parameters:
The following parameter applies to RTP mode only.
ptime: Defined as usual for RTP audio [6]. ptime: Defined as usual for RTP audio [6].
maxptime: The maximum amount of media which can be encapsulated maxptime: The maximum amount of media which can be encapsulated
in each packet, expressed as time in milliseconds. The time in each packet, expressed as time in milliseconds. The time
SHALL be calculated as the sum of the time the media present SHALL be calculated as the sum of the time the media present
in the packet represents. The time SHOULD be a multiple of the in the packet represents. The time SHOULD be a multiple of the
duration of a single codec data frame (20 msec). If not duration of a single codec data frame (20 msec). If not
signaled, the default maxptime value SHALL be 200 signaled, the default maxptime value SHALL be 200
milliseconds. milliseconds.
maxinterleave: Maximum number for interleaving length (field LLL maxinterleave: Maximum number for interleaving length (field LLL
in the Interleaving Octet). The interleaving lengths used in in the Interleaving Octet). The interleaving lengths used in
the entire session MUST NOT exceed this maximum value. If not the entire session MUST NOT exceed this maximum value. If not
signaled, the maxinterleave length SHALL be 5. signaled, the maxinterleave length SHALL be 5.
Optional parameters for storage mode: none Encoding considerations:
For RTP mode, see Section 6 and Section 7 of RFC xxxx.
Encoding considerations for RTP mode: see Section 6 and Section 7 of For storage mode, see Section 11 of RFC xxxx.
RFC xxxx.
Encoding considerations for storage mode: see Section 10.1 of RFC Security considerations:
xxxx. See Section 14 "Security Considerations" of RFC xxxx.
Security considerations: see Section 12 "Security Considerations" of Public specification:
RFC xxxx. RFC xxxx.
Public specification: RFC xxxx. Additional information:
The following information applies to storage mode only.
Additional information for storage mode:
Magic number: #!SMV\n Magic number: #!SMV\n
File extensions: smv, SMV File extensions: smv, SMV
Macintosh file type code: none Macintosh file type code: none
Object identifier or OID: none Object identifier or OID: none
Intended usage:
COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type.
Intended usage: COMMON. It is expected that many VoIP applications Person & email address to contact for further information:
(as well as mobile applications) will use this type. Adam Li
adamli@icsl.ucla.edu
Author/Change controller:
Adam Li
adamli@icsl.ucla.edu
IETF Audio/Video Transport Working Group
12.4. Registration of Media Type SMV0
Media Type Name: audio
Media Subtype Name: SMV0
Type 2 Header-Free packet format for SMV
Required Parameter: none
Optional parameters: none
Encoding considerations: none
Security considerations:
See Section 14 "Security Considerations" of RFC xxxx.
Public specification:
RFC xxxx.
Additional information: none
Intended usage:
COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type.
Person & email address to contact for further information: Person & email address to contact for further information:
Adam Li Adam Li
adamli@icsl.ucla.edu adamli@icsl.ucla.edu
Author/Change controller: Author/Change controller:
Adam Li Adam Li
adamli@icsl.ucla.edu adamli@icsl.ucla.edu
IETF Audio/Video Transport Working Group IETF Audio/Video Transport Working Group
11. Mapping to SDP Parameters 13. Mapping to SDP Parameters
Please note that this section applies to the RTP mode only. Please note that this section applies to the RTP mode only.
Parameters are mapped to SDP [6] as usual. The information carried in the MIME media type specification has a
Example usage in SDP: specific mapping to fields in the Session Description Protocol (SDP)
[6], which is commonly used to describe RTP sessions. When SDP is
used to specify sessions employing the EVRC or EMV codec, the mapping
is as follows:
o The MIME type ("audio") goes in SDP "m=" as the media name.
o The MIME subtype (payload format name) goes in SDP "a=rtpmap"
as the encoding name.
o The parameters "ptime" and "maxptime" go in the SDP "a=ptime"
and "a=maxptime" attributes, respectively.
o Any remaining parameters go in the SDP "a=fmtp" attribute by
copying them directly from the MIME media type string as a
semicolon separated list of parameter=value pairs.
Some examples of SDP session descriptions for EVRC and SMV encodings
follow below.
Example of usage of EVRC:
m = audio 49120 RTP/AVP 97 m = audio 49120 RTP/AVP 97
a = rtpmap:97 EVRC a = rtpmap:97 EVRC
a = fmtp:97 ptype=1; maxinterleave=2 a = fmtp:97 maxinterleave=2
a = maxptime:80 a = maxptime:80
12. Security Considerations Example of usage of SMV
m = audio 49122 RTP/AVP 99
a = rtpmap:99 SMV0
a = fmtp:99
Note that the payload format (encoding) names are commonly shown in
upper case. MIME subtypes are commonly shown in lower case. These
names are case-insensitive in both places. Similarly, parameter names
are case-insensitive both in MIME types and in the default mapping to
the SDP a=fmtp attribute.
14. Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [4], and any appropriate profile (for example [5]). specification [4], and any appropriate profile (for example [5]).
This implies that confidentiality of the media streams is achieved by This implies that confidentiality of the media streams is achieved by
encryption. Because the data compression used with this payload encryption. Because the data compression used with this payload
format is applied end-to-end, encryption may be performed after format is applied end-to-end, encryption may be performed after
compression so there is no conflict between the two operations. compression so there is no conflict between the two operations.
A potential denial-of-service threat exists for data encoding using A potential denial-of-service threat exists for data encoding using
skipping to change at page 18, line 7 skipping to change at page 19, line 20
be overloaded simply by the receipt of too many packets, either be overloaded simply by the receipt of too many packets, either
desired or undesired. Network-layer authentication may be used to desired or undesired. Network-layer authentication may be used to
discard packets from undesired sources, but the processing cost of discard packets from undesired sources, but the processing cost of
the authentication itself may be too high. In a multicast the authentication itself may be too high. In a multicast
environment, pruning of specific sources may be implemented in environment, pruning of specific sources may be implemented in
future versions of IGMP [7] and in multicast routing protocols to future versions of IGMP [7] and in multicast routing protocols to
allow a receiver to select which sources are allowed to reach it. allow a receiver to select which sources are allowed to reach it.
Interleaving MAY affect encryption. Depending on the used encryption Interleaving MAY affect encryption. Depending on the used encryption
scheme there MAY be restrictions on for example the time when keys scheme there MAY be restrictions on for example the time when keys
can be changed. can be changed. Specifically, the key change may need to occur at the
boundary between interleave groups.
13. Adding Support of Other Frame-Based Vocoders 15. Adding Support of Other Frame-Based Vocoders
As described above, the RTP packet format defined in this document is As described above, the RTP packet format defined in this document is
very flexible and designed to be usable by other frame-based very flexible and designed to be usable by other frame-based
vocoders. vocoders.
Additional vocoders using this format MUST have properties as Additional vocoders using this format MUST have properties as
described in Section 3.3. described in Section 3.3.
The following need to be done in order for any eligible vocoders to For an eligible vocoder to use the payload format mechanisms defined
use the RTP payload format defined in this document: in this document, a new RTP payload format document needs to be
published as an RFC. That document can simply refer to this document
and then specify the following parameters:
o Define the unit used for RTP time stamp; o Define the unit used for RTP time stamp;
o Define the meaning of the Mode Request bits; o Define the meaning of the Mode Request bits;
o Define corresponding codec data frame type values for ToC; o Define corresponding codec data frame type values for ToC;
o Define the conversion procedure for vocoders output data frame; o Define the conversion procedure for vocoders output data frame;
o Define a magic number for storage mode, and complete the o Define a magic number for storage mode, and complete the
corresponding MIME registration. corresponding MIME registration.
14. Acknowledgements 16. Acknowledgements
The following authors have made significant contributions to this The following authors have made significant contributions to this
document: Adam H. Li, John D. Villasenor, Dong-Seek Park, Jeong-Hoon document: Adam H. Li, John D. Villasenor, Dong-Seek Park, Jeong-Hoon
Park, Keith Miller, S. Craig Greer, David Leon, Nikolai Leung, Park, Keith Miller, S. Craig Greer, David Leon, Nikolai Leung,
Marcello Lioy, Kyle J. McKay, Magdalena L. Espelien, Randall Gellens, Marcello Lioy, Kyle J. McKay, Magdalena L. Espelien, Randall Gellens,
Tom Hiller, Peter J. McCann, Stinson S. Mathai, Michael D. Turner, Tom Hiller, Peter J. McCann, Stinson S. Mathai, Michael D. Turner,
Ajay Rajkumar, Dan Gal, Magnus Westerlund, Lars-Erik Jonsson, Greg Ajay Rajkumar, Dan Gal, Magnus Westerlund, Lars-Erik Jonsson, Greg
Sherwood, and Thomas Zeng. Sherwood, and Thomas Zeng.
15. References 17. References
[1] 3GPP2 C.S0014, "Enhanced Variable Rate Codec, Speech Service [1] 3GPP2 C.S0014, "Enhanced Variable Rate Codec, Speech Service
Option 3 for Wideband Spread Spectrum Digital Systems", January Option 3 for Wideband Spread Spectrum Digital Systems", January
1997. 1997.
[2] 3GPP2 C.S0030, "Selectable Mode Vocoder", August 2001. [2] C.S0030-0 v2.0, "Selectable Mode Vocoder, Service Option for
Wideband Spread Spectrum Communication Systems", May 2002.
[3] Bradner, S., "Key words for use in RFCs to Indicate Requirement [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997. Levels", BCP 14, RFC 2119, March 1997.
[4] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, [4] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
"RTP: A Transport Protocol for Real-Time Applications", RFC "RTP: A Transport Protocol for Real-Time Applications", RFC
1889, January 1996. 1889, January 1996.
[5] Schulzrinne, H., "RTP Profile for Audio and Video Conferences [5] Schulzrinne, H., "RTP Profile for Audio and Video Conferences
with Minimal Control", RFC 1890, January 1996. with Minimal Control", RFC 1890, January 1996.
[6] M. Handley and V. Jacobson, "SDP: Session Description Protocol", [6] M. Handley and V. Jacobson, "SDP: Session Description Protocol",
RFC 2327, April 1998. RFC 2327, April 1998.
[7] Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC [7] Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC
1112, August 1989. 1112, August 1989.
16. Authors' Address 18. Authors' Address
The editor will serve as the point of contact for technical issues. The editor will serve as the point of contact for technical issues.
Adam H. Li Adam H. Li
Image Communication Lab Image Communication Lab
Electrical Engineering Department Electrical Engineering Department
University of California University of California
Los Angeles, CA 90095 Los Angeles, CA 90095
USA USA
Phone: +1 310 825 5178 Phone: +1 310 825 5178
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/