draft-ietf-ecm-cm-04.txt   rfc3124.txt 
Internet Engineering Task Force Hari Balakrishnan
INTERNET DRAFT MIT LCS
Document: draft-ietf-ecm-cm-04.txt Srinivasan Seshan
CMU
May, 2001
Expires: November 2001
The Congestion Manager Network Working Group H. Balakrishnan
Request for Comments: 3124 MIT LCS
Category: Standards Track S. Seshan
CMU
June 2001
The Congestion Manager
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document specifies an Internet standards track protocol for the
all provisions of Section 10 of RFC-2026 [Bradner96]. Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Internet-Drafts are working documents of the Internet Engineering Copyright Notice
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
1. Abstract Copyright (C) The Internet Society (2001). All Rights Reserved.
Abstract
This document describes the Congestion Manager (CM), an end-system This document describes the Congestion Manager (CM), an end-system
module that: module that:
(i) Enables an ensemble of multiple concurrent streams from a (i) Enables an ensemble of multiple concurrent streams from a sender
sender destined to the same receiver and sharing the same destined to the same receiver and sharing the same congestion
congestion properties to perform proper congestion avoidance and properties to perform proper congestion avoidance and control, and
control, and
(ii) Allows applications to easily adapt to network congestion. (ii) Allows applications to easily adapt to network congestion.
The framework described in this document integrates congestion 1. Conventions used in this document:
management across all applications and transport protocols. The CM
maintains congestion parameters (available aggregate and per-stream
bandwidth, per-receiver round-trip times, etc.) and exports an API
that enables applications to learn about network characteristics,
pass information to the CM, share congestion information with each
other, and schedule data transmissions. This document focuses on
applications and transport protocols with their own independent
per-byte or per-packet sequence number information, and does not
require modifications to the receiver protocol stack. However, the
receiving application must provide feedback to the sending
application about received packets and losses, and the latter is
expected to use the CM API to update CM state. This document does
not address networks with reservations or service differentiation.
2. Conventions used in this document:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
this document are to be interpreted as described in RFC-2119 document are to be interpreted as described in RFC-2119 [Bradner97].
[Bradner97].
STREAM STREAM
A group of packets that all share the same source and
destination IP address, IP type-of-service, transport A group of packets that all share the same source and destination
protocol, and source and destination transport-layer port IP address, IP type-of-service, transport protocol, and source and
numbers. destination transport-layer port numbers.
MACROFLOW MACROFLOW
A group of CM-enabled streams that all use the same congestion
management and scheduling algorithms, and share congestion A group of CM-enabled streams that all use the same congestion
state information. Currently, streams destined to different management and scheduling algorithms, and share congestion state
receivers belong to different macroflows. Streams destined to information. Currently, streams destined to different receivers
the same receiver MAY belong to different macroflows. When belong to different macroflows. Streams destined to the same
the Congestion Manager is in use, streams that experience receiver MAY belong to different macroflows. When the Congestion
identical congestion behavior and use the same congestion Manager is in use, streams that experience identical congestion
control algorithm SHOULD belong to the same macroflow. behavior and use the same congestion control algorithm SHOULD
belong to the same macroflow.
APPLICATION APPLICATION
Any software module that uses the CM. This includes
user-level applications such as Web servers or audio/video Any software module that uses the CM. This includes user-level
servers, as well as in-kernel protocols such as TCP [Postel81] applications such as Web servers or audio/video servers, as well
that use the CM for congestion control. as in-kernel protocols such as TCP [Postel81] that use the CM for
congestion control.
WELL-BEHAVED APPLICATION WELL-BEHAVED APPLICATION
An application that only transmits when allowed by the CM and
accurately accounts for all data that it has sent to the An application that only transmits when allowed by the CM and
receiver by informing the CM using the CM API. accurately accounts for all data that it has sent to the receiver
by informing the CM using the CM API.
PATH MAXIMUM TRANSMISSION UNIT (PMTU) PATH MAXIMUM TRANSMISSION UNIT (PMTU)
The size of the largest packet that the sender can transmit
without it being fragmented en route to the receiver. It The size of the largest packet that the sender can transmit
includes the sizes of all headers and data except the IP without it being fragmented en route to the receiver. It includes
header. the sizes of all headers and data except the IP header.
CONGESTION WINDOW (cwnd) CONGESTION WINDOW (cwnd)
A CM state variable that modulates the amount of outstanding
data between sender and receiver. A CM state variable that modulates the amount of outstanding data
between sender and receiver.
OUTSTANDING WINDOW (ownd) OUTSTANDING WINDOW (ownd)
The number of bytes that has been transmitted by the source,
but not known to have been either received by the destination The number of bytes that has been transmitted by the source, but
or lost in the network. not known to have been either received by the destination or lost
in the network.
INITIAL WINDOW (IW) INITIAL WINDOW (IW)
The size of the sender's congestion window at the beginning of
a macroflow. The size of the sender's congestion window at the beginning of a
macroflow.
DATA TYPE SYNTAX DATA TYPE SYNTAX
We use "u64" for unsigned 64-bit, "u32" for unsigned 32-
bit, "u16" for unsigned 16-bit, "u8" for unsigned 8-bit, "i32" for
signed 32-bit, "i16" for signed 16-bit quantities, "float" for IEEE
floating point values. The type "void" is used to indicate that no
return value is expected from a call. Pointers are referred to
using "*" syntax, following C language convention.
We emphasize that all the API functions described in this We use "u64" for unsigned 64-bit, "u32" for unsigned 32-bit, "u16"
document are "abstract" calls and that conformant CM for unsigned 16-bit, "u8" for unsigned 8-bit, "i32" for signed
implementations may differ in specific implementation details. 32-bit, "i16" for signed 16-bit quantities, "float" for IEEE
floating point values. The type "void" is used to indicate that
no return value is expected from a call. Pointers are referred to
using "*" syntax, following C language convention.
3. Introduction We emphasize that all the API functions described in this document
are "abstract" calls and that conformant CM implementations may
differ in specific implementation details.
2. Introduction
The framework described in this document integrates congestion
management across all applications and transport protocols. The CM
maintains congestion parameters (available aggregate and per-stream
bandwidth, per-receiver round-trip times, etc.) and exports an API
that enables applications to learn about network characteristics,
pass information to the CM, share congestion information with each
other, and schedule data transmissions. This document focuses on
applications and transport protocols with their own independent per-
byte or per-packet sequence number information, and does not require
modifications to the receiver protocol stack. However, the receiving
application must provide feedback to the sending application about
received packets and losses, and the latter is expected to use the CM
API to update CM state. This document does not address networks with
reservations or service differentiation.
The CM is an end-system module that enables an ensemble of multiple The CM is an end-system module that enables an ensemble of multiple
concurrent streams to perform stable congestion avoidance and concurrent streams to perform stable congestion avoidance and
control, and allows applications to easily adapt their control, and allows applications to easily adapt their transmissions
transmissions to prevailing network conditions. It integrates to prevailing network conditions. It integrates congestion
congestion management across all applications and transport management across all applications and transport protocols. It
protocols. It maintains congestion parameters (available aggregate maintains congestion parameters (available aggregate and per-stream
and per-stream bandwidth, per-receiver round-trip times, etc.) and bandwidth, per-receiver round-trip times, etc.) and exports an API
exports an API that enables applications to learn about network that enables applications to learn about network characteristics,
characteristics, pass information to the CM, share congestion pass information to the CM, share congestion information with each
information with each other, and schedule data transmissions. When other, and schedule data transmissions. When the CM is used, all
the CM is used, all data transmissions subject to the CM must be data transmissions subject to the CM must be done with the explicit
done with the explicit consent of the CM via this API to ensure consent of the CM via this API to ensure proper congestion behavior.
proper congestion behavior.
Systems MAY choose to use CM, and if so they MUST follow this Systems MAY choose to use CM, and if so they MUST follow this
specification. specification.
This document focuses on applications and networks where the This document focuses on applications and networks where the
following conditions hold: following conditions hold:
1. Applications are well-behaved with their own independent 1. Applications are well-behaved with their own independent
per-byte or per-packet sequence number information, and use the per-byte or per-packet sequence number information, and use the
CM API to update internal state in the CM. CM API to update internal state in the CM.
skipping to change at line 158 skipping to change at page 4, line 22
paths with differing characteristics. paths with differing characteristics.
The Congestion Manager framework can be extended to support The Congestion Manager framework can be extended to support
applications that do not provide their own feedback and to applications that do not provide their own feedback and to
differentially-served networks. These extensions will be addressed differentially-served networks. These extensions will be addressed
in later documents. in later documents.
The CM is motivated by two main goals: The CM is motivated by two main goals:
(i) Enable efficient multiplexing. Increasingly, the trend on the (i) Enable efficient multiplexing. Increasingly, the trend on the
Internet is for unicast data senders (e.g., Web servers) to Internet is for unicast data senders (e.g., Web servers) to transmit
transmit heterogeneous types of data to receivers, ranging from heterogeneous types of data to receivers, ranging from unreliable
unreliable real-time streaming content to reliable Web pages and real-time streaming content to reliable Web pages and applets. As a
applets. As a result, many logically different streams share the result, many logically different streams share the same path between
same path between sender and receiver. For the Internet to remain sender and receiver. For the Internet to remain stable, each of
stable, each of these streams must incorporate control protocols these streams must incorporate control protocols that safely probe
that safely probe for spare bandwidth and react to for spare bandwidth and react to congestion. Unfortunately, these
congestion. Unfortunately, these concurrent streams typically compete concurrent streams typically compete with each other for network
with each other for network resources, rather than share them resources, rather than share them effectively. Furthermore, they do
effectively. Furthermore, they do not learn from each other about not learn from each other about the state of the network. Even if
the state of the network. Even if they each independently implement they each independently implement congestion control (e.g., a group
congestion control (e.g., a group of TCP connections each of TCP connections each implementing the algorithms in [Jacobson88,
implementing the algorithms in [Jacobson88, Allman99]), the Allman99]), the ensemble of streams tends to be more aggressive in
ensemble of streams tends to be more aggressive in the face of the face of congestion than a single TCP connection implementing
congestion than a single TCP connection implementing standard TCP standard TCP congestion control and avoidance [Balakrishnan98].
congestion control and avoidance [Balakrishnan98].
(ii) Enable application adaptation to congestion. Increasingly (ii) Enable application adaptation to congestion. Increasingly,
popular real-time streaming applications run over UDP using their popular real-time streaming applications run over UDP using their own
own user-level transport protocols for good application user-level transport protocols for good application performance, but
performance, but in most cases today do not adapt or react properly in most cases today do not adapt or react properly to network
to network congestion. By implementing a stable control algorithm congestion. By implementing a stable control algorithm and exposing
and exposing an adaptation API, the CM enables easy application an adaptation API, the CM enables easy application adaptation to
adaptation to congestion. Applications adapt the data they congestion. Applications adapt the data they transmit to the current
transmit to the current network conditions. network conditions.
The CM framework builds on recent work on TCP control block sharing The CM framework builds on recent work on TCP control block sharing
[Touch97], integrated TCP congestion control (TCP-Int) [Touch97], integrated TCP congestion control (TCP-Int)
[Balakrishnan98] and TCP sessions [Padmanabhan98]. [Touch97] [Balakrishnan98] and TCP sessions [Padmanabhan98]. [Touch97]
advocates the sharing of some of the state in the TCP control block advocates the sharing of some of the state in the TCP control block
to improve transient transport performance and describes sharing to improve transient transport performance and describes sharing
across an ensemble of TCP connections. [Balakrishnan98], across an ensemble of TCP connections. [Balakrishnan98],
[Padmanabhan98], and [Eggert00] describe several experiments that [Padmanabhan98], and [Eggert00] describe several experiments that
quantify the benefits of sharing congestion state, including quantify the benefits of sharing congestion state, including improved
improved stability in the face of congestion and better loss stability in the face of congestion and better loss recovery.
recovery. Integrating loss recovery across concurrent connections Integrating loss recovery across concurrent connections significantly
significantly improves performance because losses on one connection improves performance because losses on one connection can be detected
can be detected by noticing that later data sent on another by noticing that later data sent on another connection has been
connection has been received and acknowledged. The CM framework received and acknowledged. The CM framework extends these ideas in
extends these ideas in two significant ways: (i) it extends two significant ways: (i) it extends congestion management to non-TCP
congestion management to non-TCP streams, which are becoming streams, which are becoming increasingly common and often do not
increasingly common and often do not implement proper congestion implement proper congestion management, and (ii) it provides an API
management, and (ii) it provides an API for applications to adapt for applications to adapt their transmissions to current network
their transmissions to current network conditions. For an extended conditions. For an extended discussion of the motivation for the CM,
discussion of the motivation for the CM, its architecture, API, its architecture, API, and algorithms, see [Balakrishnan99]; for a
and algorithms, see [Balakrishnan99]; for a description of an description of an implementation and performance results, see
implementation and performance results, see [Andersen00]. [Andersen00].
The resulting end-host protocol architecture at the sender is shown The resulting end-host protocol architecture at the sender is shown
in Figure 1. The CM helps achieve network stability by in Figure 1. The CM helps achieve network stability by implementing
implementing stable congestion avoidance and control algorithms stable congestion avoidance and control algorithms that are "TCP-
that are "TCP-friendly" [Mahdavi98] based on algorithms described friendly" [Mahdavi98] based on algorithms described in [Allman99].
in [Allman99]. However, it does not attempt to enforce proper However, it does not attempt to enforce proper congestion behavior
congestion behavior for all applications (but it does not preclude for all applications (but it does not preclude a policer on the host
a policer on the host that performs this task). Note that while that performs this task). Note that while the policer at the end-
the policer at the end-host can use CM, the network has to be host can use CM, the network has to be protected against compromises
protected against compromises to the CM and the policer at the end to the CM and the policer at the end hosts, a task that requires
hosts, a task that requires router machinery [Floyd99a]. We do not router machinery [Floyd99a]. We do not address this issue further in
address this issue further in this document. this document.
|--------| |--------| |--------| |--------| |--------------| |--------| |--------| |--------| |--------| |--------------|
| HTTP | | FTP | | RTP 1 | | RTP 2 | | | | HTTP | | FTP | | RTP 1 | | RTP 2 | | |
|--------| |--------| |--------| |--------| | | |--------| |--------| |--------| |--------| | |
| | | ^ | ^ | | | | | ^ | ^ | |
| | | | | | | Scheduler | | | | | | | | Scheduler |
| | | | | | |---| | | | | | | | | |---| | |
| | | |-------|--+->| | | | | | | |-------|--+->| | | |
| | | | | |<--| | | | | | | |<--| |
v v v v | | |--------------| v v v v | | |--------------|
skipping to change at line 241 skipping to change at page 6, line 28
| | | | | | P |-->| | | | | | | | P |-->| |
| | | | | | | | | | | | | | | | | |
|---|------+---|--------------|------->| | | Congestion | |---|------+---|--------------|------->| | | Congestion |
| | | | I | | | | | | | I | | |
v v v | | | Controller | v v v | | | Controller |
|-----------------------------------| | | | | |-----------------------------------| | | | |
| IP |-->| | | | | IP |-->| | | |
|-----------------------------------| | | |--------------| |-----------------------------------| | | |--------------|
|---| |---|
Figure 1 Figure 1
The key components of the CM framework are (i) the API, (ii) the The key components of the CM framework are (i) the API, (ii) the
congestion controller, and (iii) the scheduler. The API is (in congestion controller, and (iii) the scheduler. The API is (in part)
part) motivated by the requirements of application-level framing motivated by the requirements of application-level framing (ALF)
(ALF) [Clark90], and is described in Section 4. The CM internals [Clark90], and is described in Section 4. The CM internals (Section
(Section 5) include a congestion controller (Section 5.1) and a 5) include a congestion controller (Section 5.1) and a scheduler to
scheduler to orchestrate data transmissions between concurrent orchestrate data transmissions between concurrent streams in a
streams in a macroflow (Section 5.2). The congestion controller macroflow (Section 5.2). The congestion controller adjusts the
adjusts the aggregate transmission rate between sender and receiver aggregate transmission rate between sender and receiver based on its
based on its estimate of congestion in the network. It obtains estimate of congestion in the network. It obtains feedback about its
feedback about its past transmissions from applications themselves past transmissions from applications themselves via the API. The
via the API. The scheduler apportions available bandwidth amongst scheduler apportions available bandwidth amongst the different
the different streams within each macroflow and notifies streams within each macroflow and notifies applications when they are
applications when they are permitted to send data. This document permitted to send data. This document focuses on well-behaved
focuses on well-behaved applications; a future one will describe applications; a future one will describe the sender-receiver protocol
the sender-receiver protocol and header formats that will handle and header formats that will handle applications that do not
applications that do not incorporate their own feedback to the CM. incorporate their own feedback to the CM.
4. CM API 3. CM API
By convention, the IETF does not treat Application Programming By convention, the IETF does not treat Application Programming
Interfaces as standards track. However, it is considered important Interfaces as standards track. However, it is considered important
to have the CM API and CM algorithm requirements in one coherent to have the CM API and CM algorithm requirements in one coherent
document. The following section on the CM API uses the terms MUST, document. The following section on the CM API uses the terms MUST,
SHOULD, etc. but the terms are meant to apply within the context of SHOULD, etc., but the terms are meant to apply within the context of
an implementation of the CM API. The section does not apply to an implementation of the CM API. The section does not apply to
congestion control implementations in general, only to those congestion control implementations in general, only to those
implementations offering the CM API. implementations offering the CM API.
Using the CM API, streams can determine their share of the available Using the CM API, streams can determine their share of the available
bandwidth, request and have their data transmissions scheduled, bandwidth, request and have their data transmissions scheduled,
inform the CM about successful transmissions, and be informed when inform the CM about successful transmissions, and be informed when
the CM's estimate of path bandwidth changes. Thus, the CM frees the CM's estimate of path bandwidth changes. Thus, the CM frees
applications from having to maintain information about the state of applications from having to maintain information about the state of
congestion and available bandwidth along any path. congestion and available bandwidth along any path.
The function prototypes below follow standard C language The function prototypes below follow standard C language convention.
convention. We emphasize that these API functions are abstract We emphasize that these API functions are abstract calls and
calls and conformant CM implementations may differ in specific conformant CM implementations may differ in specific details, as long
details, as long as equivalent functionality is provided. as equivalent functionality is provided.
When a new stream is created by an application, it passes some When a new stream is created by an application, it passes some
information to the CM via the cm_open(stream_info) API call. information to the CM via the cm_open(stream_info) API call.
Currently, stream_info consists of the following information: (i) Currently, stream_info consists of the following information: (i) the
the source IP address, (ii) the source port, (iii) the destination source IP address, (ii) the source port, (iii) the destination IP
IP address, (iv) the destination port, and (v) the IP protocol address, (iv) the destination port, and (v) the IP protocol number.
number.
4.1 State maintenance 3.1 State maintenance
1. Open: All applications MUST call cm_open(stream_info) before 1. Open: All applications MUST call cm_open(stream_info) before
using the CM API. This returns a handle, cm_streamid, for the using the CM API. This returns a handle, cm_streamid, for the
application to use for all further CM API invocations for that application to use for all further CM API invocations for that
stream. If the returned cm_streamid is -1, then the cm_open() stream. If the returned cm_streamid is -1, then the cm_open()
failed and that stream cannot use the CM. failed and that stream cannot use the CM.
All other calls to the CM for a stream use the cm_streamid All other calls to the CM for a stream use the cm_streamid
returned from the cm_open() call. returned from the cm_open() call.
2. Close: When a stream terminates, the application SHOULD invoke 2. Close: When a stream terminates, the application SHOULD invoke
cm_close(cm_streamid) to inform the CM about the termination cm_close(cm_streamid) to inform the CM about the termination
of the stream. of the stream.
3. Packet size: cm_mtu(cm_streamid) returns the estimated PMTU of 3. Packet size: cm_mtu(cm_streamid) returns the estimated PMTU of
the path between sender and receiver. Internally, this the path between sender and receiver. Internally, this
information SHOULD be obtained via path MTU discovery information SHOULD be obtained via path MTU discovery
[Mogul90]. It MAY be statically configured in the absence of [Mogul90]. It MAY be statically configured in the absence of
such a mechanism. such a mechanism.
4.2 Data transmission 3.2 Data transmission
The CM accommodates two types of adaptive senders, enabling The CM accommodates two types of adaptive senders, enabling
applications to dynamically adapt their content based on applications to dynamically adapt their content based on prevailing
prevailing network conditions, and supporting ALF-based network conditions, and supporting ALF-based applications.
applications.
1. Callback-based transmission. The callback-based transmission API 1. Callback-based transmission. The callback-based transmission API
puts the stream in firm control of deciding what to transmit at puts the stream in firm control of deciding what to transmit at each
each point in time. To achieve this, the CM does not buffer any point in time. To achieve this, the CM does not buffer any data;
data; instead, it allows streams the opportunity to adapt to instead, it allows streams the opportunity to adapt to unexpected
unexpected network changes at the last possible instant. Thus, network changes at the last possible instant. Thus, this enables
this enables streams to "pull out" and repacketize data upon streams to "pull out" and repacketize data upon learning about any
learning about any rate change, which is hard to do once the data rate change, which is hard to do once the data has been buffered.
has been buffered. The CM must implement a cm_request(i32 The CM must implement a cm_request(i32 cm_streamid) call for streams
cm_streamid) call for streams wishing to send data in this style. wishing to send data in this style. After some time, depending on
After some time, depending on the rate, the CM MUST the rate, the CM MUST invoke a callback using cmapp_send(), which is
invoke a callback using cmapp_send(), which is a grant for the stream to send up to PMTU bytes. The callback-style
a grant for the stream to send up to PMTU bytes. The API is the recommended choice for ALF-based streams. Note that
callback-style API is the recommended choice for ALF-based streams. cm_request() does not take the number of bytes or MTU-sized units as
Note that cm_request() does not take the number of bytes or an argument; each call to cm_request() is an implicit request for
MTU-sized units as an argument; each call to cm_request() is an sending up to PMTU bytes. The CM MAY provide an alternate interface,
implicit request for sending up to PMTU bytes. The CM MAY provide cm_request(int k). The cmapp_send callback for this request is
an alternate interface, cm_request(int k). The cmapp_send callback granted the right to send up to k PMTU sized segments. Section 4.3
for this request is granted the right to send up to k PMTU sized discusses the time duration for which the transmission grant is
segments. Section 4.3 discusses the time duration for which the valid, while Section 5.2 describes how these requests are scheduled
transmission grant is valid, while Section 5.2 describes how these and callbacks made.
requests are scheduled and callbacks made.
2. Synchronous-style. The above callback-based API accommodates a 2. Synchronous-style. The above callback-based API accommodates a
class of ALF streams that are "asynchronous." Asynchronous class of ALF streams that are "asynchronous." Asynchronous
transmitters do not transmit based on a periodic clock, but do so transmitters do not transmit based on a periodic clock, but do so
triggered by asynchronous events like file reads or captured triggered by asynchronous events like file reads or captured frames.
frames. On the other hand, there are many streams that are On the other hand, there are many streams that are "synchronous"
"synchronous" transmitters, which transmit periodically based on transmitters, which transmit periodically based on their own internal
their own internal timers (e.g., an audio senders that sends at a timers (e.g., an audio senders that sends at a constant sampling
constant sampling rate). While CM callbacks could be configured to rate). While CM callbacks could be configured to periodically
periodically interrupt such transmitters, the transmit loop of such interrupt such transmitters, the transmit loop of such applications
applications is less affected if they retain their original is less affected if they retain their original timer-based loop. In
timer-based loop. In addition, it complicates the CM API to have a addition, it complicates the CM API to have a stream express the
stream express the periodicity and granularity of its callbacks. periodicity and granularity of its callbacks. Thus, the CM MUST
Thus, the CM MUST export an API that allows such streams to be informed export an API that allows such streams to be informed of changes in
of changes in rates using the cmapp_update(u64 newrate, u32 srtt, rates using the cmapp_update(u64 newrate, u32 srtt, u32 rttdev)
u32 rttdev) callback function, where newrate is the new rate in callback function, where newrate is the new rate in bits per second
bits per second for this stream, srtt is the current smoothed round for this stream, srtt is the current smoothed round trip time
trip time estimate in microseconds, and rttdev is the smoothed estimate in microseconds, and rttdev is the smoothed linear deviation
linear deviation in the round-trip time estimate calculated using in the round-trip time estimate calculated using the same algorithm
the same algorithm as in TCP [Paxson00]. The newrate value reports as in TCP [Paxson00]. The newrate value reports an instantaneous
an instantaneous rate calculated, for example, by taking the ratio rate calculated, for example, by taking the ratio of cwnd and srtt,
of cwnd and srtt, and dividing by the fraction of that ratio and dividing by the fraction of that ratio allocated to the stream.
allocated to the stream. In response, the stream MUST adapt its
packet size or change its timer interval to conform to (i.e., not In response, the stream MUST adapt its packet size or change its
exceed) the allowed rate. Of course, it may choose not to use all timer interval to conform to (i.e., not exceed) the allowed rate. Of
of this rate. Note that the CM is not on the data path of the course, it may choose not to use all of this rate. Note that the CM
actual transmission. is not on the data path of the actual transmission.
To avoid unnecessary cmapp_update() callbacks that the application To avoid unnecessary cmapp_update() callbacks that the application
will only ignore, the CM MUST provide a cm_thresh(float will only ignore, the CM MUST provide a cm_thresh(float
rate_downthresh, float rate_upthresh, float rtt_downthresh, float rate_downthresh, float rate_upthresh, float rtt_downthresh, float
rtt_upthresh) function that a stream can use at any stage in its execution. rtt_upthresh) function that a stream can use at any stage in its
In response, the CM SHOULD invoke the callback only when the rate decreases execution. In response, the CM SHOULD invoke the callback only when
to less than (rate_downthresh * lastrate) or increases to more than the rate decreases to less than (rate_downthresh * lastrate) or
(rate_upthresh * lastrate), where lastrate is the rate last increases to more than (rate_upthresh * lastrate), where lastrate is
notified to the stream, or when the round-trip time changes the rate last notified to the stream, or when the round-trip time
correspondingly by the requisite thresholds. This information is changes correspondingly by the requisite thresholds. This
used as a hint by the CM, in the sense the cmapp_update() can be information is used as a hint by the CM, in the sense the
called even if these conditions are not met. cmapp_update() can be called even if these conditions are not met.
The CM MUST implement a cm_query(i32 cm_streamid, u64* rate, The CM MUST implement a cm_query(i32 cm_streamid, u64* rate, u32*
u32* srtt, u32* rttdev) to allow an application to query srtt, u32* rttdev) to allow an application to query the current CM
the current CM state. This sets the rate variable to state. This sets the rate variable to the current rate estimate in
the current rate estimate in bits per second, the bits per second, the srtt variable to the current smoothed round-trip
srtt variable to the current smoothed round-trip time estimate in time estimate in microseconds, and rttdev to the mean linear
microseconds, and rttdev to the mean linear deviation. If the CM deviation. If the CM does not have valid estimates for the
does not have valid estimates for the macroflow, it fills in macroflow, it fills in negative values for the rate, srtt, and
negative values for the rate, srtt, and rttdev. rttdev.
Note that a stream can use more than one of the above transmission Note that a stream can use more than one of the above transmission
APIs at the same time. In particular, the knowledge of sustainable APIs at the same time. In particular, the knowledge of sustainable
rate is useful for asynchronous streams as well as synchronous rate is useful for asynchronous streams as well as synchronous ones;
ones; e.g., an asynchronous Web server disseminating images using e.g., an asynchronous Web server disseminating images using TCP may
TCP may use cmapp_send() to schedule its transmissions and use cmapp_send() to schedule its transmissions and cmapp_update() to
cmapp_update() to decide whether to send a low-resolution or decide whether to send a low-resolution or high-resolution image. A
high-resolution image. A TCP implementation using the CM is TCP implementation using the CM is described in Section 6.1.1, where
described in Section 6.1.1, where the benefit of the cm_request() the benefit of the cm_request() callback API for TCP will become
callback API for TCP will become apparent. apparent.
The reader will notice that the basic CM API does not provide an The reader will notice that the basic CM API does not provide an
interface for buffered congestion-controlled transmissions. This interface for buffered congestion-controlled transmissions. This is
is intentional, since this transmission mode can be implemented intentional, since this transmission mode can be implemented using
using the callback-based primitive. Section 6.1.2 describes how the callback-based primitive. Section 6.1.2 describes how
congestion-controlled UDP sockets may be implemented using the CM congestion-controlled UDP sockets may be implemented using the CM
API. API.
4.3 Application notification 3.3 Application notification
When a stream receives feedback from receivers, it MUST use When a stream receives feedback from receivers, it MUST use
cm_update(i32 cm_streamid, u32 nrecd, u32 nlost, u8 lossmode, i32 cm_update(i32 cm_streamid, u32 nrecd, u32 nlost, u8 lossmode, i32
rtt) to inform the CM about events such as congestion losses, rtt) to inform the CM about events such as congestion losses,
successful receptions, type of loss (timeout event, Explicit successful receptions, type of loss (timeout event, Explicit
Congestion Notification [Ramakrishnan98], etc.) and round-trip time Congestion Notification [Ramakrishnan99], etc.) and round-trip time
samples. The nrecd parameter indicates how many bytes were samples. The nrecd parameter indicates how many bytes were
successfully received by the receiver since the last cm_update successfully received by the receiver since the last cm_update call,
call, while the nrecd parameter identifies how many bytes were while the nrecd parameter identifies how many bytes were received
received were lost during the same time period. The rtt value were lost during the same time period. The rtt value indicates the
indicates the round-trip time measured during the transmission of round-trip time measured during the transmission of these bytes. The
these bytes. The rtt value must be set to -1 if no valid rtt value must be set to -1 if no valid round-trip sample was
round-trip sample was obtained by the application. The lossmode obtained by the application. The lossmode parameter provides an
parameter provides an indicator of how a loss was detected. A indicator of how a loss was detected. A value of CM_NO_FEEDBACK
value of CM_NO_FEEDBACK indicates that the application has received indicates that the application has received no feedback for all its
no feedback for all its outstanding data, and is reporting this to outstanding data, and is reporting this to the CM. For example, a
the CM. For example, a TCP that has experienced a timeout would TCP that has experienced a timeout would use this parameter to inform
use this parameter to inform the CM of this. A value of the CM of this. A value of CM_LOSS_FEEDBACK indicates that the
CM_LOSS_FEEDBACK indicates that the application has experienced application has experienced some loss, which it believes to be due to
some loss, which it believes to be due to congestion, but not all congestion, but not all outstanding data has been lost. For example,
outstanding data has been lost. For example, a TCP segment loss a TCP segment loss detected using duplicate (selective)
detected using duplicate (selective) acknowledgements or other acknowledgments or other data-driven techniques fits this category.
data-driven techniques fits this category. A value of A value of CM_EXPLICIT_CONGESTION indicates that the receiver echoed
CM_EXPLICIT_CONGESTION indicates that the receiver echoed an an explicit congestion notification message. Finally, a value of
explicit congestion notification message. Finally, a value of
CM_NO_CONGESTION indicates that no congestion-related loss has CM_NO_CONGESTION indicates that no congestion-related loss has
occurred. The lossmode parameter MUST be reported as a bit-vector occurred. The lossmode parameter MUST be reported as a bit-vector
where the bits correspond to CM_NO_FEEDBACK, CM_LOSS_FEEDBACK, where the bits correspond to CM_NO_FEEDBACK, CM_LOSS_FEEDBACK,
CM_EXPLICIT_CONGESTION, and CM_NO_CONGESTION. Note that over links CM_EXPLICIT_CONGESTION, and CM_NO_CONGESTION. Note that over links
(paths) that experience losses for reasons other than congestion, (paths) that experience losses for reasons other than congestion, an
an application SHOULD inform the CM of losses, with the application SHOULD inform the CM of losses, with the CM_NO_CONGESTION
CM_NO_CONGESTION field set. field set.
cm_notify(i32 cm_streamid, u32 nsent) MUST be called when data is cm_notify(i32 cm_streamid, u32 nsent) MUST be called when data is
transmitted from the host (e.g., in the IP output routine) to transmitted from the host (e.g., in the IP output routine) to inform
inform the CM that nsent bytes were just transmitted on a given the CM that nsent bytes were just transmitted on a given stream.
stream. This allows the CM to update its estimate of the number of This allows the CM to update its estimate of the number of
outstanding bytes for the macroflow and for the stream. outstanding bytes for the macroflow and for the stream.
A cmapp_send() grant from the CM to an application is valid only A cmapp_send() grant from the CM to an application is valid only for
for an expiration time, equal to the larger of the round-trip time an expiration time, equal to the larger of the round-trip time and an
and an implementation-dependent threshold communicated as an implementation-dependent threshold communicated as an argument to the
argument to the cmapp_send() callback function. The application cmapp_send() callback function. The application MUST NOT send data
MUST NOT send data based on this callback after this time has based on this callback after this time has expired. Furthermore, if
expired. Furthermore, if the application decides not to send data the application decides not to send data after receiving this
after receiving this callback, it SHOULD call callback, it SHOULD call cm_notify(stream_info, 0) to allow the CM to
cm_notify(stream_info, 0) to allow the CM to permit other streams permit other streams in the macroflow to transmit data. The CM
in the macroflow to transmit data. The CM congestion controller congestion controller MUST be robust to applications forgetting to
MUST be robust to applications forgetting to invoke invoke cm_notify(stream_info, 0) correctly, or applications that
cm_notify(stream_info, 0) correctly, or applications that crash or crash or disappear after having made a cm_request() call.
disappear after having made a cm_request() call.
4.4 Querying 3.4 Querying
If applications wish to learn about per-stream available bandwidth If applications wish to learn about per-stream available bandwidth
and round-trip time, they can use the CM's cm_query(i32 and round-trip time, they can use the CM's cm_query(i32 cm_streamid,
cm_streamid, i64* rate, i32* srtt, i32* rttdev) call, which fills i64* rate, i32* srtt, i32* rttdev) call, which fills in the desired
in the desired quantities. If the CM does not have valid estimates quantities. If the CM does not have valid estimates for the
for the macroflow, it fills in negative values for the rate, srtt, macroflow, it fills in negative values for the rate, srtt, and
and rttdev. rttdev.
4.5 Sharing granularity 3.5 Sharing granularity
One of the decisions the CM needs to make is the granularity at One of the decisions the CM needs to make is the granularity at which
which a macroflow is constructed, by deciding which streams belong a macroflow is constructed, by deciding which streams belong to the
to the same macroflow and share congestion information. The API same macroflow and share congestion information. The API provides
provides two functions that allow applications to decide which of two functions that allow applications to decide which of their
their streams ought to belong to the same macroflow. streams ought to belong to the same macroflow.
cm_getmacroflow(i32 cm_streamid) returns a unique i32 macroflow cm_getmacroflow(i32 cm_streamid) returns a unique i32 macroflow
identifier. cm_setmacroflow(i32 cm_macroflowid, i32 cm_streamid) identifier. cm_setmacroflow(i32 cm_macroflowid, i32 cm_streamid)
sets the macroflow of the stream cm_streamid to cm_macroflowid. If the sets the macroflow of the stream cm_streamid to cm_macroflowid. If
cm_macroflowid that is passed to cm_setmacroflow() is -1, then a the cm_macroflowid that is passed to cm_setmacroflow() is -1, then a
new macroflow is constructed and this is returned to the caller. new macroflow is constructed and this is returned to the caller.
Each call to cm_setmacroflow() overrides the previous macroflow Each call to cm_setmacroflow() overrides the previous macroflow
association for the stream, should one exist. association for the stream, should one exist.
The default suggested aggregation method is to aggregate by The default suggested aggregation method is to aggregate by
destination IP address; i.e., all streams to the same destination destination IP address; i.e., all streams to the same destination
address are aggregated to a single macroflow by default. The address are aggregated to a single macroflow by default. The
cm_getmacroflow() and cm_setmacroflow() calls can then be used to cm_getmacroflow() and cm_setmacroflow() calls can then be used to
change this as needed. We do note that there are some cases where change this as needed. We do note that there are some cases where
this may not be optimal, even over best-effort networks. For this may not be optimal, even over best-effort networks. For
example, when a group of receivers are behind a NAT device, the example, when a group of receivers are behind a NAT device, the
sender will see them all as one address. If the hosts behind the sender will see them all as one address. If the hosts behind the NAT
NAT are in fact connected over different bottleneck links, some of are in fact connected over different bottleneck links, some of those
those hosts could see worse performance than before. It is hosts could see worse performance than before. It is possible to
possible to detect such hosts when using delay and loss estimates, detect such hosts when using delay and loss estimates, although the
although the specific mechanisms for doing so are beyond the scope specific mechanisms for doing so are beyond the scope of this
of this document. document.
The objective of this interface is to set up sharing of groups not The objective of this interface is to set up sharing of groups not
sharing policy of relative weights of streams in a macroflow. The sharing policy of relative weights of streams in a macroflow. The
latter requires the scheduler to provide an interface to set latter requires the scheduler to provide an interface to set sharing
sharing policy. However, because we want to support many different policy. However, because we want to support many different
schedulers (each of which may need different information to set schedulers (each of which may need different information to set
policy), we do not specify a complete API to the scheduler (but see policy), we do not specify a complete API to the scheduler (but see
Section 5.2). A later guideline document is expected to describe a Section 5.2). A later guideline document is expected to describe a
few simple schedulers (e.g., weighted round-robin, hierarchical few simple schedulers (e.g., weighted round-robin, hierarchical
scheduling) and the API they export to provide relative scheduling) and the API they export to provide relative
prioritization. prioritization.
5. CM internals 4. CM internals
This section describes the internal components of the CM. It This section describes the internal components of the CM. It
includes a Congestion Controller and a Scheduler, with includes a Congestion Controller and a Scheduler, with well-defined,
well-defined, abstract interfaces exported by them. abstract interfaces exported by them.
5.1 Congestion controller 4.1 Congestion controller
Associated with each macroflow is a congestion control algorithm; Associated with each macroflow is a congestion control algorithm; the
the collection of all these algorithms comprises the congestion collection of all these algorithms comprises the congestion
controller of the CM. The control algorithm decides when and how controller of the CM. The control algorithm decides when and how
much data can be transmitted by a macroflow. It uses application much data can be transmitted by a macroflow. It uses application
notifications (Section 4.3) from concurrent streams on the same notifications (Section 4.3) from concurrent streams on the same
macroflow to build up information about the congestion state of the macroflow to build up information about the congestion state of the
network path used by the macroflow. network path used by the macroflow.
The congestion controller MUST implement a "TCP-friendly" The congestion controller MUST implement a "TCP-friendly" [Mahdavi98]
[Mahdavi98] congestion control algorithm. Several macroflows MAY congestion control algorithm. Several macroflows MAY (and indeed,
(and indeed, often will) use the same congestion control algorithm often will) use the same congestion control algorithm but each
but each macroflow maintains state about the network used by its macroflow maintains state about the network used by its streams.
streams.
The congestion control module MUST implement the following abstract The congestion control module MUST implement the following abstract
interfaces. We emphasize that these are not directly visible to interfaces. We emphasize that these are not directly visible to
applications; they are within the context of a macroflow, and are applications; they are within the context of a macroflow, and are
different from the CM API functions of Section 4. different from the CM API functions of Section 4.
- void query(u64 *rate, u32 *srtt, u32 *rttdev): This function - void query(u64 *rate, u32 *srtt, u32 *rttdev): This function
returns the estimated rate (in bits per second) and smoothed returns the estimated rate (in bits per second) and smoothed
round trip time (in microseconds) for the macroflow. round trip time (in microseconds) for the macroflow.
skipping to change at line 550 skipping to change at page 13, line 4
congestion control module whenever data is sent by an congestion control module whenever data is sent by an
application. The nsent parameter indicates the number of bytes application. The nsent parameter indicates the number of bytes
just sent by the application. just sent by the application.
- void update(u32 nsent, u32 nrecd, u32 rtt, u32 lossmode): This - void update(u32 nsent, u32 nrecd, u32 rtt, u32 lossmode): This
function is called whenever any of the CM streams associated with function is called whenever any of the CM streams associated with
a macroflow identifies that data has reached the receiver or has a macroflow identifies that data has reached the receiver or has
been lost en route. The nrecd parameter indicates the number of been lost en route. The nrecd parameter indicates the number of
bytes that have just arrived at the receiver. The nsent bytes that have just arrived at the receiver. The nsent
parameter is the sum of the number of bytes just received and the parameter is the sum of the number of bytes just received and the
number of bytes identified as lost en route. The rtt parameter is number of bytes identified as lost en route. The rtt parameter is
the estimated round trip time in microseconds during the the estimated round trip time in microseconds during the
transfer. The lossmode parameter provides an indicator of how a transfer. The lossmode parameter provides an indicator of how a
loss was detected (section 4.3). loss was detected (section 4.3).
Although these interfaces are not visible to applications, the Although these interfaces are not visible to applications, the
congestion controller MUST implement these abstract interfaces to congestion controller MUST implement these abstract interfaces to
provide for modular inter-operability with different provide for modular inter-operability with different separately-
separately-developed schedulers. developed schedulers.
The congestion control module MUST also call the associated The congestion control module MUST also call the associated
scheduler's schedule function (section 5.2) when it believes that scheduler's schedule function (section 5.2) when it believes that the
the current congestion state allows an MTU-sized packet to be sent. current congestion state allows an MTU-sized packet to be sent.
5.2 Scheduler 4.2 Scheduler
While it is the responsibility of the congestion control module to While it is the responsibility of the congestion control module to
determine when and how much data can be transmitted, it is the determine when and how much data can be transmitted, it is the
responsibility of a macroflow's scheduler module to determine which responsibility of a macroflow's scheduler module to determine which
of the streams should get the opportunity to transmit data. of the streams should get the opportunity to transmit data.
The Scheduler MUST implement the following interfaces: The Scheduler MUST implement the following interfaces:
- void schedule(u32 num_bytes): When the congestion control module - void schedule(u32 num_bytes): When the congestion control module
determines that data can be sent, the schedule() routine MUST be determines that data can be sent, the schedule() routine MUST be
skipping to change at line 595 skipping to change at page 14, line 5
- void notify(i32 cm_streamid, u32 nsent): This interface is used - void notify(i32 cm_streamid, u32 nsent): This interface is used
to notify the scheduler module whenever data is sent by a CM to notify the scheduler module whenever data is sent by a CM
application. The nsent parameter indicates the number of bytes application. The nsent parameter indicates the number of bytes
just sent by the application. just sent by the application.
The Scheduler MAY implement many additional interfaces. As The Scheduler MAY implement many additional interfaces. As
experience with CM schedulers increases, future documents may experience with CM schedulers increases, future documents may
make additions and/or changes to some parts of the scheduler make additions and/or changes to some parts of the scheduler
API. API.
6. Examples 5. Examples
6.1 Example applications 5.1 Example applications
This section describes three possible uses of the CM API by This section describes three possible uses of the CM API by
applications. We describe two asynchronous applications---an applications. We describe two asynchronous applications---an
implementation of a TCP sender and an implementation of implementation of a TCP sender and an implementation of congestion-
congestion-controlled UDP sockets, and a synchronous controlled UDP sockets, and a synchronous application---a streaming
application---a streaming audio server. More details of these audio server. More details of these applications and CM
applications and CM implementation optimizations for efficient implementation optimizations for efficient operation are described in
operation are described in [Andersen00]. [Andersen00].
All applications that use the CM MUST incorporate feedback from the All applications that use the CM MUST incorporate feedback from the
receiver. For example, it must periodically (typically once or receiver. For example, it must periodically (typically once or twice
twice per round trip time) determine how many of its packets per round trip time) determine how many of its packets arrived at the
arrived at the receiver. When the source gets this feedback, it receiver. When the source gets this feedback, it MUST use
MUST use cm_update() to inform the CM of this new information. cm_update() to inform the CM of this new information. This results
This results in the CM updating ownd and may result in the CM in the CM updating ownd and may result in the CM changing its
changing its estimates and calling cmapp_update() of the streams of estimates and calling cmapp_update() of the streams of the macroflow.
the macroflow.
The protocols in this section are examples and suggestions for The protocols in this section are examples and suggestions for
implementation, rather than requirements for any conformant implementation, rather than requirements for any conformant
implementation. implementation.
6.1.1 TCP 5.1.1 TCP
A TCP implementation that uses CM should use the cmapp_send() A TCP implementation that uses CM should use the cmapp_send()
callback API. TCP only identifies which data it should send upon callback API. TCP only identifies which data it should send upon the
the arrival of an acknowledgement or expiration of a timer. As a arrival of an acknowledgement or expiration of a timer. As a result,
result, it requires tight control over when and if new data or it requires tight control over when and if new data or
retransmissions are sent. retransmissions are sent.
When TCP either connects to or accepts a connection from another When TCP either connects to or accepts a connection from another
host, it performs a cm_open() call to associate the TCP connection host, it performs a cm_open() call to associate the TCP connection
with a cm_streamid. with a cm_streamid.
Once a connection is established, the CM is used to control the Once a connection is established, the CM is used to control the
transmission of outgoing data. The CM eliminates the need for transmission of outgoing data. The CM eliminates the need for
tracking and reacting to congestion in TCP, because the CM and its tracking and reacting to congestion in TCP, because the CM and its
transmission API ensure proper congestion behavior. Loss recovery transmission API ensure proper congestion behavior. Loss recovery is
is still performed by TCP based on fast retransmissions and still performed by TCP based on fast retransmissions and recovery as
recovery as well as timeouts. In addition, TCP is also modified to well as timeouts. In addition, TCP is also modified to have its own
have its own outstanding window (tcp_ownd) estimate. Whenever data outstanding window (tcp_ownd) estimate. Whenever data segments are
segments are sent from its cmapp_send() callback, TCP updates its sent from its cmapp_send() callback, TCP updates its tcp_ownd value.
tcp_ownd value. The ownd variable is also updated after each The ownd variable is also updated after each cm_update() call. TCP
cm_update() call. TCP also maintains a count of the number of also maintains a count of the number of outstanding segments
outstanding segments (pkt_cnt). At any time, TCP can calculate the (pkt_cnt). At any time, TCP can calculate the average packet size
average packet size (avg_pkt_size) as tcp_ownd/pkt_cnt. The (avg_pkt_size) as tcp_ownd/pkt_cnt. The avg_pkt_size is used by TCP
avg_pkt_size is used by TCP to help estimate the amount of to help estimate the amount of outstanding data. Note that this is
outstanding data. Note that this is not needed if the SACK option not needed if the SACK option is used on the connection, since this
is used on the connection, since this information is explicitly information is explicitly available.
available.
The TCP output routines are modified as follows: The TCP output routines are modified as follows:
1. All congestion window (cwnd) checks are removed. 1. All congestion window (cwnd) checks are removed.
2. When application data is available. The TCP output routines 2. When application data is available. The TCP output routines
perform all non-congestion checks (Nagle algorithm, perform all non-congestion checks (Nagle algorithm, receiver-
receiver-advertised window check, etc). If these checks pass, advertised window check, etc). If these checks pass, the output
the output routine queues the data and calls cm_request() for the routine queues the data and calls cm_request() for the stream.
stream.
3. If incoming data or timers result in a loss being detected, 3. If incoming data or timers result in a loss being detected, the
the retransmission is also placed in a queue and cm_request() is retransmission is also placed in a queue and cm_request() is
called for the stream. called for the stream.
4. The cmapp_send() callback for TCP is set to an output 4. The cmapp_send() callback for TCP is set to an output routine.
routine. If any retransmission is enqueued, the routine outputs If any retransmission is enqueued, the routine outputs the
the retransmission. Otherwise, the routine outputs as much new retransmission. Otherwise, the routine outputs as much new data
data as the TCP connection state allows. However, the as the TCP connection state allows. However, the cmapp_send()
cmapp_send() never sends more than a single segment per call. never sends more than a single segment per call. This routine
This routine arranges for the other output computations to be arranges for the other output computations to be done, such as
done, such as header and options computations. header and options computations.
The IP output routine on the host calls cm_notify() when the The IP output routine on the host calls cm_notify() when the packets
packets are actually sent out. Because it does not know which are actually sent out. Because it does not know which cm_streamid is
cm_streamid is responsible for the packet, cm_notify() takes the responsible for the packet, cm_notify() takes the stream_info as
stream_info as argument (see Section 4 for what the stream_info argument (see Section 4 for what the stream_info should contain).
should contain). Because cm_notify() reports the IP payload size, Because cm_notify() reports the IP payload size, TCP keeps track of
TCP keeps track of the total header size and incorporates these the total header size and incorporates these updates.
updates.
The TCP input routines are modified as follows: The TCP input routines are modified as follows:
1. RTT estimation is done as normal using either timestamps or 1. RTT estimation is done as normal using either timestamps or
Karn's algorithm. Any rtt estimate that is generated is passed Karn's algorithm. Any rtt estimate that is generated is passed to
to CM via the cm_update call. CM via the cm_update call.
2. All cwnd and slow start threshold (ssthresh) updates are 2. All cwnd and slow start threshold (ssthresh) updates are
removed. removed.
3. Upon the arrival of an ack for new data, TCP computes the 3. Upon the arrival of an ack for new data, TCP computes the value
value of in_flight (the amount of data in flight) as of in_flight (the amount of data in flight) as snd_max-ack-1
snd_max-ack-1 (i.e. MAX Sequence Sent - Current Ack - 1). TCP (i.e., MAX Sequence Sent - Current Ack - 1). TCP then calls
then calls cm_update(streamid, tcp_ownd - in_flight, 0, cm_update(streamid, tcp_ownd - in_flight, 0, CM_NO_CONGESTION,
CM_NO_CONGESTION, rtt). rtt).
4. Upon the arrival of a duplicate acknowledgement, TCP must 4. Upon the arrival of a duplicate acknowledgement, TCP must check
check its dupack count (dup_acks) to determine its action. If its dupack count (dup_acks) to determine its action. If dup_acks
dup_acks < 3, the TCP does nothing. If dup_acks == 3, TCP < 3, the TCP does nothing. If dup_acks == 3, TCP assumes that a
assumes that a packet was lost and that at least 3 packets packet was lost and that at least 3 packets arrived to generate
arrived to generate these duplicate acks. Therefore, it calls these duplicate acks. Therefore, it calls cm_update(streamid, 4 *
cm_update(streamid, 4 * avg_pkt_size, 3 * avg_pkt_size, avg_pkt_size, 3 * avg_pkt_size, CM_LOSS_FEEDBACK, rtt). The
CM_LOSS_FEEDBACK, rtt). The average packet size is used since the average packet size is used since the acknowledgments do not
acknowledgements do not indicate exactly how much data has indicate exactly how much data has reached the other end. Most
reached the other end. Most TCP implementations interpret a TCP implementations interpret a duplicate ACK as an indication
duplicate ACK as an indication that a full MSS has reached its that a full MSS has reached its destination. Once a new ACK is
destination. Once a new ACK is received, these TCP sender received, these TCP sender implementations may resynchronize with
implementations may resynchronize with TCP receiver. The CM API TCP receiver. The CM API does not provide a mechanism for TCP to
does not provide a mechanism for TCP to pass information from pass information from this resynchronization. Therefore, TCP can
this resynchronization. Therefore, TCP can only infer the only infer the arrival of an avg_pkt_size amount of data from each
arrival of an avg_pkt_size amount of data from each duplicate duplicate ack. TCP also enqueues a retransmission of the lost
ack. TCP also enqueues a retransmission of the lost segment and segment and calls cm_request(). If dup_acks > 3, TCP assumes that
calls cm_request(). If dup_acks > 3, TCP assumes that a packet a packet has reached the other end and caused this ack to be sent.
has reached the other end and caused this ack to be sent. As a As a result, it calls cm_update(streamid, avg_pkt_size,
result, it calls cm_update(streamid, avg_pkt_size, avg_pkt_size, avg_pkt_size, CM_NO_CONGESTION, rtt).
CM_NO_CONGESTION, rtt).
5. Upon the arrival of a partial acknowledgment (one that does 5. Upon the arrival of a partial acknowledgment (one that does not
not exceed the highest segment transmitted at the time the loss exceed the highest segment transmitted at the time the loss
occurred, as defined in [Floyd99b]), TCP assumes that a packet occurred, as defined in [Floyd99b]), TCP assumes that a packet was
was lost and that the retransmitted packet has reached the lost and that the retransmitted packet has reached the recipient.
recipient. Therefore, it calls cm_update(streamid, 2 * Therefore, it calls cm_update(streamid, 2 * avg_pkt_size,
avg_pkt_size, avg_pkt_size, CM_NO_CONGESTION, avg_pkt_size, CM_NO_CONGESTION, rtt). CM_NO_CONGESTION is used
rtt). CM_NO_CONGESTION is used since the loss period has already since the loss period has already been reported. TCP also
been reported. TCP also enqueues a retransmission of the lost enqueues a retransmission of the lost segment and calls
segment and calls cm_request(). cm_request().
When the TCP retransmission timer expires, the sender identifies When the TCP retransmission timer expires, the sender identifies that
that a segment has been lost and calls cm_update(streamid, a segment has been lost and calls cm_update(streamid, avg_pkt_size,
avg_pkt_size, 0, CM_NO_FEEDBACK, 0) to signify that no feedback has 0, CM_NO_FEEDBACK, 0) to signify that no feedback has been received
been received from the receiver and that one segment is sure to from the receiver and that one segment is sure to have "left the
have "left the pipe." TCP also enqueues a retransmission of the pipe." TCP also enqueues a retransmission of the lost segment and
lost segment and calls cm_request(). calls cm_request().
6.1.2 Congestion-controlled UDP 5.1.2 Congestion-controlled UDP
Congestion-controlled UDP is a useful CM application, which we Congestion-controlled UDP is a useful CM application, which we
describe in the context of Berkeley sockets [Stevens94]. They describe in the context of Berkeley sockets [Stevens94]. They
provide the same functionality as standard Berkeley UDP sockets, provide the same functionality as standard Berkeley UDP sockets, but
but instead of immediately sending the data from the kernel packet instead of immediately sending the data from the kernel packet queue
queue to lower layers for transmission, the buffered socket to lower layers for transmission, the buffered socket implementation
implementation makes calls to the API exported by the CM inside the makes calls to the API exported by the CM inside the kernel and gets
kernel and gets callbacks from the CM. When a CM UDP socket is callbacks from the CM. When a CM UDP socket is created, it is bound
created, it is bound to a particular stream. Later, when data is to a particular stream. Later, when data is added to the packet
added to the packet queue, cm_request() is called on the stream queue, cm_request() is called on the stream associated with the
associated with the socket. When the CM schedules this stream for socket. When the CM schedules this stream for transmission, it calls
transmission, it calls udp_ccappsend() in the UDP module. This udp_ccappsend() in the UDP module. This function transmits one MTU
function transmits one MTU from the packet queue, and schedules the from the packet queue, and schedules the transmission of any
transmission of any remaining packets. The in-kernel remaining packets. The in-kernel implementation of the CM UDP API
implementation of the CM UDP API should not require any additional should not require any additional data copies and should support all
data copies and should support all standard UDP options. Modifying standard UDP options. Modifying existing applications to use
existing applications to use congestion-controlled UDP requires the congestion-controlled UDP requires the implementation of a new socket
implementation of a new socket option on the socket. To work option on the socket. To work correctly, the sender must obtain
correctly, the sender must obtain feedback about congestion. This feedback about congestion. This can be done in at least two ways:
can be done in at least two ways: (i) the UDP receiver application (i) the UDP receiver application can provide feedback to the sender
can provide feedback to the sender application, which will inform application, which will inform the CM of network conditions using
the CM of network conditions using cm_update(); (ii) the UDP cm_update(); (ii) the UDP receiver implementation can provide
receiver implementation can provide feedback to the sending UDP. feedback to the sending UDP. Note that this latter alternative
Note that this latter alternative requires changes to the requires changes to the receiver's network stack and the sender UDP
receiver's network stack and the sender UDP cannot assume that all cannot assume that all receivers support this option without explicit
receivers support this option without explicit negotiation. negotiation.
6.1.3 Audio server 5.1.3 Audio server
A typical audio application often has access to the sample in a A typical audio application often has access to the sample in a
multitude of data rates and qualities. The objective of the multitude of data rates and qualities. The objective of the
application is then to deliver the highest possible quality of application is then to deliver the highest possible quality of audio
audio (typically the highest data rate) its clients. The selection (typically the highest data rate) its clients. The selection of
of which version of audio to transmit should be based on the which version of audio to transmit should be based on the current
current congestion state of the network. In addition, the source congestion state of the network. In addition, the source will want
will want audio delivered to its users at a consistent sampling audio delivered to its users at a consistent sampling rate. As a
rate. As a result, it must send data a regular rate, minimizing result, it must send data a regular rate, minimizing delaying
delaying transmissions and reducing buffering before playback. To transmissions and reducing buffering before playback. To meet these
meet these requirements, this application can use the synchronous requirements, this application can use the synchronous sender API
sender API (Section 4.2). (Section 4.2).
When the source first starts, it uses the cm_query() call to get an When the source first starts, it uses the cm_query() call to get an
initial estimate of network bandwidth and delay. If some other initial estimate of network bandwidth and delay. If some other
streams on that macroflow have already been active, then it gets an streams on that macroflow have already been active, then it gets an
initial estimate that is valid; otherwise, it gets negative values, initial estimate that is valid; otherwise, it gets negative values,
which it ignores. It then chooses an encoding that does not exceed which it ignores. It then chooses an encoding that does not exceed
these estimates (or, in the case of an invalid estimate, uses these estimates (or, in the case of an invalid estimate, uses
application-specific initial values) and begins transmitting application-specific initial values) and begins transmitting data.
data. The application also implements the cmapp_update() callback. The application also implements the cmapp_update() callback. When
When the CM determines that network characteristics have changed, the CM determines that network characteristics have changed, it calls
it calls the application's cmapp_update() function and passes it a the application's cmapp_update() function and passes it a new rate
new rate and round-trip time estimate. The application must change and round-trip time estimate. The application must change its choice
its choice of audio encoding to ensure that it does not exceed of audio encoding to ensure that it does not exceed these new
these new estimates. estimates.
6.2 Example congestion control module 5.2 Example congestion control module
To illustrate the responsibilities of a congestion control module, To illustrate the responsibilities of a congestion control module,
the following describes some of the actions of a simple TCP-like the following describes some of the actions of a simple TCP-like
congestion control module that implements Additive Increase congestion control module that implements Additive Increase
Multiplicative Decrease congestion control (AIMD_CC): Multiplicative Decrease congestion control (AIMD_CC):
- query(): AIMD_CC returns the current congestion window (cwnd) - query(): AIMD_CC returns the current congestion window (cwnd)
divided by the smoothed rtt (srtt) as its bandwidth estimate. It divided by the smoothed rtt (srtt) as its bandwidth estimate. It
returns the smoothed rtt estimate as srtt. returns the smoothed rtt estimate as srtt.
- notify(): AIMD_CC adds the number of bytes sent to its - notify(): AIMD_CC adds the number of bytes sent to its
outstanding data window (ownd). outstanding data window (ownd).
- update(): AIMD_CC subtracts nsent from ownd. If the value of rtt - update(): AIMD_CC subtracts nsent from ownd. If the value of rtt
is non-zero, AIMD_CC updates srtt using the TCP srtt calculation. is non-zero, AIMD_CC updates srtt using the TCP srtt calculation.
If the update indicates that data has been lost, AIMD_CC sets If the update indicates that data has been lost, AIMD_CC sets
cwnd to 1 MTU if the loss_mode is CM_NO_FEEDBACK and to cwnd/2 cwnd to 1 MTU if the loss_mode is CM_NO_FEEDBACK and to cwnd/2
(with a minimum of 1 MTU) if the loss_mode is CM_LOSS_FEEDBACK or (with a minimum of 1 MTU) if the loss_mode is CM_LOSS_FEEDBACK or
CM_EXPLICIT_CONGESTION. AIMD_CC also sets its internal ssthresh CM_EXPLICIT_CONGESTION. AIMD_CC also sets its internal ssthresh
variable to cwnd/2. If no loss had occurred, AIMD_CC mimics TCP variable to cwnd/2. If no loss had occurred, AIMD_CC mimics TCP
slow start and linear growth modes. It increments cwnd by nsent slow start and linear growth modes. It increments cwnd by nsent
when cwnd < ssthresh (bounded by a maximum of ssthresh-cwnd) and when cwnd < ssthresh (bounded by a maximum of ssthresh-cwnd) and
by nsent * MTU/cwnd when cwnd > ssthresh. by nsent * MTU/cwnd when cwnd > ssthresh.
- When cwnd or ownd are updated and indicate that at least one MTU - When cwnd or ownd are updated and indicate that at least one MTU
may be transmitted, AIMD_CC calls the CM to schedule a may be transmitted, AIMD_CC calls the CM to schedule a
transmission. transmission.
6.3 Example Scheduler Module 5.3 Example Scheduler Module
To clarify the responsibilities of a scheduler module, the To clarify the responsibilities of a scheduler module, the following
following describes some of the actions of a simple round robin describes some of the actions of a simple round robin scheduler
scheduler module (RR_sched): module (RR_sched):
- schedule(): RR_sched schedules as many streams as possible in round - schedule(): RR_sched schedules as many streams as possible in round
robin fashion. robin fashion.
- query_share(): RR_sched returns 1/(number of streams in macroflow). - query_share(): RR_sched returns 1/(number of streams in macroflow).
- notify(): RR_sched does nothing. Round robin scheduling is not - notify(): RR_sched does nothing. Round robin scheduling is not
affected by the amount of data sent. affected by the amount of data sent.
7. Security considerations 6. Security Considerations
The CM provides many of the same services that the congestion The CM provides many of the same services that the congestion control
control in TCP provides. As such, it is vulnerable to many of the in TCP provides. As such, it is vulnerable to many of the same
same security problems. For example, incorrect reports of losses security problems. For example, incorrect reports of losses and
and transmissions will give the CM an inaccurate picture of the transmissions will give the CM an inaccurate picture of the network's
network's congestion state. By giving CM a high estimate of congestion state. By giving CM a high estimate of congestion, an
congestion, an attacker can degrade the performance observed by attacker can degrade the performance observed by applications. For
applications. For example, a stream on a host can arbitrarily slow example, a stream on a host can arbitrarily slow down any other
down any other stream on the same macroflow, a form of denial of stream on the same macroflow, a form of denial of service.
service.
The more dangerous form of attack occurs when an application gives The more dangerous form of attack occurs when an application gives
the CM a low estimate of congestion. This would cause CM to be the CM a low estimate of congestion. This would cause CM to be
overly aggressive and allow data to be sent much more quickly than overly aggressive and allow data to be sent much more quickly than
sound congestion control policies would allow. sound congestion control policies would allow.
[Touch97] describes a number of the security problems that arise [Touch97] describes a number of the security problems that arise with
with congestion information sharing. An additional vulnerability congestion information sharing. An additional vulnerability (not
(not covered by [Touch97])) occurs because applications have access covered by [Touch97])) occurs because applications have access
through the CM API to control shared state that will affect other through the CM API to control shared state that will affect other
applications on the same computer. For instance, a poorly applications on the same computer. For instance, a poorly designed,
designed, possibly a compromised, or intentionally malicious UDP possibly a compromised, or intentionally malicious UDP application
application could misuse cm_update() to cause starvation and/or could misuse cm_update() to cause starvation and/or too-aggressive
too-aggressive behavior of others in the macroflow. behavior of others in the macroflow.
8. References 7. References
[Allman99] Allman, M. and Paxson, V., TCP Congestion Control, [Allman99] Allman, M. and Paxson, V., "TCP Congestion
RFC-2581, April 1999. Control", RFC 2581, April 1999.
[Andersen00] Andersen, D., Bansal, D., Curtis, D., Seshan, S., and [Andersen00] Balakrishnan, H., System Support for Bandwidth
Balakrishnan, H., System Support for Bandwidth Management and Management and Content Adaptation in Internet
Content Adaptation in Internet Applications, Proc. 4th Symp. on Applications, Proc. 4th Symp. on Operating Systems
Operating Systems Design and Implementation, San Diego, CA, Design and Implementation, San Diego, CA, October
October 2000. Available from 2000. Available from
http://nms.lcs.mit.edu/papers/cm-osdi2000.html http://nms.lcs.mit.edu/papers/cm-osdi2000.html
[Balakrishnan98] Balakrishnan, H., Padmanabhan, V., Seshan, S., [Balakrishnan98] Balakrishnan, H., Padmanabhan, V., Seshan, S.,
Stemm, M., and Katz, R., "TCP Behavior of a Busy Web Server: Stemm, M., and Katz, R., "TCP Behavior of a Busy
Analysis and Improvements," Proc. IEEE INFOCOM, San Francisco, Web Server: Analysis and Improvements," Proc. IEEE
CA, March 1998. INFOCOM, San Francisco, CA, March 1998.
[Balakrishnan99] Balakrishnan, H., Rahul, H., and Seshan, S., "An [Balakrishnan99] Balakrishnan, H., Rahul, H., and Seshan, S., "An
Integrated Congestion Management Architecture for Internet Integrated Congestion Management Architecture for
Hosts," Proc. ACM SIGCOMM, Cambridge, MA, September 1999. Internet Hosts," Proc. ACM SIGCOMM, Cambridge, MA,
September 1999.
[Bradner96] Bradner, S., "The Internet Standards Process --- [Bradner96] Bradner, S., "The Internet Standards Process ---
Revision 3", BCP 9, RFC-2026, October 1996. Revision 3", BCP 9, RFC 2026, October 1996.
[Bradner97] Bradner, S., "Key words for use in RFCs to Indicate [Bradner97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC-2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[Clark90] Clark, D. and Tennenhouse, D., "Architectural [Clark90] Clark, D. and Tennenhouse, D., "Architectural
Consideration for a New Generation of Protocols", Proc. ACM Consideration for a New Generation of Protocols",
SIGCOMM, Philadelphia, PA, September 1990. Proc. ACM SIGCOMM, Philadelphia, PA, September
1990.
[Eggert00] Eggert, L., Heidemann, J., and Touch, J., "Effects of [Eggert00] Eggert, L., Heidemann, J., and Touch, J., "Effects
Ensemble TCP," ACM Computer Comm. Review, January 2000. of Ensemble TCP," ACM Computer Comm. Review,
January 2000.
[Floyd99a] Floyd, S. and Fall, K.," Promoting the Use of End-to-End [Floyd99a] Floyd, S. and Fall, K.," Promoting the Use of End-
Congestion Control in the Internet," IEEE/ACM Trans. on to-End Congestion Control in the Internet,"
Networking, 7(4), August 1999, pp. 458-472. IEEE/ACM Trans. on Networking, 7(4), August 1999,
pp. 458-472.
[Floyd99b] Floyd, S. and Henderson, T., "The NewReno Modification [Floyd99b] Floyd, S. and T. Henderson,"The New Reno
to TCP's Fast Recovery Algorithm," RFC-2582, April Modification to TCP's Fast Recovery Algorithm," RFC
1999. (Experimental.) 2582, April 1999.
[Jacobson88] Jacobson, V., "Congestion Avoidance and Control," [Jacobson88] Jacobson, V., "Congestion Avoidance and Control,"
Proc. ACM SIGCOMM, Stanford, CA, August 1988. Proc. ACM SIGCOMM, Stanford, CA, August 1988.
[Mahdavi98] Mahdavi, J. and Floyd, S., "The TCP Friendly Website," [Mahdavi98] Mahdavi, J. and Floyd, S., "The TCP Friendly
http://www.psc.edu/networking/tcp_friendly.html Website,"
http://www.psc.edu/networking/tcp_friendly.html
[Mogul90] Mogul, J. and Deering, S., "Path MTU Discovery," [Mogul90] Mogul, J. and S. Deering, "Path MTU Discovery," RFC
RFC-1191, November 1990. 1191, November 1990.
[Padmanabhan98] Padmanabhan, V., "Addressing the Challenges of Web [Padmanabhan98] Padmanabhan, V., "Addressing the Challenges of Web
Data Transport," PhD thesis, Univ. of California, Berkeley, Data Transport," PhD thesis, Univ. of California,
December 1998. Berkeley, December 1998.
[Paxson00] Paxson. V. and Allman, M., "Computing TCP's [Paxson00] Paxson, V. and M. Allman, "Computing TCP's
Retransmission Timer," Internet Draft Retransmission Timer", RFC 2988, November 2000.
draft-paxson-tcp-rto-01.txt, April 2000. (Expires October
2000.)
[Postel81] Postel, J. (ed.), "Transmission Control Protocol," [Postel81] Postel, J., Editor, "Transmission Control
RFC-793, September 1981. Protocol", STD 7, RFC 793, September 1981.
[Ramakrishnan98] Ramakrishnan, K. and Floyd, S., "A Proposal to Add [Ramakrishnan99] Ramakrishnan, K. and Floyd, S., "A Proposal to Add
Explicit Congestion Notification (ECN) to IP," RFC-2481. Explicit Congestion Notification (ECN) to IP," RFC
(Experimental.) 2481, January 1999.
[Stevens94] Stevens, W., TCP/IP Illustrated, Volume 1. [Stevens94] Stevens, W., TCP/IP Illustrated, Volume 1.
Addison-Wesley, Reading, MA, 1994. Addison-Wesley, Reading, MA, 1994.
[Touch97] Touch, J., "TCP Control Block Interdependence," RFC-2140, [Touch97] Touch, J., "TCP Control Block Interdependence", RFC
April 1997. (Informational.) 2140, April 1997.
9. Acknowledgments 8. Acknowledgments
We thank David Andersen, Deepak Bansal, and Dorothy Curtis for We thank David Andersen, Deepak Bansal, and Dorothy Curtis for their
their work on the CM design and implementation. We thank Vern work on the CM design and implementation. We thank Vern Paxson for
Paxson for his detailed comments, feedback, and patience, and Sally his detailed comments, feedback, and patience, and Sally Floyd, Mark
Floyd, Mark Handley, and Steven McCanne for useful feedback on the Handley, and Steven McCanne for useful feedback on the CM
CM architecture. Allison Mankin and Joe Touch provided several architecture. Allison Mankin and Joe Touch provided several useful
useful comments on previous drafts of this document. comments on previous drafts of this document.
10. Authors' addresses 9. Authors' Addresses
Hari Balakrishnan Hari Balakrishnan
Laboratory for Computer Science Laboratory for Computer Science
200 Technology Square 200 Technology Square
Massachusetts Institute of Technology Massachusetts Institute of Technology
Cambridge, MA 02139 Cambridge, MA 02139
Email: hari@lcs.mit.edu
EMail: hari@lcs.mit.edu
Web: http://nms.lcs.mit.edu/~hari/ Web: http://nms.lcs.mit.edu/~hari/
Srinivasan Seshan Srinivasan Seshan
School of Computer Science School of Computer Science
Carnegie Mellon University Carnegie Mellon University
5000 Forbes Ave. 5000 Forbes Ave.
Pittsburgh, PA 15213 Pittsburgh, PA 15213
Email: srini@cmu.edu
EMail: srini@cmu.edu
Web: http://www.cs.cmu.edu/~srini/ Web: http://www.cs.cmu.edu/~srini/
Full Copyright Statement Full Copyright Statement
"Copyright (C) The Internet Society (date). All Rights Reserved. Copyright (C) The Internet Society (2001). All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain others, and derivative works that comment on or otherwise explain it
it or assist in its implementation may be prepared, copied, or assist in its implementation may be prepared, copied, published
published and distributed, in whole or in part, without restriction and distributed, in whole or in part, without restriction of any
of any kind, provided that the above copyright notice and this kind, provided that the above copyright notice and this paragraph are
paragraph are included on all such copies and derivative works. included on all such copies and derivative works. However, this
However, this document itself may not be modified in any way, such document itself may not be modified in any way, such as by removing
as by removing the copyright notice or references to the Internet the copyright notice or references to the Internet Society or other
Society or other Internet organizations, except as needed for the Internet organizations, except as needed for the purpose of
purpose of developing Internet standards in which case the developing Internet standards in which case the procedures for
procedures for copyrights defined in the Internet Standards process copyrights defined in the Internet Standards process must be
must be followed, or as required to translate it into the final followed, or as required to translate it into languages other than
draft output. English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 125 change blocks. 
553 lines changed or deleted 546 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/