draft-ietf-ecm-cm-03.txt   draft-ietf-ecm-cm-04.txt 
Internet Engineering Task Force Hari Balakrishnan Internet Engineering Task Force Hari Balakrishnan
INTERNET DRAFT MIT LCS INTERNET DRAFT MIT LCS
Document: draft-ietf-ecm-cm-03.txt Srinivasan Seshan Document: draft-ietf-ecm-cm-04.txt Srinivasan Seshan
CMU CMU
November, 2000 May, 2001
Expires: November 2001
The Congestion Manager The Congestion Manager
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC-2026 [Bradner96]. all provisions of Section 10 of RFC-2026 [Bradner96].
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
skipping to change at line 66 skipping to change at line 68
this document are to be interpreted as described in RFC-2119 this document are to be interpreted as described in RFC-2119
[Bradner97]. [Bradner97].
STREAM STREAM
A group of packets that all share the same source and A group of packets that all share the same source and
destination IP address, IP type-of-service, transport destination IP address, IP type-of-service, transport
protocol, and source and destination transport-layer port protocol, and source and destination transport-layer port
numbers. numbers.
MACROFLOW MACROFLOW
A group of streams that all use the same congestion management A group of CM-enabled streams that all use the same congestion
and scheduling algorithms, and share congestion state management and scheduling algorithms, and share congestion
information. Currently, streams destined to different state information. Currently, streams destined to different
receivers belong to different macroflows. Streams destined to receivers belong to different macroflows. Streams destined to
the same receiver MAY belong to different macroflows. Streams the same receiver MAY belong to different macroflows. When
that experience identical congestion behavior in the Internet the Congestion Manager is in use, streams that experience
and use the same congestion control algorithm SHOULD belong to identical congestion behavior and use the same congestion
the same macroflow. control algorithm SHOULD belong to the same macroflow.
APPLICATION APPLICATION
Any software module that uses the CM. This includes Any software module that uses the CM. This includes
user-level applications such as Web servers or audio/video user-level applications such as Web servers or audio/video
servers, as well as in-kernel protocols such as TCP [Postel81] servers, as well as in-kernel protocols such as TCP [Postel81]
that use the CM for congestion control. that use the CM for congestion control.
WELL-BEHAVED APPLICATION WELL-BEHAVED APPLICATION
An application that only transmits when allowed by the CM and An application that only transmits when allowed by the CM and
accurately accounts for all data that it has sent to the accurately accounts for all data that it has sent to the
skipping to change at line 128 skipping to change at line 130
The CM is an end-system module that enables an ensemble of multiple The CM is an end-system module that enables an ensemble of multiple
concurrent streams to perform stable congestion avoidance and concurrent streams to perform stable congestion avoidance and
control, and allows applications to easily adapt their control, and allows applications to easily adapt their
transmissions to prevailing network conditions. It integrates transmissions to prevailing network conditions. It integrates
congestion management across all applications and transport congestion management across all applications and transport
protocols. It maintains congestion parameters (available aggregate protocols. It maintains congestion parameters (available aggregate
and per-stream bandwidth, per-receiver round-trip times, etc.) and and per-stream bandwidth, per-receiver round-trip times, etc.) and
exports an API that enables applications to learn about network exports an API that enables applications to learn about network
characteristics, pass information to the CM, share congestion characteristics, pass information to the CM, share congestion
information with each other, and schedule data transmissions. All information with each other, and schedule data transmissions. When
data transmissions MUST be done with the explicit consent of the CM the CM is used, all data transmissions subject to the CM must be
via this API to ensure proper congestion behavior. done with the explicit consent of the CM via this API to ensure
proper congestion behavior.
Systems MAY choose to use CM, and if so they MUST follow this
specification.
This document focuses on applications and networks where the This document focuses on applications and networks where the
following conditions hold: following conditions hold:
1. Applications are well-behaved with their own independent 1. Applications are well-behaved with their own independent
per-byte or per-packet sequence number information, and use the per-byte or per-packet sequence number information, and use the
CM API to update internal state in the CM. CM API to update internal state in the CM.
2. Networks are best-effort without service discrimination or 2. Networks are best-effort without service discrimination or
reservations. In particular, it does not address situations reservations. In particular, it does not address situations
skipping to change at line 203 skipping to change at line 209
increasingly common and often do not implement proper congestion increasingly common and often do not implement proper congestion
management, and (ii) it provides an API for applications to adapt management, and (ii) it provides an API for applications to adapt
their transmissions to current network conditions. For an extended their transmissions to current network conditions. For an extended
discussion of the motivation for the CM, its architecture, API, discussion of the motivation for the CM, its architecture, API,
and algorithms, see [Balakrishnan99]; for a description of an and algorithms, see [Balakrishnan99]; for a description of an
implementation and performance results, see [Andersen00]. implementation and performance results, see [Andersen00].
The resulting end-host protocol architecture at the sender is shown The resulting end-host protocol architecture at the sender is shown
in Figure 1. The CM helps achieve network stability by in Figure 1. The CM helps achieve network stability by
implementing stable congestion avoidance and control algorithms implementing stable congestion avoidance and control algorithms
that are "TCP-friendly" [Mahdavi98] based on algorithms described in that are "TCP-friendly" [Mahdavi98] based on algorithms described
[Allman99]. However, it does not attempt to enforce proper in [Allman99]. However, it does not attempt to enforce proper
congestion behavior for all applications (but it does not preclude congestion behavior for all applications (but it does not preclude
a policer on the host that performs this task). Note that while a policer on the host that performs this task). Note that while
the policer at the end-host can use CM, the network has to be the policer at the end-host can use CM, the network has to be
protected against compromises to the CM and the policer at the end protected against compromises to the CM and the policer at the end
hosts, a task that requires router machinery [Floyd99a]. We do not hosts, a task that requires router machinery [Floyd99a]. We do not
address this issue further in this document. address this issue further in this document.
|--------| |--------| |--------| |--------| |--------------| |--------| |--------| |--------| |--------| |--------------|
| HTTP | | FTP | | RTP 1 | | RTP 2 | | | | HTTP | | FTP | | RTP 1 | | RTP 2 | | |
|--------| |--------| |--------| |--------| | | |--------| |--------| |--------| |--------| | |
skipping to change at line 256 skipping to change at line 262
feedback about its past transmissions from applications themselves feedback about its past transmissions from applications themselves
via the API. The scheduler apportions available bandwidth amongst via the API. The scheduler apportions available bandwidth amongst
the different streams within each macroflow and notifies the different streams within each macroflow and notifies
applications when they are permitted to send data. This document applications when they are permitted to send data. This document
focuses on well-behaved applications; a future one will describe focuses on well-behaved applications; a future one will describe
the sender-receiver protocol and header formats that will handle the sender-receiver protocol and header formats that will handle
applications that do not incorporate their own feedback to the CM. applications that do not incorporate their own feedback to the CM.
4. CM API 4. CM API
By convention, the IETF does not treat Application Programming
Interfaces as standards track. However, it is considered important
to have the CM API and CM algorithm requirements in one coherent
document. The following section on the CM API uses the terms MUST,
SHOULD, etc. but the terms are meant to apply within the context of
an implementation of the CM API. The section does not apply to
congestion control implementations in general, only to those
implementations offering the CM API.
Using the CM API, streams can determine their share of the available Using the CM API, streams can determine their share of the available
bandwidth, request and have their data transmissions scheduled, bandwidth, request and have their data transmissions scheduled,
inform the CM about successful transmissions, and be informed when inform the CM about successful transmissions, and be informed when
the CM's estimate of path bandwidth changes. Thus, the CM frees the CM's estimate of path bandwidth changes. Thus, the CM frees
applications from having to maintain information about the state of applications from having to maintain information about the state of
congestion and available bandwidth along any path. congestion and available bandwidth along any path.
The function prototypes below follow standard C language The function prototypes below follow standard C language
convention. We emphasize that these API functions are abstract convention. We emphasize that these API functions are abstract
calls and conformant CM implementations may differ in specific calls and conformant CM implementations may differ in specific
skipping to change at line 584 skipping to change at line 599
The Scheduler MAY implement many additional interfaces. As The Scheduler MAY implement many additional interfaces. As
experience with CM schedulers increases, future documents may experience with CM schedulers increases, future documents may
make additions and/or changes to some parts of the scheduler make additions and/or changes to some parts of the scheduler
API. API.
6. Examples 6. Examples
6.1 Example applications 6.1 Example applications
The following describes the possible use of the CM API by an This section describes three possible uses of the CM API by
asynchronous application (an implementation of a TCP sender) and a applications. We describe two asynchronous applications---an
synchronous application (an audio server). More details of these implementation of a TCP sender and an implementation of
congestion-controlled UDP sockets, and a synchronous
application---a streaming audio server. More details of these
applications and CM implementation optimizations for efficient applications and CM implementation optimizations for efficient
operation are described in [Andersen00]. We emphasize that the operation are described in [Andersen00].
protocols in this section are examples and suggestions for
implementation, rather than requirements of any conformant All applications that use the CM MUST incorporate feedback from the
receiver. For example, it must periodically (typically once or
twice per round trip time) determine how many of its packets
arrived at the receiver. When the source gets this feedback, it
MUST use cm_update() to inform the CM of this new information.
This results in the CM updating ownd and may result in the CM
changing its estimates and calling cmapp_update() of the streams of
the macroflow.
The protocols in this section are examples and suggestions for
implementation, rather than requirements for any conformant
implementation. implementation.
6.1.1 TCP 6.1.1 TCP
A TCP MUST use the cmapp_send() callback API. TCP only identifies A TCP implementation that uses CM should use the cmapp_send()
which data it should send upon the arrival of an acknowledgement or callback API. TCP only identifies which data it should send upon
expiration of a timer. As a result, it requires tight control over the arrival of an acknowledgement or expiration of a timer. As a
when and if new data or retransmissions are sent. result, it requires tight control over when and if new data or
retransmissions are sent.
When TCP either connects to or accepts a connection from another When TCP either connects to or accepts a connection from another
host, it performs a cm_open() call to associate the TCP connection host, it performs a cm_open() call to associate the TCP connection
with a cm_streamid. with a cm_streamid.
Once a connection is established, the CM is used to control the Once a connection is established, the CM is used to control the
transmission of outgoing data. The CM eliminates the need for transmission of outgoing data. The CM eliminates the need for
tracking and reacting to congestion in TCP, because the CM and its tracking and reacting to congestion in TCP, because the CM and its
transmission API ensure proper congestion behavior. Loss recovery transmission API ensure proper congestion behavior. Loss recovery
is still performed by TCP based on fast retransmissions and is still performed by TCP based on fast retransmissions and
skipping to change at line 719 skipping to change at line 747
but instead of immediately sending the data from the kernel packet but instead of immediately sending the data from the kernel packet
queue to lower layers for transmission, the buffered socket queue to lower layers for transmission, the buffered socket
implementation makes calls to the API exported by the CM inside the implementation makes calls to the API exported by the CM inside the
kernel and gets callbacks from the CM. When a CM UDP socket is kernel and gets callbacks from the CM. When a CM UDP socket is
created, it is bound to a particular stream. Later, when data is created, it is bound to a particular stream. Later, when data is
added to the packet queue, cm_request() is called on the stream added to the packet queue, cm_request() is called on the stream
associated with the socket. When the CM schedules this stream for associated with the socket. When the CM schedules this stream for
transmission, it calls udp_ccappsend() in the UDP module. This transmission, it calls udp_ccappsend() in the UDP module. This
function transmits one MTU from the packet queue, and schedules the function transmits one MTU from the packet queue, and schedules the
transmission of any remaining packets. The in-kernel transmission of any remaining packets. The in-kernel
implementation of the CM UDP API SHOULD NOT require any additional implementation of the CM UDP API should not require any additional
data copies and SHOULD support all standard UDP options. Modifying data copies and should support all standard UDP options. Modifying
existing applications to use congestion-controlled UDP requires the existing applications to use congestion-controlled UDP requires the
implementation of a new socket option on the socket. To work implementation of a new socket option on the socket. To work
correctly, the sender MUST obtain feedback about congestion. This correctly, the sender must obtain feedback about congestion. This
can be done in at least two ways: (i) the UDP receiver application can be done in at least two ways: (i) the UDP receiver application
can provide feedback to the sender application, which will inform can provide feedback to the sender application, which will inform
the CM of network conditions using cm_update(); (ii) the UDP the CM of network conditions using cm_update(); (ii) the UDP
receiver implementation can provide feedback to the sending UDP. receiver implementation can provide feedback to the sending UDP.
Note that this latter alternative requires changes to the Note that this latter alternative requires changes to the
receiver's network stack and the sender UDP cannot assume that all receiver's network stack and the sender UDP cannot assume that all
receivers support this option without explicit negotiation. receivers support this option without explicit negotiation.
6.1.3 Audio server 6.1.3 Audio server
skipping to change at line 756 skipping to change at line 784
When the source first starts, it uses the cm_query() call to get an When the source first starts, it uses the cm_query() call to get an
initial estimate of network bandwidth and delay. If some other initial estimate of network bandwidth and delay. If some other
streams on that macroflow have already been active, then it gets an streams on that macroflow have already been active, then it gets an
initial estimate that is valid; otherwise, it gets negative values, initial estimate that is valid; otherwise, it gets negative values,
which it ignores. It then chooses an encoding that does not exceed which it ignores. It then chooses an encoding that does not exceed
these estimates (or, in the case of an invalid estimate, uses these estimates (or, in the case of an invalid estimate, uses
application-specific initial values) and begins transmitting application-specific initial values) and begins transmitting
data. The application also implements the cmapp_update() callback. data. The application also implements the cmapp_update() callback.
When the CM determines that network characteristics have changed, When the CM determines that network characteristics have changed,
it calls the application's cmapp_update() function and passes it a it calls the application's cmapp_update() function and passes it a
new rate and round-trip time estimate. The application MUST change new rate and round-trip time estimate. The application must change
its choice of audio encoding to ensure that it does not exceed its choice of audio encoding to ensure that it does not exceed
these new estimates. these new estimates.
To use the CM, the application MUST incorporate feedback from the 6.2 Example congestion control module
receiver. In this example, it must periodically (typically once or
twice per round trip time) determine how many of its packets
arrived at the receiver. When the source gets this feedback, it
MUST use cm_update() to inform the CM of this new information.
This results in the CM updating ownd and may result in CM changing
its estimates and calling cmapp_update() of the streams of the
macroflow.
6.3 Example congestion control module
To illustrate the responsibilities of a congestion control module, To illustrate the responsibilities of a congestion control module,
the following describes some of the actions of a simple TCP-like the following describes some of the actions of a simple TCP-like
congestion control module that implements Additive Increase congestion control module that implements Additive Increase
Multiplicative Decrease congestion control (AIMD_CC): Multiplicative Decrease congestion control (AIMD_CC):
- query(): AIMD_CC returns the current congestion window (cwnd) - query(): AIMD_CC returns the current congestion window (cwnd)
divided by the smoothed rtt (srtt) as its bandwidth estimate. It divided by the smoothed rtt (srtt) as its bandwidth estimate. It
returns the smoothed rtt estimate as srtt. returns the smoothed rtt estimate as srtt.
skipping to change at line 798 skipping to change at line 817
CM_EXPLICIT_CONGESTION. AIMD_CC also sets its internal ssthresh CM_EXPLICIT_CONGESTION. AIMD_CC also sets its internal ssthresh
variable to cwnd/2. If no loss had occurred, AIMD_CC mimics TCP variable to cwnd/2. If no loss had occurred, AIMD_CC mimics TCP
slow start and linear growth modes. It increments cwnd by nsent slow start and linear growth modes. It increments cwnd by nsent
when cwnd < ssthresh (bounded by a maximum of ssthresh-cwnd) and when cwnd < ssthresh (bounded by a maximum of ssthresh-cwnd) and
by nsent * MTU/cwnd when cwnd > ssthresh. by nsent * MTU/cwnd when cwnd > ssthresh.
- When cwnd or ownd are updated and indicate that at least one MTU - When cwnd or ownd are updated and indicate that at least one MTU
may be transmitted, AIMD_CC calls the CM to schedule a may be transmitted, AIMD_CC calls the CM to schedule a
transmission. transmission.
6.4 Example Scheduler Module 6.3 Example Scheduler Module
To clarify the responsibilities of a scheduler module, the To clarify the responsibilities of a scheduler module, the
following describes some of the actions of a simple round robin following describes some of the actions of a simple round robin
scheduler module (RR_sched): scheduler module (RR_sched):
- schedule(): RR_sched schedules as many streams as possible in round - schedule(): RR_sched schedules as many streams as possible in round
robin fashion. robin fashion.
- query_share(): RR_sched returns 1/(number of streams in macroflow). - query_share(): RR_sched returns 1/(number of streams in macroflow).
skipping to change at line 820 skipping to change at line 839
affected by the amount of data sent. affected by the amount of data sent.
7. Security considerations 7. Security considerations
The CM provides many of the same services that the congestion The CM provides many of the same services that the congestion
control in TCP provides. As such, it is vulnerable to many of the control in TCP provides. As such, it is vulnerable to many of the
same security problems. For example, incorrect reports of losses same security problems. For example, incorrect reports of losses
and transmissions will give the CM an inaccurate picture of the and transmissions will give the CM an inaccurate picture of the
network's congestion state. By giving CM a high estimate of network's congestion state. By giving CM a high estimate of
congestion, an attacker can degrade the performance observed by congestion, an attacker can degrade the performance observed by
applications. The more dangerous form of attack is giving CM a low applications. For example, a stream on a host can arbitrarily slow
estimate of congestion. This would cause CM to be overly down any other stream on the same macroflow, a form of denial of
aggressive and allow data to be sent much more quickly than sound service.
congestion control policies would allow. [Touch97] describes the
security problems that arise with congestion information sharing in The more dangerous form of attack occurs when an application gives
more detail. the CM a low estimate of congestion. This would cause CM to be
overly aggressive and allow data to be sent much more quickly than
sound congestion control policies would allow.
[Touch97] describes a number of the security problems that arise
with congestion information sharing. An additional vulnerability
(not covered by [Touch97])) occurs because applications have access
through the CM API to control shared state that will affect other
applications on the same computer. For instance, a poorly
designed, possibly a compromised, or intentionally malicious UDP
application could misuse cm_update() to cause starvation and/or
too-aggressive behavior of others in the macroflow.
8. References 8. References
[Allman99] Allman, M. and Paxson, V., TCP Congestion Control, [Allman99] Allman, M. and Paxson, V., TCP Congestion Control,
RFC-2581, April 1999. RFC-2581, April 1999.
[Andersen00] Andersen, D., Bansal, D., Curtis, D., Seshan, S., and [Andersen00] Andersen, D., Bansal, D., Curtis, D., Seshan, S., and
Balakrishnan, H., System Support for Bandwidth Management and Balakrishnan, H., System Support for Bandwidth Management and
Content Adaptation in Internet Applications, Proc. 4th Symp. on Content Adaptation in Internet Applications, Proc. 4th Symp. on
Operating Systems Design and Implementation, San Diego, CA, Operating Systems Design and Implementation, San Diego, CA,
skipping to change at line 904 skipping to change at line 934
[Stevens94] Stevens, W., TCP/IP Illustrated, Volume 1. [Stevens94] Stevens, W., TCP/IP Illustrated, Volume 1.
Addison-Wesley, Reading, MA, 1994. Addison-Wesley, Reading, MA, 1994.
[Touch97] Touch, J., "TCP Control Block Interdependence," RFC-2140, [Touch97] Touch, J., "TCP Control Block Interdependence," RFC-2140,
April 1997. (Informational.) April 1997. (Informational.)
9. Acknowledgments 9. Acknowledgments
We thank David Andersen, Deepak Bansal, and Dorothy Curtis for We thank David Andersen, Deepak Bansal, and Dorothy Curtis for
their work on the CM design and implementation. We thank Vern their work on the CM design and implementation. We thank Vern
Paxson for his detailed comments and patience, and Sally Floyd, Paxson for his detailed comments, feedback, and patience, and Sally
Mark Handley, and Steven McCanne for useful feedback on the CM Floyd, Mark Handley, and Steven McCanne for useful feedback on the
architecture. CM architecture. Allison Mankin and Joe Touch provided several
useful comments on previous drafts of this document.
10. Authors' addresses 10. Authors' addresses
Hari Balakrishnan Hari Balakrishnan
Laboratory for Computer Science Laboratory for Computer Science
200 Technology Square 200 Technology Square
Massachusetts Institute of Technology Massachusetts Institute of Technology
Cambridge, MA 02139 Cambridge, MA 02139
Email: hari@lcs.mit.edu Email: hari@lcs.mit.edu
Web: http://nms.lcs.mit.edu/~hari/ Web: http://nms.lcs.mit.edu/~hari/
 End of changes. 17 change blocks. 
48 lines changed or deleted 79 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/