draft-ietf-ecm-cm-00.txt   draft-ietf-ecm-cm-01.txt 
Internet Engineering Task Force Hari Balakrishnan Internet Engineering Task Force Hari Balakrishnan
INTERNET DRAFT MIT LCS INTERNET DRAFT MIT LCS
Document: draft-ietf-ecm-cm-00.txt Srinivasan Seshan Document: draft-ietf-ecm-cm-01.txt Srinivasan Seshan
CMU CMU
July, 2000 July, 2000
Expires: January 2001 Expires: January 2001
The Congestion Manager The Congestion Manager
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC-2026 [Bradner96]. all provisions of Section 10 of RFC-2026 [Bradner96].
skipping to change at line 298 skipping to change at line 299
3. Packet size: cm_mtu(cm_streamid) returns the estimated PMTU of 3. Packet size: cm_mtu(cm_streamid) returns the estimated PMTU of
the path between sender and receiver. Internally, this the path between sender and receiver. Internally, this
information SHOULD be obtained via path MTU discovery information SHOULD be obtained via path MTU discovery
[Mogul90]. It MAY be statically configured in the absence of [Mogul90]. It MAY be statically configured in the absence of
such a mechanism. such a mechanism.
4.2 Data transmission 4.2 Data transmission
The CM accommodates two types of adaptive senders, enabling The CM accommodates two types of adaptive senders, enabling
applications to use ALF to dynamically adapt their content based on applications to dynamically adapt their content based on
prevailing network conditions. prevailing network conditions, and supporting ALF-based
applications.
1. Callback-based transmission. The callback-based transmission API 1. Callback-based transmission. The callback-based transmission API
puts the stream in firm control of deciding what to transmit at puts the stream in firm control of deciding what to transmit at
each point in time. To achieve this, the CM does not buffer any each point in time. To achieve this, the CM does not buffer any
data; instead, it allows streams the opportunity to adapt to data; instead, it allows streams the opportunity to adapt to
unexpected network changes at the last possible instant. Thus, unexpected network changes at the last possible instant. Thus,
this enables streams to "pull out" and repacketize data upon this enables streams to "pull out" and repacketize data upon
learning about any rate change, which is hard to do once the data learning about any rate change, which is hard to do once the data
has been buffered. A stream wishing to send data in this style has been buffered. A stream wishing to send data in this style
MUST call cm_request(i32 cm_streamid). After some time, depending on MUST call cm_request(i32 cm_streamid). After some time, depending
the rate, the CM invokes a callback using cmapp_send(), which is a on the rate, the CM invokes a callback using cmapp_send(), which is
grant for the stream to send up to PMTU bytes. The callback-style a grant for the stream to send up to PMTU bytes. The
API is the recommended choice for ALF-based streams. Note that callback-style API is the recommended choice for ALF-based streams.
cm_request() does not take the number of bytes or MTU-sized units Note that cm_request() does not take the number of bytes or
as an argument; each call to cm_request() is an implicit request MTU-sized units as an argument; each call to cm_request() is an
for sending up to PMTU bytes. Section 5.2 describes how these implicit request for sending up to PMTU bytes. Section 4.3
requests are scheduled and callbacks made. discusses the time duration for which the transmission grant is
valid, while Section 5.2 describes how these requests are scheduled
and callbacks made.
2. Synchronous-style. The above callback-based API accommodates a 2. Synchronous-style. The above callback-based API accommodates a
class of ALF streams that are "asynchronous." Asynchronous class of ALF streams that are "asynchronous." Asynchronous
transmitters do not transmit based on a periodic clock, but do so transmitters do not transmit based on a periodic clock, but do so
triggered by asynchronous events like file reads or captured triggered by asynchronous events like file reads or captured
frames. On the other hand, there are many streams that are frames. On the other hand, there are many streams that are
"synchronous" transmitters, which transmit periodically based on "synchronous" transmitters, which transmit periodically based on
their own internal timers (e.g., an audio senders that sends at a their own internal timers (e.g., an audio senders that sends at a
constant sampling rate). While CM callbacks could be configured to constant sampling rate). While CM callbacks could be configured to
periodically interrupt such transmitters, the transmit loop of such periodically interrupt such transmitters, the transmit loop of such
applications is less affected if they retain their original applications is less affected if they retain their original
timer-based loop. In addition, it complicates the CM API to have a timer-based loop. In addition, it complicates the CM API to have a
stream express the periodicity and granularity of its callbacks. stream express the periodicity and granularity of its callbacks.
Thus, the CM exports an API that allows such streams to be informed Thus, the CM exports an API that allows such streams to be informed
of changes in rates using the cmapp_update(u64 newrate, u32 srtt, of changes in rates using the cmapp_update(u64 newrate, u32 srtt,
u32 rttdev) callback function, where newrate is the new rate in u32 rttdev) callback function, where newrate is the new rate in
bits per second for this stream, srtt is the current smoothed round bits per second for this stream, srtt is the current smoothed round
trip time estimate in microseconds, and rttdev is the smoothed trip time estimate in microseconds, and rttdev is the smoothed
linear deviation in the round-trip time estimate. In response, the linear deviation in the round-trip time estimate. The newrate
stream MUST adapt its packet size or change its timer interval to value reports an instantaneous rate calculated, for example, by
conform to (not exceed) the allowed rate. Of course, it may choose taking the ratio of cwnd and srtt, and dividing by the fraction of
not to use all of this rate. that ratio allocated to the stream. In response, the stream MUST
adapt its packet size or change its timer interval to conform to
(i.e., not exceed) the allowed rate. Of course, it may choose not
to use all of this rate. Note that the CM is not on the data path
of the actual transmission.
To avoid unnecessary cmapp_update() callbacks that the application To avoid unnecessary cmapp_update() callbacks that the application
will only ignore, the stream can use the cm_thresh(float will only ignore, the stream can use the cm_thresh(float
rate_downthresh, float rate_upthresh, float rtt_downthresh, float rate_downthresh, float rate_upthresh, float rtt_downthresh, float
rtt_upthresh) function at any stage in its execution. In response, rtt_upthresh) function at any stage in its execution. In response,
the CM will invoke the callback only when the rate decreases to the CM will invoke the callback only when the rate decreases to
less than (rate_downthresh * lastrate) or increases to more than less than (rate_downthresh * lastrate) or increases to more than
(rate_upthresh * lastrate), where lastrate is the rate last (rate_upthresh * lastrate), where lastrate is the rate last
notified to the stream, or when the round-trip time changes notified to the stream, or when the round-trip time changes
correspondingly by the requisite thresholds. This information is correspondingly by the requisite thresholds. This information is
skipping to change at line 372 skipping to change at line 380
rate is useful for asynchronous streams as well as synchronous rate is useful for asynchronous streams as well as synchronous
ones; e.g., an asynchronous Web server disseminating images using ones; e.g., an asynchronous Web server disseminating images using
TCP may use cmapp_send() to schedule its transmissions and TCP may use cmapp_send() to schedule its transmissions and
cmapp_update() to decide whether to send a low-resolution or cmapp_update() to decide whether to send a low-resolution or
high-resolution image. A TCP implementation using the CM is high-resolution image. A TCP implementation using the CM is
described in Section 6.1.1, where the benefit of the cm_request() described in Section 6.1.1, where the benefit of the cm_request()
callback API for TCP will become apparent. callback API for TCP will become apparent.
The reader will notice that the basic CM API does not provide an The reader will notice that the basic CM API does not provide an
interface for buffered congestion-controlled transmissions. This interface for buffered congestion-controlled transmissions. This
is intentional, since such a transmission mode can be implemented is intentional, since this transmission mode can be implemented
using the callback-based primitive. Section 6.1.2 describes how using the callback-based primitive. Section 6.1.2 describes how
congestion-controlled UDP sockets may be implemented using the CM congestion-controlled UDP sockets may be implemented using the CM
API. API.
4.3 Application notification 4.3 Application notification
When a stream receives feedback from receivers, it MUST use When a stream receives feedback from receivers, it MUST use
cm_update(i32 cm_streamid, u32 nrecd, u32 nlost, u8 lossmode, i32 cm_update(i32 cm_streamid, u32 nrecd, u32 nlost, u8 lossmode, i32
rtt) to inform the CM about events such as congestion losses, rtt) to inform the CM about events such as congestion losses,
successful receptions, type of loss (timeout event, Explicit successful receptions, type of loss (timeout event, Explicit
skipping to change at line 400 skipping to change at line 408
round-trip sample was obtained by the application. The lossmode round-trip sample was obtained by the application. The lossmode
parameter provides an indicator of how a loss was detected. A parameter provides an indicator of how a loss was detected. A
value of CM_PERSISTENT indicates that the application believes value of CM_PERSISTENT indicates that the application believes
congestion to be severe, e.g., a TCP that has experienced a congestion to be severe, e.g., a TCP that has experienced a
timeout. A value of CM_TRANSIENT indicates that the application timeout. A value of CM_TRANSIENT indicates that the application
believes that the congestion is not severe, e.g., a TCP loss believes that the congestion is not severe, e.g., a TCP loss
detected using duplicate (selective) acknowledgements or other detected using duplicate (selective) acknowledgements or other
data-driven techniques. A value of CM_ECN indicates that the data-driven techniques. A value of CM_ECN indicates that the
receiver echoed an explicit congestion notification message. receiver echoed an explicit congestion notification message.
Finally, a value of CM_NOLOSS indicates that no congestion-related Finally, a value of CM_NOLOSS indicates that no congestion-related
loss has occurred. loss has occurred. The lossmode parameter MUST be reported as a
bit-vector where the bits correspond to CM_PERSISTENT,
CM_TRANSIENT, and CM_ECN.
cm_notify(i32 cm_streamid, u32 nsent) MUST be called when data is cm_notify(i32 cm_streamid, u32 nsent) MUST be called when data is
transmitted from the host (e.g., in the IP output routine) to transmitted from the host (e.g., in the IP output routine) to
inform the CM that nsent bytes were just transmitted on a given inform the CM that nsent bytes were just transmitted on a given
stream. This allows the CM to update its estimate of the number of stream. This allows the CM to update its estimate of the number of
outstanding bytes for the macroflow and for the stream. If a stream outstanding bytes for the macroflow and for the stream.
does not transmit any data upon a cmapp_send() callback invocation,
it SHOULD call cm_notify(stream_info, 0) to allow the CM to permit A cmapp_send() grant from the CM to an application is valid only
other streams in the macroflow to transmit data. The CM congestion for an expiration time, equal to the larger of the round-trip time
controller should be robust to applications forgetting to invoke and an implementation-dependent threshold communicated as an
cm_notify(stream_info, 0) correctly. argument to the cmapp_send() callback function. The application
MUST NOT send data based on this callback after this time has
expired. Furthermore, if the application decides not to send data
after receiving this callback, it SHOULD call
cm_notify(stream_info, 0) to allow the CM to permit other streams
in the macroflow to transmit data. The CM congestion controller
MUST be robust to applications forgetting to invoke
cm_notify(stream_info, 0) correctly, or applications that crash or
disappear after having made a cm_request() call.
4.4 Querying 4.4 Querying
If applications wish to learn about per-stream available bandwidth If applications wish to learn about per-stream available bandwidth
and round-trip time, they can use the CM's cm_query(i32 and round-trip time, they can use the CM's cm_query(i32
cm_streamid, i64* rate, i32* srtt, i32* rttdev) call, which fills cm_streamid, i64* rate, i32* srtt, i32* rttdev) call, which fills
in the desired quantities. If the CM does not have valid estimates in the desired quantities. If the CM does not have valid estimates
for the macroflow, it fills in negative values for the rate, srtt, for the macroflow, it fills in negative values for the rate, srtt,
and rttdev. and rttdev.
skipping to change at line 439 skipping to change at line 457
cm_getmacroflow(i32 cm_streamid) returns a unique i32 macroflow cm_getmacroflow(i32 cm_streamid) returns a unique i32 macroflow
identifier. cm_setmacroflow(i32 cm_macroflowid, i32 cm_streamid) identifier. cm_setmacroflow(i32 cm_macroflowid, i32 cm_streamid)
sets the macroflow of the stream cm_streamid to cm_macroflowid. If the sets the macroflow of the stream cm_streamid to cm_macroflowid. If the
cm_macroflowid that is passed to cm_setmacroflow() is -1, then a cm_macroflowid that is passed to cm_setmacroflow() is -1, then a
new macroflow is constructed and this is returned to the caller. new macroflow is constructed and this is returned to the caller.
Each call to cm_setmacroflow() overrides the previous macroflow Each call to cm_setmacroflow() overrides the previous macroflow
association for the stream, should one exist. association for the stream, should one exist.
The default suggested aggregation method is to aggregate by The default suggested aggregation method is to aggregate by
destination; i.e., all streams to the same destination are destination IP address; i.e., all streams to the same destination
aggregated to a single macroflow by default. The cm_getmacroflow() address are aggregated to a single macroflow by default. The
and cm_setmacroflow() calls can then be used to change this as cm_getmacroflow() and cm_setmacroflow() calls can then be used to
needed. change this as needed.
The objective of this interface is to set up sharing of groups not
sharing policy of relative weights of streams in a macroflow. The
latter requires the scheduler to provide an interface to set
sharing policy. However, because we want to support many different
schedulers (each of which may need different information to set
policy), we do not specify a complete API to the scheduler (but see
Section 5.2). A later guideline document intends to describe a few
simple schedulers (e.g., weighted round-robin, hierarchical
scheduling) and the API they export to provide relative
prioritization.
5. CM internals 5. CM internals
This section describes the internal components of the CM. It This section describes the internal components of the CM. It
includes a Congestion Controller and a Scheduler, with includes a Congestion Controller and a Scheduler, with
well-defined, abstract interfaces exported by them. well-defined, abstract interfaces exported by them.
5.1 Congestion controller 5.1 Congestion controller
Associated with each macroflow is a congestion control algorithm; Associated with each macroflow is a congestion control algorithm;
skipping to change at line 768 skipping to change at line 797
The CM provides many of the same services that the congestion The CM provides many of the same services that the congestion
control in TCP provides. As such, it is vulnerable to many of the control in TCP provides. As such, it is vulnerable to many of the
same security problems. For example, incorrect reports of losses same security problems. For example, incorrect reports of losses
and transmissions will give the CM an inaccurate picture of the and transmissions will give the CM an inaccurate picture of the
network's congestion state. By giving CM a high estimate of network's congestion state. By giving CM a high estimate of
congestion, an attacker can degrade the performance observed by congestion, an attacker can degrade the performance observed by
applications. The more dangerous form of attack is giving CM a low applications. The more dangerous form of attack is giving CM a low
estimate of congestion. This would cause CM to be overly estimate of congestion. This would cause CM to be overly
aggressive and allow data to be sent much more quickly than sound aggressive and allow data to be sent much more quickly than sound
congestion control policies would allow. congestion control policies would allow. [Touch97] describes the
security problems that arise with congestion information sharing in
more detail.
8. References 8. References
[Allman99] Allman, M. and Paxson, V., TCP Congestion Control, [Allman99] Allman, M. and Paxson, V., TCP Congestion Control,
RFC-2581, April 1999. RFC-2581, April 1999.
[Andersen00] Andersen, D., Bansal, D., Curtis, D., Seshan, S., and [Andersen00] Andersen, D., Bansal, D., Curtis, D., Seshan, S., and
Balakrishnan, H., System Support for Bandwidth Management and Balakrishnan, H., System Support for Bandwidth Management and
Content Adaptation in Internet Applications, Proc. 4th Symp. on Content Adaptation in Internet Applications, Proc. 4th Symp. on
Operating Systems Design and Implementation, San Diego, CA, Operating Systems Design and Implementation, San Diego, CA,
 End of changes. 10 change blocks. 
28 lines changed or deleted 59 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/