draft-ietf-mptcp-congestion-00.txt   draft-ietf-mptcp-congestion-01.txt 
Internet Engineering Task Force C. Raiciu Internet Engineering Task Force C. Raiciu
Internet-Draft M. Handley Internet-Draft M. Handley
Intended status: Experimental D. Wischik Intended status: Experimental D. Wischik
Expires: January 7, 2011 University College London Expires: July 11, 2011 University College London
July 06, 2010 January 07, 2011
Coupled Multipath-Aware Congestion Control Coupled Congestion Control for Multipath Transport Protocols
draft-ietf-mptcp-congestion-00 draft-ietf-mptcp-congestion-01
Abstract Abstract
Often endpoints are connected by multiple paths, but communications Often endpoints are connected by multiple paths, but communications
are usually restricted to a single path per connection. Resource are usually restricted to a single path per connection. Resource
usage within the network would be more efficient were it possible for usage within the network would be more efficient were it possible for
these multiple paths to be used concurrently. Multipath TCP is a these multiple paths to be used concurrently. Multipath TCP is a
proposal to achieve multipath transport in TCP. proposal to achieve multipath transport in TCP.
New congestion control algorithms are needed for multipath transport New congestion control algorithms are needed for multipath transport
skipping to change at page 1, line 41 skipping to change at page 1, line 41
This document presents a congestion control algorithm which couples This document presents a congestion control algorithm which couples
the congestion control algorithms running on different subflows by the congestion control algorithms running on different subflows by
linking their increase functions, and dynamically controls the linking their increase functions, and dynamically controls the
overall aggresiveness of the multipath flow. The result is a overall aggresiveness of the multipath flow. The result is a
practical algorithm that is fair to TCP at bottlenecks while moving practical algorithm that is fair to TCP at bottlenecks while moving
traffic away from congested links. traffic away from congested links.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF), its areas, and its working groups. Note that
working documents as Internet-Drafts. The list of current Internet- other groups may also distribute working documents as Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 7, 2011. The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on July 11, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the BSD License.
Table of Contents Table of Contents
1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Coupled Congestion Control Algorithm . . . . . . . . . . . . . 5 3. Coupled Congestion Control Algorithm . . . . . . . . . . . . . 5
4. Implementation Considerations . . . . . . . . . . . . . . . . 7 4. Implementation Considerations . . . . . . . . . . . . . . . . 7
4.1. Implementation Considerations when CWND is Expressed 4.1. Implementation Considerations when CWND is Expressed
in Packets . . . . . . . . . . . . . . . . . . . . . . . . 9 in Packets . . . . . . . . . . . . . . . . . . . . . . . . 9
5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.1. Normative References . . . . . . . . . . . . . . . . . . . 10 9.1. Normative References . . . . . . . . . . . . . . . . . . . 11
9.2. Informative References . . . . . . . . . . . . . . . . . . 10 9.2. Informative References . . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
1. Requirements Language 1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
2. Introduction 2. Introduction
skipping to change at page 5, line 39 skipping to change at page 5, line 39
deployable in the current Internet. If needed, new fairness criteria deployable in the current Internet. If needed, new fairness criteria
can be implemented by the same algorithm we propose by appropriately can be implemented by the same algorithm we propose by appropriately
scaling the overall aggressiveness. scaling the overall aggressiveness.
It is intended that the algorithm presented here can be applied to It is intended that the algorithm presented here can be applied to
other multipath transport protocols, such as alternative multipath other multipath transport protocols, such as alternative multipath
extensions to TCP, or indeed any other congestion-aware transport extensions to TCP, or indeed any other congestion-aware transport
protocols. However, for the purposes of example this document will, protocols. However, for the purposes of example this document will,
where appropriate, refer to the MPTCP protocol. where appropriate, refer to the MPTCP protocol.
It is foreseeable that different congestion controllers will be The design decisions and evaluation of the congestion control
implemented for Multipath transport, each aiming to achieve different algorithm are published in [NSDI].
properties in the resource pooling/fairness/stability design space.
In particular, solutions that give better resource pooling may be The algorithm presented here only extends TCP New Reno congestion
proposed. This algorithm is conservative from this point of view, control for multipath operation. It is foreseeable that other
sacrificing resource pooling for stability. congestion controllers will be implemented for multipath transport to
achieve the bandwidth-scaling properties of the newer congestion
control algorithms for regular TCP (such as Compound TCP and Cubic).
3. Coupled Congestion Control Algorithm 3. Coupled Congestion Control Algorithm
The algorithm we present only applies to the increase phase of the The algorithm we present only applies to the increase phase of the
congestion avoidance state specifying how the window inflates upon congestion avoidance state specifying how the window inflates upon
receiving an ack. The slow start, fast retransmit, and fast recovery receiving an ack. The slow start, fast retransmit, and fast recovery
algorithms, as well as the multiplicative decrease of the congestion algorithms, as well as the multiplicative decrease of the congestion
avoidance state are the same as in TCP [RFC5681]. avoidance state are the same as in TCP [RFC5681].
Let cwnd_i be the congestion window on the subflow i. Let tot_cwnd Let cwnd_i be the congestion window on the subflow i. Let tot_cwnd
skipping to change at page 7, line 9 skipping to change at page 7, line 10
We assume appropriate byte counting (ABC, [RFC3465]) is used, hence We assume appropriate byte counting (ABC, [RFC3465]) is used, hence
the bytes_acked variable records the number of bytes newly the bytes_acked variable records the number of bytes newly
acknowledged. If ABC is not used, bytes_acked SHOULD be set to acknowledged. If ABC is not used, bytes_acked SHOULD be set to
mss_i. mss_i.
To compute tot_cwnd, it is an easy mistake to sum up cwnd_i across To compute tot_cwnd, it is an easy mistake to sum up cwnd_i across
all subflows: when a flow is in fast retransmit, its cwnd is all subflows: when a flow is in fast retransmit, its cwnd is
typically inflated and no longer represents the real congestion typically inflated and no longer represents the real congestion
window. The correct behavior is to use the ssthresh value for flows window. The correct behavior is to use the ssthresh value for flows
in fast retransmit whe computing tot_cwnd. To cater for connections in fast retransmit when computing tot_cwnd. To cater for connections
that are app limited, the computation should consider the minimum that are app limited, the computation should consider the minimum
between flight_size_i and cwnd_i, and flight_size_i and ssthresh_i between flight_size_i and cwnd_i, and flight_size_i and ssthresh_i
where appropriate. where appropriate.
The total throughput of a multipath flow depends on the value of The total throughput of a multipath flow depends on the value of
alpha and the loss rates, maximum segment sizes and round trip times alpha and the loss rates, maximum segment sizes and round trip times
of its paths. Since we require that the total throughput is no worse of its paths. Since we require that the total throughput is no worse
than the throughput a single TCP would get on the best path, it is than the throughput a single TCP would get on the best path, it is
impossible to choose a-priori a single value of alpha that achieves impossible to choose a-priori a single value of alpha that achieves
the desired throughput in every ocasion. Hence, alpha must be the desired throughput in every ocasion. Hence, alpha must be
skipping to change at page 8, line 12 skipping to change at page 8, line 13
* tot_cwnd instead of tot_cwnd, and do all the operations in integer * tot_cwnd instead of tot_cwnd, and do all the operations in integer
arithmetic. arithmetic.
Then, scale down the increase per ack by alpha_scale. The algorithm Then, scale down the increase per ack by alpha_scale. The algorithm
is: is:
o For each ack received on subflow i, increase cwnd_i by min ( o For each ack received on subflow i, increase cwnd_i by min (
alpha*bytes_acked*mss_i/tot_cwnd/alpha_scale , bytes_acked*mss_i/ alpha*bytes_acked*mss_i/tot_cwnd/alpha_scale , bytes_acked*mss_i/
cwnd_i ) cwnd_i )
Observe that the error in computing the numerator or the denominator Observe that the errors in computing the numerator or the denominator
in the formula for alpha are quite small, as both the mss and cwnd in the formula for alpha are quite small, as both the mss and cwnd
are typically much larger than the RTT (measured in ms). Then, alpha are typically much larger than the RTT (measured in ms). Then, alpha
scale denotes the precision we want for computing alpha. scale denotes the precision we want for computing alpha. This is not
true when the stack maintains cwnd in packets rather than bytes; we
describe an appropriate scaling mechanism for packet-based stacks in
the next section.
With these two changes, all the operations can now be done using With these two changes, all the operations can now be done using
integer arithmetic. We propose alpha_scale be a small power of two, integer arithmetic. We propose alpha_scale be a small power of two,
to allow using faster shift operations instead of multiplication and to allow using faster shift operations instead of multiplication and
division. Our experiments show that using alpha_scale=512 works well division. Our experiments show that using alpha_scale=512 works well
in a wide range of scenarios. Increasing alpha_scale increases in a wide range of scenarios. Increasing alpha_scale increases
precision, but also increases the risk of overflow when computing precision, but also increases the risk of overflow when computing
alpha. Using 64bit operations would solve this issue. Another alpha. Using 64bit operations would solve this issue. Another
option is to dynamically adjust alpha_scale when computing alpha; in option is to dynamically adjust alpha_scale when computing alpha; in
this way we avoid overflow and obtain maximum precision. this way we avoid overflow and obtain maximum precision.
skipping to change at page 9, line 31 skipping to change at page 9, line 35
for multipath congestion control is a simple extension of the TCP New for multipath congestion control is a simple extension of the TCP New
Reno code. In TCP New Reno cwnd_cnt is an additional state variable Reno code. In TCP New Reno cwnd_cnt is an additional state variable
that tracks the number of bytes acked since the last cwnd increment; that tracks the number of bytes acked since the last cwnd increment;
cwnd is incremented only when cwnd_cnt > cwnd; then cwnd_cnt is set cwnd is incremented only when cwnd_cnt > cwnd; then cwnd_cnt is set
to 0. to 0.
In the multipath case, cwnd_cnt_i is maintained for each subflow as In the multipath case, cwnd_cnt_i is maintained for each subflow as
above, and cwnd_i is increased by 1 when cwnd_cnt_i > alpha_scale * above, and cwnd_i is increased by 1 when cwnd_cnt_i > alpha_scale *
tot_cwnd / alpha . tot_cwnd / alpha .
Computing alpha is a bit different for packet-based stacks. The
errors in computing the terms in the denominator are larger, because
cwnd is much smaller and mss and rtt may have the same order of
magnitude. Let max be the index of the subflow used in the
numerator. To reduce errors, it is easiest to move rtt_max (once
calculated) from the numerator to the denominator, obtaining the
equivalent formula below.
2
cwnd_max * mss_max
alpha = alpha_scale * tot_cwnd * -------------------------------
/ rtt_max * cwnd_i * mss_i \ 2
| sum -------------------------|
\ i rtt_i /
5. Discussion 5. Discussion
To achieve perfect resource pooling, one must couple both increase To achieve perfect resource pooling, one must couple both increase
and decrease of congestion windows across subflows, as in [KELLY]. and decrease of congestion windows across subflows, as in [KELLY].
Yet this tends to exhibit "flappiness": when the paths have similar Yet this tends to exhibit "flappiness": when the paths have similar
levels of congestion, the congestion controller will tend to allocate levels of congestion, the congestion controller will tend to allocate
all the window to one random subflow, and allocate zero window to the all the window to one random subflow, and allocate zero window to the
other subflows. The controller will perform random flips between other subflows. The controller will perform random flips between
these stable points. This doesn't seem desirable in general, and is these stable points. This doesn't seem desirable in general, and is
particularly bad when the achieved rates depend on the RTT (as in the particularly bad when the achieved rates depend on the RTT (as in the
skipping to change at page 10, line 47 skipping to change at page 11, line 25
Ford, A., Raiciu, C., Handley, M., and S. Barre, "TCP Ford, A., Raiciu, C., Handley, M., and S. Barre, "TCP
Extensions for Multipath Operation with Multiple Extensions for Multipath Operation with Multiple
Addresses", draft-ford-mptcp-multiaddressed-01 (work in Addresses", draft-ford-mptcp-multiaddressed-01 (work in
progress), July 2009. progress), July 2009.
[KELLY] Kelly, F. and T. Voice, "Stability of end-to-end [KELLY] Kelly, F. and T. Voice, "Stability of end-to-end
algorithms for joint routing and rate control", ACM algorithms for joint routing and rate control", ACM
SIGCOMM CCR vol. 35 num. 2, pp. 5-12, 2005, SIGCOMM CCR vol. 35 num. 2, pp. 5-12, 2005,
<http://portal.acm.org/citation.cfm?id=1064415>. <http://portal.acm.org/citation.cfm?id=1064415>.
[NSDI] Wischik, D., Raiciu, C., Greenhalgh, A., and M. Handley,
"Design, Implementation and Evaluation of Congestion
Control for Multipath TCP", Usenix NSDI , March 2011,
<http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>.
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981. RFC 793, September 1981.
[RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte
Counting (ABC)", RFC 3465, February 2003. Counting (ABC)", RFC 3465, February 2003.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009. Control", RFC 5681, September 2009.
[WISCHIK] Wischik, D., Handley, M., and M. Bagnulo Braun, "The [WISCHIK] Wischik, D., Handley, M., and M. Bagnulo Braun, "The
 End of changes. 15 change blocks. 
24 lines changed or deleted 55 lines changed or added

This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/