draft-ietf-mptcp-congestion-01.txt   draft-ietf-mptcp-congestion-02.txt 
Internet Engineering Task Force C. Raiciu Internet Engineering Task Force C. Raiciu
Internet-Draft M. Handley Internet-Draft M. Handley
Intended status: Experimental D. Wischik Intended status: Experimental D. Wischik
Expires: July 11, 2011 University College London Expires: September 15, 2011 University College London
January 07, 2011 March 14, 2011
Coupled Congestion Control for Multipath Transport Protocols Coupled Congestion Control for Multipath Transport Protocols
draft-ietf-mptcp-congestion-01 draft-ietf-mptcp-congestion-02
Abstract Abstract
Often endpoints are connected by multiple paths, but communications Often endpoints are connected by multiple paths, but communications
are usually restricted to a single path per connection. Resource are usually restricted to a single path per connection. Resource
usage within the network would be more efficient were it possible for usage within the network would be more efficient were it possible for
these multiple paths to be used concurrently. Multipath TCP is a these multiple paths to be used concurrently. Multipath TCP is a
proposal to achieve multipath transport in TCP. proposal to achieve multipath transport in TCP.
New congestion control algorithms are needed for multipath transport New congestion control algorithms are needed for multipath transport
protocols such as Multipath TCP, as single path algorithms have a protocols such as Multipath TCP, as single path algorithms have a
series of issues in the multipath context. One of the prominent series of issues in the multipath context. One of the prominent
problems is that running existing algorithms such as TCP New Reno problems is that running existing algorithms such as TCP New Reno
independently on each path would give the multipath flow more than independently on each path would give the multipath flow more than
its fair share at a bottleneck link traversed by more than one of its its fair share at a bottleneck link traversed by more than one of its
subflows. Further, it is desirable that a source with multiple paths subflows. Further, it is desirable that a source with multiple paths
available will transfer more traffic using the least congested of the available will transfer more traffic using the least congested of the
paths, hence achieving resource pooling. This would increase the paths, hence achieving resource pooling. This would increase the
overall utilization of the network and also its robustness to overall efficiency of the network and also its robustness to failure.
failure.
This document presents a congestion control algorithm which couples This document presents a congestion control algorithm which couples
the congestion control algorithms running on different subflows by the congestion control algorithms running on different subflows by
linking their increase functions, and dynamically controls the linking their increase functions, and dynamically controls the
overall aggresiveness of the multipath flow. The result is a overall aggresiveness of the multipath flow. The result is a
practical algorithm that is fair to TCP at bottlenecks while moving practical algorithm that is fair to TCP at bottlenecks while moving
traffic away from congested links. traffic away from congested links.
Status of this Memo Status of this Memo
skipping to change at page 2, line 14 skipping to change at page 2, line 13
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on July 11, 2011. This Internet-Draft will expire on September 15, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 16 skipping to change at page 3, line 16
1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Coupled Congestion Control Algorithm . . . . . . . . . . . . . 5 3. Coupled Congestion Control Algorithm . . . . . . . . . . . . . 5
4. Implementation Considerations . . . . . . . . . . . . . . . . 7 4. Implementation Considerations . . . . . . . . . . . . . . . . 7
4.1. Implementation Considerations when CWND is Expressed 4.1. Implementation Considerations when CWND is Expressed
in Packets . . . . . . . . . . . . . . . . . . . . . . . . 9 in Packets . . . . . . . . . . . . . . . . . . . . . . . . 9
5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 9.1. Normative References . . . . . . . . . . . . . . . . . . . 11
9.2. Informative References . . . . . . . . . . . . . . . . . . 11 9.2. Informative References . . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12
1. Requirements Language 1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
2. Introduction 2. Introduction
Multipath TCP (MPTCP, [I-D.ford-mptcp-multiaddressed]) is a set of Multipath TCP (MPTCP, [I-D.ford-mptcp-multiaddressed]) is a set of
skipping to change at page 7, line 26 skipping to change at page 7, line 26
alpha and the loss rates, maximum segment sizes and round trip times alpha and the loss rates, maximum segment sizes and round trip times
of its paths. Since we require that the total throughput is no worse of its paths. Since we require that the total throughput is no worse
than the throughput a single TCP would get on the best path, it is than the throughput a single TCP would get on the best path, it is
impossible to choose a-priori a single value of alpha that achieves impossible to choose a-priori a single value of alpha that achieves
the desired throughput in every ocasion. Hence, alpha must be the desired throughput in every ocasion. Hence, alpha must be
computed for each multipath flow, based on the observed properties of computed for each multipath flow, based on the observed properties of
the paths. the paths.
The formula to compute alpha is: The formula to compute alpha is:
2 cwnd_i
cwnd_i * mss_i max --------
max --------------- i 2
i 2 rtt_i
rtt_i alpha = tot_cwnd * ----------------
alpha = tot_cwnd * ------------------------- / cwnd_i \ 2
/ cwnd_i * mss_i \ 2 | sum ---------|
| sum ----------------| \ i rtt_i /
\ i rtt_i /
The formula is derived by equalizing the rate of the multipath flow The formula is derived by equalizing the rate of the multipath flow
with the rate of a TCP running on the best path, and solving for with the rate of a TCP running on the best path, and solving for
alpha. alpha.
4. Implementation Considerations 4. Implementation Considerations
The formula for alpha above implies that alpha is a floating point The formula for alpha above implies that alpha is a floating point
value. This would require performing costly floating point value. This would require performing costly floating point
operations whenever an ACK is received, Further, in many kernels operations whenever an ACK is received, Further, in many kernels
skipping to change at page 8, line 13 skipping to change at page 8, line 12
* tot_cwnd instead of tot_cwnd, and do all the operations in integer * tot_cwnd instead of tot_cwnd, and do all the operations in integer
arithmetic. arithmetic.
Then, scale down the increase per ack by alpha_scale. The algorithm Then, scale down the increase per ack by alpha_scale. The algorithm
is: is:
o For each ack received on subflow i, increase cwnd_i by min ( o For each ack received on subflow i, increase cwnd_i by min (
alpha*bytes_acked*mss_i/tot_cwnd/alpha_scale , bytes_acked*mss_i/ alpha*bytes_acked*mss_i/tot_cwnd/alpha_scale , bytes_acked*mss_i/
cwnd_i ) cwnd_i )
Alpha scale denotes the precision we want for computing alpha.
Observe that the errors in computing the numerator or the denominator Observe that the errors in computing the numerator or the denominator
in the formula for alpha are quite small, as both the mss and cwnd in the formula for alpha are quite small, as the cwnd in bytes is
are typically much larger than the RTT (measured in ms). Then, alpha typically much larger than the RTT (measured in ms).
scale denotes the precision we want for computing alpha. This is not
true when the stack maintains cwnd in packets rather than bytes; we
describe an appropriate scaling mechanism for packet-based stacks in
the next section.
With these two changes, all the operations can now be done using With these changes, all the operations can be done using integer
integer arithmetic. We propose alpha_scale be a small power of two, arithmetic. We propose alpha_scale be a small power of two, to allow
to allow using faster shift operations instead of multiplication and using faster shift operations instead of multiplication and division.
division. Our experiments show that using alpha_scale=512 works well Our experiments show that using alpha_scale=512 works well in a wide
in a wide range of scenarios. Increasing alpha_scale increases range of scenarios. Increasing alpha_scale increases precision, but
precision, but also increases the risk of overflow when computing also increases the risk of overflow when computing alpha. Using
alpha. Using 64bit operations would solve this issue. Another 64bit operations would solve this issue. Another option is to
option is to dynamically adjust alpha_scale when computing alpha; in dynamically adjust alpha_scale when computing alpha; in this way we
this way we avoid overflow and obtain maximum precision. avoid overflow and obtain maximum precision.
It is possible to implement our algorithm by calculating tot_cwnd on It is possible to implement the algorithm by calculating tot_cwnd on
each ack, however this would be costly especially when the number of each ack, however this would be costly especially when the number of
subflows is large. To avoid this overhead the implementation MAY subflows is large. To avoid this overhead the implementation MAY
choose to maintain a new per connection state variable called choose to maintain a new per connection state variable called
tot_cwnd. If it does so, the implementation will update tot_cwnd tot_cwnd. If it does so, the implementation will update tot_cwnd
value whenever the individual subflows' windows are updated. value whenever the individual subflows' windows are updated.
Updating only requires one more addition or subtraction operation Updating only requires one more addition or subtraction operation
compared to the regular, per subflow congestion control code, so its compared to the regular, per subflow congestion control code, so its
performance impact should be minimal. performance impact should be minimal.
Computing alpha per ack is also costly. We propose alpha be a per Computing alpha per ack is also costly. We propose alpha be a per
skipping to change at page 9, line 22 skipping to change at page 9, line 18
averaged once per sawtooth into rtt_avg_i. This sampling ensures averaged once per sawtooth into rtt_avg_i. This sampling ensures
that there is no sampling bias for larger windows. that there is no sampling bias for larger windows.
Given tot_cwnd and alpha, the congestion control algorithm is run for Given tot_cwnd and alpha, the congestion control algorithm is run for
each subflow independently, with similar complexity to the standard each subflow independently, with similar complexity to the standard
TCP increase code [RFC5681]. TCP increase code [RFC5681].
4.1. Implementation Considerations when CWND is Expressed in Packets 4.1. Implementation Considerations when CWND is Expressed in Packets
When the congestion control algorithm maintains cwnd in packets When the congestion control algorithm maintains cwnd in packets
rather than bytes, the code to compute tot_cwnd remains unchanged. rather than bytes, the algorithms above must change to take into
account path mss.
To compute the increase when an ack is received, the implementation To compute the increase when an ack is received, the implementation
for multipath congestion control is a simple extension of the TCP New for multipath congestion control is a simple extension of the TCP New
Reno code. In TCP New Reno cwnd_cnt is an additional state variable Reno code. In TCP New Reno cwnd_cnt is an additional state variable
that tracks the number of bytes acked since the last cwnd increment; that tracks the number of bytes acked since the last cwnd increment;
cwnd is incremented only when cwnd_cnt > cwnd; then cwnd_cnt is set cwnd is incremented only when cwnd_cnt > cwnd; then cwnd_cnt is set
to 0. to 0.
In the multipath case, cwnd_cnt_i is maintained for each subflow as In the multipath case, cwnd_cnt_i is maintained for each subflow as
above, and cwnd_i is increased by 1 when cwnd_cnt_i > alpha_scale * above, and cwnd_i is increased by 1 when cwnd_cnt_i > alpha_scale *
tot_cwnd / alpha . tot_cwnd / alpha.
Computing alpha is a bit different for packet-based stacks. The When computing alpha for packet-based stacks, the errors in computing
errors in computing the terms in the denominator are larger, because the terms in the denominator are larger (this is because cwnd is much
cwnd is much smaller and mss and rtt may have the same order of smaller and rtt may be comparatively large). Let max be the index of
magnitude. Let max be the index of the subflow used in the the subflow used in the numerator. To reduce errors, it is easiest
numerator. To reduce errors, it is easiest to move rtt_max (once to move rtt_max (once calculated) from the numerator to the
calculated) from the numerator to the denominator, obtaining the denominator, obtaining the equivalent formula below.
equivalent formula below.
2 cwnd_max
cwnd_max * mss_max alpha = alpha_scale * tot_cwnd * -----------------------
alpha = alpha_scale * tot_cwnd * ------------------------------- / rtt_max * cwnd_i \ 2
/ rtt_max * cwnd_i * mss_i \ 2 | sum -----------------|
| sum -------------------------| \ i rtt_i /
\ i rtt_i /
Note that the formula for computing alpha does not take into account
path mss, and is the same for stacks that keep cwnd in bytes or
packets. With this formula, the algorithm for computing alpha will
match the rate of TCP on the best path in B/s for byte-oriented
stacks, and in packets/s in packet-based stacks. In practice, mss
rarely changes between paths so this shouldn't be a problem.
However, it is simple to derive formulae allowing packet-based stacks
to achieve byte rate fairness (and viceversa) if needed. In
particular, for packet-based stacks wanting byte-rate fairness, the
formula above changes as follows: cwnd_max is replaced by cwnd_max *
mss_max * mss_max, while cwnd_i is replaced with cwnd_i * mss_i.
5. Discussion 5. Discussion
To achieve perfect resource pooling, one must couple both increase To achieve perfect resource pooling, one must couple both increase
and decrease of congestion windows across subflows, as in [KELLY]. and decrease of congestion windows across subflows, as in [KELLY].
Yet this tends to exhibit "flappiness": when the paths have similar Yet this tends to exhibit "flappiness": when the paths have similar
levels of congestion, the congestion controller will tend to allocate levels of congestion, the congestion controller will tend to allocate
all the window to one random subflow, and allocate zero window to the all the window to one random subflow, and allocate zero window to the
other subflows. The controller will perform random flips between other subflows. The controller will perform random flips between
these stable points. This doesn't seem desirable in general, and is these stable points. This doesn't seem desirable in general, and is
skipping to change at page 10, line 38 skipping to change at page 10, line 44
6. Security Considerations 6. Security Considerations
None. None.
Detailed security analysis for the Multipath TCP protocol itself is Detailed security analysis for the Multipath TCP protocol itself is
included in [I-D.ford-mptcp-multiaddressed] and [REF] included in [I-D.ford-mptcp-multiaddressed] and [REF]
7. Acknowledgements 7. Acknowledgements
The authors are supported by Trilogy We thank Christoph Paasch for his suggestions for computing alpha in
packet-based stacks. The authors are supported by Trilogy
(http://www.trilogy-project.org), a research project (ICT-216372) (http://www.trilogy-project.org), a research project (ICT-216372)
partially funded by the European Community under its Seventh partially funded by the European Community under its Seventh
Framework Program. The views expressed here are those of the Framework Program. The views expressed here are those of the
author(s) only. The European Commission is not liable for any use author(s) only. The European Commission is not liable for any use
that may be made of the information in this document. that may be made of the information in this document.
8. IANA Considerations 8. IANA Considerations
None. None.
skipping to change at page 11, line 27 skipping to change at page 11, line 32
Addresses", draft-ford-mptcp-multiaddressed-01 (work in Addresses", draft-ford-mptcp-multiaddressed-01 (work in
progress), July 2009. progress), July 2009.
[KELLY] Kelly, F. and T. Voice, "Stability of end-to-end [KELLY] Kelly, F. and T. Voice, "Stability of end-to-end
algorithms for joint routing and rate control", ACM algorithms for joint routing and rate control", ACM
SIGCOMM CCR vol. 35 num. 2, pp. 5-12, 2005, SIGCOMM CCR vol. 35 num. 2, pp. 5-12, 2005,
<http://portal.acm.org/citation.cfm?id=1064415>. <http://portal.acm.org/citation.cfm?id=1064415>.
[NSDI] Wischik, D., Raiciu, C., Greenhalgh, A., and M. Handley, [NSDI] Wischik, D., Raiciu, C., Greenhalgh, A., and M. Handley,
"Design, Implementation and Evaluation of Congestion "Design, Implementation and Evaluation of Congestion
Control for Multipath TCP", Usenix NSDI , March 2011, Control for Multipath TCP", Usenix NSDI , March 2011, <htt
<http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>. p://www.cs.ucl.ac.uk/staff/c.raiciu/files/mptcp-nsdi.pdf>.
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981. RFC 793, September 1981.
[RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte
Counting (ABC)", RFC 3465, February 2003. Counting (ABC)", RFC 3465, February 2003.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009. Control", RFC 5681, September 2009.
 End of changes. 17 change blocks. 
51 lines changed or deleted 59 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/