draft-ietf-mptcp-congestion-02.txt | draft-ietf-mptcp-congestion-03.txt | |||
---|---|---|---|---|

Internet Engineering Task Force C. Raiciu | Internet Engineering Task Force C. Raiciu | |||

Internet-Draft M. Handley | Internet-Draft M. Handley | |||

Intended status: Experimental D. Wischik | Intended status: Experimental D. Wischik | |||

Expires: September 15, 2011 University College London | Expires: October 13, 2011 University College London | |||

March 14, 2011 | April 11, 2011 | |||

Coupled Congestion Control for Multipath Transport Protocols | Coupled Congestion Control for Multipath Transport Protocols | |||

draft-ietf-mptcp-congestion-02 | draft-ietf-mptcp-congestion-03 | |||

Abstract | Abstract | |||

Often endpoints are connected by multiple paths, but communications | Often endpoints are connected by multiple paths, but communications | |||

are usually restricted to a single path per connection. Resource | are usually restricted to a single path per connection. Resource | |||

usage within the network would be more efficient were it possible for | usage within the network would be more efficient were it possible for | |||

these multiple paths to be used concurrently. Multipath TCP is a | these multiple paths to be used concurrently. Multipath TCP is a | |||

proposal to achieve multipath transport in TCP. | proposal to achieve multipath transport in TCP. | |||

New congestion control algorithms are needed for multipath transport | New congestion control algorithms are needed for multipath transport | |||

skipping to change at page 2, line 13 | skipping to change at page 2, line 13 | |||

and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||

time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||

material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||

The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||

http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||

The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||

http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||

This Internet-Draft will expire on September 15, 2011. | This Internet-Draft will expire on October 13, 2011. | |||

Copyright Notice | Copyright Notice | |||

Copyright (c) 2011 IETF Trust and the persons identified as the | Copyright (c) 2011 IETF Trust and the persons identified as the | |||

document authors. All rights reserved. | document authors. All rights reserved. | |||

This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||

Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||

(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||

publication of this document. Please review these documents | publication of this document. Please review these documents | |||

carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||

to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||

include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||

the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||

described in the BSD License. | described in the BSD License. | |||

Table of Contents | Table of Contents | |||

1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 | 1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 | |||

2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||

3. Coupled Congestion Control Algorithm . . . . . . . . . . . . . 5 | 3. Coupled Congestion Control Algorithm . . . . . . . . . . . . . 6 | |||

4. Implementation Considerations . . . . . . . . . . . . . . . . 7 | 4. Implementation Considerations . . . . . . . . . . . . . . . . 7 | |||

4.1. Implementation Considerations when CWND is Expressed | 4.1. Implementation Considerations when CWND is Expressed | |||

in Packets . . . . . . . . . . . . . . . . . . . . . . . . 9 | in Packets . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||

5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | |||

6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | |||

7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 | 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 | |||

8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | |||

9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||

9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 | 9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 | |||

9.2. Informative References . . . . . . . . . . . . . . . . . . 11 | 9.2. Informative References . . . . . . . . . . . . . . . . . . 11 | |||

skipping to change at page 4, line 35 | skipping to change at page 4, line 35 | |||

Bottleneck fairness is just one requirement multipath congestion | Bottleneck fairness is just one requirement multipath congestion | |||

control should meet. The following three goals capture the desirable | control should meet. The following three goals capture the desirable | |||

properties of a practical multipath congestion control algorithm: | properties of a practical multipath congestion control algorithm: | |||

o Goal 1 (Improve Throughput) A multipath flow should perform at | o Goal 1 (Improve Throughput) A multipath flow should perform at | |||

least as well as a single path flow would on the best of the paths | least as well as a single path flow would on the best of the paths | |||

available to it. | available to it. | |||

o Goal 2 (Do no harm) A multipath flow should not take up more | o Goal 2 (Do no harm) A multipath flow should not take up more | |||

capacity on any one of its paths than if it was a single path flow | capacity from any of the resources shared by its different paths, | |||

using only that route. This guarantees it will not unduly harm | than if it was a single flow using only one of these paths. This | |||

other flows. | guarantees it will not unduly harm other flows. | |||

o Goal 3 (Balance congestion) A multipath flow should move as much | o Goal 3 (Balance congestion) A multipath flow should move as much | |||

traffic as possible off its most congested paths, subject to | traffic as possible off its most congested paths, subject to | |||

meeting the first two goals. | meeting the first two goals. | |||

Goals 1 and 2 together ensure fairness at the bottleneck. Goal 3 | Goals 1 and 2 together ensure fairness at the bottleneck. Goal 3 | |||

captures the concept of resource pooling [WISCHIK]: if each multipath | captures the concept of resource pooling [WISCHIK]: if each multipath | |||

flow sends more data through its least congested path, the traffic in | flow sends more data through its least congested path, the traffic in | |||

the network will move away from congested areas. This improves | the network will move away from congested areas. This improves | |||

robustness and overall throughput, among other things. The way to | robustness and overall throughput, among other things. The way to | |||

skipping to change at page 5, line 13 | skipping to change at page 5, line 13 | |||

We propose an algorithm that couples only the additive increase | We propose an algorithm that couples only the additive increase | |||

function of the subflows, and uses unmodified TCP New Reno behavior | function of the subflows, and uses unmodified TCP New Reno behavior | |||

in case of a drop. The algorithm relies on the traditional TCP | in case of a drop. The algorithm relies on the traditional TCP | |||

mechanisms to detect drops, to retransmit data, etc. | mechanisms to detect drops, to retransmit data, etc. | |||

Detecting shared bottlenecks reliably is quite difficult, but is just | Detecting shared bottlenecks reliably is quite difficult, but is just | |||

one part of a bigger question. This bigger question is how much | one part of a bigger question. This bigger question is how much | |||

bandwidth a multipath user should use in total, even if there is no | bandwidth a multipath user should use in total, even if there is no | |||

shared bottleneck. | shared bottleneck. | |||

Our solution sets the multipath flow's aggregate bandwidth to be the | The congestion controller aims to set the multipath flow's aggregate | |||

same bandwidth a regular TCP flow would get on the best path | bandwidth to be the same as a regular TCP flow would get on the best | |||

available to the multipath flow. To estimate the bandwidth of a | path available to the multipath flow. To estimate the bandwidth of a | |||

regular TCP flow, the multipath flow estimates loss rates and round | regular TCP flow, the multipath flow estimates loss rates and round | |||

trip times and computes the target rate. Then it adjusts the overall | trip times and computes the target rate. Then it adjusts the overall | |||

aggresiveness (parameter alpha) to achieve the desired rate. | aggresiveness (parameter alpha) to achieve the desired rate. | |||

We note that in cases with low statistical multiplexing (where the | While the mechanism above applies always, its effect depends on | |||

multipath flow influences the loss rates on the path) the multipath | whether the multipath TCP flow influences or does not influence the | |||

throughput will be strictly higher than a single TCP would get on any | link loss rates (high vs. low statistical multiplexing). If MPTCP | |||

of the paths. In particular, if using two idle paths, multipath | does not influence link loss rates, MPTCP will get the same | |||

throughput will be sum of the two paths' throughput. | throughput as TCP on the best path. In cases with low statistical | |||

multiplexing, where the multipath flow influences the loss rates on | ||||

the path, the multipath throughput will be strictly higher than a | ||||

single TCP would get on any of the paths. In particular, if using | ||||

two idle paths, multipath throughput will be sum of the two paths' | ||||

throughput. | ||||

This algorithm ensures bottleneck fairness and fairness in the | This algorithm ensures bottleneck fairness and fairness in the | |||

broader, network sense. We acknowledge that current TCP fairness | broader, network sense. We acknowledge that current TCP fairness | |||

criteria are far from ideal, but a multipath TCP needs to be | criteria are far from ideal, but a multipath TCP needs to be | |||

deployable in the current Internet. If needed, new fairness criteria | deployable in the current Internet. If needed, new fairness criteria | |||

can be implemented by the same algorithm we propose by appropriately | can be implemented by the same algorithm we propose by appropriately | |||

scaling the overall aggressiveness. | scaling the overall aggressiveness. | |||

It is intended that the algorithm presented here can be applied to | It is intended that the algorithm presented here can be applied to | |||

other multipath transport protocols, such as alternative multipath | other multipath transport protocols, such as alternative multipath | |||

skipping to change at page 6, line 35 | skipping to change at page 6, line 39 | |||

The increase formula takes the minimum between the computed increase | The increase formula takes the minimum between the computed increase | |||

for the multipath subflow (first argument to min), and the increase | for the multipath subflow (first argument to min), and the increase | |||

TCP would get in the same scenario (the second argument). In this | TCP would get in the same scenario (the second argument). In this | |||

way, we ensure that any multipath subflow cannot be more aggressive | way, we ensure that any multipath subflow cannot be more aggressive | |||

than a TCP flow in the same circumstances, hence achieving goal 2 (do | than a TCP flow in the same circumstances, hence achieving goal 2 (do | |||

no harm). | no harm). | |||

"alpha" is a parameter of the algorithm that describes the | "alpha" is a parameter of the algorithm that describes the | |||

aggresiveness of the multipath flow. To meet Goal 1 (improve | aggresiveness of the multipath flow. To meet Goal 1 (improve | |||

throughput), the value of alpha is chosen such that the aggregate | throughput), the value of alpha is chosen such that the aggregate | |||

throughput of the multipath flow is equal to the rate a TCP flow | throughput of the multipath flow is equal to the throughput a TCP | |||

would get if it ran on the best path. | flow would get if it ran on the best path. | |||

To get an intuition of what the algorithm is trying to do, let's take | To get an intuition of what the algorithm is trying to do, let's take | |||

the case where all the subflows have the same round trip time and | the case where all the subflows have the same round trip time and | |||

MSS. In this case the algorithm will grow the total window by | MSS. In this case the algorithm will grow the total window by | |||

approximately alpha*MSS per RTT. This increase is distributed to the | approximately alpha*MSS per RTT. This increase is distributed to the | |||

individual flows according to their instantaneous window size. | individual flows according to their instantaneous window size. | |||

Subflow i will increase by alpha*cwnd_i/tot_cwnd segments per RTT. | Subflow i will increase by alpha*cwnd_i/tot_cwnd segments per RTT. | |||

Note that, as in standard TCP, when tot_cwnd is large the increase | Note that, as in standard TCP, when tot_cwnd is large the increase | |||

may be 0. In this case the increase MUST be set to 1. We discuss | may be 0. In this case the increase MUST be set to 1. We discuss | |||

skipping to change at page 7, line 21 | skipping to change at page 7, line 25 | |||

that are app limited, the computation should consider the minimum | that are app limited, the computation should consider the minimum | |||

between flight_size_i and cwnd_i, and flight_size_i and ssthresh_i | between flight_size_i and cwnd_i, and flight_size_i and ssthresh_i | |||

where appropriate. | where appropriate. | |||

The total throughput of a multipath flow depends on the value of | The total throughput of a multipath flow depends on the value of | |||

alpha and the loss rates, maximum segment sizes and round trip times | alpha and the loss rates, maximum segment sizes and round trip times | |||

of its paths. Since we require that the total throughput is no worse | of its paths. Since we require that the total throughput is no worse | |||

than the throughput a single TCP would get on the best path, it is | than the throughput a single TCP would get on the best path, it is | |||

impossible to choose a-priori a single value of alpha that achieves | impossible to choose a-priori a single value of alpha that achieves | |||

the desired throughput in every ocasion. Hence, alpha must be | the desired throughput in every ocasion. Hence, alpha must be | |||

computed for each multipath flow, based on the observed properties of | computed based on the observed properties of the paths. | |||

the paths. | ||||

The formula to compute alpha is: | The formula to compute alpha is: | |||

cwnd_i | cwnd_i | |||

max -------- | max -------- | |||

i 2 | i 2 | |||

rtt_i | rtt_i | |||

alpha = tot_cwnd * ---------------- | alpha = tot_cwnd * ---------------- | |||

/ cwnd_i \ 2 | / cwnd_i \ 2 | |||

| sum ---------| | | sum ---------| | |||

skipping to change at page 8, line 9 | skipping to change at page 8, line 11 | |||

approximate the above calculations using integer arithmetic. | approximate the above calculations using integer arithmetic. | |||

Let alpha_scale be an integer. When computing alpha, use alpha_scale | Let alpha_scale be an integer. When computing alpha, use alpha_scale | |||

* tot_cwnd instead of tot_cwnd, and do all the operations in integer | * tot_cwnd instead of tot_cwnd, and do all the operations in integer | |||

arithmetic. | arithmetic. | |||

Then, scale down the increase per ack by alpha_scale. The algorithm | Then, scale down the increase per ack by alpha_scale. The algorithm | |||

is: | is: | |||

o For each ack received on subflow i, increase cwnd_i by min ( | o For each ack received on subflow i, increase cwnd_i by min ( | |||

alpha*bytes_acked*mss_i/tot_cwnd/alpha_scale , bytes_acked*mss_i/ | (alpha*bytes_acked*mss_i/tot_cwnd)/alpha_scale , | |||

cwnd_i ) | bytes_acked*mss_i/cwnd_i ) | |||

Alpha scale denotes the precision we want for computing alpha. | Alpha scale denotes the precision we want for computing alpha. | |||

Observe that the errors in computing the numerator or the denominator | Observe that the errors in computing the numerator or the denominator | |||

in the formula for alpha are quite small, as the cwnd in bytes is | in the formula for alpha are quite small, as the cwnd in bytes is | |||

typically much larger than the RTT (measured in ms). | typically much larger than the RTT (measured in ms). | |||

With these changes, all the operations can be done using integer | With these changes, all the operations can be done using integer | |||

arithmetic. We propose alpha_scale be a small power of two, to allow | arithmetic. We propose alpha_scale be a small power of two, to allow | |||

using faster shift operations instead of multiplication and division. | using faster shift operations instead of multiplication and division. | |||

Our experiments show that using alpha_scale=512 works well in a wide | Our experiments show that using alpha_scale=512 works well in a wide | |||

skipping to change at page 9, line 24 | skipping to change at page 9, line 26 | |||

4.1. Implementation Considerations when CWND is Expressed in Packets | 4.1. Implementation Considerations when CWND is Expressed in Packets | |||

When the congestion control algorithm maintains cwnd in packets | When the congestion control algorithm maintains cwnd in packets | |||

rather than bytes, the algorithms above must change to take into | rather than bytes, the algorithms above must change to take into | |||

account path mss. | account path mss. | |||

To compute the increase when an ack is received, the implementation | To compute the increase when an ack is received, the implementation | |||

for multipath congestion control is a simple extension of the TCP New | for multipath congestion control is a simple extension of the TCP New | |||

Reno code. In TCP New Reno cwnd_cnt is an additional state variable | Reno code. In TCP New Reno cwnd_cnt is an additional state variable | |||

that tracks the number of bytes acked since the last cwnd increment; | that tracks the number of segments acked since the last cwnd | |||

cwnd is incremented only when cwnd_cnt > cwnd; then cwnd_cnt is set | increment; cwnd is incremented only when cwnd_cnt > cwnd; then | |||

to 0. | cwnd_cnt is set to 0. | |||

In the multipath case, cwnd_cnt_i is maintained for each subflow as | In the multipath case, cwnd_cnt_i is maintained for each subflow as | |||

above, and cwnd_i is increased by 1 when cwnd_cnt_i > alpha_scale * | above, and cwnd_i is increased by 1 when cwnd_cnt_i > max(alpha_scale | |||

tot_cwnd / alpha. | * tot_cwnd / alpha, cwnd_i). | |||

When computing alpha for packet-based stacks, the errors in computing | When computing alpha for packet-based stacks, the errors in computing | |||

the terms in the denominator are larger (this is because cwnd is much | the terms in the denominator are larger (this is because cwnd is much | |||

smaller and rtt may be comparatively large). Let max be the index of | smaller and rtt may be comparatively large). Let max be the index of | |||

the subflow used in the numerator. To reduce errors, it is easiest | the subflow used in the numerator. To reduce errors, it is easiest | |||

to move rtt_max (once calculated) from the numerator to the | to move rtt_max (once calculated) from the numerator to the | |||

denominator, obtaining the equivalent formula below. | denominator, obtaining the equivalent formula below. | |||

cwnd_max | cwnd_max | |||

alpha = alpha_scale * tot_cwnd * ----------------------- | alpha = alpha_scale * tot_cwnd * ----------------------- | |||

End of changes. 12 change blocks. | ||||

27 lines changed or deleted | | 31 lines changed or added | ||

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |