draft-ietf-mptcp-congestion-07.txt | rfc6356.txt | |||
---|---|---|---|---|

Internet Engineering Task Force C. Raiciu | Internet Engineering Task Force (IETF) C. Raiciu | |||

Internet-Draft University Politehnica of | Request for Comments: 6356 Univ. Politehnica of Bucharest | |||

Intended status: Experimental Bucharest | Category: Experimental M. Handly | |||

Expires: January 30, 2012 M. Handley | ISSN: 2070-1721 D. Wischik | |||

D. Wischik | Univ. College London | |||

University College London | October 2011 | |||

July 29, 2011 | ||||

Coupled Congestion Control for Multipath Transport Protocols | Coupled Congestion Control for Multipath Transport Protocols | |||

draft-ietf-mptcp-congestion-07 | ||||

Abstract | Abstract | |||

Often endpoints are connected by multiple paths, but communications | Often endpoints are connected by multiple paths, but communications | |||

are usually restricted to a single path per connection. Resource | are usually restricted to a single path per connection. Resource | |||

usage within the network would be more efficient were it possible for | usage within the network would be more efficient were it possible for | |||

these multiple paths to be used concurrently. Multipath TCP is a | these multiple paths to be used concurrently. Multipath TCP is a | |||

proposal to achieve multipath transport in TCP. | proposal to achieve multipath transport in TCP. | |||

New congestion control algorithms are needed for multipath transport | New congestion control algorithms are needed for multipath transport | |||

protocols such as Multipath TCP, as single path algorithms have a | protocols such as Multipath TCP, as single path algorithms have a | |||

series of issues in the multipath context. One of the prominent | series of issues in the multipath context. One of the prominent | |||

problems is that running existing algorithms such as standard TCP | problems is that running existing algorithms such as standard TCP | |||

independently on each path would give the multipath flow more than | independently on each path would give the multipath flow more than | |||

its fair share at a bottleneck link traversed by more than one of its | its fair share at a bottleneck link traversed by more than one of its | |||

subflows. Further, it is desirable that a source with multiple paths | subflows. Further, it is desirable that a source with multiple paths | |||

available will transfer more traffic using the least congested of the | available will transfer more traffic using the least congested of the | |||

paths, achieving a property called resource pooling where a bundle of | paths, achieving a property called "resource pooling" where a bundle | |||

links effectively behaves like one shared link with bigger-capacity. | of links effectively behaves like one shared link with bigger | |||

This would increase the overall efficiency of the network and also | capacity. This would increase the overall efficiency of the network | |||

its robustness to failure. | and also its robustness to failure. | |||

This document presents a congestion control algorithm which couples | This document presents a congestion control algorithm that couples | |||

the congestion control algorithms running on different subflows by | the congestion control algorithms running on different subflows by | |||

linking their increase functions, and dynamically controls the | linking their increase functions, and dynamically controls the | |||

overall aggressiveness of the multipath flow. The result is a | overall aggressiveness of the multipath flow. The result is a | |||

practical algorithm that is fair to TCP at bottlenecks while moving | practical algorithm that is fair to TCP at bottlenecks while moving | |||

traffic away from congested links. | traffic away from congested links. | |||

Status of this Memo | Status of This Memo | |||

This Internet-Draft is submitted in full conformance with the | ||||

provisions of BCP 78 and BCP 79. | ||||

Internet-Drafts are working documents of the Internet Engineering | This document is not an Internet Standards Track specification; it is | |||

Task Force (IETF). Note that other groups may also distribute | published for examination, experimental implementation, and | |||

working documents as Internet-Drafts. The list of current Internet- | evaluation. | |||

Drafts is at http://datatracker.ietf.org/drafts/current/. | ||||

Internet-Drafts are draft documents valid for a maximum of six months | This document defines an Experimental Protocol for the Internet | |||

and may be updated, replaced, or obsoleted by other documents at any | community. This document is a product of the Internet Engineering | |||

time. It is inappropriate to use Internet-Drafts as reference | Task Force (IETF). It represents the consensus of the IETF | |||

material or to cite them other than as "work in progress." | community. It has received public review and has been approved for | |||

publication by the Internet Engineering Steering Group (IESG). Not | ||||

all documents approved by the IESG are a candidate for any level of | ||||

Internet Standard; see Section 2 of RFC 5741. | ||||

This Internet-Draft will expire on January 30, 2012. | Information about the current status of this document, any errata, | |||

and how to provide feedback on it may be obtained at | ||||

http://www.rfc-editor.org/info/rfc6356. | ||||

Copyright Notice | Copyright Notice | |||

Copyright (c) 2011 IETF Trust and the persons identified as the | Copyright (c) 2011 IETF Trust and the persons identified as the | |||

document authors. All rights reserved. | document authors. All rights reserved. | |||

This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||

Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||

(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||

publication of this document. Please review these documents | publication of this document. Please review these documents | |||

carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||

to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||

include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||

the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||

described in the Simplified BSD License. | described in the Simplified BSD License. | |||

Table of Contents | Table of Contents | |||

1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction ....................................................3 | |||

2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Requirements Language ...........................................5 | |||

3. Coupled Congestion Control Algorithm . . . . . . . . . . . . . 6 | 3. Coupled Congestion Control Algorithm ............................5 | |||

4. Implementation Considerations . . . . . . . . . . . . . . . . 7 | 4. Implementation Considerations ...................................7 | |||

4.1. Computing alpha in Practice . . . . . . . . . . . . . . . 8 | 4.1. Computing "alpha" in Practice ..............................7 | |||

4.2. Implementation Considerations when CWND is Expressed | 4.2. Implementation Considerations when CWND is | |||

in Packets . . . . . . . . . . . . . . . . . . . . . . . . 9 | Expressed in Packets .......................................8 | |||

5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 5. Discussion ......................................................9 | |||

6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 | 6. Security Considerations ........................................10 | |||

7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 | 7. Acknowledgements ...............................................11 | |||

8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 | 8. References .....................................................11 | |||

9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 | 8.1. Normative References ......................................11 | |||

9.1. Normative References . . . . . . . . . . . . . . . . . . . 12 | 8.2. Informative References ....................................11 | |||

9.2. Informative References . . . . . . . . . . . . . . . . . . 12 | ||||

Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 | ||||

1. Requirements Language | ||||

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||

"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||

document are to be interpreted as described in RFC 2119 [RFC2119] . | ||||

2. Introduction | 1. Introduction | |||

Multipath TCP (MPTCP, [I-D.ietf-mptcp-multiaddressed]) is a set of | Multipath TCP (MPTCP, [MPTCP-MULTIADDRESSED]) is a set of extensions | |||

extensions to regular TCP [RFC0793] that allows one TCP connection to | to regular TCP [RFC0793] that allows one TCP connection to be spread | |||

be spread across multiple paths. MPTCP distributes load through the | across multiple paths. MPTCP distributes load through the creation | |||

creation of separate "subflows" across potentially disjoint paths. | of separate "subflows" across potentially disjoint paths. | |||

How should congestion control be performed for multipath TCP? First, | How should congestion control be performed for multipath TCP? First, | |||

each subflow must have its own congestion control state (i.e. cwnd) | each subflow must have its own congestion control state (i.e., cwnd) | |||

so that capacity on that path is matched by offered load. The | so that capacity on that path is matched by offered load. The | |||

simplest way to achieve this goal is to simply run standard TCP | simplest way to achieve this goal is to simply run standard TCP | |||

congestion control on each subflow. However this solution is | congestion control on each subflow. However, this solution is | |||

unsatisfactory as it gives the multipath flow an unfair share when | unsatisfactory as it gives the multipath flow an unfair share when | |||

the paths taken by its different subflows share a common bottleneck. | the paths taken by its different subflows share a common bottleneck. | |||

Bottleneck fairness is just one requirement multipath congestion | Bottleneck fairness is just one requirement multipath congestion | |||

control should meet. The following three goals capture the desirable | control should meet. The following three goals capture the desirable | |||

properties of a practical multipath congestion control algorithm: | properties of a practical multipath congestion control algorithm: | |||

o Goal 1 (Improve Throughput) A multipath flow should perform at | o Goal 1 (Improve Throughput) A multipath flow should perform at | |||

least as well as a single path flow would on the best of the paths | least as well as a single path flow would on the best of the paths | |||

available to it. | available to it. | |||

o Goal 2 (Do no harm) A multipath flow should not take up more | o Goal 2 (Do no harm) A multipath flow should not take up more | |||

capacity from any of the resources shared by its different paths, | capacity from any of the resources shared by its different paths | |||

than if it was a single flow using only one of these paths. This | than if it were a single flow using only one of these paths. This | |||

guarantees it will not unduly harm other flows. | guarantees it will not unduly harm other flows. | |||

o Goal 3 (Balance congestion) A multipath flow should move as much | o Goal 3 (Balance congestion) A multipath flow should move as much | |||

traffic as possible off its most congested paths, subject to | traffic as possible off its most congested paths, subject to | |||

meeting the first two goals. | meeting the first two goals. | |||

Goals 1 and 2 together ensure fairness at the bottleneck. Goal 3 | Goals 1 and 2 together ensure fairness at the bottleneck. Goal 3 | |||

captures the concept of resource pooling [WISCHIK]: if each multipath | captures the concept of resource pooling [WISCHIK]: if each multipath | |||

flow sends more data through its least congested path, the traffic in | flow sends more data through its least congested path, the traffic in | |||

the network will move away from congested areas. This improves | the network will move away from congested areas. This improves | |||

skipping to change at page 5, line 14 | skipping to change at page 4, line 11 | |||

of the subflows, and uses unmodified TCP behavior in case of a drop. | of the subflows, and uses unmodified TCP behavior in case of a drop. | |||

The algorithm relies on the traditional TCP mechanisms to detect | The algorithm relies on the traditional TCP mechanisms to detect | |||

drops, to retransmit data, etc. | drops, to retransmit data, etc. | |||

Detecting shared bottlenecks reliably is quite difficult, but is just | Detecting shared bottlenecks reliably is quite difficult, but is just | |||

one part of a bigger question. This bigger question is how much | one part of a bigger question. This bigger question is how much | |||

bandwidth a multipath user should use in total, even if there is no | bandwidth a multipath user should use in total, even if there is no | |||

shared bottleneck. | shared bottleneck. | |||

The congestion controller aims to set the multipath flow's aggregate | The congestion controller aims to set the multipath flow's aggregate | |||

bandwidth to be the same as a regular TCP flow would get on the best | bandwidth to be the same as that of a regular TCP flow would get on | |||

path available to the multipath flow. To estimate the bandwidth of a | the best path available to the multipath flow. To estimate the | |||

regular TCP flow, the multipath flow estimates loss rates and round | bandwidth of a regular TCP flow, the multipath flow estimates loss | |||

trip times and computes the target rate. Then it adjusts the overall | rates and round-trip times (RTTs) and computes the target rate. | |||

aggresiveness (parameter alpha) to achieve the desired rate. | Then, it adjusts the overall aggressiveness (parameter alpha) to | |||

achieve the desired rate. | ||||

While the mechanism above applies always, its effect depends on | While the mechanism above applies always, its effect depends on | |||

whether the multipath TCP flow influences or does not influence the | whether the multipath TCP flow influences or does not influence the | |||

link loss rates (low vs. high statistical multiplexing). If MPTCP | link loss rates (low versus high statistical multiplexing). If MPTCP | |||

does not influence link loss rates, MPTCP will get the same | does not influence link loss rates, MPTCP will get the same | |||

throughput as TCP on the best path. In cases with low statistical | throughput as TCP on the best path. In cases with low statistical | |||

multiplexing, where the multipath flow influences the loss rates on | multiplexing, where the multipath flow influences the loss rates on | |||

the path, the multipath throughput will be strictly higher than a | the path, the multipath throughput will be strictly higher than that | |||

single TCP would get on any of the paths. In particular, if using | a single TCP would get on any of the paths. In particular, if using | |||

two idle paths, multipath throughput will be sum of the two paths' | two idle paths, multipath throughput will be sum of the two paths' | |||

throughput. | throughput. | |||

This algorithm ensures bottleneck fairness and fairness in the | This algorithm ensures bottleneck fairness and fairness in the | |||

broader, network sense. We acknowledge that current TCP fairness | broader, network sense. We acknowledge that current TCP fairness | |||

criteria are far from ideal, but a multipath TCP needs to be | criteria are far from ideal, but a multipath TCP needs to be | |||

deployable in the current Internet. If needed, new fairness criteria | deployable in the current Internet. If needed, new fairness criteria | |||

can be implemented by the same algorithm we propose by appropriately | can be implemented by the same algorithm we propose by appropriately | |||

scaling the overall aggressiveness. | scaling the overall aggressiveness. | |||

It is intended that the algorithm presented here can be applied to | It is intended that the algorithm presented here can be applied to | |||

other multipath transport protocols, such as alternative multipath | other multipath transport protocols, such as alternative multipath | |||

extensions to TCP, or indeed any other congestion-aware transport | extensions to TCP, or indeed any other congestion-aware transport | |||

protocols. However, for the purposes of example this document will, | protocols. However, for the purposes of example, this document will, | |||

where appropriate, refer to the MPTCP protocol. | where appropriate, refer to the MPTCP. | |||

The design decisions and evaluation of the congestion control | The design decisions and evaluation of the congestion control | |||

algorithm are published in [NSDI]. | algorithm are published in [NSDI]. | |||

The algorithm presented here only extends standard TCP congestion | The algorithm presented here only extends standard TCP congestion | |||

control for multipath operation. It is foreseeable that other | control for multipath operation. It is foreseeable that other | |||

congestion controllers will be implemented for multipath transport to | congestion controllers will be implemented for multipath transport to | |||

achieve the bandwidth-scaling properties of the newer congestion | achieve the bandwidth-scaling properties of the newer congestion | |||

control algorithms for regular TCP (such as Compound TCP and Cubic). | control algorithms for regular TCP (such as Compound TCP and Cubic). | |||

2. Requirements Language | ||||

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||

"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||

document are to be interpreted as described in RFC 2119 [RFC2119] . | ||||

3. Coupled Congestion Control Algorithm | 3. Coupled Congestion Control Algorithm | |||

The algorithm we present only applies to the increase phase of the | The algorithm we present only applies to the increase phase of the | |||

congestion avoidance state specifying how the window inflates upon | congestion avoidance state specifying how the window inflates upon | |||

receiving an ack. The slow start, fast retransmit, and fast recovery | receiving an ACK. The slow start, fast retransmit, and fast recovery | |||

algorithms, as well as the multiplicative decrease of the congestion | algorithms, as well as the multiplicative decrease of the congestion | |||

avoidance state are the same as in standard TCP[RFC5681]. | avoidance state are the same as in standard TCP [RFC5681]. | |||

Let cwnd_i be the congestion window on the subflow i. Let tot_cwnd | Let cwnd_i be the congestion window on the subflow i. Let cwnd_total | |||

be the sum of the congestion windows of all subflows in the | be the sum of the congestion windows of all subflows in the | |||

connection. Let p_i, rtt_i and mss_i be the loss rate, round trip | connection. Let p_i, rtt_i, and MSS_i be the loss rate, round-trip | |||

time (i.e. smoothed round trip time estimate used by TCP) and maximum | time (i.e., smoothed round-trip time estimate used by TCP), and | |||

segment size on subflow i. | maximum segment size on subflow i. | |||

We assume throughout this document that the congestion window is | We assume throughout this document that the congestion window is | |||

maintained in bytes, unless otherwise specified. We briefly describe | maintained in bytes, unless otherwise specified. We briefly describe | |||

the algorithm for packet-based implementations of cwnd in section | the algorithm for packet-based implementations of cwnd in section | |||

Section 4.2. | Section 4.2. | |||

Our proposed "Linked Increases" algorithm MUST: | Our proposed "Linked Increases" algorithm MUST: | |||

o For each ack received on subflow i, increase cwnd_i by | o For each ACK received on subflow i, increase cwnd_i by | |||

alpha * bytes_acked * mss_i bytes_acked * mss_i | alpha * bytes_acked * MSS_i bytes_acked * MSS_i | |||

min ( --------------------------- , ------------------- ) (1) | min ( --------------------------- , ------------------- ) (1) | |||

tot_cwnd cwnd_i | cwnd_total cwnd_i | |||

The increase formula (1) takes the minimum between the computed | The increase formula (1) takes the minimum between the computed | |||

increase for the multipath subflow (first argument to min), and the | increase for the multipath subflow (first argument to min), and the | |||

increase TCP would get in the same scenario (the second argument). | increase TCP would get in the same scenario (the second argument). | |||

In this way, we ensure that any multipath subflow cannot be more | In this way, we ensure that any multipath subflow cannot be more | |||

aggressive than a TCP flow in the same circumstances, hence achieving | aggressive than a TCP flow in the same circumstances, hence achieving | |||

goal 2 (do no harm). | Goal 2 (do no harm). | |||

"alpha" is a parameter of the algorithm that describes the | "alpha" is a parameter of the algorithm that describes the | |||

aggresiveness of the multipath flow. To meet Goal 1 (improve | aggressiveness of the multipath flow. To meet Goal 1 (improve | |||

throughput), the value of alpha is chosen such that the aggregate | throughput), the value of alpha is chosen such that the aggregate | |||

throughput of the multipath flow is equal to the throughput a TCP | throughput of the multipath flow is equal to the throughput a TCP | |||

flow would get if it ran on the best path. | flow would get if it ran on the best path. | |||

To get an intuition of what the algorithm is trying to do, let's take | To get an idea of what the algorithm is trying to do, let's take the | |||

the case where all the subflows have the same round trip time and | case where all the subflows have the same round-trip time and Maximum | |||

MSS. In this case the algorithm will grow the total window by | Segment Size (MSS). In this case, the algorithm will grow the total | |||

approximately alpha*MSS per RTT. This increase is distributed to the | window by approximately alpha*MSS per RTT. This increase is | |||

individual flows according to their instantaneous window size. | distributed to the individual flows according to their instantaneous | |||

Subflow i will increase by alpha*cwnd_i/tot_cwnd segments per RTT. | window size. Subflow i will increase by alpha*cwnd_i/cwnd_total | |||

segments per RTT. | ||||

Note that, as in standard TCP, when tot_cwnd is large the increase | Note that, as in standard TCP, when cwnd_total is large the increase | |||

may be 0. In this case the increase MUST be set to 1. We discuss | may be 0. In this case, the increase MUST be set to 1. We discuss | |||

how to implement this formula in practice in the next section. | how to implement this formula in practice in the next section. | |||

We assume implementations use an approach similar to appropriate byte | We assume implementations use an approach similar to appropriate byte | |||

counting (ABC, [RFC3465]), where the bytes_acked variable records the | counting (ABC, [RFC3465]), where the bytes_acked variable records the | |||

number of bytes newly acknowledged. If this is not the case, | number of bytes newly acknowledged. If this is not the case, | |||

bytes_acked SHOULD be set to mss_i. | bytes_acked SHOULD be set to MSS_i. | |||

To compute tot_cwnd, it is an easy mistake to sum up cwnd_i across | To compute cwnd_total, it is an easy mistake to sum up cwnd_i across | |||

all subflows: when a flow is in fast retransmit, its cwnd is | all subflows: when a flow is in fast retransmit, its cwnd is | |||

typically inflated and no longer represents the real congestion | typically inflated and no longer represents the real congestion | |||

window. The correct behavior is to use the ssthresh value for flows | window. The correct behavior is to use the ssthresh (slow start | |||

in fast retransmit when computing tot_cwnd. To cater for connections | threshold) value for flows in fast retransmit when computing | |||

that are app limited, the computation should consider the minimum | cwnd_total. To cater to connections that are app limited, the | |||

between flight_size_i and cwnd_i, and flight_size_i and ssthresh_i | computation should consider the minimum between flight_size_i and | |||

where appropriate. | cwnd_i, and flight_size_i and ssthresh_i, where appropriate. | |||

The total throughput of a multipath flow depends on the value of | The total throughput of a multipath flow depends on the value of | |||

alpha and the loss rates, maximum segment sizes and round trip times | alpha and the loss rates, maximum segment sizes, and round-trip times | |||

of its paths. Since we require that the total throughput is no worse | of its paths. Since we require that the total throughput is no worse | |||

than the throughput a single TCP would get on the best path, it is | than the throughput a single TCP would get on the best path, it is | |||

impossible to choose a-priori a single value of alpha that achieves | impossible to choose, a priori, a single value of alpha that achieves | |||

the desired throughput in every occasion. Hence, alpha must be | the desired throughput in every occasion. Hence, alpha must be | |||

computed based on the observed properties of the paths. | computed based on the observed properties of the paths. | |||

The formula to compute alpha is: | The formula to compute alpha is: | |||

cwnd_i | MAX (cwnd_i/rtt_i^2) | |||

max -------- | alpha = cwnd_total * ------------------------- (2) | |||

i 2 | (SUM (cwnd_i/rtt_i))^2 | |||

rtt_i | ||||

alpha = tot_cwnd * ---------------- (2) | Note: | |||

/ cwnd_i \ 2 | ||||

| sum ---------| | MAX (x_i) means the maximum value for any possible value of i. | |||

\ i rtt_i / | ||||

SUM (x_i) means the summation for all possible values of i. | ||||

The formula (2) is derived by equalizing the rate of the multipath | The formula (2) is derived by equalizing the rate of the multipath | |||

flow with the rate of a TCP running on the best path, and solving for | flow with the rate of a TCP running on the best path, and solving for | |||

alpha. | alpha. | |||

4. Implementation Considerations | 4. Implementation Considerations | |||

Equation (2) implies that alpha is a floating point value. This | Equation (2) implies that alpha is a floating point value. This | |||

would require performing costly floating point operations whenever an | would require performing costly floating point operations whenever an | |||

ACK is received, Further, in many kernels floating point operations | ACK is received. Further, in many kernels, floating point operations | |||

are disabled. There is an easy way to approximate the above | are disabled. There is an easy way to approximate the above | |||

calculations using integer arithmetic. | calculations using integer arithmetic. | |||

4.1. Computing alpha in Practice | 4.1. Computing "alpha" in Practice | |||

Let alpha_scale be an integer. When computing alpha, use alpha_scale | Let alpha_scale be an integer. When computing alpha, use alpha_scale | |||

* tot_cwnd instead of tot_cwnd, and do all the operations in integer | * cwnd_total instead of cwnd_total and do all the operations in | |||

arithmetic. | integer arithmetic. | |||

Then, scale down the increase per ack by alpha_scale. The resulting | Then, scale down the increase per ACK by alpha_scale. The resulting | |||

algorithm is a simple change from Equation (1): | algorithm is a simple change from Equation (1): | |||

o For each ack received on subflow i, increase cwnd_i by | o For each ACK received on subflow i, increase cwnd_i by: | |||

alpha * bytes_acked * mss_i bytes_acked * mss_i | alpha * bytes_acked * MSS_i bytes_acked * MSS_i | |||

min ( --------------------------- , ------------------- ) (3) | min ( --------------------------- , ------------------- ) (3) | |||

alpha_scale * tot_cwnd cwnd_i | alpha_scale * cwnd_total cwnd_i | |||

The alpha_scale parameter denotes the precision we want for computing | The alpha_scale parameter denotes the precision we want for computing | |||

alpha. Observe that the errors in computing the numerator or the | alpha. Observe that the errors in computing the numerator or the | |||

denominator in the formula for alpha are quite small, as the cwnd in | denominator in the formula for alpha are quite small, as the cwnd in | |||

bytes is typically much larger than the RTT (measured in ms). | bytes is typically much larger than the RTT (measured in ms). | |||

With these changes, all the operations can be done using integer | With these changes, all the operations can be done using integer | |||

arithmetic. We propose alpha_scale be a small power of two, to allow | arithmetic. We propose alpha_scale be a small power of two, to allow | |||

using faster shift operations instead of multiplication and division. | using faster shift operations instead of multiplication and division. | |||

Our experiments show that using alpha_scale=512 works well in a wide | Our experiments show that using alpha_scale=512 works well in a wide | |||

range of scenarios. Increasing alpha_scale increases precision, but | range of scenarios. Increasing alpha_scale increases precision, but | |||

also increases the risk of overflow when computing alpha. Using | also increases the risk of overflow when computing alpha. Using 64- | |||

64bit operations would solve this issue. Another option is to | bit operations would solve this issue. Another option is to | |||

dynamically adjust alpha_scale when computing alpha; in this way we | dynamically adjust alpha_scale when computing alpha; in this way, we | |||

avoid overflow and obtain maximum precision. | avoid overflow and obtain maximum precision. | |||

It is possible to implement the algorithm by calculating tot_cwnd on | It is possible to implement the algorithm by calculating cwnd_total | |||

each ack, however this would be costly especially when the number of | on each ack; however, this would be costly especially when the number | |||

subflows is large. To avoid this overhead the implementation MAY | of subflows is large. To avoid this overhead, the implementation MAY | |||

choose to maintain a new per connection state variable called | choose to maintain a new per-connection state variable called | |||

tot_cwnd. If it does so, the implementation will update tot_cwnd | "cwnd_total". If it does so, the implementation will update the | |||

value whenever the individual subflows' windows are updated. | cwnd_total value whenever the individual subflow's windows are | |||

Updating only requires one more addition or subtraction operation | updated. Updating only requires one more addition or subtraction | |||

compared to the regular, per subflow congestion control code, so its | operation compared to the regular, per-subflow congestion control | |||

performance impact should be minimal. | code, so its performance impact should be minimal. | |||

Computing alpha per ack is also costly. We propose alpha to be a per | Computing alpha per ACK is also costly. We propose alpha be a per- | |||

connection variable, computed whenever there is a drop and once per | connection variable, computed whenever there is a drop and once per | |||

RTT otherwise. More specifically, let cwnd_new be the new value of | RTT otherwise. More specifically, let cwnd_new be the new value of | |||

the congestion window after it is inflated or after a drop. Update | the congestion window after it is inflated or after a drop. Update | |||

alpha only if the quotient of cwnd_i/mss_i differs from the quotient | alpha only if the quotient of cwnd_i/MSS_i differs from the quotient | |||

of cwnd_new_i/mss_i. | of cwnd_new_i/MSS_i. | |||

In certain cases with small RTTs, computing alpha can still be | In certain cases with small RTTs, computing alpha can still be | |||

expensive. We observe that if RTTs were constant, it is sufficient | expensive. We observe that if RTTs were constant, it is sufficient | |||

to compute alpha once per drop, as alpha does not change between | to compute alpha once per drop, as alpha does not change between | |||

drops (the insight here is that cwnd_i/cwnd_j = constant as long as | drops (the insight here is that cwnd_i/cwnd_j = constant as long as | |||

both windows increase). Experimental results show that even if round | both windows increase). Experimental results show that even if | |||

trip times are not constant, using average round trip time per | round-trip times are not constant, using average round-trip time per | |||

sawtooth instead of instantaneous round trip time (i.e. TCP's | sawtooth instead of instantaneous round-trip time (i.e., TCP's | |||

smoothed RTT estimator) gives good precision for computing alpha. | smoothed RTT estimator) gives good precision for computing alpha. | |||

Hence, it is possible to compute alpha only once per drop using a | Hence, it is possible to compute alpha only once per drop using a | |||

modified version of equation (2) where rtt_i is replaced with | modified version of equation (2) where rtt_i is replaced with | |||

rtt_avg_i. | rtt_avg_i. | |||

If using average round trip time, rtt_avg_i will be computed by | If using average round-trip time, rtt_avg_i will be computed by | |||

sampling the rtt_i whenever the window can accommodate one more | sampling the rtt_i whenever the window can accommodate one more | |||

packet, i.e. when cwnd / mss < (cwnd+increase)/mss. The samples are | packet, i.e., when cwnd / MSS < (cwnd+increase)/MSS. The samples are | |||

averaged once per sawtooth into rtt_avg_i. This sampling ensures | averaged once per sawtooth into rtt_avg_i. This sampling ensures | |||

that there is no sampling bias for larger windows. | that there is no sampling bias for larger windows. | |||

Given tot_cwnd and alpha, the congestion control algorithm is run for | Given cwnd_total and alpha, the congestion control algorithm is run | |||

each subflow independently, with similar complexity to the standard | for each subflow independently, with similar complexity to the | |||

TCP increase code [RFC5681]. | standard TCP increase code [RFC5681]. | |||

4.2. Implementation Considerations when CWND is Expressed in Packets | 4.2. Implementation Considerations when CWND is Expressed in Packets | |||

When the congestion control algorithm maintains cwnd in packets | When the congestion control algorithm maintains cwnd in packets | |||

rather than bytes, the algorithms above must change to take into | rather than bytes, the algorithms above must change to take into | |||

account path mss. | account path MSS. | |||

To compute the increase when an ack is received, the implementation | To compute the increase when an ACK is received, the implementation | |||

for multipath congestion control is a simple extension of the | for multipath congestion control is a simple extension of the | |||

standard TCP code. In standard TCP cwnd_cnt is an additional state | standard TCP code. In standard, TCP cwnd_cnt is an additional state | |||

variable that tracks the number of segments acked since the last cwnd | variable that tracks the number of segments acked since the last cwnd | |||

increment; cwnd is incremented only when cwnd_cnt > cwnd; then | increment; cwnd is incremented only when cwnd_cnt > cwnd; then, | |||

cwnd_cnt is set to 0. | cwnd_cnt is set to 0. | |||

In the multipath case, cwnd_cnt_i is maintained for each subflow as | In the multipath case, cwnd_cnt_i is maintained for each subflow as | |||

above, and cwnd_i is increased by 1 when cwnd_cnt_i > max(alpha_scale | above, and cwnd_i is increased by 1 when cwnd_cnt_i > max(alpha_scale | |||

* tot_cwnd / alpha, cwnd_i). | * cwnd_total / alpha, cwnd_i). | |||

When computing alpha for packet-based stacks, the errors in computing | When computing alpha for packet-based stacks, the errors in computing | |||

the terms in the denominator are larger (this is because cwnd is much | the terms in the denominator are larger (this is because cwnd is much | |||

smaller and rtt may be comparatively large). Let max be the index of | smaller and rtt may be comparatively large). Let max be the index of | |||

the subflow used in the numerator. To reduce errors, it is easiest | the subflow used in the numerator. To reduce errors, it is easiest | |||

to move rtt_max (once calculated) from the numerator to the | to move rtt_max (once calculated) from the numerator to the | |||

denominator, changing equation (2) to obtain the equivalent formula | denominator, changing equation (2) to obtain the equivalent formula | |||

below. | below. | |||

(4) | ||||

cwnd_max | cwnd_max | |||

alpha = alpha_scale * tot_cwnd * ----------------------- (4) | alpha = alpha_scale * cwnd_total * ------------------------------------ | |||

/ rtt_max * cwnd_i \ 2 | (SUM ((rtt_max * cwnd_i) / rtt_i))^2 | |||

| sum -----------------| | ||||

\ i rtt_i / | ||||

Note that the calculation of alpha does not take into account path | Note that the calculation of alpha does not take into account path | |||

mss, and is the same for stacks that keep cwnd in bytes or packets. | MSS and is the same for stacks that keep cwnd in bytes or packets. | |||

With this formula, the algorithm for computing alpha will match the | With this formula, the algorithm for computing alpha will match the | |||

rate of TCP on the best path in B/s for byte-oriented stacks, and in | rate of TCP on the best path in B/s for byte-oriented stacks, and in | |||

packets/s in packet-based stacks. In practice, mss rarely changes | packets/s in packet-based stacks. In practice, MSS rarely changes | |||

between paths so this shouldn't be a problem. | between paths so this shouldn't be a problem. | |||

However, it is simple to derive formulae allowing packet-based stacks | However, it is simple to derive formulae allowing packet-based stacks | |||

to achieve byte rate fairness (and viceversa) if needed. In | to achieve byte rate fairness (and vice versa) if needed. In | |||

particular, for packet-based stacks wanting byte-rate fairness, | particular, for packet-based stacks wanting byte-rate fairness, | |||

equation (4) above changes as follows: cwnd_max is replaced by | equation (4) above changes as follows: cwnd_max is replaced by | |||

cwnd_max * mss_max * mss_max, while cwnd_i is replaced with cwnd_i * | cwnd_max * MSS_max * MSS_max, while cwnd_i is replaced with cwnd_i * | |||

mss_i. | MSS_i. | |||

5. Discussion | 5. Discussion | |||

The algorithm we've presented fully achieves Goals 1 and 2, but does | The algorithm we've presented fully achieves Goals 1 and 2, but does | |||

not achieve full resource pooling (Goal 3). Resource pooling | not achieve full resource pooling (Goal 3). Resource pooling | |||

requires that no traffic should be transferred on links with higher | requires that no traffic should be transferred on links with higher | |||

loss rates. To achieve perfect resource pooling, one must couple | loss rates. To achieve perfect resource pooling, one must couple | |||

both increase and decrease of congestion windows across subflows, as | both increase and decrease of congestion windows across subflows, as | |||

in [KELLY]. | in [KELLY]. | |||

There are a few problems with such a fully-coupled controller. | There are a few problems with such a fully coupled controller. | |||

First, it will probe insufficiently paths with high loss rates, and | First, it will insufficiently probe paths with high loss rates and | |||

will fail to detect free capacity when it becomes available. Second, | will fail to detect free capacity when it becomes available. Second, | |||

such controllers tend to exhibit "flappiness": when the paths have | such controllers tend to exhibit "flappiness": when the paths have | |||

similar levels of congestion, the congestion controller will tend to | similar levels of congestion, the congestion controller will tend to | |||

allocate all the window to one random subflow, and allocate zero | allocate all the window to one random subflow and allocate zero | |||

window to the other subflows. The controller will perform random | window to the other subflows. The controller will perform random | |||

flips between these stable points. This doesn't seem desirable in | flips between these stable points. This doesn't seem desirable in | |||

general, and is particularly bad when the achieved rates depend on | general, and is particularly bad when the achieved rates depend on | |||

the RTT (as in the current Internet): in such a case, the resulting | the RTT (as in the current Internet): in such a case, the resulting | |||

rate with fluctuate unpredictably depending on which state the | rate with fluctuate unpredictably depending on which state the | |||

controller is in, hence violating Goal 1. | controller is in, hence violating Goal 1. | |||

By only coupling increases our proposal probes high-loss paths, | By only coupling increases our proposal probes high loss paths, | |||

detecting free capacity quicker. Our proposal does not suffer from | detecting free capacity quicker. Our proposal does not suffer from | |||

flappiness but also achieves less resource pooling. The algorithm | flappiness but also achieves less resource pooling. The algorithm | |||

will allocate window to the subflows such that p_i * cwnd_i = | will allocate window to the subflows such that p_i * cwnd_i = | |||

constant, for all i. Thus, when the loss rates of the subflows are | constant, for all i. Thus, when the loss rates of the subflows are | |||

equal, each subflow will get an equal window, removing flappiness. | equal, each subflow will get an equal window, removing flappiness. | |||

When the loss rates differ, progressively more window will be | When the loss rates differ, progressively more windows will be | |||

allocated to the flow with the lower loss rate. In contrast, perfect | allocated to the flow with the lower loss rate. In contrast, perfect | |||

resource pooling requires that all the window should be allocated on | resource pooling requires that all the window should be allocated on | |||

the path with the lowest loss rate. Further details can be found in | the path with the lowest loss rate. Further details can be found in | |||

[NSDI]. | [NSDI]. | |||

6. Security Considerations | 6. Security Considerations | |||

One security concern relates to what we call the traffic-shifting | One security concern relates to what we call the traffic-shifting | |||

attack: on-path attackers can drop packets belonging to a multipath | attack: on-path attackers can drop packets belonging to a multipath | |||

subflow, which in turn makes the path seem congested and will force | subflow, which, in turn, makes the path seem congested and will force | |||

the sender's congestion controller to avoid that path and push more | the sender's congestion controller to avoid that path and push more | |||

data over alternate subflows. | data over alternate subflows. | |||

The attacker's goal is to create congestion on the corresponding | The attacker's goal is to create congestion on the corresponding | |||

alternative paths. This behaviour is entirely feasible, but will | alternative paths. This behavior is entirely feasible but will only | |||

only have minor effects: by design, the coupled congestion controller | have minor effects: by design, the coupled congestion controller is | |||

is less (or similarly) aggressive on any of its paths than a single | less (or similarly) aggressive on any of its paths than a single TCP | |||

TCP flow. Thus, the biggest effect this attack can have is to make a | flow. Thus, the biggest effect this attack can have is to make a | |||

multipath subflow be as aggressive as a single TCP flow. | multipath subflow be as aggressive as a single TCP flow. | |||

Another effect of the traffic-shifting attack is that the new path | Another effect of the traffic-shifting attack is that the new path | |||

can monitor all the traffic, whereas before it could only see a | can monitor all the traffic, whereas before it could only see a | |||

subset of traffic. We believe that if privacy is needed, splitting | subset of traffic. We believe that if privacy is needed, splitting | |||

traffic across multiple paths with MPTCP is not the right solution in | traffic across multiple paths with MPTCP is not the right solution in | |||

the first place; end-to-end encryption should be used instead. | the first place; end-to-end encryption should be used instead. | |||

Besides the traffic-shifting attack mentioned above, the coupled | Besides the traffic-shifting attack mentioned above, the coupled | |||

congestion control algorithm defined in this draft adds no other | congestion control algorithm defined in this document adds no other | |||

security considerations to those found in | security considerations to those found in [MPTCP-MULTIADDRESSED] and | |||

[I-D.ietf-mptcp-multiaddressed] and [RFC6181]. Detailed security | [RFC6181]. Detailed security analysis for the Multipath TCP protocol | |||

analysis for the Multipath TCP protocol itself is included in | itself is included in [MPTCP-MULTIADDRESSED] and [RFC6181]. | |||

[I-D.ietf-mptcp-multiaddressed] and [RFC6181]. | ||||

7. Acknowledgements | 7. Acknowledgements | |||

We thank Christoph Paasch for his suggestions for computing alpha in | We thank Christoph Paasch for his suggestions for computing alpha in | |||

packet-based stacks. The authors are supported by Trilogy | packet-based stacks. The authors are supported by Trilogy | |||

(http://www.trilogy-project.org), a research project (ICT-216372) | (http://www.trilogy-project.org), a research project (ICT-216372) | |||

partially funded by the European Community under its Seventh | partially funded by the European Community under its Seventh | |||

Framework Program. The views expressed here are those of the | Framework Program. The views expressed here are those of the | |||

author(s) only. The European Commission is not liable for any use | author(s) only. The European Commission is not liable for any use | |||

that may be made of the information in this document. | that may be made of the information in this document. | |||

8. IANA Considerations | 8. References | |||

This document does not require any action from IANA. | ||||

9. References | ||||

9.1. Normative References | 8.1. Normative References | |||

[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, | [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, | |||

RFC 793, September 1981. | RFC 793, September 1981. | |||

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||

Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||

[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||

Control", RFC 5681, September 2009. | Control", RFC 5681, September 2009. | |||

9.2. Informative References | 8.2. Informative References | |||

[I-D.ietf-mptcp-multiaddressed] | ||||

Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, | ||||

"TCP Extensions for Multipath Operation with Multiple | ||||

Addresses", draft-ietf-mptcp-multiaddressed-04 (work in | ||||

progress), July 2011. | ||||

[KELLY] Kelly, F. and T. Voice, "Stability of end-to-end | [KELLY] Kelly, F. and T. Voice, "Stability of end-to-end | |||

algorithms for joint routing and rate control", ACM | algorithms for joint routing and rate control", ACM | |||

SIGCOMM CCR vol. 35 num. 2, pp. 5-12, 2005, | SIGCOMM CCR vol. 35 num. 2, pp. 5-12, 2005, | |||

<http://portal.acm.org/citation.cfm?id=1064415>. | <http://portal.acm.org/citation.cfm?id=1064415>. | |||

[MPTCP-MULTIADDRESSED] | ||||

Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, | ||||

"TCP Extensions for Multipath Operation with Multiple | ||||

Addresses", Work in Progress, July 2011. | ||||

[NSDI] Wischik, D., Raiciu, C., Greenhalgh, A., and M. Handley, | [NSDI] Wischik, D., Raiciu, C., Greenhalgh, A., and M. Handley, | |||

"Design, Implementation and Evaluation of Congestion | "Design, Implementation and Evaluation of Congestion | |||

Control for Multipath TCP", Usenix NSDI , March 2011, <htt | Control for Multipath TCP", Usenix NSDI , March 2011, <htt | |||

p://www.cs.ucl.ac.uk/staff/c.raiciu/files/mptcp-nsdi.pdf>. | p://www.cs.ucl.ac.uk/staff/c.raiciu/files/mptcp-nsdi.pdf>. | |||

[RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | |||

Counting (ABC)", RFC 3465, February 2003. | Counting (ABC)", RFC 3465, February 2003. | |||

[RFC6181] Bagnulo, M., "Threat Analysis for TCP Extensions for | [RFC6181] Bagnulo, M., "Threat Analysis for TCP Extensions for | |||

Multipath Operation with Multiple Addresses", RFC 6181, | Multipath Operation with Multiple Addresses", RFC 6181, | |||

skipping to change at page 13, line 13 | skipping to change at page 12, line 18 | |||

<http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>. | <http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>. | |||

Authors' Addresses | Authors' Addresses | |||

Costin Raiciu | Costin Raiciu | |||

University Politehnica of Bucharest | University Politehnica of Bucharest | |||

Splaiul Independentei 313 | Splaiul Independentei 313 | |||

Bucharest | Bucharest | |||

Romania | Romania | |||

Email: costin.raiciu@cs.pub.ro | EMail: costin.raiciu@cs.pub.ro | |||

Mark Handley | Mark Handley | |||

University College London | University College London | |||

Gower Street | Gower Street | |||

London WC1E 6BT | London WC1E 6BT | |||

UK | UK | |||

Email: m.handley@cs.ucl.ac.uk | EMail: m.handley@cs.ucl.ac.uk | |||

Damon Wischik | Damon Wischik | |||

University College London | University College London | |||

Gower Street | Gower Street | |||

London WC1E 6BT | London WC1E 6BT | |||

UK | UK | |||

Email: d.wischik@cs.ucl.ac.uk | EMail: d.wischik@cs.ucl.ac.uk | |||

End of changes. 76 change blocks. | ||||

177 lines changed or deleted | | 171 lines changed or added | ||

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |