draft-ietf-tcpm-proportional-rate-reduction-02.txt   draft-ietf-tcpm-proportional-rate-reduction-03.txt 
TCP Maintenance Working Group M. Mathis TCP Maintenance Working Group M. Mathis
Internet-Draft N. Dukkipati Internet-Draft N. Dukkipati
Intended status: Experimental Y. Cheng Intended status: Experimental Y. Cheng
Expires: January 14, 2013 Google, Inc Expires: April 25, 2013 Google, Inc
July 13, 2012 Oct 22, 2012
Proportional Rate Reduction for TCP Proportional Rate Reduction for TCP
draft-ietf-tcpm-proportional-rate-reduction-02.txt draft-ietf-tcpm-proportional-rate-reduction-03.txt
Abstract Abstract
This document describes an experimental algorithm, Proportional Rate This document describes an experimental algorithm, Proportional Rate
Reduction (PPR) to improve the accuracy of the amount of data sent by Reduction (PPR) to improve the accuracy of the amount of data sent by
TCP during loss recovery. Standard Congestion Control requires that TCP during loss recovery. Standard Congestion Control requires that
TCP and other protocols reduce their congestion window in response to TCP and other protocols reduce their congestion window in response to
losses. This window reduction naturally occurs in the same round losses. This window reduction naturally occurs in the same round
trip as the data retransmissions to repair the losses, and is trip as the data retransmissions to repair the losses, and is
implemented by choosing not to transmit any data in response to some implemented by choosing not to transmit any data in response to some
ACKs arriving from the receiver. Two widely deployed algorithms are ACKs arriving from the receiver. Two widely deployed algorithms are
used to implement this window reduction: Fast Recovery and Rate used to implement this window reduction: Fast Recovery and Rate
Halving. Both algorithms are needlessly fragile under a number of Halving. Both algorithms are needlessly fragile under a number of
conditions, particularly when there is a burst of losses such that conditions, particularly when there is a burst of losses such that
the number of ACKs returning to the sender is small. Proportional the number of ACKs returning to the sender is small. Proportional
Rate Reduction minimizes these excess window reductions such that at Rate Reduction minimizes these excess window adjustments such that at
the end of recovery the actual window size will be as close as the end of recovery the actual window size will be as close as
possible to ssthresh, the window size determined by the congestion possible to ssthresh, the window size determined by the congestion
control algorithm. It is patterned after Rate Halving, but using the control algorithm. It is patterned after Rate Halving, but using the
fraction that is appropriate for target window chosen by the fraction that is appropriate for target window chosen by the
congestion control algorithm. congestion control algorithm.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
skipping to change at page 1, line 48 skipping to change at page 1, line 48
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 14, 2013. This Internet-Draft will expire on April 25, 2013.
Copyright Notice Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4. Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5. Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 11 5. Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 11
6. Conclusion and Recommendations . . . . . . . . . . . . . . . . 12 6. Conclusion and Recommendations . . . . . . . . . . . . . . . . 12
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
8. Security Considerations . . . . . . . . . . . . . . . . . . . 13 8. Security Considerations . . . . . . . . . . . . . . . . . . . 13
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.1. Normative References . . . . . . . . . . . . . . . . . . . 13 10.1. Normative References . . . . . . . . . . . . . . . . . . . 13
10.2. Informative References . . . . . . . . . . . . . . . . . . 14 10.2. Informative References . . . . . . . . . . . . . . . . . . 14
Appendix A. Packet Conservation Bound . . . . . . . . . . . . . . 14 Appendix A. Strong Packet Conservation Bound . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16
1. Introduction 1. Introduction
This document describes an experimental algorithm, Proportional Rate This document describes an experimental algorithm, Proportional Rate
Reduction (PPR) to improve the accuracy of the amount of data sent by Reduction (PPR) to improve the accuracy of the amount of data sent by
TCP during loss recovery. TCP during loss recovery.
Standard Congestion Control [RFC5681] requires that TCP (and other Standard Congestion Control [RFC5681] requires that TCP (and other
protocols) reduce their congestion window in response to losses. protocols) reduce their congestion window in response to losses.
Fast Recovery, described in the same document, is the reference Fast Recovery, described in the same document, is the reference
algorithm for making this adjustment. Its stated goal is to recover algorithm for making this adjustment. Its stated goal is to recover
TCP's self clock by relying on returning ACKs during recovery to TCP's self clock by relying on returning ACKs during recovery to
clock more data into the network. Fast Recovery adjusts the window clock more data into the network. Fast Recovery typically adjusts
by waiting for one half RTT of ACKs to pass before sending any data. the window by waiting for one half RTT of ACKs to pass before sending
It is fragile because it can not compensate for the implicit window any data. It is fragile because it can not compensate for the
reduction caused by the losses themselves, and is exposed to implicit window reduction caused by the losses themselves.
timeouts. For example if half of the data or ACKs are lost, Fast
Recovery's expected behavior would be to wait for a half of a window
of (remaining) ACKs to pass. It would then not receive any of the
ACKs needed for recovery and suffer a timeout.
The rate-halving algorithm improves this situation by sending data on RFC 3517 [RFC3517] makes Fast Recovery with SACK [RFC2018] more
alternate ACKs during recovery, such that after one RTT the window accurate by computing "pipe", a sender side estimate of the number of
has been halved. Rate-halving is implemented in Linux after only bytes still outstanding in the network. With RFC 3517, Fast Recovery
being informally published [RHweb], including an uncompleted is implemented by sending data as necessary on each ACK to prevent
Internet-Draft [RHID]. Rate-halving also does not adequately pipe from falling below ssthresh, the window size as determined by
compensate for the implicit window reduction caused by the losses and the congestion control algorithm. This protects Fast Recovery from
assumes a 50% window reduction, which was completely standard at the timeouts in many cases where there are heavy losses, although not if
time it was written, but not appropriate for modern congestion the entire second half of the window of data or ACKs are lost.
control algorithms such as Cubic [CUBIC], which reduce the window by However, a single ACK carrying a SACK option that reports of large
less than 50%. As a consequence rate-halving often allows the window quantity of missing data can cause a step discontinuity in the pipe
to fall further than necessary, reducing performance and increasing estimator, which can cause Fast Retransmit to send a burst of data.
the risk of timeouts if there are additional losses.
The rate-halving algorithm sends data on alternate ACKs during
recovery, such that after one RTT the window has been halved. Rate-
halving is implemented in Linux after only being informally published
[RHweb], including an uncompleted Internet-Draft [RHID]. Rate-
halving also does not adequately compensate for the implicit window
reduction caused by the losses and assumes a net 50% window
reduction, which was completely standard at the time it was written,
but not appropriate for modern congestion control algorithms such as
Cubic [CUBIC], which reduce the window by less than 50%. As a
consequence rate-halving often allows the window to fall further than
necessary, reducing performance and increasing the risk of timeouts
if there are additional losses.
Proportional Rate Reduction (PPR) avoids these excess window Proportional Rate Reduction (PPR) avoids these excess window
reductions such that at the end of recovery the actual window size adjustments such that at the end of recovery the actual window size
will be as close as possible to ssthresh, the window size determined will be as close as possible to ssthresh, the window size determined
by the congestion control algorithm. It is patterned after Rate by the congestion control algorithm. It is patterned after Rate
Halving, but using the fraction that is appropriate for the target Halving, but using the fraction that is appropriate for the target
window chosen by the congestion control algorithm. During PRR one of window chosen by the congestion control algorithm. During PRR one of
two additional reduction bound algorithms limits the total window two additional reduction bound algorithms limits the total window
reduction due to all mechanisms, including application stalls and the reduction due to all mechanisms, including transient application
losses themselves. stalls and the losses themselves.
We describe two slightly different reduction bound algorithms: We describe two slightly different reduction bound algorithms:
conservative reduction bound (CRB), which is strictly packet conservative reduction bound (CRB), which is strictly packet
conserving; and a slow start reduction bound (SSRB), which is more conserving; and a slow start reduction bound (SSRB), which is more
aggressive than CRB by at most one segment per ACK. PRR-CRB meets aggressive than CRB by at most one segment per ACK. PRR-CRB meets
the strong conservative bound described in Appendix A, however in the Strong Packet Conservation Bound described in Appendix A, however
real networks it does not perform as well as the algorithms described in real networks it does not perform as well as the algorithms
in RFC 3517, which prove to be non-conservative in a significant described in RFC 3517, which prove to be more aggressive in a
number of cases. SSRB offers a compromise by allowing TCP to send significant number of cases. SSRB offers a compromise by allowing
one additional segment per ACK relative to CRB in some situations. TCP to send one additional segment per ACK relative to CRB in some
Although SSRB is less aggressive than RFC 3517 (transmitting fewer situations. Although SSRB is less aggressive than RFC 3517
segments or taking more time to transmit them) it outperforms it, due (transmitting fewer segments or taking more time to transmit them) it
to the lower probability of additional losses during recovery. outperforms it, due to the lower probability of additional losses
during recovery.
PRR and both reduction bounds are based on common design principles, The Strong Packet Conservation Bound on which PRR and both reduction
derived from Van Jacobson's packet conservation principle: segments bounds are based is patterned after Van Jacobson's packet
delivered to the receiver are used as the clock to trigger sending conservation principle: segments delivered to the receiver are used
the same number of segments back into the network. As much as as the clock to trigger sending the same number of segments back into
possible Proportional Rate Reduction and the reduction bound rely on the network. As much as possible Proportional Rate Reduction and the
this self clock process, and are only slightly affected by the reduction bound algorithms rely on this self clock process, and are
accuracy of other estimators, such as pipe [RFC3517] and cwnd. This only slightly affected by the accuracy of other estimators, such as
is what gives the algorithms their precision in the presence of pipe [RFC3517] and cwnd. This is what gives the algorithms their
events that cause uncertainty in other estimators. precision in the presence of events that cause uncertainty in other
estimators.
The original definition of the packet conservation principle
[Jacobson88] treated packets that are presumed to be lost (e.g.
marked as candidates for retransmission) as having left the network.
This idea is reflected in the pipe estimator defined in RFC 3517 and
used here, but it is distinct from Strong Packet Conservation Bound
described in Appendix A, which is defined solely on the basis of data
arriving at the receiver.
We evaluated these and other algorithms in a large scale measurement We evaluated these and other algorithms in a large scale measurement
study, summarized below. The most important results from that study study presented in an companion paper [IMC11] and summarized in
are presented in an companion paper [IMC11]. PRR+SSRB outperforms Section 5. For authentic network traffic PRR+SSRB outperforms both
both RFC 3517 and Linux Rate Halving under authentic network traffic, RFC 3517 and Linux Rate Halving even though it is less aggressive
even though it is less aggressive than RFC 3517. than RFC 3517.
The algorithms are described as modifications to RFC 5681 [RFC5681], The algorithms are described as modifications to RFC 5681 [RFC5681],
TCP Congestion Control, using concepts drawn from the pipe algorithm TCP Congestion Control, using concepts drawn from the pipe algorithm
[RFC3517]. They are most accurate and more easily implemented with [RFC3517]. They are most accurate and more easily implemented with
SACK [RFC2018], but do not require SACK. SACK [RFC2018], but do not require SACK. The analysis and
measurement study presented here predates [RFC6675], which updates
RFC 3517.
2. Definitions 2. Definitions
The following terms, parameters and state variables are used as they The following terms, parameters and state variables are used as they
are defined in earlier documents: are defined in earlier documents:
RFC 793: snd.una
RFC 3517: covered (as in "covered sequence numbers") RFC 3517: covered (as in "covered sequence numbers")
RFC 5681: duplicate ACK, FlightSize, Sender Maximum Segment Size RFC 5681: duplicate ACK, FlightSize, Sender Maximum Segment Size
(SMSS) (SMSS)
Voluntary window reductions: choosing not to send data in response to Voluntary window reductions: choosing not to send data in response to
some ACKs, for the purpose of reducing the sending window size and some ACKs, for the purpose of reducing the sending window size and
data rate. data rate.
We define some additional variables: We define some additional variables:
skipping to change at page 6, line 34 skipping to change at page 6, line 43
prr_out += (data sent) // strictly less than or equal to sndcnt prr_out += (data sent) // strictly less than or equal to sndcnt
3.1. Examples 3.1. Examples
We illustrate these algorithms by showing their different behaviors We illustrate these algorithms by showing their different behaviors
for two scenarios: TCP experiencing either a single loss or a burst for two scenarios: TCP experiencing either a single loss or a burst
of 15 consecutive losses. In all cases we assume bulk data (no of 15 consecutive losses. In all cases we assume bulk data (no
application pauses), standard AIMD congestion control and cwnd = application pauses), standard AIMD congestion control and cwnd =
FlightSize = pipe = 20 segments, so ssthresh will be set to 10 at the FlightSize = pipe = 20 segments, so ssthresh will be set to 10 at the
beginning of recovery. We also assume standard Fast Retransmit and beginning of recovery. We also assume standard Fast Retransmit and
Limited Transmit, so TCP will send two new segments followed by one Limited Transmit [RFC3042], so TCP will send two new segments
retransmit in response to the first 3 duplicate ACKs following the followed by one retransmit in response to the first 3 duplicate ACKs
losses. following the losses.
Each of the diagrams below shows the per ACK response to the first Each of the diagrams below shows the per ACK response to the first
round trip for the various recovery algorithms when the zeroth round trip for the various recovery algorithms when the zeroth
segment is lost. The top line indicates the transmitted segment segment is lost. The top line indicates the transmitted segment
number triggering the ACKs, with an X for the lost segment. "cwnd" number triggering the ACKs, with an X for the lost segment. "cwnd"
and "pipe" indicate the values of these algorithms after processing and "pipe" indicate the values of these algorithms after processing
each returning ACK. "Sent" indicates how much 'N'ew or each returning ACK. "Sent" indicates how much 'N'ew or
'R'etransmitted data would be sent. Note that the algorithms for 'R'etransmitted data would be sent. Note that the algorithms for
deciding which data to send are out of scope of this document. deciding which data to send are out of scope of this document.
skipping to change at page 8, line 29 skipping to change at page 8, line 29
pipe: 19 19 4 4 4 pipe: 19 19 4 4 4
sent: N N R R R sent: N N R R R
RB: b b b RB: b b b
PRR-SSRB PRR-SSRB
ack# X X X X X X X X X X X X X X X 15 16 17 18 19 ack# X X X X X X X X X X X X X X X 15 16 17 18 19
pipe: 19 19 4 5 6 pipe: 19 19 4 5 6
sent: N N 2R 2R 2R sent: N N 2R 2R 2R
RB: bd d d RB: bd d d
In this specific situation, RFC 3517 is very non-conservative, In this specific situation, RFC 3517 is more aggressive, because once
because once fast retransmit is triggered (on the ACK for segment 17) fast retransmit is triggered (on the ACK for segment 17) TCP
TCP immediately retransmits sufficient data to bring pipe up to cwnd. immediately retransmits sufficient data to bring pipe up to cwnd.
Our measurement data (see Section 5) indicates that RFC 3517 Our measurement data (see Section 5) indicates that RFC 3517
significantly outperforms Rate Halving, PRR-CRB and some other significantly outperforms Rate Halving, PRR-CRB and some other
similarly conservative algorithms that we tested, suggesting that it similarly conservative algorithms that we tested, showing that it is
is significantly common for the actual losses to exceed the window significantly common for the actual losses to exceed the window
reduction determined by the congestion control algorithm. reduction determined by the congestion control algorithm.
The Linux implementation of Rate Halving includes an early version of The Linux implementation of Rate Halving includes an early version of
the conservative reduction bound [RHweb]. In this situation the five the conservative reduction bound [RHweb]. In this situation the five
ACKs trigger exactly one transmission each (2 new data, 3 old data), ACKs trigger exactly one transmission each (2 new data, 3 old data),
and cwnd is set to 5. At a window size of 5, it takes three round and cwnd is set to 5. At a window size of 5, it takes three round
trips to retransmit all 15 lost segments. Rate Halving does not trips to retransmit all 15 lost segments. Rate Halving does not
raise the window at all during recovery, so when recovery finally raise the window at all during recovery, so when recovery finally
completes, TCP will slowstart cwnd from 5 up to 10. In this example, completes, TCP will slowstart cwnd from 5 up to 10. In this example,
TCP operates at half of the window chosen by the congestion control TCP operates at half of the window chosen by the congestion control
skipping to change at page 10, line 33 skipping to change at page 10, line 33
specifically they cause prr_delivered-prr_out to be significantly specifically they cause prr_delivered-prr_out to be significantly
positive. If the application catches up while TCP is still in positive. If the application catches up while TCP is still in
recovery, TCP will send a partial window burst to catch up to exactly recovery, TCP will send a partial window burst to catch up to exactly
where it would have been, had the application never stalled. where it would have been, had the application never stalled.
Although this burst might be viewed as being hard on the network, Although this burst might be viewed as being hard on the network,
this is exactly what happens every time there is a partial RTT this is exactly what happens every time there is a partial RTT
application stall while not in recovery. We have made the partial application stall while not in recovery. We have made the partial
RTT stall behavior uniform in all states. Changing this behavior is RTT stall behavior uniform in all states. Changing this behavior is
out of scope for this document. out of scope for this document.
Proportional Rate Reduction with Reduction Bound is significantly Proportional Rate Reduction with Reduction Bound is less sensitive to
less sensitive to errors of the pipe estimator. While in recovery, errors in the pipe estimator. While in recovery, pipe is
pipe is intrinsically an estimator, using incomplete information to intrinsically an estimator, using incomplete information to estimate
guess if un-SACKed segments are actually lost or out-of-order in the if un-SACKed segments are actually lost or merely out-of-order in the
network. Under some conditions pipe can have significant errors, for network. Under some conditions pipe can have significant errors, for
example when a burst of reordered data is presumed to be lost and is example pipe is underestimated when when a burst of reordered data is
retransmitted, but then the original data arrives before the prematurely assumed to be lost and marked for retransmission. If the
retransmission. If the transmissions are regulated directly by pipe transmissions are regulated directly by pipe as they are with RFC
as they are in RFC 3517, then errors and discontinuities in the value 3517, such as step discontinuity in the pipe estimator causes a burst
of the pipe estimator can cause significant errors in the amount of of data, which can not be retracted once the pipe estimator is
data sent. With Proportional Rate Reduction with Reduction Bound, corrected a few ACKs later. For PRR, pipe merely determines which
pipe merely determines how sndcnt is computed from DeliveredData. algorithm, Proportional Rate Reduction or the reduction bound, is
Since short term errors in pipe are smoothed out across multiple ACKs used to compute sndcnt from DeliveredData. While pipe is
and both Proportional Rate Reduction and the reduction bound converge underestimated the algorithms are different by at most one segment
to the same final window, errors in the pipe estimator have less per ACK. Once pipe is updated they converge to the same final window
impact on the final outcome. at the end of recovery.
Under all conditions and sequences of events during recovery, PRR-CRB Under all conditions and sequences of events during recovery, PRR-CRB
strictly bounds the data transmitted to be equal to or less than the strictly bounds the data transmitted to be equal to or less than the
amount of data delivered to the receiver. We claim that this packet amount of data delivered to the receiver. We claim that this Strong
conservation bound is the most aggressive algorithm that does not Packet Conservation Bound is the most aggressive algorithm that does
lead to additional forced losses in some environments. It has the not lead to additional forced losses in some environments. It has
property that if there is a standing queue at a bottleneck with no the property that if there is a standing queue at a bottleneck with
cross traffic, the queue will maintain exactly constant length for no cross traffic, the queue will maintain exactly constant length for
the duration of the recovery, except for +1/-1 fluctuation due to the duration of the recovery, except for +1/-1 fluctuation due to
differences in packet arrival and exit times. See Appendix A for a differences in packet arrival and exit times. See Appendix A for a
detailed discussion of this property. detailed discussion of this property.
Although the packet Packet Conserving Bound in very appealing for a Although the Strong Packet Conserving Bound in very appealing for a
number of reasons, our measurements summarized in Section 5 number of reasons, our measurements summarized in Section 5
demonstrate that it is less aggressive and does not perform as well demonstrate that it is less aggressive and does not perform as well
as RFC3517, which permits large bursts of data when there are bursts as RFC3517, which permits large bursts of data when there are bursts
of losses. PRR-SSRB is a compromise that permits TCP to send one of losses. PRR-SSRB is a compromise that permits TCP to send one
extra segment per ACK as compared to the packet conserving bound. extra segment per ACK as compared to the packet conserving bound.
From the perspective of the packet conserving bound, PRR-SSRB does From the perspective of a strict packet conserving bound, PRR-SSRB
indeed open the window during recovery, however it is significantly does indeed open the window during recovery, however it is
less aggressive than RFC3517 in the presence of burst losses. significantly less aggressive than RFC3517 in the presence of burst
losses.
5. Measurements 5. Measurements
In a companion IMC11 paper [IMC11] we describe some measurements In a companion IMC11 paper [IMC11] we describe some measurements
comparing the various strategies for reducing the window during comparing the various strategies for reducing the window during
recovery. The results are summarized here. recovery. The experiments were performed on servers carrying Google
production traffic and are briefly summarized here.
The various window reduction algorithms and extensive instrumentation The various window reduction algorithms and extensive instrumentation
were all implemented in Linux 2.6. We used the uniform set of were all implemented in Linux 2.6. We used the uniform set of
algorithms present in the base Linux implementation, including CUBIC algorithms present in the base Linux implementation, including CUBIC
[CUBIC], limited transmit [RFC3042], threshold transmit from [FACK] [CUBIC], limited transmit [RFC3042], threshold transmit from [FACK]
and lost retransmission detection algorithms. We confirmed that the (a similar algorithm appears in [RFC6675]) and lost retransmission
behaviors of Rate Halving (the Linux default), RFC 3517 and PRR were detection algorithms. We confirmed that the behaviors of Rate
authentic to their respective specifications and that performance and Halving (the Linux default), RFC 3517 and PRR were authentic to their
features were comparable to the kernels in production use. The respective specifications and that performance and features were
different window reduction algorithms were all present in the same comparable to the kernels in production use. All of the different
kernel and could be selected with a sysctl, such that we had an window reduction algorithms were all present in a common kernel and
absolutely uniform baseline for comparing them. could be selected with a sysctl, such that we had an absolutely
uniform baseline for comparing them.
Our experiments included an additional algorithm, PRR with an Our experiments included an additional algorithm, PRR with an
unlimited bound (PRR-UB), which sends ssthresh-pipe bursts when pipe unlimited bound (PRR-UB), which sends ssthresh-pipe bursts when pipe
falls below ssthresh. This behavior parallels RFC 3517. falls below ssthresh. This behavior parallels RFC 3517.
An important detail of this configuration is that CUBIC only reduces An important detail of this configuration is that CUBIC only reduces
the window by 30%, as opposed to the 50% reduction used by the window by 30%, as opposed to the 50% reduction used by
traditional congestion control algorithms. This, in conjunction with traditional congestion control algorithms. This accentuates the
using only standard algorithms to trigger Fast Retransmit, tendency for RFC 3517 and PRR-UB to send a burst at the point when
accentuates the tendency for RFC 3517 and PRR-UB to send a burst at Fast Retransmit gets triggered because pipe is likely to already be
the point when Fast Retransmit gets triggered if pipe is already below ssthresh. Precisely this condition was observed for 32% of the
below ssthresh. recovery events: pipe fell below ssthresh before Fast Retransmit is
All experiments were performed on servers carrying production traffic
for multiple Google services.
In this configuration it is observed that for 32% of the recovery
events, pipe falls below ssthresh before Fast Retransmit is
triggered, thus the various PRR algorithms start in the reduction triggered, thus the various PRR algorithms start in the reduction
bound phase, and RFC 3517 send bursts of segments with the fast bound phase, and RFC 3517 sends bursts of segments with the fast
retransmit. retransmit.
In the companion paper we observe that PRR-SSRB spends the least time In the companion paper we observe that PRR-SSRB spends the least time
in recovery of all the algorithms tested, largely because it in recovery of all the algorithms tested, largely because it
experiences fewer timeouts once it is already in recovery. experiences fewer timeouts once it is already in recovery.
RFC 3517 experiences 29% more detected lost retransmissions and 2.6% RFC 3517 experiences 29% more detected lost retransmissions and 2.6%
more timeouts (presumably due to undetected lost retransmissions) more timeouts (presumably due to undetected lost retransmissions)
than PRR-SSRB. These results are representative of PRR-UB and other than PRR-SSRB. These results are representative of PRR-UB and other
algorithms that send bursts when pipe falls below ssthresh. algorithms that send bursts when pipe falls below ssthresh.
Rate Halving experiences 5% more timeouts and significantly smaller Rate Halving experiences 5% more timeouts and significantly smaller
final cwnd values at the end of recovery. The smaller cwnd sometimes final cwnd values at the end of recovery. The smaller cwnd sometimes
causes the recovery itself to take extra round trips. These results causes the recovery itself to take extra round trips. These results
are representative of PRR-CRB and other algorithms that implement are representative of PRR-CRB and other algorithms that implement
strict packet conservation during recovery. strict packet conservation during recovery.
6. Conclusion and Recommendations 6. Conclusion and Recommendations
Although the packet conserving bound is very appealing for a number Although the Strong Packet Conserving Bound used in PRR-CRB is very
of reasons, our measurements demonstrate that it is less aggressive appealing for a number of reasons, our measurements show that it is
and does not perform as well as RFC3517, which permits significant less aggressive and does not perform as well as RFC 3517, which
bursts of data when there are large bursts of losses. PRR-SSRB is a permits bursts of data when there are bursts of losses. RFC 3517 is
compromise that permits TCP to send one extra segment per ACK as conservative in the original sense of Van Jacobson's packet
relative to the packet conserving bound. From the perspective of the conservation principle, which included the assumption that presumed
packet conserving bound, PRR-SSRB does indeed open the window during lost segments have indeed left the network. PRR-CRB makes no such
recovery, however it is significantly less aggressive than RFC3517 in assumption, following instead a Strong Packet Conserving Bound, in
the presence of burst losses. Even so, it often out performs which only packets that have arrived at the receiver are considered
RFC3517, because it avoids some of the self inflicted losses caused to have left the network. PRR-SSRB is a compromise that permits TCP
by bursts. to send one extra segment per ACK relative to the Strong Packet
Conserving Bound, to partially compensate for excess losses.
From the perspective of the Strong Packet Conserving Bound, PRR-SSRB
does indeed open the window during recovery, however it is
significantly less aggressive than RFC 3517 in the presence of burst
losses. Even so, it often outperforms RFC 3517, because it avoids
some of the self inflicted losses caused by bursts.
At this time we see no reason not to test and deploy PRR-SSRB on a At this time we see no reason not to test and deploy PRR-SSRB on a
large scale. Implementers worried about any potential impact of large scale. Implementers worried about any potential impact of
raising the window during recovery may want to optionally support raising the window during recovery may want to optionally support
PRR-CRB (which is actually simpler to implement) for comparison PRR-CRB (which is actually simpler to implement) for comparison
studies. studies.
One final comment about terminology: we expect that common usage will One final comment about terminology: we expect that common usage will
drop "slow start reduction bound" from the algorithm name. This drop "slow start reduction bound" from the algorithm name. This
document needed to be pedantic about having distinct names for document needed to be pedantic about having distinct names for
proportional rate reduction and every variant of the reduction bound. proportional rate reduction and every variant of the reduction bound.
However, once paired they become one. However, we do not anticipate any future exploration of the
alternative reduction bounds.
7. Acknowledgements 7. Acknowledgements
This draft is based in part on previous incomplete work by Matt This draft is based in part on previous incomplete work by Matt
Mathis, Jeff Semke and Jamshid Mahdavi [RHID] and influenced by Mathis, Jeff Semke and Jamshid Mahdavi [RHID] and influenced by
several discussion with John Heffner. several discussion with John Heffner.
Monia Ghobadi and Sivasankar Radhakrishnan helped analyze the Monia Ghobadi and Sivasankar Radhakrishnan helped analyze the
experiments. experiments.
Ilpo Jarvinen reviewed the code. Ilpo Jarvinen reviewed the code.
Mark Allman improved the document through his insightful review.
8. Security Considerations 8. Security Considerations
Proportional Rate Reduction does not change the risk profile for TCP. Proportional Rate Reduction does not change the risk profile for TCP.
Implementers that change PRR from counting bytes to segments have to Implementers that change PRR from counting bytes to segments have to
be cautious about the effects of ACK splitting attacks [Savage99], be cautious about the effects of ACK splitting attacks [Savage99],
where the receiver acknowledges partial segments for the purpose of where the receiver acknowledges partial segments for the purpose of
confusing the sender's congestion accounting. confusing the sender's congestion accounting.
9. IANA Considerations 9. IANA Considerations
skipping to change at page 13, line 41 skipping to change at page 14, line 5
Note to RFC Editor: this section may be removed on publication as an Note to RFC Editor: this section may be removed on publication as an
RFC. RFC.
10. References 10. References
10.1. Normative References 10.1. Normative References
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996. Selective Acknowledgment Options", RFC 2018, October 1996.
[RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A
Conservative Selective Acknowledgment (SACK)-based Loss
Recovery Algorithm for TCP", RFC 3517, April 2003.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009. Control", RFC 5681, September 2009.
[RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
and Y. Nishida, "A Conservative Loss Recovery Algorithm
Based on Selective Acknowledgment (SACK) for TCP",
RFC 6675, August 2012.
10.2. Informative References 10.2. Informative References
[RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
TCP's Loss Recovery Using Limited Transmit", RFC 3042, TCP's Loss Recovery Using Limited Transmit", RFC 3042,
January 2001. January 2001.
[RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A
Conservative Selective Acknowledgment (SACK)-based Loss
Recovery Algorithm for TCP", RFC 3517, April 2003.
[IMC11] Dukkipati, N., Mathis, M., and Y. Cheng, "Proportional [IMC11] Dukkipati, N., Mathis, M., and Y. Cheng, "Proportional
Rate Reduction for TCP", ACM Internet Measurement Rate Reduction for TCP", ACM Internet Measurement
Conference IMC11, December 2011. Conference IMC11, December 2011.
[FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgment:
Refining TCP Congestion Control", ACM SIGCOMM SIGCOMM96, Refining TCP Congestion Control", ACM SIGCOMM SIGCOMM96,
August 1996. August 1996.
[RHID] Mathis, M., Semke, J., Mahdavi, J., and K. Lahey, "The [RHID] Mathis, M., Semke, J., Mahdavi, J., and K. Lahey, "The
Rate-Halving Algorithm for TCP Congestion Control", Rate-Halving Algorithm for TCP Congestion Control",
draft-mathis-tcp-ratehalving (work in progress), draft-mathis-tcp-ratehalving (work in progress),
June 1999. June 1999.
[RHweb] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding [RHweb] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding
Parameters", Web publication , December 1997. Parameters", Web publication , December 1997.
[CUBIC] Rhee, I. and L. Xu, "CUBIC: A new TCP-friendly high-speed [CUBIC] Rhee, I. and L. Xu, "CUBIC: A new TCP-friendly high-speed
TCP variant", PFLDnet 2005, Feb 2005. TCP variant", PFLDnet 2005, Feb 2005.
[Jacobson88]
Jacobson, V., "Congestion Avoidance and Control", SIGCOMM
Comput. Commun. Rev. 18(4), Aug 1988.
[Savage99] [Savage99]
Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
"TCP congestion control with a misbehaving receiver", "TCP congestion control with a misbehaving receiver",
SIGCOMM Comput. Commun. Rev. 29(5), October 1999. SIGCOMM Comput. Commun. Rev. 29(5), October 1999.
Appendix A. Packet Conservation Bound Appendix A. Strong Packet Conservation Bound
PRR-CRB meets a conservative, philosophically pure and aesthetically PRR-CRB is based on a conservative, philosophically pure and
appealing notion of correct, described here. However, in real aesthetically appealing Strong Packet Conservation Bound, described
networks it does not perform as well as the algorithms described in here. Although inspired by Van Jacobson's packet conservation
RFC 3517, which proves to be non-conservative in a significant number principle [Jacobson88], it differs in how it treats segments that are
of cases. missing and presumed lost. Under all conditions and sequences of
events during recovery, PRR-CRB strictly bounds the data transmitted
to be equal to or less than the amount of data delivered to the
receiver. Note that the effects of presumed losses are included in
the pipe calculation, but do not affect the outcome of PRR-CRB, once
pipe has fallen below ssthresh.
Under all conditions and sequences of events during recovery, PRR-CRB We claim that this Strong Packet Conservation Bound is the most
strictly bounds the data transmitted to be equal to or less than the aggressive algorithm that does not lead to additional forced losses
amount of data delivered to the receiver. We claim that this packet in some environments. It has the property that if there is a
conservation bound is the most aggressive algorithm that does not standing queue at a bottleneck that is carrying no other traffic, the
lead to additional forced losses in some environments. It has the queue will maintain exactly constant length for the entire duration
property that if there is a standing queue at a bottleneck that is of the recovery, except for +1/-1 fluctuation due to differences in
carrying no other traffic, the queue will maintain exactly constant packet arrival and exit times. Any less aggressive algorithm will
length for the entire duration of the recovery, except for +1/-1 result in a declining queue at the bottleneck. Any more aggressive
fluctuation due to differences in packet arrival and exit times. Any algorithm will result in an increasing queue or additional losses if
less aggressive algorithm will result in a declining queue at the it is a full drop tail queue.
bottleneck. Any more aggressive algorithm will result in an
increasing queue or additional losses if it is a full drop tail
queue.
We demonstrate this property with a little thought experiment: We demonstrate this property with a little thought experiment:
Imagine a network path that has insignificant delays in both Imagine a network path that has insignificant delays in both
directions, except for the processing time and queue at a single directions, except for the processing time and queue at a single
bottleneck in the forward path. By insignificant delay, we mean when bottleneck in the forward path. By insignificant delay, we mean when
a packet is "served" at the head of the bottleneck queue, the a packet is "served" at the head of the bottleneck queue, the
following events happen in much less than one bottleneck packet time: following events happen in much less than one bottleneck packet time:
the packet arrives at the receiver; the receiver sends an ACK; which the packet arrives at the receiver; the receiver sends an ACK; which
arrives at the sender; the sender processes the ACK and sends some arrives at the sender; the sender processes the ACK and sends some
data; the data is queued at the bottleneck. data; the data is queued at the bottleneck.
If sndcnt is set to DeliveredData and nothing else is inhibiting If sndcnt is set to DeliveredData and nothing else is inhibiting
sending data, then clearly the data arriving at the bottleneck queue sending data, then clearly the data arriving at the bottleneck queue
will exactly replace the data that was served at the head of the will exactly replace the data that was served at the head of the
queue, so the queue will have a constant length. If queue is drop queue, so the queue will have a constant length. If queue is drop
tail and full then the queue will stay exactly full. Losses or tail and full then the queue will stay exactly full. Losses or
reordering on the ACK path only cause wider fluctuations in the queue reordering on the ACK path only cause wider fluctuations in the queue
size, but do not raise the peak size, independent of whether the data size, but do not raise its peak size, independent of whether the data
is in order or out-of-order (including loss recovery from an earlier is in order or out-of-order (including loss recovery from an earlier
RTT). Any more aggressive algorithm which sends additional data will RTT). Any more aggressive algorithm which sends additional data will
overflow the drop tail queue and cause loss. Any less aggressive overflow the drop tail queue and cause loss. Any less aggressive
algorithm will under fill the queue. Therefore setting sndcnt to algorithm will under fill the queue. Therefore setting sndcnt to
DeliveredData is the most aggressive algorithm that does not cause DeliveredData is the most aggressive algorithm that does not cause
forced losses in this simple network. Relaxing the assumptions (e.g. forced losses in this simple network. Relaxing the assumptions (e.g.
making delays more authentic and adding more flows, delayed ACKs, making delays more authentic and adding more flows, delayed ACKs,
etc) may increases the fine grained fluctuations in queue size but etc) are likely to increases the fine grained fluctuations in queue
does not change its basic behavior. size but do not change its basic behavior.
Note that the congestion control algorithm implements a broader Note that the congestion control algorithm implements a broader
notion of optimal that includes appropriately sharing the network. notion of optimal that includes appropriately sharing the network.
Typical congestion control algorithms are likely to reduce the data Typical congestion control algorithms are likely to reduce the data
sent relative to the packet conserving bound implemented by PRR sent relative to the packet conserving bound implemented by PRR
bringing TCP's actual window down to ssthresh. bringing TCP's actual window down to ssthresh.
Authors' Addresses Authors' Addresses
Matt Mathis Matt Mathis
 End of changes. 40 change blocks. 
143 lines changed or deleted 184 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/