draft-ietf-tcpm-early-rexmt-02.txt   draft-ietf-tcpm-early-rexmt-03.txt 
Internet Engineering Task Force Mark Allman Internet Engineering Task Force Mark Allman
INTERNET DRAFT ICSI INTERNET DRAFT ICSI
File: draft-ietf-tcpm-early-rexmt-02.txt Konstantin Avrachenkov File: draft-ietf-tcpm-early-rexmt-03.txt Konstantin Avrachenkov
Intended Status: Experimental INRIA Intended Status: Experimental INRIA
Urtzi Ayesta Urtzi Ayesta
LAAS-CNRS LAAS-CNRS
Josh Blanton Josh Blanton
Ohio University Ohio University
Per Hurtig Per Hurtig
Karlstad University Karlstad University
October 2009 November 2009
Expires: April 2010 Expires: May 2010
Early Retransmit for TCP and SCTP Early Retransmit for TCP and SCTP
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79. the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 39 skipping to change at page 1, line 39
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 27, 2010. This Internet-Draft will expire on May 18, 2010.
Copyright Notice Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 32 skipping to change at page 2, line 32
1 Introduction 1 Introduction
Many researchers have studied problems with TCP [RFC793,RFC5681] Many researchers have studied problems with TCP [RFC793,RFC5681]
when the congestion window is small and have outlined possible when the congestion window is small and have outlined possible
mechanisms to mitigate these problems mechanisms to mitigate these problems
[Mor97,BPS+98,Bal98,LK98,RFC3150,AA02]. SCTP's [RFC4960] loss [Mor97,BPS+98,Bal98,LK98,RFC3150,AA02]. SCTP's [RFC4960] loss
recovery and congestion control mechanisms are based on TCP and recovery and congestion control mechanisms are based on TCP and
therefore the same problems impact the performance of SCTP therefore the same problems impact the performance of SCTP
connections. When the transport detects a missing segment, the connections. When the transport detects a missing segment, the
connection enters a loss recovery phase. There are several variants connection enters a loss recovery phase. There are several variants
of the loss recovery phase depending on the TCP implemention. TCP of the loss recovery phase depending on the TCP implementation. TCP
can use slow start based recovery or Fast Recovery [RFC5681], can use slow start based recovery or Fast Recovery [RFC5681],
NewReno [RFC3782], and loss recovery based on selective NewReno [RFC3782], and loss recovery based on selective
acknowledgments (SACKs) [RFC2018,FF96,RFC3517]. SCTP's loss acknowledgments (SACKs) [RFC2018,FF96,RFC3517]. SCTP's loss
recovery is not as varied due to the built-in selective recovery is not as varied due to the built-in selective
acknowledgments. acknowledgments.
All the above variants have two methods for invoking loss recovery. All the above variants have two methods for invoking loss recovery.
First, if an acknowledgment (ACK) for a given segment is not First, if an acknowledgment (ACK) for a given segment is not
received in a certain amount of time a retransmission timer fires received in a certain amount of time a retransmission timer fires
and the segment is resent [RFC2988,RFC4960]. Second, the "Fast and the segment is resent [RFC2988,RFC4960]. Second, the "Fast
skipping to change at page 4, line 46 skipping to change at page 4, line 46
state to be kept by the TCP sender. In both cases we describe state to be kept by the TCP sender. In both cases we describe
SACK-based and non-SACK-based versions of the scheme (of course, the SACK-based and non-SACK-based versions of the scheme (of course, the
non-SACK version will not apply to SCTP). This document explicitly non-SACK version will not apply to SCTP). This document explicitly
does not prefer one variant over the other, but leaves the choice to does not prefer one variant over the other, but leaves the choice to
the implementer. the implementer.
2.1 Byte-based Early Retransmit 2.1 Byte-based Early Retransmit
A TCP or SCTP sender MAY use byte-based Early Retransmit. A TCP or SCTP sender MAY use byte-based Early Retransmit.
A sender employing byte-based Early Retransmit MUST use the Upon the arrival of an ACK, a sender employing byte-based Early
following two conditions to determine when an Early Retransmit is Retransmit MUST use the following two conditions to determine when
sent: an Early Retransmit is sent:
(2.a) The amount of outstanding data (ownd)---data sent but not yet (2.a) The amount of outstanding data (ownd)---data sent but not yet
acknowledged---is less than 4*SMSS bytes. acknowledged---is less than 4*SMSS bytes.
Note that in the byte-based variant of Early Retransmit Note that in the byte-based variant of Early Retransmit
'ownd' is equivalent to 'FlightSize' defined in [RFC5681]. We 'ownd' is equivalent to 'FlightSize' defined in [RFC5681]. We
use different notation because 'ownd' is not consistent with use different notation because 'ownd' is not consistent with
FlightSize through this document. FlightSize through this document.
Also note that in SCTP messages will have to be converted to Also note that in SCTP messages will have to be converted to
skipping to change at page 5, line 19 skipping to change at page 5, line 19
sender or the advertised receive window does not permit new sender or the advertised receive window does not permit new
segments to be transmitted. segments to be transmitted.
When the above two conditions hold and a TCP connection does not When the above two conditions hold and a TCP connection does not
support SACK the duplicate ACK threshold used to trigger a support SACK the duplicate ACK threshold used to trigger a
retransmission MUST be reduced to: retransmission MUST be reduced to:
ER_thresh = ceiling (ownd/SMSS) - 1 (1) ER_thresh = ceiling (ownd/SMSS) - 1 (1)
duplicate ACKs, where ownd is in terms of bytes. We call this duplicate ACKs, where ownd is in terms of bytes. We call this
reduced ACK threshold enabling "Early Retransimission". reduced ACK threshold enabling "Early Retransmission".
When conditions (2.a) and (2.b) hold and a TCP connection does When conditions (2.a) and (2.b) hold and a TCP connection does
support SACK or SCTP is in use, Early Retransmit MUST be used only support SACK or SCTP is in use, Early Retransmit MUST be used only
when "ownd - SMSS" bytes have been SACKed. when "ownd - SMSS" bytes have been SACKed.
When conditions (2.a) and (2.b) do not hold, the transport MUST NOT When conditions (2.a) and (2.b) do not hold, the transport MUST NOT
use Early Retransmit, but rather prefer the standard mechanisms, use Early Retransmit, but rather prefer the standard mechanisms,
including Fast Retransmit and Limited Transmit. including Fast Retransmit and Limited Transmit.
As noted above, the drawback of this byte-based variant is precision As noted above, the drawback of this byte-based variant is precision
skipping to change at page 5, line 53 skipping to change at page 5, line 53
In this case ER_thresh will be two, per equation (1). Thus, In this case ER_thresh will be two, per equation (1). Thus,
even though there are enough segments outstanding to trigger even though there are enough segments outstanding to trigger
Fast Retransmit with the standard duplicate ACK threshold Early Fast Retransmit with the standard duplicate ACK threshold Early
Retransmit will be triggered. This could cause or exacerbate Retransmit will be triggered. This could cause or exacerbate
performance problems caused by segment reordering in the network. performance problems caused by segment reordering in the network.
2.2 Segment-based Early Retransmit 2.2 Segment-based Early Retransmit
A TCP or SCTP sender MAY use segment-based Early Retransmit. A TCP or SCTP sender MAY use segment-based Early Retransmit.
A sender employing segment-based Early Retransmit MUST use the Upon the arrival of an ACK, a sender employing segment-based Early
following two conditions to determine when an Early Retransmit is Retransmit MUST use the following two conditions to determine when
sent: an Early Retransmit is sent:
(3.a) The number of outstanding segments (oseg)---segments sent but (3.a) The number of outstanding segments (oseg)---segments sent but
not yet acknowledged---is less than four. not yet acknowledged---is less than four.
(3.b) There is either no unsent data ready for transmission at the (3.b) There is either no unsent data ready for transmission at the
sender or the advertised receive window does not permit new sender or the advertised receive window does not permit new
segments to be transmitted. segments to be transmitted.
When the above two conditions hold and a TCP connection does not When the above two conditions hold and a TCP connection does not
support SACK the duplicate ACK threshold used to trigger a support SACK the duplicate ACK threshold used to trigger a
retransmission MUST be reduced to: retransmission MUST be reduced to:
ER_thresh = oseg - 1 (2) ER_thresh = oseg - 1 (2)
duplicate ACKs, where oseg represents the number of outstanding duplicate ACKs, where oseg represents the number of outstanding
segments. (We discuss tracking the number of outstanding segments segments. (We discuss tracking the number of outstanding segments
below.) We call this reduced ACK threshold enabling "Early below.) We call this reduced ACK threshold enabling "Early
Retransimission". Retransmission".
When conditions (3.a) and (3.b) hold and a TCP connection does When conditions (3.a) and (3.b) hold and a TCP connection does
support SACK or SCTP is in use, Early Retransmit MUST be used only support SACK or SCTP is in use, Early Retransmit MUST be used only
when "oseg - 1" segments have been SACKed. A segment is considered when "oseg - 1" segments have been SACKed. A segment is considered
to be SACKed when all its data bytes (TCP) or data chunks (SCTP) to be SACKed when all its data bytes (TCP) or data chunks (SCTP)
have been indicated as arrived by the receiver. have been indicated as arrived by the receiver.
When conditions (3.a) and (3.b) do not hold, the transport MUST NOT When conditions (3.a) and (3.b) do not hold, the transport MUST NOT
use Early Retransmit, but rather prefer the standard mechanisms, use Early Retransmit, but rather prefer the standard mechanisms,
including Fast Retransmit and Limited Transmit. including Fast Retransmit and Limited Transmit.
skipping to change at page 6, line 43 skipping to change at page 6, line 43
form an understanding as to how many actual segments have been form an understanding as to how many actual segments have been
transmitted, but not acknowledged. This can be done by the sender transmitted, but not acknowledged. This can be done by the sender
tracking the boundaries of the three segments on the right side of tracking the boundaries of the three segments on the right side of
the current window (which involves tracking four sequence numbers in the current window (which involves tracking four sequence numbers in
TCP). This could be done by keeping a circular list of the segment TCP). This could be done by keeping a circular list of the segment
boundaries, for instance. Cumulative ACKs that do not fall within boundaries, for instance. Cumulative ACKs that do not fall within
this region indicate that at least four segments are outstanding and this region indicate that at least four segments are outstanding and
therefore Early Retransmit MUST NOT be used. When the outstanding therefore Early Retransmit MUST NOT be used. When the outstanding
window becomes small enough that Early Retransmit can be invoked, a window becomes small enough that Early Retransmit can be invoked, a
full understanding of the number of outstanding segments will be full understanding of the number of outstanding segments will be
available from the four sequence numbers retained. available from the four sequence numbers retained. (Note: the
implicit sequence number consumed by the TCP FIN can also included
in the tracking of segment boundaries.)
3 Discussion 3 Discussion
In this section we discuss a number of issues surrounding the Early In this section we discuss a number of issues surrounding the Early
Retransmit algorithm. Retransmit algorithm.
3.1 SACK vs. non-SACK 3.1 SACK vs. non-SACK
The SACK variant of the Early Retransmit algorithm is preferred to The SACK variant of the Early Retransmit algorithm is preferred to
the non-SACK variant in TCP due to its robustness in the face of ACK the non-SACK variant in TCP due to its robustness in the face of ACK
skipping to change at page 11, line 16 skipping to change at page 11, line 17
Conservative Selective Acknowledgment (SACK)-based Loss Recovery Conservative Selective Acknowledgment (SACK)-based Loss Recovery
Algorithm for TCP. RFC 3517, April 2003. Algorithm for TCP. RFC 3517, April 2003.
[RFC3522] Reiner Ludwig, Michael Meyer. The Eifel Detection [RFC3522] Reiner Ludwig, Michael Meyer. The Eifel Detection
Algorithm for TCP. RFC 3522, April 2003. Algorithm for TCP. RFC 3522, April 2003.
[RFC3782] Sally Floyd, Tom Henderson, Andrei Gurtov. The NewReno [RFC3782] Sally Floyd, Tom Henderson, Andrei Gurtov. The NewReno
Modification to TCP's Fast Recovery Algorithm. RFC 3782, April Modification to TCP's Fast Recovery Algorithm. RFC 3782, April
2004. 2004.
[RFC4653] Sumitha Bhandarkar, A. L. Narasimha Reddy, Mark Allman,
Ethan Blanton. Improving the Robustness of TCP to
Non-Congestion Events, August 2006. RFC 4653.
Author's Addresses: Author's Addresses:
Mark Allman Mark Allman
International Computer Science Institute International Computer Science Institute
1947 Center Street, Suite 600 1947 Center Street, Suite 600
Berkeley, CA 94704-1198 Berkeley, CA 94704-1198
Phone: 440-235-1792 Phone: 440-235-1792
mallman@icir.org mallman@icir.org
http://www.icir.org/mallman/ http://www.icir.org/mallman/
 End of changes. 10 change blocks. 
14 lines changed or deleted 20 lines changed or added

This html diff was produced by rfcdiff 1.37a. The latest version is available from http://tools.ietf.org/tools/rfcdiff/