draft-ietf-tcpm-early-rexmt-00.txt   draft-ietf-tcpm-early-rexmt-01.txt 
Internet Engineering Task Force Mark Allman Internet Engineering Task Force Mark Allman
INTERNET DRAFT ICSI INTERNET DRAFT ICSI
File: draft-ietf-tcpm-early-rexmt-00.txt Konstantin Avrachenkov File: draft-ietf-tcpm-early-rexmt-01.txt Konstantin Avrachenkov
INRIA INRIA
Urtzi Ayesta Urtzi Ayesta
LAAS-CNRS LAAS-CNRS
Josh Blanton Josh Blanton
Ohio University Ohio University
Per Hurtig Per Hurtig
Karlstad University Karlstad University
August 2008 January 2009
Expires: February 2009
Early Retransmit for TCP and SCTP Early Retransmit for TCP and SCTP
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any This Internet-Draft is submitted to IETF in full conformance with
applicable patent or other IPR claims of which he or she is aware the provisions of BCP 78 and BCP 79.
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as other groups may also distribute working documents as Internet-
Internet-Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on July 13, 2009.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document.
Abstract Abstract
This document proposes a new mechanism for TCP and SCTP that can be This document proposes a new mechanism for TCP and SCTP that can be
used to recover lost segments when a connection's congestion window used to recover lost segments when a connection's congestion window
is small. The "Early Retransmit" mechanism allows the transport to is small. The "Early Retransmit" mechanism allows the transport to
reduce, in certain special circumstances, the number of duplicate reduce, in certain special circumstances, the number of duplicate
acknowledgments required to trigger a fast retransmission. This acknowledgments required to trigger a fast retransmission. This
allows the transport to use fast retransmit to recover packet losses allows the transport to use fast retransmit to recover packet losses
that would otherwise require a lengthy retransmission timeout. that would otherwise require a lengthy retransmission timeout.
skipping to change at page 3, line 15 skipping to change at page 3, line 23
lost, the minimum RTO is conservatively chosen to be 1 second. lost, the minimum RTO is conservatively chosen to be 1 second.
Therefore, it behooves TCP senders to detect and recover from as Therefore, it behooves TCP senders to detect and recover from as
many losses as possible without incurring a lengthy timeout during many losses as possible without incurring a lengthy timeout during
which the connection remains idle. However, if not enough duplicate which the connection remains idle. However, if not enough duplicate
ACKs arrive from the receiver, the Fast Retransmit algorithm is ACKs arrive from the receiver, the Fast Retransmit algorithm is
never triggered---this situation occurs when the congestion window never triggered---this situation occurs when the congestion window
is small, if a large number of segments in a window are lost or at is small, if a large number of segments in a window are lost or at
the end of a transfer as data drains from the network. For the end of a transfer as data drains from the network. For
instance, consider a congestion window (cwnd) of three segments. If instance, consider a congestion window (cwnd) of three segments. If
one segment is dropped by the network, then at most two duplicate one segment is dropped by the network, then at most two duplicate
ACKs will arrive at the sender, assuming no ACK loss. Since three ACKs will arrive at the sender. Since three duplicate ACKs are
duplicate ACKs are required to trigger Fast Retransmit, a timeout required to trigger Fast Retransmit, a timeout will be required to
will be required to resend the dropped packet. resend the dropped packet.
[BPS+98] shows that roughly 56% of retransmissions sent by a busy [BPS+98] shows that roughly 56% of retransmissions sent by a busy
web server are sent after the RTO timer expires, while only 44% are web server are sent after the RTO timer expires, while only 44% are
handled by Fast Retransmit. In addition, only 4% of the RTO handled by Fast Retransmit. In addition, only 4% of the RTO
timer-based retransmissions could have been avoided with SACK, which timer-based retransmissions could have been avoided with SACK, which
has to continue to disambiguate reordering from genuine loss. has to continue to disambiguate reordering from genuine loss.
Furthermore, [All00] shows that for one particular web server the Furthermore, [All00] shows that for one particular web server the
median transfer size is less than four segments, indicating that median transfer size is less than four segments, indicating that
more than half of the connections will be forced to rely on the RTO more than half of the connections will be forced to rely on the RTO
timer to recover from any losses that occur. Thus, loss recovery timer to recover from any losses that occur. Thus, loss recovery
skipping to change at page 4, line 15 skipping to change at page 4, line 22
segments, but rather the number of outstanding bytes or messages. segments, but rather the number of outstanding bytes or messages.
Therefore, applying the intuitive notion of a transport with less Therefore, applying the intuitive notion of a transport with less
than four segments outstanding is more complicated than it first than four segments outstanding is more complicated than it first
appears. In section 2.1 we describe a "byte-based" variant of Early appears. In section 2.1 we describe a "byte-based" variant of Early
Retransmit that attempts to roughly map the number of outstanding Retransmit that attempts to roughly map the number of outstanding
bytes to a number of outstanding packets that is then used when bytes to a number of outstanding packets that is then used when
deciding whether to trigger Early Retransmit. In section 2.2 we deciding whether to trigger Early Retransmit. In section 2.2 we
describe a "packet-based" variant that represents a more precise describe a "packet-based" variant that represents a more precise
algorithm for triggering Early Retransmit. The precision comes at algorithm for triggering Early Retransmit. The precision comes at
the cost of requiring additional state to be kept by the TCP sender. the cost of requiring additional state to be kept by the TCP sender.
In both cases we described SACK-based and non-SACK-based versions of In both cases we describe SACK-based and non-SACK-based versions of
the scheme (of course, the non-SACK version will not apply to SCTP). the scheme (of course, the non-SACK version will not apply to SCTP).
2.1 Byte-based Early Retransmit 2.1 Byte-based Early Retransmit
A TCP or SCTP sender MAY use byte-based Early Retransmit. A TCP or SCTP sender MAY use byte-based Early Retransmit.
A sender employing byte-based Early Retransmit MUST use the A sender employing byte-based Early Retransmit MUST use the
following two conditions to determine when an Early Retransmit is following two conditions to determine when an Early Retransmit is
sent: sent:
(2.a) The amount of outstanding data (ownd)---data sent but not yet (2.a) The amount of outstanding data (ownd)---data sent but not yet
acknowledged---is less than 4*SMSS bytes. acknowledged---is less than 4*SMSS bytes.
(Note that in the byte-based variant of Early Retransmit
'ownd' is equivalent to 'FlightSize' defined in [RFC2581]. We
use different notation because 'ownd' is not consistent with
FlightSize through this document.)
(2.b) There is either no unsent data ready for transmission at the (2.b) There is either no unsent data ready for transmission at the
sender or the advertised window does not permit new segments sender or the advertised window does not permit new segments
to be transmitted. to be transmitted.
When the above two conditions hold and the connection does not When the above two conditions hold and the connection does not
support SACK the duplicate ACK threshold used to trigger a support SACK the duplicate ACK threshold used to trigger a
retransmission MUST be reduced to: retransmission MUST be reduced to:
ER_thresh = ceiling (ownd/SMSS) - 1 (1) ER_thresh = ceiling (ownd/SMSS) - 1 (1)
skipping to change at page 6, line 12 skipping to change at page 6, line 25
for instance. Cumulative ACKs that do not fall within this region for instance. Cumulative ACKs that do not fall within this region
indicate that at least four segments are outstanding and therefore indicate that at least four segments are outstanding and therefore
Early Retransmit MUST NOT be used. When the outstanding window Early Retransmit MUST NOT be used. When the outstanding window
becomes small enough that Early Retransmit can be invoked, a full becomes small enough that Early Retransmit can be invoked, a full
understanding of the number of outstanding packets will be understanding of the number of outstanding packets will be
available from the four sequence numbers retained. available from the four sequence numbers retained.
3 Discussion 3 Discussion
The SACK variant of the Early Retransmit algorithm is preferred to The SACK variant of the Early Retransmit algorithm is preferred to
the non-SACK variant due to its robustness in the face of ACK loss the non-SACK variant in TCP due to its robustness in the face of ACK
(since SACKs are sent redundantly) and due to interactions with the loss (since SACKs are sent redundantly) and due to interactions with
delayed ACK timer. Consider a flight of three segments, S1...S3, the delayed ACK timer (SCTP does not have a non-SACK mode and
with S2 being dropped by the network. When S1 arrives it is therefore naturally supports SACK-based Early Retransmit). Consider
in-order and so the receiver may or may not delay the ACK, leading a flight of three segments, S1...S3, with S2 being dropped by the
to two scenarios: network. When S1 arrives it is in-order and so the receiver may or
may not delay the ACK, leading to two scenarios:
(A) The ACK for S1 is delayed: In this case the arrival of S3 will (A) The ACK for S1 is delayed: In this case the arrival of S3 will
trigger an ACK to be transmitted covering segment S1 (which was trigger an ACK to be transmitted covering segment S1 (which was
previously unacknowledged). In this case Early Retransmit previously unacknowledged). In this case Early Retransmit
without SACK will not prevent an RTO because no duplicate ACKs without SACK will not prevent an RTO because no duplicate ACKs
will arrive. However, with SACK the ACK for S1 will also will arrive. However, with SACK the ACK for S1 will also
include SACK information indicating that S3 has arrived at the include SACK information indicating that S3 has arrived at the
receiver. The sender can then invoke Early Retransmit on this receiver. The sender can then invoke Early Retransmit on this
ACK because only one packet remains outstanding. ACK because only one packet remains outstanding.
skipping to change at page 7, line 43 skipping to change at page 7, line 58
still occur, ECN may allow a transport to perform better with small still occur, ECN may allow a transport to perform better with small
cwnd sizes because the sender will be required to detect less cwnd sizes because the sender will be required to detect less
segment loss [RFC2884]. segment loss [RFC2884].
[Bal98] outlines another solution to the problem of having no new [Bal98] outlines another solution to the problem of having no new
segments to transmit into the network when the first two duplicate segments to transmit into the network when the first two duplicate
ACKs arrive. In response to these duplicate ACKs, a TCP sender ACKs arrive. In response to these duplicate ACKs, a TCP sender
transmits zero-byte segments to induce additional duplicate ACKs. transmits zero-byte segments to induce additional duplicate ACKs.
This method preserves the robustness of the standard Fast Retransmit This method preserves the robustness of the standard Fast Retransmit
algorithm at the cost of injecting segments into the network that do algorithm at the cost of injecting segments into the network that do
not deliver any data (and, therefore are potentially wasting network not deliver any data, and therefore are potentially wasting network
resources). resources (at a time when there is a reasonable chance that the
resources are scarce).
5 Security Considerations 5 Security Considerations
The security considerations found in [RFC2581] apply to this The security considerations found in [RFC2581] apply to this
document. No additional security problems have been identified with document. No additional security problems have been identified with
Early Retransmit at this time. Early Retransmit at this time.
Acknowledgments Acknowledgments
We thank Sally Floyd for her feedback in discussions about Early We thank Sally Floyd for her feedback in discussions about Early
skipping to change at page 11, line 28 skipping to change at line 579
being transmitted. being transmitted.
MITIGATION A.3: Allow a connection to trigger Early Retransmit using MITIGATION A.3: Allow a connection to trigger Early Retransmit using
the criteria given in section 2, in addition to a "small" the criteria given in section 2, in addition to a "small"
timeout [Pax97]. For instance, a sender may have to wait for 2 timeout [Pax97]. For instance, a sender may have to wait for 2
duplicate ACKs and then T msec before Early Retransmit is duplicate ACKs and then T msec before Early Retransmit is
invoked. The added time gives reordered acknowledgments time to invoked. The added time gives reordered acknowledgments time to
arrive at the sender and avoid a needless retransmit. Designing arrive at the sender and avoid a needless retransmit. Designing
a method for choosing an appropriate timeout is part of the a method for choosing an appropriate timeout is part of the
research that would need to be involved in this scheme. research that would need to be involved in this scheme.
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to pertain to the implementation or use of the technology described
in this document or the extent to which any license under such
rights might or might not be available; nor does it represent that
it has made any independent effort to identify any such rights.
Information on the procedures with respect to rights in RFC
documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use
of such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository
at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on
an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The IETF Trust (2008). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 12 change blocks. 
23 lines changed or deleted 36 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/