--- 1/draft-ietf-tcpm-early-rexmt-00.txt 2009-01-14 20:12:08.000000000 +0100 +++ 2/draft-ietf-tcpm-early-rexmt-01.txt 2009-01-14 20:12:08.000000000 +0100 @@ -1,52 +1,58 @@ Internet Engineering Task Force Mark Allman INTERNET DRAFT ICSI -File: draft-ietf-tcpm-early-rexmt-00.txt Konstantin Avrachenkov +File: draft-ietf-tcpm-early-rexmt-01.txt Konstantin Avrachenkov INRIA Urtzi Ayesta LAAS-CNRS Josh Blanton Ohio University Per Hurtig Karlstad University - August 2008 - Expires: February 2009 - + January 2009 Early Retransmit for TCP and SCTP Status of this Memo - By submitting this Internet-Draft, each author represents that any - applicable patent or other IPR claims of which he or she is aware - have been or will be disclosed, and any of which he or she becomes - aware will be disclosed, in accordance with Section 6 of BCP 79. + This Internet-Draft is submitted to IETF in full conformance with + the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that - other groups may also distribute working documents as - Internet-Drafts. + other groups may also distribute working documents as Internet- + Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. + This Internet-Draft will expire on July 13, 2009. + Copyright Notice - Copyright (C) The IETF Trust (2008). + Copyright (c) 2009 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with + respect to this document. Abstract This document proposes a new mechanism for TCP and SCTP that can be used to recover lost segments when a connection's congestion window is small. The "Early Retransmit" mechanism allows the transport to reduce, in certain special circumstances, the number of duplicate acknowledgments required to trigger a fast retransmission. This allows the transport to use fast retransmit to recover packet losses that would otherwise require a lengthy retransmission timeout. @@ -110,23 +117,23 @@ lost, the minimum RTO is conservatively chosen to be 1 second. Therefore, it behooves TCP senders to detect and recover from as many losses as possible without incurring a lengthy timeout during which the connection remains idle. However, if not enough duplicate ACKs arrive from the receiver, the Fast Retransmit algorithm is never triggered---this situation occurs when the congestion window is small, if a large number of segments in a window are lost or at the end of a transfer as data drains from the network. For instance, consider a congestion window (cwnd) of three segments. If one segment is dropped by the network, then at most two duplicate - ACKs will arrive at the sender, assuming no ACK loss. Since three - duplicate ACKs are required to trigger Fast Retransmit, a timeout - will be required to resend the dropped packet. + ACKs will arrive at the sender. Since three duplicate ACKs are + required to trigger Fast Retransmit, a timeout will be required to + resend the dropped packet. [BPS+98] shows that roughly 56% of retransmissions sent by a busy web server are sent after the RTO timer expires, while only 44% are handled by Fast Retransmit. In addition, only 4% of the RTO timer-based retransmissions could have been avoided with SACK, which has to continue to disambiguate reordering from genuine loss. Furthermore, [All00] shows that for one particular web server the median transfer size is less than four segments, indicating that more than half of the connections will be forced to rely on the RTO timer to recover from any losses that occur. Thus, loss recovery @@ -164,34 +170,39 @@ segments, but rather the number of outstanding bytes or messages. Therefore, applying the intuitive notion of a transport with less than four segments outstanding is more complicated than it first appears. In section 2.1 we describe a "byte-based" variant of Early Retransmit that attempts to roughly map the number of outstanding bytes to a number of outstanding packets that is then used when deciding whether to trigger Early Retransmit. In section 2.2 we describe a "packet-based" variant that represents a more precise algorithm for triggering Early Retransmit. The precision comes at the cost of requiring additional state to be kept by the TCP sender. - In both cases we described SACK-based and non-SACK-based versions of + In both cases we describe SACK-based and non-SACK-based versions of the scheme (of course, the non-SACK version will not apply to SCTP). 2.1 Byte-based Early Retransmit A TCP or SCTP sender MAY use byte-based Early Retransmit. A sender employing byte-based Early Retransmit MUST use the following two conditions to determine when an Early Retransmit is sent: (2.a) The amount of outstanding data (ownd)---data sent but not yet acknowledged---is less than 4*SMSS bytes. + (Note that in the byte-based variant of Early Retransmit + 'ownd' is equivalent to 'FlightSize' defined in [RFC2581]. We + use different notation because 'ownd' is not consistent with + FlightSize through this document.) + (2.b) There is either no unsent data ready for transmission at the sender or the advertised window does not permit new segments to be transmitted. When the above two conditions hold and the connection does not support SACK the duplicate ACK threshold used to trigger a retransmission MUST be reduced to: ER_thresh = ceiling (ownd/SMSS) - 1 (1) @@ -270,26 +280,27 @@ for instance. Cumulative ACKs that do not fall within this region indicate that at least four segments are outstanding and therefore Early Retransmit MUST NOT be used. When the outstanding window becomes small enough that Early Retransmit can be invoked, a full understanding of the number of outstanding packets will be available from the four sequence numbers retained. 3 Discussion The SACK variant of the Early Retransmit algorithm is preferred to - the non-SACK variant due to its robustness in the face of ACK loss - (since SACKs are sent redundantly) and due to interactions with the - delayed ACK timer. Consider a flight of three segments, S1...S3, - with S2 being dropped by the network. When S1 arrives it is - in-order and so the receiver may or may not delay the ACK, leading - to two scenarios: + the non-SACK variant in TCP due to its robustness in the face of ACK + loss (since SACKs are sent redundantly) and due to interactions with + the delayed ACK timer (SCTP does not have a non-SACK mode and + therefore naturally supports SACK-based Early Retransmit). Consider + a flight of three segments, S1...S3, with S2 being dropped by the + network. When S1 arrives it is in-order and so the receiver may or + may not delay the ACK, leading to two scenarios: (A) The ACK for S1 is delayed: In this case the arrival of S3 will trigger an ACK to be transmitted covering segment S1 (which was previously unacknowledged). In this case Early Retransmit without SACK will not prevent an RTO because no duplicate ACKs will arrive. However, with SACK the ACK for S1 will also include SACK information indicating that S3 has arrived at the receiver. The sender can then invoke Early Retransmit on this ACK because only one packet remains outstanding. @@ -355,22 +366,23 @@ still occur, ECN may allow a transport to perform better with small cwnd sizes because the sender will be required to detect less segment loss [RFC2884]. [Bal98] outlines another solution to the problem of having no new segments to transmit into the network when the first two duplicate ACKs arrive. In response to these duplicate ACKs, a TCP sender transmits zero-byte segments to induce additional duplicate ACKs. This method preserves the robustness of the standard Fast Retransmit algorithm at the cost of injecting segments into the network that do - not deliver any data (and, therefore are potentially wasting network - resources). + not deliver any data, and therefore are potentially wasting network + resources (at a time when there is a reasonable chance that the + resources are scarce). 5 Security Considerations The security considerations found in [RFC2581] apply to this document. No additional security problems have been identified with Early Retransmit at this time. Acknowledgments We thank Sally Floyd for her feedback in discussions about Early @@ -556,56 +569,10 @@ being transmitted. MITIGATION A.3: Allow a connection to trigger Early Retransmit using the criteria given in section 2, in addition to a "small" timeout [Pax97]. For instance, a sender may have to wait for 2 duplicate ACKs and then T msec before Early Retransmit is invoked. The added time gives reordered acknowledgments time to arrive at the sender and avoid a needless retransmit. Designing a method for choosing an appropriate timeout is part of the research that would need to be involved in this scheme. - -Intellectual Property Statement - - The IETF takes no position regarding the validity or scope of any - Intellectual Property Rights or other rights that might be claimed - to pertain to the implementation or use of the technology described - in this document or the extent to which any license under such - rights might or might not be available; nor does it represent that - it has made any independent effort to identify any such rights. - Information on the procedures with respect to rights in RFC - documents can be found in BCP 78 and BCP 79. - - Copies of IPR disclosures made to the IETF Secretariat and any - assurances of licenses to be made available, or the result of an - attempt made to obtain a general license or permission for the use - of such proprietary rights by implementers or users of this - specification can be obtained from the IETF on-line IPR repository - at http://www.ietf.org/ipr. - - The IETF invites any interested party to bring to its attention any - copyrights, patents or patent applications, or other proprietary - rights that may cover technology that may be required to implement - this standard. Please address the information to the IETF at - ietf-ipr@ietf.org. - -Disclaimer of Validity - - This document and the information contained herein are provided on - an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE - REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE - IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL - WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY - WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE - ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS - FOR A PARTICULAR PURPOSE. - -Copyright Statement - - Copyright (C) The IETF Trust (2008). This document is subject - to the rights, licenses and restrictions contained in BCP 78, and - except as set forth therein, the authors retain all their rights. - -Acknowledgment - - Funding for the RFC Editor function is currently provided by the - Internet Society.