draft-ietf-tcpm-rfc4138bis-03.txt   draft-ietf-tcpm-rfc4138bis-04.txt 
Internet Engineering Task Force P. Sarolahti Internet Engineering Task Force P. Sarolahti
INTERNET-DRAFT Nokia Research Center INTERNET-DRAFT Nokia Research Center
draft-ietf-tcpm-rfc4138bis-03.txt M. Kojo draft-ietf-tcpm-rfc4138bis-04.txt M. Kojo
Intended status: Proposed Standard University of Helsinki Intended status: Proposed Standard University of Helsinki
Expires: March 2009 K. Yamamoto Updates: 4138 K. Yamamoto
M. Hata Expires: April 2009 M. Hata
NTT Docomo NTT Docomo
9 September 2008 30 October 2008
Forward RTO-Recovery (F-RTO): An Algorithm for Detecting Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
Spurious Retransmission Timeouts with TCP Spurious Retransmission Timeouts with TCP
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
skipping to change at page 1, line 38 skipping to change at page 1, line 38
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on March 2009. This Internet-Draft will expire on April 2009.
Abstract Abstract
The purpose of this document is to move the F-RTO functionality for
TCP in RFC 4138 from Experimental to Standards Track status. The F-
RTO support for SCTP in RFC 4138 remains with Experimental status.
See Appendix B for the differences between this document and RFC
4138.
Spurious retransmission timeouts cause suboptimal TCP performance Spurious retransmission timeouts cause suboptimal TCP performance
because they often result in unnecessary retransmission of the last because they often result in unnecessary retransmission of the last
window of data. This document describes the F-RTO detection window of data. This document describes the F-RTO detection
algorithm for detecting spurious TCP retransmission timeouts. F-RTO algorithm for detecting spurious TCP retransmission timeouts. F-RTO
is a TCP sender-only algorithm that does not require any TCP options is a TCP sender-only algorithm that does not require any TCP options
to operate. After retransmitting the first unacknowledged segment to operate. After retransmitting the first unacknowledged segment
triggered by a timeout, the F-RTO algorithm of the TCP sender triggered by a timeout, the F-RTO algorithm of the TCP sender
monitors the incoming acknowledgments to determine whether the monitors the incoming acknowledgments to determine whether the
timeout was spurious. It then decides whether to send new segments timeout was spurious. It then decides whether to send new segments
or retransmit unacknowledged segments. The algorithm effectively or retransmit unacknowledged segments. The algorithm effectively
skipping to change at page 3, line 13 skipping to change at page 3, line 13
improves TCP performance in the case of a spurious timeout. improves TCP performance in the case of a spurious timeout.
Table of Contents Table of Contents
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Conventions and Terminology. . . . . . . . . . . . . . . 5 1.1. Conventions and Terminology. . . . . . . . . . . . . . . 5
2. Basic F-RTO Algorithm . . . . . . . . . . . . . . . . . . . . 5 2. Basic F-RTO Algorithm . . . . . . . . . . . . . . . . . . . . 5
2.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . . 6 2.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . . 6
2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 8 2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 8
3. SACK-Enhanced Version of the F-RTO Algorithm. . . . . . . . . 10 3. SACK-Enhanced Version of the F-RTO Algorithm. . . . . . . . . 10
3.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . . 10
3.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 12
4. Taking Actions after Detecting Spurious RTO . . . . . . . . . 12 4. Taking Actions after Detecting Spurious RTO . . . . . . . . . 12
5. Evaluation of RFC 4138 and Differences to this 5. Evaluation of RFC 4138. . . . . . . . . . . . . . . . . . . . 13
Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
8. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 15 8. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 15
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
A. Discussion of Window-Limited Cases. . . . . . . . . . . . . . 15 A. Discussion of Window-Limited Cases. . . . . . . . . . . . . . 15
B. List of Changes . . . . . . . . . . . . . . . . . . . . . . . 16 B. Changes since RFC 4138. . . . . . . . . . . . . . . . . . . . 16
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Normative References . . . . . . . . . . . . . . . . . . . . . . 17 Normative References . . . . . . . . . . . . . . . . . . . . . . 17
Informative References . . . . . . . . . . . . . . . . . . . . . 17 Informative References . . . . . . . . . . . . . . . . . . . . . 17
AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . 19 AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . 19
Full Copyright Statement . . . . . . . . . . . . . . . . . . . . 21 Full Copyright Statement . . . . . . . . . . . . . . . . . . . . 21
Intellectual Property. . . . . . . . . . . . . . . . . . . . . . 21 Intellectual Property. . . . . . . . . . . . . . . . . . . . . . 21
1. Introduction 1. Introduction
The Transmission Control Protocol (TCP) [Pos81] has two methods for The Transmission Control Protocol (TCP) [Pos81] has two methods for
triggering retransmissions. First, the TCP sender relies on triggering retransmissions. First, the TCP sender relies on
incoming duplicate ACKs, which indicate that the receiver is missing incoming duplicate ACKs, which indicate that the receiver is missing
some of the data. After a required number of successive duplicate some of the data. After a required number of successive duplicate
ACKs have arrived at the sender, it retransmits the first ACKs have arrived at the sender, it retransmits the first
unacknowledged segment [APS99] and continues with a loss recovery unacknowledged segment [APS99] and continues with a loss recovery
algorithm such as NewReno [FHG04] or SACK-based loss recovery algorithm such as NewReno [FHG04] or SACK-based loss recovery
[BAFW03]. Second, the TCP sender maintains a retransmission timer [BAFW03]. Second, the TCP sender maintains a retransmission timer
which triggers retransmission of segments, if they have not been which triggers retransmission of segments, if they have not been
acknowledged before the retransmission timeout (RTO) expires. When acknowledged before the retransmission timeout (RTO) occurs. When
the retransmission timeout occurs, the TCP sender enters the RTO the retransmission timeout occurs, the TCP sender enters the RTO
recovery where the congestion window is initialized to one segment recovery where the congestion window is initialized to one segment
and unacknowledged segments are retransmitted using the slow-start and unacknowledged segments are retransmitted using the slow-start
algorithm. The retransmission timer is adjusted dynamically, based algorithm. The retransmission timer is adjusted dynamically, based
on the measured round-trip times [PA00]. on the measured round-trip times [PA00].
It has been pointed out that the retransmission timer can expire It has been pointed out that the retransmission timer can expire
spuriously and cause unnecessary retransmissions when no segments spuriously and cause unnecessary retransmissions when no segments
have been lost [LK00, GL02, LM03]. After a spurious retransmission have been lost [LK00, GL02, LM03]. After a spurious retransmission
timeout, the late acknowledgments of the original segments arrive at timeout, the late acknowledgments of the original segments arrive at
skipping to change at page 4, line 26 skipping to change at page 4, line 27
the current RTO value. Third, on a low-bandwidth link the arrival the current RTO value. Third, on a low-bandwidth link the arrival
of competing traffic (possibly with higher priority), or some other of competing traffic (possibly with higher priority), or some other
change in available bandwidth, can cause a sudden increase of the change in available bandwidth, can cause a sudden increase of the
round-trip time. This may trigger a spurious retransmission round-trip time. This may trigger a spurious retransmission
timeout. A persistently reliable link layer can also cause a sudden timeout. A persistently reliable link layer can also cause a sudden
delay when a data frame and several retransmissions of it are lost delay when a data frame and several retransmissions of it are lost
for some reason. This document does not distinguish between the for some reason. This document does not distinguish between the
different causes of such a delay spike. Rather, it discusses the different causes of such a delay spike. Rather, it discusses the
spurious retransmission timeouts caused by a delay spike in general. spurious retransmission timeouts caused by a delay spike in general.
This document describes the F-RTO detection algorithm. It is based This document describes the F-RTO detection algorithm for TCP. It is
on the detection mechanism of the "Forward RTO-Recovery" (F-RTO) based on the detection mechanism of the "Forward RTO-Recovery" (F-
algorithm [SKR03] that is used for detecting spurious retransmission RTO) algorithm [SKR03] that is used for detecting spurious
timeouts and thus avoids unnecessary retransmissions following the retransmission timeouts and thus avoids unnecessary retransmissions
retransmission timeout. When the timeout is not spurious, the F-RTO following the retransmission timeout. When the timeout is not
algorithm reverts back to the conventional RTO recovery algorithm, spurious, the F-RTO algorithm reverts back to the conventional RTO
and therefore has similar behavior and performance. In contrast to recovery algorithm, and therefore has similar behavior and
alternative algorithms proposed for detecting unnecessary performance. In contrast to alternative algorithms proposed for
retransmissions (Eifel [LK00], [LM03] and DSACK-based algorithms detecting unnecessary retransmissions (Eifel [LK00], [LM03] and
[BA04]), F-RTO does not require any TCP options for its operation, DSACK-based algorithms [BA04]), F-RTO does not require any TCP
and it can be implemented by modifying only the TCP sender. The options for its operation, and it can be implemented by modifying
Eifel algorithm uses TCP timestamps [BBJ92] for detecting a spurious only the TCP sender. The Eifel algorithm uses TCP timestamps
timeout upon arrival of the first acknowledgment after the [BBJ92] for detecting a spurious timeout upon arrival of the first
retransmission. The DSACK-based algorithms require that the TCP acknowledgment after the retransmission. The DSACK-based algorithms
Selective Acknowledgment Option [MMFR96], with the DSACK extension require that the TCP Selective Acknowledgment Option [MMFR96], with
[FMMP00], is in use. With DSACK, the TCP receiver can report if it the DSACK extension [FMMP00], is in use. With DSACK, the TCP
has received a duplicate segment, enabling the sender to detect receiver can report if it has received a duplicate segment, enabling
afterwards whether it has retransmitted segments unnecessarily. The the sender to detect afterwards whether it has retransmitted
F-RTO algorithm only attempts to detect and avoid unnecessary segments unnecessarily. The F-RTO algorithm only attempts to detect
retransmissions after an RTO. Eifel and DSACK can also be used for and avoid unnecessary retransmissions after an RTO. Eifel and DSACK
detecting unnecessary retransmissions caused by other events, such can also be used for detecting unnecessary retransmissions caused by
as packet reordering. other events, such as packet reordering.
When an RTO expires, the F-RTO sender retransmits the first When the retransmission timer expires, the F-RTO sender retransmits
unacknowledged segment as usual [APS99]. Deviating from the normal the first unacknowledged segment as usual [APS99]. Deviating from
operation after a timeout, it then tries to transmit new, previously the normal operation after a timeout, it then tries to transmit new,
unsent data for the first acknowledgment that arrives after the previously unsent data for the first acknowledgment that arrives
timeout, given that the acknowledgment advances the window. If the after the timeout, given that the acknowledgment advances the
second acknowledgment that arrives after the timeout advances the window. If the second acknowledgment that arrives after the timeout
window (i.e., acknowledges data that was not retransmitted), the F- advances the window (i.e., acknowledges data that was not
RTO sender declares the timeout spurious and exits the RTO recovery. retransmitted), the F-RTO sender declares the timeout spurious and
However, if either of these two acknowledgments is a duplicate ACK, exits the RTO recovery. However, if either of these two
there will not be sufficient evidence of a spurious timeout. acknowledgments is a duplicate ACK, there will not be sufficient
Therefore, the F-RTO sender retransmits the unacknowledged segments evidence of a spurious timeout. Therefore, the F-RTO sender
in slow start similarly to the traditional algorithm. retransmits the unacknowledged segments in slow start similarly to
the traditional algorithm. With a SACK-enhanced version of the F-RTO
algorithm, spurious timeouts may be detected even if duplicate ACKs
arrive after an RTO retransmission.
With a SACK-enhanced version of the F-RTO algorithm, spurious This document specifies the F-RTO algorithm for TCP only, replacing
timeouts may be detected even if duplicate ACKs arrive after an RTO the F-RTO functionality with TCP in RFC 4138 [SK05] and moving it
retransmission. Even though this document only specifies the F-RTO from Experimental to Standards Track status. The algorithm can also
algorithm for TCP, the algorithm can also be applied to the Stream be applied to the Stream Control Transmission Protocol (SCTP)
Control Transmission Protocol (SCTP) [Ste07] that has acknowledgment [Ste07] that has acknowledgment and packet retransmission concepts
and packet retransmission concepts similar to TCP. Considerations on similar to TCP. The considerations on applying F-RTO to SCTP are
applying F-RTO for SCTP are discussed in RFC 4138 [SK05]. discussed in RFC 4138, but the F-RTO support for SCTP remains with
Experimental status.
This document is organized as follows. Section 2 describes the This document is organized as follows. Section 2 describes the basic
basic F-RTO algorithm, and the SACK-enhanced F-RTO algorithm is F-RTO algorithm, and the SACK-enhanced F-RTO algorithm is given in
given in Section 3. Section 4 discusses the possible actions to be Section 3. Section 4 discusses the possible actions to be taken
taken after detecting a spurious RTO and Section 5 discusses the after detecting a spurious RTO and Section 5 discusses the security
security considerations. considerations.
1.1. Conventions and Terminology 1.1. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119 document are to be interpreted as described in BCP 14, RFC 2119
[RFC2119] and indicate requirement levels for protocols. [RFC2119] and indicate requirement levels for protocols.
2. Basic F-RTO Algorithm 2. Basic F-RTO Algorithm
A timeout is considered spurious if it would have been avoided had A timeout is considered spurious if it would have been avoided had
the sender waited longer for an acknowledgment to arrive [LM03]. F- the sender waited longer for an acknowledgment to arrive [LM03]. F-
RTO affects the TCP sender behavior only after a retransmission RTO affects the TCP sender behavior only after a retransmission
timeout. Otherwise, the TCP behavior remains the same. When the timeout. Otherwise, the TCP behavior remains the same. When the
RTO expires, the F-RTO algorithm monitors incoming acknowledgments retransmission timer expires, the F-RTO algorithm monitors incoming
and if the TCP sender gets an acknowledgment for a segment that was acknowledgments and if the TCP sender gets an acknowledgment for a
not retransmitted due to timeout, the F-RTO algorithm declares a segment that was not retransmitted due to the timeout, the F-RTO
timeout spurious. The actions taken in response to a spurious algorithm declares a timeout spurious. The actions taken in response
timeout are not specified in this document, but we discuss some to a spurious timeout are not specified in this document, but we
alternatives in Section 4. This section introduces the algorithm discuss some alternatives in Section 4. This section introduces the
and then discusses the different steps of the algorithm in more algorithm and then discusses the different steps of the algorithm in
detail. more detail.
Following the practice used with the Eifel Detection algorithm Following the practice used with the Eifel Detection algorithm
[LM03], we use the "SpuriousRecovery" variable to indicate whether [LM03], we use the "SpuriousRecovery" variable to indicate whether
the retransmission is declared spurious by the sender. This variable the retransmission is declared spurious by the sender. This variable
can be used as an input for a corresponding response algorithm. With can be used as an input for a corresponding response algorithm. With
F-RTO, the value of SpuriousRecovery can be either SPUR_TO F-RTO, the value of SpuriousRecovery can be either SPUR_TO
(indicating a spurious retransmission timeout) or FALSE (indicating (indicating a spurious retransmission timeout) or FALSE (indicating
that the timeout is not declared spurious), and the TCP sender that the timeout is not declared spurious and the TCP sender should
should follow the conventional RTO recovery algorithm. In addition, follow the conventional RTO recovery algorithm). In addition, we use
we use the "recover" variable specified in the NewReno algorithm the "recover" variable specified in the NewReno algorithm [FHG04].
[FHG04].
2.1. The Algorithm 2.1. The Algorithm
A TCP sender implementing the basic F-RTO algorithm MUST take the A TCP sender implementing the basic F-RTO algorithm MUST take the
following steps after the retransmission timer expires. If the following steps after the retransmission timer expires. If the
retransmission timer expires again during the execution of the F-RTO retransmission timer expires again during the execution of the F-RTO
algorithm, the TCP sender MUST re-start the algorithm processing algorithm, the TCP sender MUST re-start the algorithm processing
from step 1. If the sender implements some loss recovery algorithm from step 1. If the sender implements some loss recovery algorithm
other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT
be entered when earlier fast recovery is underway. be entered when earlier fast recovery is underway.
The F-RTO algorithm takes different actions based on whether an The F-RTO algorithm takes different actions based on whether an
incoming acknowledgement advances the cumulative acknowledgement incoming acknowledgement advances the cumulative acknowledgement
point for a received in-order segment, or whether it is a duplicate point for a received in-order segment, or whether it is a duplicate
acknowledgement to indicate an out-of-order segment. Duplicate acknowledgement to indicate an out-of-order segment. Duplicate
acknowledgement is defined in [APB08]. The F-RTO algorithm does not acknowledgement is defined in [APB08]. The F-RTO algorithm does not
specify actions for receiving a segment that does not acknowledge specify actions for receiving a segment that neither acknowledges
new data but is not a duplicate acknowledgement. The TCP sender new data nor is a duplicate acknowledgement. The TCP sender SHOULD
SHOULD ignore such segments and wait for a segment that either ignore such segments and wait for a segment that either acknowledges
acknowledges new data or is a duplicate acknowledgment. new data or is a duplicate acknowledgment.
1) When RTO expires, retransmit the first unacknowledged segment and 1) When the retransmission timer expires, retransmit the first
set SpuriousRecovery to FALSE. If the TCP sender is already in unacknowledged segment and set SpuriousRecovery to FALSE. If the
RTO recovery AND "recover" is larger than or equal to SND.UNA TCP sender is already in RTO recovery AND "recover" is larger
(the oldest unacknowledged sequence number [Pos81]), do not enter than or equal to SND.UNA (the oldest unacknowledged sequence
step 2 of this algorithm. Instead, store the highest sequence number [Pos81]), do not enter step 2 of this algorithm. Instead,
number transmitted so far in variable "recover" and continue with store the highest sequence number transmitted so far in variable
slow start retransmissions following the conventional RTO "recover" and continue with slow start retransmissions following
recovery algorithm. the conventional RTO recovery algorithm.
2) When the first acknowledgment after the RTO retransmission 2) When the first acknowledgment after the RTO retransmission
arrives at the TCP sender, store the highest sequence number arrives at the TCP sender, store the highest sequence number
transmitted so far in variable "recover". The TCP sender chooses transmitted so far in variable "recover". The TCP sender chooses
one of the following actions, depending on whether the ACK one of the following actions, depending on whether the ACK
advances the window or whether it is a duplicate ACK. advances the window or whether it is a duplicate ACK.
a) If the acknowledgment is a duplicate ACK OR the a) If the acknowledgment is a duplicate ACK OR the
Acknowledgement field covers "recover" but not more than Acknowledgement field covers "recover" but not more than
"recover" OR the acknowledgment does not acknowledge all of "recover" OR the acknowledgment does not acknowledge all of
skipping to change at page 8, line 17 skipping to change at page 8, line 19
The F-RTO sender takes cautious actions when it receives duplicate The F-RTO sender takes cautious actions when it receives duplicate
acknowledgments after a retransmission timeout. Because duplicate acknowledgments after a retransmission timeout. Because duplicate
ACKs may indicate that segments have been lost, reliably detecting a ACKs may indicate that segments have been lost, reliably detecting a
spurious timeout is difficult due to the lack of additional spurious timeout is difficult due to the lack of additional
information. Therefore, it is prudent to follow the conventional information. Therefore, it is prudent to follow the conventional
TCP recovery in those cases. TCP recovery in those cases.
The condition in step 1 prevents the execution of the F-RTO The condition in step 1 prevents the execution of the F-RTO
algorithm in case a previous RTO recovery is underway when the algorithm in case a previous RTO recovery is underway when the
retransmission timer expires, except in case the retransmission retransmission timer expires, except in case the retransmission
timer expires multiple times for the same segment. If RTO expires timer expires multiple times for the same segment. If the
during an earlier RTO-based loss recovery, acknowledgements for retransmission timer expires during an earlier RTO-based loss
retransmitted segments may falsely lead the TCP sender to declare recovery, acknowledgements for retransmitted segments may falsely
the timeout spurious. lead the TCP sender to declare the timeout spurious.
If the first acknowledgment after the RTO retransmission covers the If the first acknowledgment after the RTO retransmission covers the
"recover" point at algorithm step (2a), there is not enough evidence "recover" point at algorithm step (2a), there is not enough evidence
that a non-retransmitted segment has arrived at the receiver after that a non-retransmitted segment has arrived at the receiver after
the timeout. This is a common case when a fast retransmission is the timeout. This is a common case when a fast retransmission is
lost and has been retransmitted again after an RTO, while the rest lost and has been retransmitted again after an RTO, while the rest
of the unacknowledged segments were successfully delivered to the of the unacknowledged segments were successfully delivered to the
TCP receiver before the retransmission timeout. Therefore, the TCP receiver before the retransmission timeout. Therefore, the
timeout cannot be declared spurious in this case. timeout cannot be declared spurious in this case.
skipping to change at page 9, line 5 skipping to change at page 9, line 9
transfer from stalling. If no segments were sent, the pipe between transfer from stalling. If no segments were sent, the pipe between
sender and receiver might run out of segments, and no further sender and receiver might run out of segments, and no further
acknowledgments would arrive. Therefore, in the window-limited acknowledgments would arrive. Therefore, in the window-limited
case, the recommendation is to revert to the conventional RTO case, the recommendation is to revert to the conventional RTO
recovery with slow start retransmissions. Appendix A discusses some recovery with slow start retransmissions. Appendix A discusses some
alternative solutions for window-limited situations. alternative solutions for window-limited situations.
If the retransmission timeout is declared spurious, the TCP sender If the retransmission timeout is declared spurious, the TCP sender
sets the value of the "recover" variable to SND.UNA in order to sets the value of the "recover" variable to SND.UNA in order to
allow fast retransmit [FHG04]. The "recover" variable was proposed allow fast retransmit [FHG04]. The "recover" variable was proposed
for avoiding unnecessary, multiple fast retransmits when RTO expires for avoiding unnecessary, multiple fast retransmits when the
during fast recovery with NewReno TCP. Because the F-RTO sender retransmission timer expires during fast recovery with NewReno TCP.
retransmits only the segment that triggered the timeout, the problem Because the F-RTO sender retransmits only the segment that triggered
of unnecessary multiple fast retransmits [FHG04] cannot occur. the timeout, the problem of unnecessary multiple fast retransmits
Therefore, if three duplicate ACKs arrive at the sender after the [FHG04] cannot occur. Therefore, if three duplicate ACKs arrive at
timeout, they probably indicate a packet loss, and thus fast the sender after the timeout, they probably indicate a packet loss,
retransmit should be used to allow efficient recovery. If there are and thus fast retransmit should be used to allow efficient recovery.
not enough duplicate ACKs arriving at the sender after a packet If there are not enough duplicate ACKs arriving at the sender after
loss, the retransmission timer expires again and the sender enters a packet loss, the retransmission timer expires again and the sender
step 1 of this algorithm. enters step 1 of this algorithm.
When the timeout is declared spurious, the TCP sender cannot detect When the timeout is declared spurious, the TCP sender cannot detect
whether the unnecessary RTO retransmission was lost. In principle, whether the unnecessary RTO retransmission was lost. In principle,
the loss of the RTO retransmission should be taken as a congestion the loss of the RTO retransmission should be taken as a congestion
signal. Thus, there is a small possibility that the F-RTO sender signal. Thus, there is a small possibility that the F-RTO sender
will violate the congestion control rules, if it chooses to fully will violate the congestion control rules, if it chooses to fully
revert congestion control parameters after detecting a spurious revert congestion control parameters after detecting a spurious
timeout. The Eifel detection algorithm has a similar property, timeout. The Eifel detection algorithm has a similar property,
while the DSACK option can be used to detect whether the while the DSACK option can be used to detect whether the
retransmitted segment was successfully delivered to the receiver. retransmitted segment was successfully delivered to the receiver.
skipping to change at page 10, line 15 skipping to change at page 10, line 16
3. SACK-Enhanced Version of the F-RTO Algorithm 3. SACK-Enhanced Version of the F-RTO Algorithm
This section describes an alternative version of the F-RTO algorithm This section describes an alternative version of the F-RTO algorithm
that uses the TCP Selective Acknowledgment Option [MMFR96]. By that uses the TCP Selective Acknowledgment Option [MMFR96]. By
using the SACK option, the TCP sender detects spurious timeouts in using the SACK option, the TCP sender detects spurious timeouts in
most of the cases when packet reordering or packet duplication is most of the cases when packet reordering or packet duplication is
present. If the SACK blocks acknowledge new data that was not present. If the SACK blocks acknowledge new data that was not
transmitted after the RTO retransmission, the sender may declare the transmitted after the RTO retransmission, the sender may declare the
timeout spurious, even when duplicate ACKs follow the RTO. timeout spurious, even when duplicate ACKs follow the RTO.
3.1. The Algorithm
Given that the TCP Selective Acknowledgment Option [MMFR96] is Given that the TCP Selective Acknowledgment Option [MMFR96] is
enabled for a TCP connection, a TCP sender MAY implement the SACK- enabled for a TCP connection, a TCP sender MAY apply the SACK-
enhanced F-RTO algorithm. If the sender applies the SACK-enhanced enhanced F-RTO algorithm. If the sender applies the SACK-enhanced
F-RTO algorithm, it MUST follow the steps below. This algorithm F-RTO algorithm, it MUST follow the steps below. This algorithm
SHOULD NOT be applied if the TCP sender is already in loss recovery SHOULD NOT be applied if the TCP sender is already in loss recovery
when retransmission timeout occurs. when a retransmission timeout occurs.
The steps of the SACK-enhanced version of the F-RTO algorithm are as The steps of the SACK-enhanced version of the F-RTO algorithm are as
follows. If the retransmission timer expires again during the follows. If the retransmission timer expires again during the
execution of the SACK-enhanced F-RTO algorithm, the TCP sender MUST execution of the SACK-enhanced F-RTO algorithm, the TCP sender MUST
re-start the algorithm processing from step 1. re-start the algorithm processing from step 1.
1) When the RTO expires, retransmit the first unacknowledged segment 1) When the retransmission timer expires, retransmit the first
and set SpuriousRecovery to FALSE. Following the recommendation unacknowledged segment and set SpuriousRecovery to FALSE.
in SACK specification [MMFR96], reset the SACK scoreboard. If Following the recommendation in the SACK specification [MMFR96],
"RecoveryPoint" is larger than or equal to SND.UNA, do not enter reset the SACK scoreboard. If "RecoveryPoint" is larger than or
step 2 of this algorithm. Instead, set variable "RecoveryPoint" equal to SND.UNA, do not enter step 2 of this algorithm. Instead,
to indicate the highest sequence number transmitted so far and set variable "RecoveryPoint" to indicate the highest sequence
continue with slow start retransmissions following the number transmitted so far and continue with slow start
conventional RTO recovery algorithm. retransmissions following the conventional RTO recovery
algorithm.
2) Wait until the acknowledgment of the data retransmitted due to 2) Wait until the acknowledgment of the data retransmitted due to
the timeout arrives at the sender. If duplicate ACKs arrive the timeout arrives at the sender. If duplicate ACKs arrive
before the cumulative acknowledgment for retransmitted data, before the cumulative acknowledgment for retransmitted data,
adjust the scoreboard according to the incoming SACK information. adjust the scoreboard according to the incoming SACK information.
Stay in step 2 and wait for the next new acknowledgment. If RTO Stay in step 2 and wait for the next new acknowledgment. If the
expires again, go to step 1 of the algorithm. When a new retransmission timeout expires again, go to step 1 of the
acknowledgment arrives, set variable "RecoveryPoint" to indicate algorithm. When a new acknowledgment arrives, set variable
the highest sequence number transmitted so far. "RecoveryPoint" to indicate the highest sequence number
transmitted so far.
a) If the Cumulative Acknowledgement field covers "RecoveryPoint" a) If the Cumulative Acknowledgement field covers "RecoveryPoint"
but not more than "RecoveryPoint", revert to the conventional but not more than "RecoveryPoint", revert to the conventional
RTO recovery and set the congestion window to no more than 2 * RTO recovery and set the congestion window to no more than 2 *
MSS, like a regular TCP would do. Do not enter step 3 of this MSS, like a regular TCP would do. Do not enter step 3 of this
algorithm. algorithm.
b) Else, if the Cumulative Acknowledgement field does not cover b) Else, if the Cumulative Acknowledgement field does not cover
"RecoveryPoint" but is larger than SND.UNA, transmit up to two "RecoveryPoint" but is larger than SND.UNA, transmit up to two
new (previously unsent) segments and proceed to step 3. If new (previously unsent) segments and proceed to step 3. If
skipping to change at page 11, line 44 skipping to change at page 12, line 6
retransmission timeout can be declared spurious, because the retransmission timeout can be declared spurious, because the
segment acknowledged with this ACK was transmitted before the segment acknowledged with this ACK was transmitted before the
timeout. timeout.
If there are unacknowledged holes between the received SACK blocks, If there are unacknowledged holes between the received SACK blocks,
those segments are retransmitted similarly to the conventional SACK those segments are retransmitted similarly to the conventional SACK
recovery algorithm [BAFW03]. If the algorithm exits with recovery algorithm [BAFW03]. If the algorithm exits with
SpuriousRecovery set to SPUR_TO, "RecoveryPoint" is set to SND.UNA, SpuriousRecovery set to SPUR_TO, "RecoveryPoint" is set to SND.UNA,
thus allowing fast recovery on incoming duplicate acknowledgments. thus allowing fast recovery on incoming duplicate acknowledgments.
3.2. Discussion
The SACK enhanced algorithm works on the same principle as the basic The SACK enhanced algorithm works on the same principle as the basic
algorithm, but by utilizing the additional information from the SACK algorithm, but by utilizing the additional information from the SACK
option. When a genuine retransmission timeout occurs during a steady option. When a genuine retransmission timeout occurs during a steady
state of a connection, it can be assumed that there are no segments state of a connection, it can be assumed that there are no segments
left in the pipe. Otherwise, the acknowledgments triggered by these left in the pipe. Otherwise, the acknowledgments triggered by these
segments would have triggered the SACK loss recovery or transmission segments would have triggered the SACK loss recovery or transmission
of new segments. Therefore, if the F-RTO sender receives of new segments. Therefore, if the F-RTO sender receives
acknowledgements for segments transmitted before the retransmission acknowledgements for segments transmitted before the retransmission
timeout in response to the two new segments sent at the algorithm timeout in response to the two new segments sent at the algorithm
step 2, the normal operation of TCP has been just delayed, and the step 2, the normal operation of TCP has been just delayed, and the
retransmission timeout is considered spurious. Note that this retransmission timeout is considered spurious. Note that this
reasoning works only when the TCP sender is not in loss recovery at reasoning works only when the TCP sender is not in loss recovery at
the time the retransmission timeout occurs. The condition in step 1 the time the retransmission timeout occurs. The condition in step 1
checking that "RecoveryPoint" is larger than SND.UNA prevents the checking that "RecoveryPoint" is larger than or equal to SND.UNA
execution of the F-RTO algorithm in case a previous loss recovery, prevents the execution of the F-RTO algorithm in case a previous
either RTO recovery or SACK loss recovery, is underway when the loss recovery, either RTO recovery or SACK loss recovery, is
retransmission timer expires. It, however, allows the execution of underway when the retransmission timer expires. It, however, allows
the F-RTO algorithm, if the retransmission timer expires multiple the execution of the F-RTO algorithm, if the retransmission timer
times for the same segment. expires multiple times for the same segment.
4. Taking Actions after Detecting Spurious RTO 4. Taking Actions after Detecting Spurious RTO
Upon a retransmission timeout, a conventional TCP sender assumes Upon a retransmission timeout, a conventional TCP sender assumes
that outstanding segments are lost and starts retransmitting the that outstanding segments are lost and starts retransmitting the
unacknowledged segments. When the retransmission timeout is unacknowledged segments. When the retransmission timeout is
detected to be spurious, the TCP sender should not continue detected to be spurious, the TCP sender should not continue
retransmitting based on the timeout. For example, if the sender was retransmitting based on the timeout. For example, if the sender was
in congestion avoidance phase transmitting new, previously unsent in congestion avoidance phase transmitting new, previously unsent
segments, it should continue transmitting previously unsent segments segments, it should continue transmitting previously unsent segments
skipping to change at page 12, line 37 skipping to change at page 13, line 5
There are currently two alternatives specified for a spurious There are currently two alternatives specified for a spurious
timeout response algorithm, the Eifel Response Algorithm [LG04], and timeout response algorithm, the Eifel Response Algorithm [LG04], and
an algorithm for adapting the retransmission timeout after a an algorithm for adapting the retransmission timeout after a
spurious RTO [BBA06]. If no specific response algorithm is spurious RTO [BBA06]. If no specific response algorithm is
implemented, the TCP SHOULD respond to spurious timeout implemented, the TCP SHOULD respond to spurious timeout
conservatively, applying the TCP congestion control specification conservatively, applying the TCP congestion control specification
[APS99]. Different response algorithms for spurious retransmission [APS99]. Different response algorithms for spurious retransmission
timeouts have been analyzed in some research papers [GL03, Sar03] timeouts have been analyzed in some research papers [GL03, Sar03]
and IETF documents [SL03]. and IETF documents [SL03].
5. Evaluation of RFC 4138 and Differences to this Document 5. Evaluation of RFC 4138
F-RTO was first specified in an Experimental RFC 4138 that has been F-RTO was first specified in an Experimental RFC 4138 that has been
implemented in a number of operating systems since it was published. implemented in a number of operating systems since it was published.
Gained experience has been documented in a separate document Gained experience has been documented in a separate document
[KYHS07], and can be summarized as follows. [KYHS07], and can be summarized as follows.
If the TCP sender employs F-RTO, it is able to detect spurious RTOs If the TCP sender employs F-RTO, it is able to detect spurious RTOs
and avoid the unnecessary retransmission of the whole window of and avoid the unnecessary retransmission of the whole window of
data. Because F-RTO avoids the unnecessary retransmissions after a data. Because F-RTO avoids the unnecessary retransmissions after a
spurious RTO, it is able to adhere to the packet conservation spurious RTO, it is able to adhere to the packet conservation
skipping to change at page 14, line 31 skipping to change at page 14, line 36
useful for the receiver to maliciously gain a larger congestion useful for the receiver to maliciously gain a larger congestion
window. window.
A common case for a retransmission timeout is that a fast A common case for a retransmission timeout is that a fast
retransmission of a segment is lost. If all other segments have retransmission of a segment is lost. If all other segments have
been received, the RTO retransmission causes the whole window to be been received, the RTO retransmission causes the whole window to be
acknowledged at once. This case is recognized in F-RTO algorithm acknowledged at once. This case is recognized in F-RTO algorithm
branch (2a). However, if the receiver only acknowledges one segment branch (2a). However, if the receiver only acknowledges one segment
after receiving the RTO retransmission, and then the rest of the after receiving the RTO retransmission, and then the rest of the
segments, it could cause the timeout to be declared spurious when it segments, it could cause the timeout to be declared spurious when it
is not. Therefore, it is suggested that, when an RTO expires during is not. Therefore, it is suggested that, when an RTO occurs during
the fast recovery phase, the sender would not fully revert the the fast recovery phase, the sender would not fully revert the
congestion window even if the timeout was declared spurious. congestion window even if the timeout was declared spurious.
Instead, the sender would reduce the congestion window to 1. Instead, the sender would reduce the congestion window to 1.
If there is more than one segment missing at the time of a If there is more than one segment missing at the time of a
retransmission timeout, the receiver does not benefit from retransmission timeout, the receiver does not benefit from
misleading the sender to declare a spurious timeout because the misleading the sender to declare a spurious timeout because the
sender would have to go through another recovery period to sender would have to go through another recovery period to
retransmit the missing segments, usually after an RTO has elapsed. retransmit the missing segments, usually after an RTO has elapsed.
skipping to change at page 16, line 23 skipping to change at page 16, line 30
encouraged. encouraged.
- In receiver-limited cases, send one octet of new data, regardless - In receiver-limited cases, send one octet of new data, regardless
of the advertised window limit, and continue with step 3 of the F- of the advertised window limit, and continue with step 3 of the F-
RTO algorithm. It is possible that the receiver will have free RTO algorithm. It is possible that the receiver will have free
buffer space to receive the data by the time the segment has buffer space to receive the data by the time the segment has
propagated through the network, in which case no harm is done. If propagated through the network, in which case no harm is done. If
the receiver is not capable of receiving the segment, it rejects the receiver is not capable of receiving the segment, it rejects
the segment and sends a duplicate ACK. the segment and sends a duplicate ACK.
B. List of Changes B. Changes since RFC 4138
Changes from RFC 4138 are summarized below, apart from minor editing Changes from RFC 4138 are summarized below, apart from minor editing
and language improvements. and language improvements.
* Modified the basic F-RTO algorithm and SACK-enhanced F-RTO * Modified the basic F-RTO algorithm and the SACK-enhanced F-RTO
algorithm to prevent the TCP sender from applying F-RTO algorithm if algorithm to prevent the TCP sender from applying the F-RTO
retransmission timer expires when an earlier RTO recovery is algorithm if the retransmission timer expires when an earlier RTO
underway, except when RTO expires multiple times for the same recovery is underway, except when the retransmission timer expires
segment. multiple times for the same segment.
* Clarified behavior on multiple timeouts. * Clarified behavior on multiple timeouts.
* Added a paragraph on acknowledgements that do not acknowledge new * Added a paragraph on acknowledgements that do not acknowledge new
data but are not duplicate acknowledgements data but are not duplicate acknowledgements.
* Clarified the SACK-algorithm a bit, and added one paragraph of * Clarified the SACK-algorithm a bit, and added one paragraph of
description of the basic idea of the algorithm. description of the basic idea of the algorithm.
* Removed SCTP considerations * Removed SCTP considerations.
* Removed earlier Appendix sections, except Appendix C from RFC * Removed earlier Appendix sections, except Appendix C from RFC
4138, which is now Appendix A 4138, which is now Appendix A.
* Clarified text about the possible response algorithms * Clarified text about the possible response algorithms.
* Added section that summarizes the evaluation of RFC 4138 * Added section that summarizes the evaluation of RFC 4138.
References References
Normative References Normative References
[APS99] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion [APS99] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999. Control", RFC 2581, April 1999.
[APB08] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [APB08] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", Internet-Draft "draft-ietf-tcpm- Control", Internet-Draft "draft-ietf-tcpm-
 End of changes. 36 change blocks. 
127 lines changed or deleted 143 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/