Internet Engineering Task Force                                M. Allman
INTERNET-DRAFT                                                      ICSI
File: draft-ietf-tcpm-rto-consider-03.txt                 April draft-ietf-tcpm-rto-consider-04.txt                  June 15, 2016
Intended Status: Best Current Practice
Expires: October December 15, 2016

                  Retransmission Timeout Considerations Requirements

Status of this Memo

    This document may not be modified, and derivative works of it may
    not be created, except to format it for publication as an RFC or to
    translate it into languages other than English.

    This Internet-Draft is submitted in full conformance with the
    provisions of BCP 78 and BCP 79.  Internet-Drafts are working
    documents of the Internet Engineering Task Force (IETF), its areas,
    and its working groups. Note that other groups may also distribute
    working documents as Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time. It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at

    The list of Internet-Draft Shadow Directories can be accessed at

    This Internet-Draft will expire on October 15, 2016.

Copyright Notice

    Copyright (c) 2016 IETF Trust and the persons identified as the
    document authors. All rights reserved.

    This document is subject to BCP 78 and the IETF Trust's Legal
    Provisions Relating to IETF Documents
    ( in effect on the date of
    publication of this document. Please review these documents
    carefully, as they describe your rights and restrictions with
    respect to this document. Code Components extracted from this
    document must include Simplified BSD License text as described in
    Section 4.e of the Trust Legal Provisions and are provided without
    warranty as described in the Simplified BSD License.


    Ensuring reliable communication often manifests in a timeout and
    retry mechanism.  Each implementation of a retransmission timeout
    mechanism represents a balance between correctness and timeliness
    and therefore no implementation suits all situations.  This document
    provides high-level requirements for retransmission timeout schemes
    appropriate for general use in the Internet.  Within the
    requirements, implementations have latitude to define particulars
    that best address each situation.


    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    document are to be interpreted as described in BCP 14, RFC 2119

1   Introduction

    Despite our best intentions and most robust mechanisms, reliability
    in networking ultimately requires

    Reliable transmission is a timeout and re-try mechanism.
    Often there are more timely key property for many network protocols
    and precise applications.  Our protocols use various mechanisms than a timeout
    for repairing loss (e.g., TCP's fast retransmit [RFC5681], NewReno
    [RFC6582] to achieve
    reliable data transmission.  Often we use continuous or selective acknowledgment scheme [RFC2018,RFC6675])
    which require information exchange between components in periodic
    reports from the system.
    Such communication cannot recipient to inform the sender's notion of which
    pieces of data are missing and need to be guaranteed. retransmitted to ensure
    reliability.  Alternatively, information coding---e.g., FEC---can allow the recipient be
    used to recover from some
    amount of lost information without use of a retransmission.  This
    latter provides achieve probabilistic reliability.  Finally, negative
    acknowledgment schemes exist that do not depend on continuous
    feedback to trigger retransmissions (e.g., [RFC3940]). reliability without retransmissions.
    regardless of these useful alternatives, despite our best intentions and most robust mechanisms, the
    only thing we can truly depend on is the passage of time and
    therefore our ultimate backstop to ensuring reliability is a timeout.  (Note: There is a case when
    we cannot count on timeout
    and re-try mechanism.  That is, the passage sender sets some expectation for
    how long to wait for confirmation of time, but in this case we believe
    repairing loss will be delivery for a moot point and hence we do not further
    consider given piece of
    data.  When this case time period passes without delivery confirmation
    the sender assumes the data was lost in this document.) transit and therefore
    schedules a retransmission.  This process of ensuring reliability
    via time-based loss detection and resending lost data is commonly
    referred to as a "retransmission timeout (RTO)" mechanism.

    Various protocols have defined their own timeout RTO mechanisms (e.g., TCP
    [RFC6298], SCTP [RFC4960], SIP [RFC3261]).  Ideally, if we know
    a segment will be lost before reaching the destination, a second
    copy of it would be sent immediately after the first transmission.
    However, in reality the  The specifics of
    retransmission timeouts often represent a particular tradeoff
    between correctness and responsiveness [AP99].  In other words we
    want to simultaneously:

      - Wait wait long enough to ensure the decision to retransmit detection of loss is correct and
        therefore a retransmission is
        correct. in fact needed, and

      - Bound bound the delay we impose on applications before

    However, serving repairing

    Serving both of these goals is difficult as they pull in opposite
    directions.  I.e., towards either (a) withholding needed
    retransmissions too long to ensure the retransmissions are original transmission is
    needed lost or (b) not waiting long enough to help application
    responsiveness and hence sending spurious unnecessary (often denoted
    "spurious") retransmissions.  Given this
    fundamental tradeoff [AP99], we  We have found that even though the
    retransmission timeout (RTO) procedures are standardized, RTO
    procedure is standardized for some protocols (e.g., TCP [RFC6298]),
    implementations often add their own subtle imprint on the specifics
    of the process to tilt the tradeoff between correctness and
    responsiveness in some particular way.

    At this point we recognize that often these specific tweaks are that
    deviate from standardized RTO mechanisms do not
    crucial for materially impact
    network safety.  Hence,  Therefore, in this document we outline the a set of
    high-level protocol-agnostic requirements for RTO mechanisms that are crucial
    provide a for any retransmission
    timeout scheme to follow. network safety.  The intent is to then allow provide a safe
    foundation on which implementations have the flexibility to
    instantiate mechanisms that best realize their specific goals within this framework.  These specific mechanisms
    could be standardized by the IETF or ad-hoc, but as long as they
    adhere to the requirements given goals.

2   Scope

    The principles we outline in this document they would be
    considered consistent with are protocol-agnostic and
    widely applicable.  We make the standards.

    Finally, we note following scope statements about
    the application of the requirements discussed in Section 3:

    (S.1) The requirements in this document are applicable apply only to
    any protocol that uses a retransmission timeout mechanism.  The
    examples timer-based
          loss detection and discussion retransmission.

          While there are framed in terms of TCP, however, that is
    an artifact of where much a bevy of our experience with RTOs comes from uses for timers in protocols---from
          rate-based pacing to connection failure detection to making
          congestion control decisions and
    should not be read as narrowing beyond---these are outside
          the scope of the requirements.

2   Scope

    This document offers high-level this document.

    (S.2) The requirements based on experience
    with retransmission timer algorithms.  However, in this document
    explicitly does not update or obsolete currently standardized
    algorithms nor limit future standardization only apply to cases where
          loss detected via a timer is repaired by a retransmission of specific RTO
    mechanisms.  Specifically:

    (a) RTO mechanisms that are currently standardized
          the original data.

          Other cases are not certainly possible---e.g., replacing the lost
          data with an updated
        or obsoleted by version---but fall outside the scope of
          this document.  This holds even in cases where
        the existing specification differs from the

    (S.3) The requirements in this document apply only to endpoint-to-
          endpoint unicast communication.  Reliable multicast (e.g., [RFC3261] uses
          [RFC5740]) protocols are explicitly outside the scope of this

          Protocols such as SCTP [RFC4960] and MP-TCP [RFC6182] that
          communicate in a smaller initial RTO than unicast fashion with multiple specific
          endpoints can leverage the requirements in this document specifies).  Existing standard specifications enjoy
        their own consensus which
          provided they track state and follow the requirements for each
          endpoint independently.  I.e., if host A communicates with
          hosts B and C, A must use independent RTOs for traffic sent to
          B and C.

    (S.4) There are cases where state is shared across connections or
          flows (e.g., [RFC2140], [RFC3124]).  The RTO is one piece
          state that is often discussed as sharable.  These situations
          raise issues that the simple flow-oriented RTO mechanism
          discussed in this document does not change.

    (b) Future standardization efforts that specify RTO mechanisms
        SHOULD follow consider (e.g., how long
          to preserve state between connections).  Therefore, while the requirements
          general principles given in this document.  This follows Section 3 are likely applicable,
          sharing RTOs across flows is outside the definition scope of "SHOULD" [RFC2119] and is explicitly this

    (S.5) The requirements in this document apply to reliable
          transmission, but do not assume that all data transmitted
          within a
        "MUST".  That is, connection or flow is reliably sent.

          E.g., a protocol like DCCP [RFC4340] could leverage the
          requirements in this document hold unless for the community has consensus that specific deviations in initial reliable
          handshake even though the protocol reverts to unreliable
          transmission after the handshake.

          E.g., a
        particular context are warranted.

    (c) RTO mechanisms protocol like SCTP [RFC4960] could leverage the
          requirements for data that are not standardized but adhere is sent only "partially reliably".
          In this case, the protocol uses two phases for each message.
          In the first phase, the protocol attempts to ensure
          reliability and can leverage the requirements in this
          document.  At some point the following section are deemed consistent with value of the standards.  This includes RTO mechanisms that are deviations
        from a specific standardized algorithm, but data is gone and the
          protocol transitions to the second phase where the data is
          treated as unreliably transmitted and therefore the protocol
          will no longer attempt to repair the loss---and hence there
          are still within no more retransmissions and the requirements below.

    More colloquially we note that each in this
          document are moot.

    (S.6) The requirements for RTO implementation mechanisms in this document can be placed
    into one
          applied regardless of whether the following four categories:

    - The implementation precisely follows a standard RTO mechanism
      (e.g., [RFC6298]), as well as adhering to is the requirements sole
          loss repair strategy or works in this

      This document represents no change concert with other

          E.g., for this situation as such an
      implementation a simple protocol like UDP-based DNS [] a timeout
          and re-try mechanism is clearly standards compliant.

    - The implementation does not precisely follow likely to act alone to ensure

          E.g., within a standard complex protocol like TCP or SCTP we have
          designed methods to detect and repair loss based on explicit
          endpoint state sharing [RFC2018,RFC4960,RFC6675].  These
          mechanisms are preferred over the RTO
      mechanism as they are often more
          timely and does not adhere to precise than the coarse-grained RTO.  In these
          cases, the RTO becomes a last resort when the more advanced
          mechanisms fail.

    Additionally, the following statements detail the relationship of
    the requirements in this

      This document makes no change to this situation as such an
      implementation is clearly not standards compliant.

    - The implementation precisely follows a standard other specifications and

    (R.1) RTO mechanism
      (e.g., [RFC3261]), but does mechanisms that are currently standardized are not precisely adhere updated
          or obsoleted by this document.  Implementations are free to
          use these existing specifications as they do now.

          This holds even in cases where the existing specification
          differs from the requirements in this document.

      This document represents no change for this situation as such an
      implementation is considered standards compliant by virtue of
      precisely implementing (e.g.,
          [RFC3261] uses a smaller initial timeout than this document
          specifies).  Existing standard mechanism that has community specifications enjoy their own
          consensus as a reasonable approach.  That is, which this document's
      stance is to document does not limit change.

    (R.2) Future standardization efforts that specify RTO mechanisms
          SHOULD follow the community's ability to make exceptions requirements in this document.

          There may be reasons for future RTO mechanisms to deviate from
          the requirements herein for particular cases.

    - The implementation in Section 3.  In these cases, we expect only
          that the standards process does not precisely follow a standard so after reasonable
          deliberation and with good reason.

    (R.3) Alternatively, future RTO
      mechanism, yet does adhere to mechanism implementations may be
          made directly against the requirements in Section 3 without
          another protocol-specific specification.

    (R.4) There will no doubt be cases where applying the requirements
          in this document.

      This document represents directly is not possible due to the structure
          or operation of a change for these implementations and
      considers them protocol.  For instance, a case where a
          timeout is used to be consistent with detect loss, but the standards by virtue loss is not repaired
          with a direct retransmission of
      following the requirements herein that provide for original data.  In these
          situations, an RTO safe for
      operation in alternate specification is required.  We
          encourage such future efforts to leverage the Internet.

    In other words, spirit of the
          requirements in this document can be viewed as
    specifying the default properties of an RTO mechanism.
    Specifications can more concretely nail down specifics within these
    defaults or work outside the defaults as necessary.  However,
    implementations that fall within the defaults do not require
    explicit specifications to be considered consistent with the
    standards. inform alternate

3   Requirements

    We now list the requirements that SHOULD apply when designing
    retransmission timeout (RTO) mechanisms.

    (1) In the absence of any knowledge about the latency of a path, the
        RTO MUST be conservatively set to no less than 1 second.

        This requirement ensures two important aspects of the RTO.
        First, when transmitting into an unknown network,
        retransmissions will not be sent before an ACK would reasonably
        be expected to arrive and hence possibly waste scarce network
        resources.  Second, as noted below, sometimes retransmissions
        can lead to ambiguities in assessing the latency of a network
        path.  Therefore, it is especially important for the first
        latency sample to be free of ambiguities such that there is a
        baseline for the remainder of the communication.

        The specific constant (1 second) comes from the analysis of
        Internet RTTs found in Appendix A of [RFC6298].

    (2) As we note above, loss detection happens when a sender does not
        receive delivery confirmation within an some expected period of
        time.  We now specify three requirements that pertain to setting
        the sampling length of
        the latency across a path. this expectation.

        Often measuring the latency time required for delivery confirmation is
        is framed as assessing the round-trip time (RTT)---e.g., in TCP's (RTT) of the network path as
        this is the minimum amount of time required to receive delivery
        confirmation and also often follows protocol behavior whereby
        acknowledgments are generated quickly after data arrives.  For
        instance, this is the case for the RTO computation
        specification [RFC6298].  This used by TCP [RFC6298] and
        SCTP [RFC4960].  However, this is somewhat mis-leading as the
        expected latency is better framed as the "feedback time" (FT).

        In other words, it the expectation is not always simply a network
        property, but the length of includes additional time before a sender should
        reasonably expect a response to a query.

        For instance, consider a UDP-based DNS request from a client to
        a resolver.  When the request can be served from the resolver's
        cache the FT likely well approximates the network RTT between
        the client and resolver.  However, on a cache miss the resolver
        will have to request the needed information from authoritative
        DNS servers, which will non-trivially increase the FT and
        therefore compared
        to the FT RTT between the client and resolver does not well
        match the network-based RTT between the two hosts. resolver.

        (a) In steady state the RTO MUST be set based on recent
            observations of both the FT and the variance of the FT.

            In other words, the RTO should be based on a reasonable
            amount of time that the sender should wait for an
            acknowledgment of the data delivery
            confirmation before retransmitting the given data.

        (b) FT observations MUST be taken regularly.

            The exact definition

            Internet measurements show that taking only a single FT
            sample per TCP connection results in a relatively poorly
            performing RTO mechanism [AP99], hence the requirement that
            the FT be sampled continuously throughout the lifetime of "regularly" is deliberately left
            vague. a

            TCP takes a an FT sample roughly once per RTT, or if using the
            timestamp option [RFC7323] on each acknowledgment arrival.
            [AP99] shows that both these approaches result in roughly
            equivalent performance for the RTO estimator.
            Additionally, [AP99] shows that taking only a single FT
            sample per TCP connection is suboptimal and hence the
            requirement that the FT be sampled continuously throughout
            the lifetime of a connection.  For the purpose of this
            requirement, we state that FT samples

            Therefore, "regularly" SHOULD be taken defined as at least once
            per RTT or as frequently as data is exchanged and
            ACKed if in cases where
            that happens less frequently than every once per RTT.  However, we
            also recognize that it may not always be practical to take a
            an FT sample this often in all cases.  Hence, this
            once-per-RTT sampling requirement definition of "regularly" is explicitly a
            "SHOULD" and not a "MUST".

        (c) FT samples used in observations MAY be taken from non-data exchanges.

            Some protocols use keepalives, heartbeats or other messages
            to exchange control information.  To the computation extent that the
            latency of these transactions mirrors data exchange, they
            can be leveraged to take FT samples within the RTO
            mechanism.  Such samples can help protocols keep their RTO
            accurate during lulls in data transmission.  However, given
            that these messages may not be subject to the same delays as
            data transmission, we do not take a general view on whether
            this is useful or not.

        (d) An RTO mechanism MUST NOT be
            ambiguous. use ambiguous FT samples.

            Assume two copies of some segment X are transmitted at times
            t0 and t1 and then segment X is acknowledged at time t2. t2 the sender receives
            confirmation that X in fact arrived.  In some cases, it is
            not clear which copy of X triggered the
            ACK confirmation and
            hence the actual FT is either t2-t1 or t2-t0, but which is a
            mystery.  Therefore, in this situation an implementation
            MUST use Karn's algorithm [KP87,RFC6298] and use neither
            version of the FT sample and hence not update the RTO.

            There are cases where two copies of some data are
            transmitted in a way whereby the sender can tell which is
            being acknowledged by an incoming ACK.  E.g., TCP's
            timestamp option [RFC7323] allows for segments to be
            uniquely identified and hence avoid the ambiguity.  In such
            cases there is no ambiguity and the resulting samples can
            update the RTO.

    (3) Each time the RTO fires detects a loss and causes a retransmission is
        scheduled, the value of the RTO MUST be exponentially backed off
        such that the next firing requires a longer interval.  The
        backoff may SHOULD be removed after the successful repair of the
        lost data and subsequent transmission of non-retransmitted data.

        A maximum value MAY be placed on the RTO.  The maximum RTO provided it is at least MUST
        NOT be less than 60 seconds (a la [RFC6298]).

        This ensures network safety.

    (4) Retransmission timeouts Retransmissions triggered by the RTO mechanism MUST be taken as
        indications of
        congestion in the network congestion and the sending rate adapted
        using a standard mechanism (e.g., TCP collapses the congestion
        window to one segment [RFC5681]).

        This ensures network safety.

        An exception is

        Exception could be made to this rule if an IETF standardized
        mechanism is used to determine that a particular loss is due to
        a non-congestion event (e.g., packet corruption).  In such a
        case a congestion control action is not required.  Additionally,
        RTO-triggered congestion control actions may be reversed when a
        standard mechanism determines that the cause of the loss was not
        congestion after all. all (e.g., [RFC5682]).

4   Discussion

    We note that research has shown the tension between the
    responsiveness and correctness of retransmission timeouts seems to
    be a fundamental tradeoff in the context of TCP [AP99].  That is,
    making the RTO more aggressive (e.g., via changing TCP's EWMA gains,
    lowering the minimum RTO, etc.) can reduce the time spent waiting on
    needed retransmissions.  However, at the same time, such
    aggressiveness leads to more needless retransmissions.  Therefore,
    being as aggressive as the requirements given in the previous
    section allow in any particular situation may not be the best course
    of action because an RTO expiration carries a requirement to invoke
    a congestion response and hence slow transmission down.

    While the tradeoff between responsiveness and correctness seems
    fundamental, the tradeoff can be made less relevant if the sender
    can detect and recover from spurious RTOs.  Several mechanisms have
    been proposed for this purpose, such as Eifel [RFC3522], F-RTO
    [RFC5682] and DSACK [RFC2883,RFC3708].  Using such mechanisms may
    allow a data originator to tip towards being more responsive without
    incurring (as much of) the attendant costs of needless retransmits.

    Also, note, that in addition to the experiments discussed in [AP99],
    the Linux TCP implementation has been using various non-standard RTO
    mechanisms for many years seemingly without large scale problems
    (e.g., using different EWMA gains). gains than specified in [RFC6298]).
    Further, a number of implementations use minimum RTOs that are less
    than the 1 second specified in [RFC6298].  While the implication of
    these deviations from the standard may be more spurious retransmits
    (per [AP99]), we are aware of no large scale problems caused by this
    change to the minimum RTO.

    Finally, we note that while allowing implementations to be more
    aggressive may in fact increase the number of needless
    retransmissions the above requirements fail safe in that they insist
    on exponential backoff of the RTO and a transmission rate reduction.
    Therefore, allowing providing implementers more latitude than they have
    traditionally been given in their instantiations IETF specifications of
    an RTO mechanism mechanisms
    does not somehow open the flood gates to aggressive behavior.  Since
    there is a downside to being aggressive the incentives for proper
    behavior are retained in the mechanism.

5   Security Considerations

    This document does not alter the security properties of
    retransmission timeout mechanisms.  See [RFC6298] for a discussion
    of these within the context of TCP.


    This document benefits from years of discussions with Ethan Blanton,
    Sally Floyd, Jana Iyengar, Shawn Ostermann, Vern Paxson, and the
    members of the TCPM and TCP-IMPL working groups.  Ran Atkinson,
    Yuchung Cheng, David Black, Gorry Fairhurst, Jonathan Looney and
    Michael Scharf provided useful comments on a previous version of
    this draft.

Normative References

    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
        Requirement Levels", BCP 14, RFC 2119, March 1997.

Informative References

    [AP99] Allman, M., V. Paxson, "On Estimating End-to-End Network Path
        Properties", Proceedings of the ACM SIGCOMM Technical Symposium,
        September 1999.

    [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time
        Estimates in Reliable Transport Protocols", SIGCOMM 87.

    [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
        Selective Acknowledgment Options", RFC 2018, October 1996.

    [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140,
        April 1997.

    [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
        Extension to the Selective Acknowledgement (SACK) Option for
        TCP", RFC 2883, July 2000.

    [RFC3124] Balakrishnan, H., S. Seshan, "The Congestion Manager", RFC
        2134, June 2001.

    [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
        A., Peterson, J., Sparks, R., Handley, M., and E. Schooler,
        "SIP: Session Initiation Protocol", RFC 3261, June 2002.

    [RFC3522] Ludwig, R., M. Meyer, "The Eifel Detection Algorithm for
        TCP", RFC 3522, april 2003.

    [RFC3708] Blanton, E., M. Allman, "Using TCP Duplicate Selective
        Acknowledgement (DSACKs) and Stream Control Transmission
        Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs)
        to Detect Spurious Retransmissions", RFC 3708, February 2004.

    [RFC3940] Adamson, B., C. Bormann, M. Handley, J. Macker,
        "Negative-acknowledgment (NACK)-Oriented Reliable Multicast
        (NORM) Protocol", November 2004, RFC 3940.

    [RFC4340] Kohler, E., M. Handley, S. Floyd, "Datagram Congestion
        Control Protocol (DCCP)", March 2006, RFC 4340.

    [RFC4960] Stweart, R., "Stream Control Transmission Protocol", RFC
        4960, September 2007.

    [RFC5682] Sarolahti, P., M. Kojo, K. Yamamoto, M. Hata, "Forward
        RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious
        Retransmission Timeouts with TCP", RFC 5682, September 2009.

    [RFC5740] Adamson, B., C. Bormann, M. Handley, J. Macker,
        "NACK-Oriented Reliable Multicast (NORM) Transport Protocol",
        November 2009, RFC 5740.

    [RFC6182] Ford, A., C. Raiciu, M. Handley, S. Barre, J. Iyengar,
        "Architectural Guidelines for Multipath TCP Development", March
        2011, RFC 6182.

    [RFC6298] Paxson, V., M. Allman, H.K. Chu, M. Sargent, "Computing
        TCP's Retransmission Timer", June 2011, RFC 6298.

    [RFC6582] Henderson, T., S. Floyd, A. Gurtov, Y. Nishida, "The
        NewReno Modification to TCP's Fast Recovery Algorithm", April
        2012, RFC 6582.

    [RFC6675] Blanton, E., M. Allman, L. Wang, I. Jarvinen, M.  Kojo,
        Y. Nishida, "A Conservative Loss Recovery Algorithm Based on
        Selective Acknowledgment (SACK) for TCP", August 2012, RFC 6675.

    [RFC7323] Borman D., B. Braden, V. Jacobson, R. Scheffenegger, "TCP
        Extensions for High Performance", September 2014, RFC 7323.

Authors' Addresses

   Mark Allman
   International Computer Science Institute
   1947 Center St.  Suite 600
   Berkeley, CA  94704