draft-ietf-tcpm-2140bis-10.txt   draft-ietf-tcpm-2140bis-11.txt 
TCPM WG J. Touch TCPM WG J. Touch
Internet Draft Independent Internet Draft Independent
Intended status: Informational M. Welzl Intended status: Informational M. Welzl
Obsoletes: 2140 S. Islam Obsoletes: 2140 S. Islam
Expires: September 2021 University of Oslo Expires: October 2021 University of Oslo
March 16, 2021 April 12, 2021
TCP Control Block Interdependence TCP Control Block Interdependence
draft-ietf-tcpm-2140bis-10.txt draft-ietf-tcpm-2140bis-11.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
This document may contain material from IETF Documents or IETF This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this 10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow material may not have granted the IETF Trust the right to allow
skipping to change at page 1, line 45 skipping to change at page 1, line 45
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on September 16, 2021. This Internet-Draft will expire on October 12, 2021.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided Section 4.e of the Trust Legal Provisions and are provided
without warranty as described in the Simplified BSD License. without warranty as described in the Simplified BSD License.
Abstract Abstract
This memo provides guidance to TCP implementers that are intended to This memo provides guidance to TCP implementers that is intended to
help improve convergence to steady-state operation without affecting help improve connection convergence to steady-state operation
interoperability. It updates and replaces RFC 2140's description of without affecting interoperability. It updates and replaces RFC
interdependent TCP control blocks and the ways that part of TCP 2140's description of sharing TCP state, as typically represented in
state can be shared among similar concurrent or consecutive TCP Control Blocks, among similar concurrent or consecutive
connections. TCP state includes a combination of parameters, such as connections.
connection state, current round-trip time estimates, congestion
control information, and process information. Most of this state is
maintained on a per-connection basis in the TCP Control Block (TCB),
but implementations can (and do) share certain TCB information
across connections to the same host. Such sharing is intended to
improve overall transient transport performance, while maintaining
backward-compatibility with existing implementations. The sharing
described herein is limited to only the TCB initialization and so
has no effect on the long-term behavior of TCP after a connection
has been established.
Table of Contents Table of Contents
1. Introduction...................................................3 1. Introduction...................................................3
2. Conventions Used in This Document..............................4 2. Conventions Used in This Document..............................4
3. Terminology....................................................4 3. Terminology....................................................4
4. The TCP Control Block (TCB)....................................6 4. The TCP Control Block (TCB)....................................5
5. TCB Interdependence............................................7 5. TCB Interdependence............................................7
6. Temporal Sharing...............................................7 6. Temporal Sharing...............................................7
6.1. Initialization of the new TCB................................7 6.1. Initialization of a new TCB..................................7
6.2. Updates to the new TCB.......................................8 6.2. Updates to the TCB cache.....................................8
6.3. Discussion...................................................9 6.3. Discussion..................................................10
7. Ensemble Sharing..............................................11 7. Ensemble Sharing..............................................11
7.1. Initialization of a new TCB.................................11 7.1. Initialization of a new TCB.................................11
7.2. Updates to the new TCB......................................12 7.2. Updates to the TCB cache....................................12
7.3. Discussion..................................................13 7.3. Discussion..................................................13
8. Issues with TCB information sharing...........................14 8. Issues with TCB information sharing...........................14
8.1. Traversing the same network path............................15 8.1. Traversing the same network path............................15
8.2. State dependence............................................15 8.2. State dependence............................................15
8.3. Problems with IP sharing....................................16 8.3. Problems with sharing based on IP address...................16
9. Implications..................................................16 9. Implications..................................................16
9.1. Layering....................................................16 9.1. Layering....................................................17
9.2. Other possibilities.........................................17 9.2. Other possibilities.........................................17
10. Implementation Observations..................................17 10. Implementation Observations..................................18
11. Updates to RFC 2140..........................................18 11. Changes Compared to RFC 2140.................................19
12. Security Considerations......................................19 12. Security Considerations......................................19
13. IANA Considerations..........................................20 13. IANA Considerations..........................................20
14. References...................................................20 14. References...................................................20
14.1. Normative References....................................20 14.1. Normative References....................................20
14.2. Informative References..................................21 14.2. Informative References..................................21
15. Acknowledgments..............................................23 15. Acknowledgments..............................................24
16. Change log...................................................23 16. Change log...................................................24
Appendix A : TCB Sharing History.................................27 Appendix A : TCB Sharing History.................................28
Appendix B : TCP Option Sharing and Caching......................28 Appendix B : TCP Option Sharing and Caching......................29
Appendix C : Automating the Initial Window in TCP over Long Appendix C : Automating the Initial Window in TCP over Long
Timescales.......................................................30 Timescales.......................................................31
C.1. Introduction.............................................30 C.1. Introduction.............................................31
C.2. Design Considerations....................................30 C.2. Design Considerations....................................31
C.3. Proposed IW Algorithm....................................31 C.3. Proposed IW Algorithm....................................32
C.4. Discussion...............................................35 C.4. Discussion...............................................36
C.5. Observations.............................................36 C.5. Observations.............................................37
1. Introduction 1. Introduction
TCP is a connection-oriented reliable transport protocol layered TCP is a connection-oriented reliable transport protocol layered
over IP [RFC793]. Each TCP connection maintains state, usually in a over IP [RFC793]. Each TCP connection maintains state, usually in a
data structure called the TCP Control Block (TCB). The TCB contains data structure called the TCP Control Block (TCB). The TCB contains
information about the connection state, its associated local information about the connection state, its associated local
process, and feedback parameters about the connection's transmission process, and feedback parameters about the connection's transmission
properties. As originally specified and usually implemented, most properties. As originally specified and usually implemented, most
TCB information is maintained on a per-connection basis. Some TCB information is maintained on a per-connection basis. Some
implementations can (and now do) share certain TCB information implementations share certain TCB information across connections to
across connections to the same host [RFC2140]. Such sharing is the same host [RFC2140]. Such sharing is intended to lead to better
intended to lead to better overall transient performance, especially overall transient performance, especially for numerous short-lived
for numerous short-lived and simultaneous connections, as often used and simultaneous connections, as can be used in the World-Wide Web
in the World-Wide Web [Be94][Br02]. This sharing of state is and other applications [Be94][Br02]. This sharing of state is
intended to help TCP connections converge to long term behavior intended to help TCP connections converge to long term behavior
(assuming stable application load, i.e., so-called "steady-state") (assuming stable application load, i.e., so-called "steady-state")
more quickly without affecting TCP interoperability. more quickly without affecting TCP interoperability.
This document updates RFC 2140's discussion of TCB state sharing and This document updates RFC 2140's discussion of TCB state sharing and
provides a complete replacement for that document. This state provides a complete replacement for that document. This state
sharing affects only TCB initialization [RFC2140] and thus has no sharing affects only TCB initialization [RFC2140] and thus has no
effect on the long-term behavior of TCP after a connection has been effect on the long-term behavior of TCP after a connection has been
established nor on interoperability. Path information shared across established nor on interoperability. Path information shared across
SYN destination port numbers assumes that TCP segments having the SYN destination port numbers assumes that TCP segments having the
same host-pair experience the same path properties, i.e., that same host-pair experience the same path properties, i.e., that
traffic is not routed differently based on port numbers or other traffic is not routed differently based on port numbers or other
connection parameters. The observations about TCB sharing in this connection parameters (also addressed further in Section 8.1). The
document apply similarly to any protocol with congestion state, observations about TCB sharing in this document apply similarly to
including SCTP [RFC4960] and DCCP [RFC4340], as well as for any protocol with congestion state, including SCTP [RFC4960] and
individual subflows in Multipath TCP [RFC8684]. DCCP [RFC4340], as well as for individual subflows in Multipath TCP
[RFC8684].
2. Conventions Used in This Document 2. Conventions Used in This Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
The core of this document describes behavior that is already The core of this document describes behavior that is already
skipping to change at page 5, line 23 skipping to change at page 5, line 15
uses transport packets to discover the PMTU [RFC4821] uses transport packets to discover the PMTU [RFC4821]
+PMTU - largest IP datagram that can traverse a path +PMTU - largest IP datagram that can traverse a path
[RFC1191][RFC8201] [RFC1191][RFC8201]
PMTUD - path-layer MTU discovery, a mechanism that relies on ICMP PMTUD - path-layer MTU discovery, a mechanism that relies on ICMP
error messages to discover the PMTU [RFC1191][RFC8201] error messages to discover the PMTU [RFC1191][RFC8201]
+RTT - round-trip time of a TCP packet exchange [RFC793] +RTT - round-trip time of a TCP packet exchange [RFC793]
+RTTVAR - variance of round-trip times of a TCP packet exchange +RTTVAR - variation of round-trip times of a TCP packet exchange
[RFC6298] [RFC6298]
+rwnd - TCP receive window size [RFC5681] +rwnd - TCP receive window size [RFC5681]
+sendcwnd - TCP send-side congestion window (cwnd) size [RFC5681] +sendcwnd - TCP send-side congestion window (cwnd) size [RFC5681]
+sendMSS - TCP maximum segment size, a value transmitted in a TCP +sendMSS - TCP maximum segment size, a value transmitted in a TCP
option that represents the largest TCP user data payload that can be option that represents the largest TCP user data payload that can be
received [RFC6691] received [RFC6691]
skipping to change at page 6, line 24 skipping to change at page 6, line 18
pointers to Internet Protocol (IP) PCB pointers to Internet Protocol (IP) PCB
Per-connection shared state Per-connection shared state
macro-state macro-state
connection state connection state
timers timers
flags flags
local and remote host numbers and ports local and remote host numbers and ports
TCP option state TCP option state
micro-state micro-state
send and receive window state (size*, current number) send and receive window state (size*, current number)
cong. window size (sendcwnd)* congestion window size (sendcwnd)*
cong. window size threshold (ssthresh)* congestion window size threshold (ssthresh)*
max window size seen* max window size seen*
sendMSS# sendMSS#
MMS_S# MMS_S#
MMS_R# MMS_R#
PMTU# PMTU#
round-trip time and variance# round-trip time and its variation#
The per-connection information is shown as split into macro-state The per-connection information is shown as split into macro-state
and micro-state, terminology borrowed from [Co91]. Macro-state and micro-state, terminology borrowed from [Co91]. Macro-state
describes the protocol for establishing the initial shared state describes the protocol for establishing the initial shared state
about the connection; we include the endpoint numbers and components about the connection; we include the endpoint numbers and components
(timers, flags) required upon commencement that are later used to (timers, flags) required upon commencement that are later used to
help maintain that state. Micro-state describes the protocol after a help maintain that state. Micro-state describes the protocol after a
connection has been established, to maintain the reliability and connection has been established, to maintain the reliability and
congestion control of the data transferred in the connection. congestion control of the data transferred in the connection.
skipping to change at page 7, line 6 skipping to change at page 6, line 48
class is clearly host-pair dependent (shown above as "#", e.g., class is clearly host-pair dependent (shown above as "#", e.g.,
sendMSS, MMS_R, MMS_S, PMTU, RTT), because these parameters are sendMSS, MMS_R, MMS_S, PMTU, RTT), because these parameters are
defined by the endpoint or endpoint pair (sendMSS, MMS_R, MMS_S, defined by the endpoint or endpoint pair (sendMSS, MMS_R, MMS_S,
RTT) or are already cached and shared on that basis (PMTU RTT) or are already cached and shared on that basis (PMTU
[RFC1191][RFC4821]). The other is host-pair dependent in its [RFC1191][RFC4821]). The other is host-pair dependent in its
aggregate (shown above as "*", e.g., congestion window information, aggregate (shown above as "*", e.g., congestion window information,
current window sizes, etc.) because they depend on the total current window sizes, etc.) because they depend on the total
capacity between the two endpoints. capacity between the two endpoints.
Not all of the TCB state is necessarily sharable. In particular, Not all of the TCB state is necessarily sharable. In particular,
some TCP options are negotiated only upon application layer request, some TCP options are negotiated only upon request by the application
so their use may not be correlated across connections. Other options layer, so their use may not be correlated across connections. Other
negotiate connection-specific parameters, which are similarly not options negotiate connection-specific parameters, which are
shareable. These are discussed further in Appendix B. similarly not shareable. These are discussed further in Appendix B.
Finally, we exclude rwnd from further discussion because its value Finally, we exclude rwnd from further discussion because its value
should depend on the send window size, so it is already addressed by should depend on the send window size, so it is already addressed by
send window sharing and is not independently affected by sharing. send window sharing and is not independently affected by sharing.
5. TCB Interdependence 5. TCB Interdependence
There are two cases of TCB interdependence. Temporal sharing occurs There are two cases of TCB interdependence. Temporal sharing occurs
when the TCB of an earlier (now CLOSED) connection to a host is used when the TCB of an earlier (now CLOSED) connection to a host is used
to initialize some parameters of a new connection to that same host, to initialize some parameters of a new connection to that same host,
i.e., in sequence. Ensemble sharing occurs when a currently active i.e., in sequence. Ensemble sharing occurs when a currently active
connection to a host is used to initialize another (concurrent) connection to a host is used to initialize another (concurrent)
connection to that host. connection to that host.
6. Temporal Sharing 6. Temporal Sharing
The TCB data cache is accessed in two ways: it is read to initialize The TCB data cache is accessed in two ways: it is read to initialize
new TCBs and written when more current per-host state is available. new TCBs and written when more current per-host state is available.
6.1. Initialization of the new TCB 6.1. Initialization of a new TCB
TCBs for new connections can be initialized using context from past TCBs for new connections can be initialized using cached context
connections as follows: from past connections as follows:
TEMPORAL SHARING - TCB Initialization TEMPORAL SHARING - TCB Initialization
Cached TCB New TCB Cached TCB New TCB
-------------------------------------- --------------------------------------
old_MMS_S old_MMS_S or not cached* old_MMS_S old_MMS_S or not cached*
old_MMS_R old_MMS_R or not cached* old_MMS_R old_MMS_R or not cached*
old_sendMSS old_sendMSS old_sendMSS old_sendMSS
skipping to change at page 8, line 43 skipping to change at page 8, line 43
options and sharing is provided in Appendix B. options and sharing is provided in Appendix B.
TEMPORAL SHARING - Option Info Initialization TEMPORAL SHARING - Option Info Initialization
Cached New Cached New
------------------------------------ ------------------------------------
old_TFO_cookie old_TFO_cookie old_TFO_cookie old_TFO_cookie
old_TFO_failure old_TFO_failure old_TFO_failure old_TFO_failure
6.2. Updates to the new TCB 6.2. Updates to the TCB cache
During the connection, the associated TCB can be updated based on During a connection, the TCB cache can be updated based on events of
particular events, as shown below: current connections and their TCBs as they progress over time, as
shown below:
TEMPORAL SHARING - Cache Updates TEMPORAL SHARING - Cache Updates
Cached TCB Current TCB when? New Cached TCB Cached TCB Current TCB when? New Cached TCB
---------------------------------------------------------- ----------------------------------------------------------
old_MMS_S curr_MMS_S OPEN curr_MMS_S old_MMS_S curr_MMS_S OPEN curr_MMS_S
old_MMS_R curr_MMS_R OPEN curr_MMS_R old_MMS_R curr_MMS_R OPEN curr_MMS_R
old_sendMSS curr_sendMSS MSSopt curr_sendMSS old_sendMSS curr_sendMSS MSSopt curr_sendMSS
skipping to change at page 9, line 30 skipping to change at page 9, line 30
old_RTTVAR curr_RTTVAR CLOSE merge(curr,old) old_RTTVAR curr_RTTVAR CLOSE merge(curr,old)
old_option curr_option ESTAB (depends on option) old_option curr_option ESTAB (depends on option)
old_ssthresh curr_ssthresh CLOSE merge(curr,old) old_ssthresh curr_ssthresh CLOSE merge(curr,old)
old_sendcwnd curr_sendcwnd CLOSE merge(curr,old) old_sendcwnd curr_sendcwnd CLOSE merge(curr,old)
+Note that PMTU is cached at the IP layer [RFC1191][RFC4821]. +Note that PMTU is cached at the IP layer [RFC1191][RFC4821].
Merge() is the function that combines the current and previous (old)
values and may vary for each parameter of the TCB cache. The
particular function is not specified in this document; examples
include windowed averages (mean of the past N values, for some N)
and exponential decay (new = (1-alpha)*old + alpha *new, where alpha
is in the range [0..1]).
The table below gives an overview of option-specific information The table below gives an overview of option-specific information
that can be similarly shared. The TFO cookie is maintained until the that can be similarly shared. The TFO cookie is maintained until the
client explicitly requests it be updated as a separate event. client explicitly requests it be updated as a separate event.
TEMPORAL SHARING - Option Info Updates TEMPORAL SHARING - Option Info Updates
Cached Current when? New Cached Cached Current when? New Cached
--------------------------------------------------------- ---------------------------------------------------------
old_TFO_cookie old_TFO_cookie ESTAB old_TFO_cookie old_TFO_cookie old_TFO_cookie ESTAB old_TFO_cookie
skipping to change at page 10, line 14 skipping to change at page 10, line 25
recent values from any connection. For sendMSS, the cache is recent values from any connection. For sendMSS, the cache is
consulted only at connection establishment and not otherwise consulted only at connection establishment and not otherwise
updated, which means that MSS options do not affect current updated, which means that MSS options do not affect current
connections. The default sendMSS is never saved; only reported MSS connections. The default sendMSS is never saved; only reported MSS
values update the cache, so an explicit override is required to values update the cache, so an explicit override is required to
reduce the sendMSS. Cached sendMSS affects only data sent in the SYN reduce the sendMSS. Cached sendMSS affects only data sent in the SYN
segment, i.e., during client connection initiation or during segment, i.e., during client connection initiation or during
simultaneous open; all other segment MSS are based on the value simultaneous open; all other segment MSS are based on the value
updated as included in the SYN. updated as included in the SYN.
RTT values are updated by formulae that merge the old and new RTT values are updated by formulae that merges the old and new
values. Dynamic RTT estimation requires a sequence of RTT values, as noted in Section 6.2. Dynamic RTT estimation requires a
measurements. As a result, the cached RTT (and its variance) is an sequence of RTT measurements. As a result, the cached RTT (and its
average of its previous value with the contents of the currently variation) is an average of its previous value with the contents of
active TCB for that host, when a TCB is closed. RTT values are the currently active TCB for that host, when a TCB is closed. RTT
updated only when a connection is closed. The method for merging old values are updated only when a connection is closed. The method for
and current values needs to attempt to reduce the transient effects merging old and current values needs to attempt to reduce the
of the new connections. transient effects of the new connections.
The updates for RTT, RTTVAR and ssthresh rely on existing The updates for RTT, RTTVAR and ssthresh rely on existing
information, i.e., old values. Should no such values exist, the information, i.e., old values. Should no such values exist, the
current values are cached instead. current values are cached instead.
TCP options are copied or merged depending on the details of each TCP options are copied or merged depending on the details of each
option, where "merge" is some function that combines the values of option. E.g., TFO state is updated when a connection is established
"curr" and "old". E.g., TFO state is updated when a connection is and read before establishing a new connection.
established and read before establishing a new connection.
Sections 8 and 9 discuss compatibility issues and implications of Sections 8 and 9 discuss compatibility issues and implications of
sharing the specific information listed above. Section 10 gives an sharing the specific information listed above. Section 10 gives an
overview of known implementations. overview of known implementations.
Most cached TCB values are updated when a connection closes. The Most cached TCB values are updated when a connection closes. The
exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122], exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122],
PMTU which is updated after Path MTU Discovery and also reported by PMTU which is updated after Path MTU Discovery and also reported by
IP [RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the IP [RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the
MSS option is received in the TCP SYN header. MSS option is received in the TCP SYN header.
skipping to change at page 11, line 29 skipping to change at page 11, line 39
Sharing cached TCB data across concurrent connections requires Sharing cached TCB data across concurrent connections requires
attention to the aggregate nature of some of the shared state. For attention to the aggregate nature of some of the shared state. For
example, although MSS and RTT values can be shared by copying, it example, although MSS and RTT values can be shared by copying, it
may not be appropriate to simply copy congestion window or ssthresh may not be appropriate to simply copy congestion window or ssthresh
information; instead, the new values can be a function (f) of the information; instead, the new values can be a function (f) of the
cumulative values and the number of connections (N). cumulative values and the number of connections (N).
7.1. Initialization of a new TCB 7.1. Initialization of a new TCB
TCBs for new connections can be initialized using context from TCBs for new connections can be initialized using cached context
concurrent connections as follows: from concurrent connections as follows:
ENSEMBLE SHARING - TCB Initialization ENSEMBLE SHARING - TCB Initialization
Cached TCB New TCB Cached TCB New TCB
------------------------------------------ ------------------------------------------
old_MMS_S old_MMS_S old_MMS_S old_MMS_S
old_MMS_R old_MMS_R old_MMS_R old_MMS_R
old_sendMSS old_sendMSS old_sendMSS old_sendMSS
skipping to change at page 12, line 29 skipping to change at page 12, line 29
old_RTTVAR old_RTTVAR old_RTTVAR old_RTTVAR
sum(old_ssthresh) f(sum(old_ssthresh), N) sum(old_ssthresh) f(sum(old_ssthresh), N)
sum(old_sendcwnd) f(sum(old_sendcwnd), N) sum(old_sendcwnd) f(sum(old_sendcwnd), N)
_ _
old_option (option specific) old_option (option specific)
+Note that PMTU is cached at the IP layer [RFC1191][RFC4821]. +Note that PMTU is cached at the IP layer [RFC1191][RFC4821].
In the table, the cached sum() is a total across all active
connections because these parameters act in aggregate; similarly f()
is a function that updates that sum based on the new connection's
values, represented as "N".
The table below gives an overview of option-specific information The table below gives an overview of option-specific information
that can be similarly shared. Again, The TFO_cookie is updated upon that can be similarly shared. Again, The TFO_cookie is updated upon
explicit client request, which is a separate event. explicit client request, which is a separate event.
ENSEMBLE SHARING - Option Info Initialization ENSEMBLE SHARING - Option Info Initialization
Cached New Cached New
------------------------------------ ------------------------------------
old_TFO_cookie old_TFO_cookie old_TFO_cookie old_TFO_cookie
old_TFO_failure old_TFO_failure old_TFO_failure old_TFO_failure
7.2. Updates to the new TCB 7.2. Updates to the TCB cache
During the connection, the associated TCB can be updated based on During a connection, the TCB cache can be updated based on changes
changes to concurrent connections, as shown below: to concurrent connections and their TCBs, as shown below:
ENSEMBLE SHARING - Cache Updates ENSEMBLE SHARING - Cache Updates
Cached TCB Current TCB when? New Cached TCB Cached TCB Current TCB when? New Cached TCB
--------------------------------------------------------------- ---------------------------------------------------------------
old_MMS_S curr_MMS_S OPEN curr_MMS_S old_MMS_S curr_MMS_S OPEN curr_MMS_S
old_MMS_R curr_MMS_R OPEN curr_MMS_R old_MMS_R curr_MMS_R OPEN curr_MMS_R
old_sendMSS curr_sendMSS MSSopt curr_sendMSS old_sendMSS curr_sendMSS MSSopt curr_sendMSS
old_PMTU curr_PMTU PMTUD+ / curr_PMTU old_PMTU curr_PMTU PMTUD+ / curr_PMTU
PLPMTUD+ PLPMTUD+
old_RTT curr_RTT update rtt_update(old,curr) old_RTT curr_RTT update rtt_update(old, curr)
old_RTTVAR curr_RTTVAR update rtt_update(old,curr) old_RTTVAR curr_RTTVAR update rtt_update(old, curr)
old_ssthresh curr_ssthresh update adjust sum as appropriate old_ssthresh curr_ssthresh update adjust sum as appropriate
old_sendcwnd curr_sendcwnd update adjust sum as appropriate old_sendcwnd curr_sendcwnd update adjust sum as appropriate
old_option curr_option (depends) (option specific) old_option curr_option (depends) (option specific)
+Note that the PMTU is cached at the IP layer [RFC1191][RFC4821]. +Note that the PMTU is cached at the IP layer [RFC1191][RFC4821].
In the table, rtt_update() is the function used to combine old and
current values, e.g., as a windowed average or exponentially decayed
average.
The table below gives an overview of option-specific information The table below gives an overview of option-specific information
that can be similarly shared. that can be similarly shared.
ENSEMBLE SHARING - Option Info Updates ENSEMBLE SHARING - Option Info Updates
Cached Current when? New Cached Cached Current when? New Cached
---------------------------------------------------------- ----------------------------------------------------------
old_TFO_cookie old_TFO_cookie ESTAB old_TFO_cookie old_TFO_cookie old_TFO_cookie ESTAB old_TFO_cookie
old_TFO_failure old_TFO_failure ESTAB old_TFO_failure old_TFO_failure old_TFO_failure ESTAB old_TFO_failure
skipping to change at page 14, line 17 skipping to change at page 14, line 21
Congestion window size and ssthresh aggregation are more complicated Congestion window size and ssthresh aggregation are more complicated
in the concurrent case. When there is an ensemble of connections, we in the concurrent case. When there is an ensemble of connections, we
need to decide how that ensemble would have shared these variables, need to decide how that ensemble would have shared these variables,
in order to derive initial values for new TCBs. in order to derive initial values for new TCBs.
Sections 8 and 9 discuss compatibility issues and implications of Sections 8 and 9 discuss compatibility issues and implications of
sharing the specific information listed above. sharing the specific information listed above.
There are several ways to initialize the congestion window in a new There are several ways to initialize the congestion window in a new
TCB among an ensemble of current connections to a host. Current TCP TCB among an ensemble of current connections to a host. Current TCP
implementations initialize it to four segments as standard [rfc3390] implementations initialize it to four segments as standard [RFC3390]
and 10 segments experimentally [RFC6928]. These approaches assume and 10 segments experimentally [RFC6928]. These approaches assume
that new connections should behave as conservatively as possible. that new connections should behave as conservatively as possible.
The algorithm described in [Ba12] adjusts the initial cwnd depending The algorithm described in [Ba12] adjusts the initial cwnd depending
on the cwnd values of ongoing connections. It is also possible to on the cwnd values of ongoing connections. It is also possible to
use sharing mechanisms over long timescales to adapt TCP's initial use sharing mechanisms over long timescales to adapt TCP's initial
window automatically, as described further in Appendix C. window automatically, as described further in Appendix C.
8. Issues with TCB information sharing 8. Issues with TCB information sharing
Here, we discuss various types of problems that may arise with TCB Here, we discuss various types of problems that may arise with TCB
skipping to change at page 15, line 20 skipping to change at page 15, line 22
Multipath routing that relies on examining transport headers, such Multipath routing that relies on examining transport headers, such
as ECMP and LAG [RFC7424], may not result in repeatable path as ECMP and LAG [RFC7424], may not result in repeatable path
selection when TCP segments are encapsulated, encrypted, or altered selection when TCP segments are encapsulated, encrypted, or altered
- for example, in some Virtual Private Network (VPN) tunnels that - for example, in some Virtual Private Network (VPN) tunnels that
rely on proprietary encapsulation. Similarly, such approaches cannot rely on proprietary encapsulation. Similarly, such approaches cannot
operate deterministically when the TCP header is encrypted, e.g., operate deterministically when the TCP header is encrypted, e.g.,
when using IPsec ESP (although TCB interdependence among the entire when using IPsec ESP (although TCB interdependence among the entire
set sharing the same endpoint IP addresses should work without set sharing the same endpoint IP addresses should work without
problems when the TCP header is encrypted). Measures to increase the problems when the TCP header is encrypted). Measures to increase the
probability that connections use the same path could be applied: probability that connections use the same path could be applied:
e.g., the connections could be given the same IPv6 flow label. TCB e.g., the connections could be given the same IPv6 flow label
interdependence can also be extended to sets of host IP address [RFC6437]. TCB interdependence can also be extended to sets of host
pairs that share the same network path conditions, such as when a IP address pairs that share the same network path conditions, such
group of addresses is on the same LAN (see Section 9). as when a group of addresses is on the same LAN (see Section 9).
Traversing the same path is not important for host-specific Traversing the same path is not important for host-specific
information such as rwnd and TCP option state, such as TFOinfo, or information such as rwnd and TCP option state, such as TFOinfo, or
for information that is already cached per-host, such as path MTU. for information that is already cached per-host, such as path MTU.
When TCB information is shared across different SYN destination When TCB information is shared across different SYN destination
ports, path-related information can be incorrect; however, the ports, path-related information can be incorrect; however, the
impact of this error is potentially diminished if (as discussed impact of this error is potentially diminished if (as discussed
here) TCB sharing affects only the transient event of a connection here) TCB sharing affects only the transient event of a connection
start or if TCB information is shared only within connections to the start or if TCB information is shared only within connections to the
same SYN destination port. same SYN destination port.
skipping to change at page 16, line 5 skipping to change at page 16, line 7
8.2. State dependence 8.2. State dependence
There may be additional considerations to the way in which TCB There may be additional considerations to the way in which TCB
interdependence rebalances congestion feedback among the current interdependence rebalances congestion feedback among the current
connections, e.g., it may be appropriate to consider the impact of a connections, e.g., it may be appropriate to consider the impact of a
connection being in Fast Recovery [RFC5681] or some other similar connection being in Fast Recovery [RFC5681] or some other similar
unusual feedback state, e.g., as inhibiting or affecting the unusual feedback state, e.g., as inhibiting or affecting the
calculations described herein. calculations described herein.
8.3. Problems with IP sharing 8.3. Problems with sharing based on IP address
It can be wrong to share TCB information between TCP connections on It can be wrong to share TCB information between TCP connections on
the same host as identified by the IP address if an IP address is the same host as identified by the IP address if an IP address is
assigned to a new host (e.g., IP address spinning, as is used by assigned to a new host (e.g., IP address spinning, as is used by
ISPs to inhibit running servers). It can be wrong if Network Address ISPs to inhibit running servers). It can be wrong if Network Address
(and Port) Translation (NA(P)T) [RFC2663] or any other IP sharing (and Port) Translation (NA(P)T) [RFC2663] or any other IP sharing
mechanism is used. Such mechanisms are less likely to be used with mechanism is used. Such mechanisms are less likely to be used with
IPv6. Other methods to identify a host could also be considered to IPv6. Other methods to identify a host could also be considered to
make correct TCB sharing more likely. Moreover, some TCB information make correct TCB sharing more likely. Moreover, some TCB information
is about dominant path properties rather than the specific host. IP is about dominant path properties rather than the specific host. IP
skipping to change at page 16, line 28 skipping to change at page 16, line 30
9. Implications 9. Implications
There are several implications to incorporating TCB interdependence There are several implications to incorporating TCB interdependence
in TCP implementations. First, it may reduce the need for in TCP implementations. First, it may reduce the need for
application-layer multiplexing for performance enhancement application-layer multiplexing for performance enhancement
[RFC7231]. Protocols like HTTP/2 [RFC7540] avoid connection [RFC7231]. Protocols like HTTP/2 [RFC7540] avoid connection
reestablishment costs by serializing or multiplexing a set of per- reestablishment costs by serializing or multiplexing a set of per-
host connections across a single TCP connection. This avoids TCP's host connections across a single TCP connection. This avoids TCP's
per-connection OPEN handshake and also avoids recomputing the MSS, per-connection OPEN handshake and also avoids recomputing the MSS,
RTT, and congestion window values. By avoiding the so-called, "slow- RTT, and congestion window values. By avoiding the so-called "slow-
start restart," performance can be optimized [Hu01]. TCB start restart", performance can be optimized [Hu01]. TCB
interdependence can provide the "slow-start restart avoidance" of interdependence can provide the "slow-start restart avoidance" of
multiplexing, without requiring a multiplexing mechanism at the multiplexing, without requiring a multiplexing mechanism at the
application layer. application layer.
Like the initial version of this document [RFC2140], this update's Like the initial version of this document [RFC2140], this update's
approach to TCB interdependence focuses on sharing a set of TCBs by approach to TCB interdependence focuses on sharing a set of TCBs by
updating the TCB state to reduce the impact of transients when updating the TCB state to reduce the impact of transients when
connections begin or end. Other mechanisms have since been proposed connections begin, end, or otherwise significantly change state.
to continuously share information between all ongoing communication Other mechanisms have since been proposed to continuously share
(including connectionless protocols), updating the congestion state information between all ongoing communication (including
during any congestion-related event (e.g., timeout, loss connectionless protocols), updating the congestion state during any
confirmation, etc.) [RFC3124]. By dealing exclusively with congestion-related event (e.g., timeout, loss confirmation, etc.)
transients, TCB interdependence is more likely to exhibit the same [RFC3124]. By dealing exclusively with transients, the approach in
behavior as unmodified, independent TCP connections. this document is more likely to exhibit the "steady-state" behavior
as unmodified, independent TCP connections.
9.1. Layering 9.1. Layering
TCB interdependence pushes some of the TCP implementation from the TCB interdependence pushes some of the TCP implementation from the
traditional transport layer (in the ISO model), to the network traditional transport layer (in the ISO model), to the network
layer. This acknowledges that some state is in fact per-host-pair or layer. This acknowledges that some state is in fact per-host-pair or
can be per-path as indicated solely by that host-pair. Transport can be per-path as indicated solely by that host-pair. Transport
protocols typically manage per-application-pair associations (per protocols typically manage per-application-pair associations (per
stream), and network protocols manage per-host-pair and path stream), and network protocols manage per-host-pair and path
associations (routing). Round-trip time, MSS, and congestion associations (routing). Round-trip time, MSS, and congestion
information could be more appropriately handled in a network-layer information could be more appropriately handled at the network
fashion, aggregated among concurrent connections, and shared across layer, aggregated among concurrent connections, and shared across
connection instances [RFC3124]. connection instances [RFC3124].
An earlier version of RTT sharing suggested implementing RTT state An earlier version of RTT sharing suggested implementing RTT state
at the IP layer, rather than at the TCP layer. Our observations at the IP layer, rather than at the TCP layer. Our observations
describe sharing state among TCP connections, which avoids some of describe sharing state among TCP connections, which avoids some of
the difficulties in an IP-layer solution. One such problem of an IP the difficulties in an IP-layer solution. One such problem of an IP
layer solution is determining the correspondence between packet layer solution is determining the correspondence between packet
exchanges using IP header information alone, where such exchanges using IP header information alone, where such
correspondence is needed to compute RTT. Because TCB sharing correspondence is needed to compute RTT. Because TCB sharing
computes RTTs inside the TCP layer using TCP header information, it computes RTTs inside the TCP layer using TCP header information, it
skipping to change at page 17, line 42 skipping to change at page 17, line 50
There may be other information that can be shared between concurrent There may be other information that can be shared between concurrent
connections. For example, knowing that another connection has just connections. For example, knowing that another connection has just
tried to expand its window size and failed, a connection may not tried to expand its window size and failed, a connection may not
attempt to do the same for some period. The idea is that existing attempt to do the same for some period. The idea is that existing
TCP implementations infer the behavior of all competing connections, TCP implementations infer the behavior of all competing connections,
including those within the same host or subnet. One possible including those within the same host or subnet. One possible
optimization is to make that implicit feedback explicit, via optimization is to make that implicit feedback explicit, via
extended information associated with the endpoint IP address and its extended information associated with the endpoint IP address and its
TCP implementation, rather than per-connection state in the TCB. TCP implementation, rather than per-connection state in the TCB.
This document focuses on sharing TCB information at connection
initialization. Subsequent to RFC 2140, there have been numerous
approaches that attempt to coordinate ongoing state across
concurrent connections, both within TCP and other congestion-
reactive protocols, which are summarized in [Is18]. These approaches
are more complex to implement and their comparison to steady-state
TCP equivalence can be more difficult to establish, sometimes
intentionally (i.e., they sometimes intend to provide a different
kind of "fairness" than emerges from TCP operation).
10. Implementation Observations 10. Implementation Observations
The observation that some TCB state is host-pair specific rather The observation that some TCB state is host-pair specific rather
than application-pair dependent is not new and is a common than application-pair dependent is not new and is a common
engineering decision in layered protocol implementations. Although engineering decision in layered protocol implementations. Although
now deprecated, T/TCP [RFC1644] was the first to propose using now deprecated, T/TCP [RFC1644] was the first to propose using
caches in order to maintain TCB states (see 0). caches in order to maintain TCB states (see Appendix A).
The table below describes the current implementation status for TCB The table below describes the current implementation status for TCB
temporal sharing in Windows as of December 2020, Linux kernel temporal sharing in Windows as of December 2020, Apple variants
(macOS, iOS, iPadOS, tvOS, watchOS) as of January 2021, Linux kernel
version 5.10.3, and FreeBSD 12. Ensemble sharing is not yet version 5.10.3, and FreeBSD 12. Ensemble sharing is not yet
implemented. implemented.
KNOWN IMPLEMENTATION STATUS KNOWN IMPLEMENTATION STATUS
TCB data Status TCB data Status
------------------------------------------------------------ ------------------------------------------------------------
old_MMS_S Not shared old_MMS_S Not shared
old_MMS_R Not shared old_MMS_R Not shared
skipping to change at page 18, line 35 skipping to change at page 19, line 6
old_TFOinfo Cached and shared in Apple, Linux, Windows old_TFOinfo Cached and shared in Apple, Linux, Windows
old_sendcwnd Not shared old_sendcwnd Not shared
old_ssthresh Cached and shared in Apple, FreeBSD*, Linux* old_ssthresh Cached and shared in Apple, FreeBSD*, Linux*
TFO failure Cached and shared in Apple TFO failure Cached and shared in Apple
In the table above, "Apple" refers to all Apple OSes, i.e., In the table above, "Apple" refers to all Apple OSes, i.e.,
desktop/laptop macOS, phone iOS, video player tvOS, pad ipadOS, and desktop/laptop macOS, phone iOS, pad iPadOS, video player tvOS, and
watch watchOS, which all share the same Internet protocol stack. watch watchOS, which all share the same Internet protocol stack.
*Note: In FreeBSD, new ssthresh is the mean of curr_ssthresh and *Note: In FreeBSD, new ssthresh is the mean of curr_ssthresh and
previous value if a previous value exists; in Linux, the calculation previous value if a previous value exists; in Linux, the calculation
depends on state and is max(curr_cwnd/2, old_ssthresh) in most depends on state and is max(curr_cwnd/2, old_ssthresh) in most
cases. cases.
11. Updates to RFC 2140 11. Changes Compared to RFC 2140
This document updates the description of TCB sharing in RFC 2140 and This document updates the description of TCB sharing in RFC 2140 and
its associated impact on existing and new connection state, its associated impact on existing and new connection state,
providing a complete replacement for that document [RFC2140]. It providing a complete replacement for that document [RFC2140]. It
clarifies the previous description and terminology and extends the clarifies the previous description and terminology and extends the
mechanism to its impact on new protocols and mechanisms, including mechanism to its impact on new protocols and mechanisms, including
multipath TCP, fast open, PLPMTUD, NAT, and the TCP Authentication multipath TCP, fast open, PLPMTUD, NAT, and the TCP Authentication
Option. Option.
The detailed impact on TCB state addresses TCB parameters in greater The detailed impact on TCB state addresses TCB parameters in greater
detail, addressing MSS in both the send and receive direction, MSS detail, addressing MSS in both the send and receive direction, MSS
and send-MSS separately, adds path MTU and ssthresh, and addresses and sendMSS separately, adds path MTU and ssthresh, and addresses
the impact on TCP option state. the impact on TCP option state.
New sections have been added to address compatibility issues and New sections have been added to address compatibility issues and
implementation observations. The relation of this work to T/TCP has implementation observations. The relation of this work to T/TCP has
been moved to 0 on history, partly to reflect the deprecation of been moved to 0 on history, partly to reflect the deprecation of
that protocol. that protocol.
Appendix C has been added to discuss the potential to use temporal Appendix C has been added to discuss the potential to use temporal
sharing over long timescales to adapt TCP's initial window sharing over long timescales to adapt TCP's initial window
automatically, avoiding the need to periodically revise a single automatically, avoiding the need to periodically revise a single
skipping to change at page 21, line 24 skipping to change at page 21, line 42
[Al10] Allman, M., "Initial Congestion Window Specification", [Al10] Allman, M., "Initial Congestion Window Specification",
(work in progress), draft-allman-tcpm-bump-initcwnd-00, (work in progress), draft-allman-tcpm-bump-initcwnd-00,
Nov. 2010. Nov. 2010.
[Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A [Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A
Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala
Lumpur, Malaysia, May 23-27 2016. Lumpur, Malaysia, May 23-27 2016.
[Ba20] Bagnulo, M., Briscoe, B., "ECN++: Adding Explicit [Ba20] Bagnulo, M., Briscoe, B., "ECN++: Adding Explicit
Congestion Notification (ECN) to TCP Control Packets", Congestion Notification (ECN) to TCP Control Packets",
draft-ietf-tcpm-generalized-ecn-06, Oct. 2020. draft-ietf-tcpm-generalized-ecn-07, Feb. 2021.
[Be94] Berners-Lee, T., et al., "The World-Wide Web," [Be94] Berners-Lee, T., et al., "The World-Wide Web,"
Communications of the ACM, V37, Aug. 1994, pp. 76-82. Communications of the ACM, V37, Aug. 1994, pp. 76-82.
[Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for [Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for
Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994. Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994.
[Br02] Brownlee, N., Claffy, K., "Understanding Internet Traffic [Br02] Brownlee, N., Claffy, K., "Understanding Internet Traffic
Streams: Dragonflies and Tortoises", IEEE Communications Streams: Dragonflies and Tortoises", IEEE Communications
Magazine p110-117, 2002. Magazine p110-117, 2002.
skipping to change at page 22, line 5 skipping to change at page 22, line 26
[FreeBSD] FreeBSD source code, Release 2.10, http://www.freebsd.org/ [FreeBSD] FreeBSD source code, Release 2.10, http://www.freebsd.org/
[Hu01] Hughes, A., Touch, J., Heidemann, J., "Issues in Slow- [Hu01] Hughes, A., Touch, J., Heidemann, J., "Issues in Slow-
Start Restart After Idle", draft-hughes-restart-00 Start Restart After Idle", draft-hughes-restart-00
(expired), Dec. 2001. (expired), Dec. 2001.
[Hu12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for [Hu12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for
short TCP flows," 2012 IEEE International Conference on short TCP flows," 2012 IEEE International Conference on
Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213. Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213.
[IANA] IANA TCP Parameters (options) registry,
https://www.iana.org/assignments/tcp-parameters
[Is18] Islam, S., Welzl, M., Hiorth, K., Hayes, D., Armitage, G.,
Gjessing, S., "ctrlTCP: Reducing Latency through Coupled,
Heterogeneous Multi-Flow TCP Congestion Control," Proc.
IEEE INFOCOM Global Internet Symposium (GI) workshop (GI
2018), Honolulu, HI, April 2018.
[Ja88] Jacobson, V., Karels, M., "Congestion Avoidance and [Ja88] Jacobson, V., Karels, M., "Congestion Avoidance and
Control", Proc. Sigcomm 1988. Control", Proc. Sigcomm 1988.
[RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions [RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions
Functional Specification," RFC-1644, July 1994. Functional Specification," RFC-1644, July 1994.
[RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379, [RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379,
September 1992. September 1992.
[RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast
skipping to change at page 22, line 43 skipping to change at page 23, line 27
[RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion [RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion
Control Protocol (DCCP)," RFC 4340, Mar. 2006. Control Protocol (DCCP)," RFC 4340, Mar. 2006.
[RFC4960] Stewart, R., (Ed.), "Stream Control Transmission [RFC4960] Stewart, R., (Ed.), "Stream Control Transmission
Protocol," RFC4960, Sept. 2007. Protocol," RFC4960, Sept. 2007.
[RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication [RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication
Option," RFC 5925, June 2010. Option," RFC 5925, June 2010.
[RFC6437] Amante, S., Carpenter, B., Jiang, S., Rajajalme, J., "IPv6
Flow Label Specification," RFC 6437, Nov. 2011.
[RFC6691] Borman, D., "TCP Options and Maximum Segment Size (MSS)," [RFC6691] Borman, D., "TCP Options and Maximum Segment Size (MSS),"
RFC 6691, July 2012. RFC 6691, July 2012.
[RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing
TCP's Initial Window," RFC 6928, Apr. 2013. TCP's Initial Window," RFC 6928, Apr. 2013.
[RFC7231] Fielding, R., Reshke, J., Eds., "HTTP/1.1 Semantics and [RFC7231] Fielding, R., Reshke, J., Eds., "HTTP/1.1 Semantics and
Content," RFC-7231, June 2014. Content," RFC-7231, June 2014.
[RFC7323] Borman, D., Braden, B., Jacobson, V., Scheffenegger, R., [RFC7323] Borman, D., Braden, B., Jacobson, V., Scheffenegger, R.,
skipping to change at page 23, line 42 skipping to change at page 24, line 30
research project between the University of Oslo and Huawei research project between the University of Oslo and Huawei
Technologies Co., Ltd. and were partly supported by USC/ISI's Postel Technologies Co., Ltd. and were partly supported by USC/ISI's Postel
Center. Center.
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
16. Change log 16. Change log
This section should be removed upon final publication as an RFC. This section should be removed upon final publication as an RFC.
ietf-11:
- Addressed gen-art review and IESG feedback
ietf-10: ietf-10:
- Addressed gen-art review request for clarifications - Addressed IETF last call feedback
ietf-09: ietf-09:
- Correction of typographic errors - Correction of typographic errors
ietf-08: ietf-08:
- Address TSV AD comments, add Apple OS implementation status - Address TSV AD comments, add Apple OS implementation status
ietf-07: ietf-07:
skipping to change at page 27, line 8 skipping to change at page 28, line 8
PO Box 1080 Blindern PO Box 1080 Blindern
Oslo N-0316 Oslo N-0316
Norway Norway
Phone: +47 22 84 08 37 Phone: +47 22 84 08 37
Email: safiquli@ifi.uio.no Email: safiquli@ifi.uio.no
Appendix A: TCB Sharing History Appendix A: TCB Sharing History
T/TCP proposed using caches to maintain TCB information across T/TCP proposed using caches to maintain TCB information across
instances (temporal sharing), e.g., smoothed RTT, RTT variance, instances (temporal sharing), e.g., smoothed RTT, RTT variation,
congestion avoidance threshold, and MSS [RFC1644]. These values were congestion avoidance threshold, and MSS [RFC1644]. These values were
in addition to connection counts used by T/TCP to accelerate data in addition to connection counts used by T/TCP to accelerate data
delivery prior to the full three-way handshake during an OPEN. The delivery prior to the full three-way handshake during an OPEN. The
goal was to aggregate TCB components where they reflect one goal was to aggregate TCB components where they reflect one
association - that of the host-pair, rather than artificially association - that of the host-pair, rather than artificially
separating those components by connection. separating those components by connection.
At least one T/TCP implementation saved the MSS and aggregated the At least one T/TCP implementation saved the MSS and aggregated the
RTT parameters across multiple connections but omitted caching the RTT parameters across multiple connections but omitted caching the
congestion window information [Br94], as originally specified in congestion window information [Br94], as originally specified in
skipping to change at page 28, line 8 skipping to change at page 29, line 8
the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same
[FreeBSD]. As mentioned before, only the MSS and RTT parameters were [FreeBSD]. As mentioned before, only the MSS and RTT parameters were
cached, as originally specified in [RFC1379]. Later discussion of cached, as originally specified in [RFC1379]. Later discussion of
T/TCP suggested including congestion control parameters in this T/TCP suggested including congestion control parameters in this
cache; for example, [RFC1644] (Section 3.1) hints at initializing cache; for example, [RFC1644] (Section 3.1) hints at initializing
the congestion window to the old window size. the congestion window to the old window size.
Appendix B: TCP Option Sharing and Caching Appendix B: TCP Option Sharing and Caching
In addition to the options that can be cached and shared, this memo In addition to the options that can be cached and shared, this memo
also lists known options for which state is unsafe to be kept. This also lists known TCP options [IANA] for which state is unsafe to be
list is not intended to be authoritative or exhaustive. kept. This list is not intended to be authoritative or exhaustive.
Obsolete (unsafe to keep state): Obsolete (unsafe to keep state):
ECHO ECHO
ECHO REPLY ECHO REPLY
PO Conn permitted PO Conn permitted
PO service profile PO service profile
skipping to change at page 30, line 12 skipping to change at page 31, line 12
TFO cookie (if TFO succeeded in the past) TFO cookie (if TFO succeeded in the past)
Appendix C: Automating the Initial Window in TCP over Long Timescales Appendix C: Automating the Initial Window in TCP over Long Timescales
C.1. Introduction C.1. Introduction
Temporal sharing, as described earlier in this document, builds on Temporal sharing, as described earlier in this document, builds on
the assumption that multiple consecutive connections between the the assumption that multiple consecutive connections between the
same host pair are somewhat likely to be exposed to similar same host pair are somewhat likely to be exposed to similar
environment characteristics. The stored information can therefore environment characteristics. The stored information can become less
become invalid over time, and suitable precautions should be taken accurate over time and suitable precautions should take this ageing
(this is discussed further in section 8.1). However, there are also into consideration (this is discussed further in section 8.1).
cases where it can make sense to use much longer-term measurements However, there are also cases where it can make sense to track these
of TCP connections to gradually influence TCP parameters. This values over longer periods, observing properties of TCP connections
to gradually influence evolving trends in TCP parameters. This
appendix describes an example of such a case. appendix describes an example of such a case.
TCP's congestion control algorithm uses an initial window value TCP's congestion control algorithm uses an initial window value
(IW), both as a starting point for new connections and as an upper (IW), both as a starting point for new connections and as an upper
limit for restarting after an idle period [RFC5681][RFC7661]. This limit for restarting after an idle period [RFC5681][RFC7661]. This
value has evolved over time, originally one maximum segment size value has evolved over time, originally one maximum segment size
(MSS), and increased to the lesser of four MSS or 4,380 bytes (MSS), and increased to the lesser of four MSS or 4,380 bytes
[RFC3390][RFC5681]. For a typical Internet connection with a maximum [RFC3390][RFC5681]. For a typical Internet connection with a maximum
transmission unit (MTU) of 1500 bytes, this permits three segments transmission unit (MTU) of 1500 bytes, this permits three segments
of 1,460 bytes each. of 1,460 bytes each.
skipping to change at page 31, line 28 skipping to change at page 32, line 29
o Increase the IW in the absence of sustained loss of IW segments, o Increase the IW in the absence of sustained loss of IW segments,
as determined over a number of different connections. as determined over a number of different connections.
o Operate conservatively, i.e., tend towards leaving the IW the o Operate conservatively, i.e., tend towards leaving the IW the
same in the absence of sufficient information, and give greater same in the absence of sufficient information, and give greater
consideration to IW segment loss than IW segment success. consideration to IW segment loss than IW segment success.
We expect that, without other context, a good IW algorithm will We expect that, without other context, a good IW algorithm will
converge to a single value, but this is not required. An endpoint converge to a single value, but this is not required. An endpoint
with additional context or information, or deployed in a constrained with additional context or information, or deployed in a constrained
environment, can always use a different value. In specific, environment, can always use a different value. In particular,
information from previous connections, or sets of connections with a information from previous connections, or sets of connections with a
similar path, can already be used as context for such decisions (as similar path, can already be used as context for such decisions (as
noted in the core of this document). noted in the core of this document).
However, if a given IW value persistently causes packet loss during However, if a given IW value persistently causes packet loss during
the initial burst of packets, it is clearly inappropriate and could the initial burst of packets, it is clearly inappropriate and could
be inducing unnecessary loss in other competing connections. This be inducing unnecessary loss in other competing connections. This
might happen for sites behind very slow boxes with small buffers, might happen for sites behind very slow boxes with small buffers,
which may or may not be the first hop. which may or may not be the first hop.
skipping to change at page 32, line 27 skipping to change at page 33, line 29
Internet (here we selected the current de-facto standard rather than Internet (here we selected the current de-facto standard rather than
the actual standard). Current proposals, including default current the actual standard). Current proposals, including default current
operation, are degenerate cases of the algorithm below for given operation, are degenerate cases of the algorithm below for given
parameters - notably MulDec = 1.0 and AddIncr = 0 MSS, thus parameters - notably MulDec = 1.0 and AddIncr = 0 MSS, thus
disabling the automatic part of the algorithm. disabling the automatic part of the algorithm.
The proposed algorithm is as follows: The proposed algorithm is as follows:
1. On boot: 1. On boot:
IW = MaxIW; # assume this is in bytes, and an even number of MSS IW = MaxIW; # assume this is in bytes, and indicates an integer
multiple of 2 MSS (an even number to support ACK compression)
2. Upon starting a new connection: 2. Upon starting a new connection:
CWND = IW; CWND = IW;
conncount++; conncount++;
IWnotchecked = 1; # true IWnotchecked = 1; # true
3. During a connection's SYN-ACK processing, if SYN-ACK includes ECN 3. During a connection's SYN-ACK processing, if SYN-ACK includes ECN
(as similarly addressed in Sec 5 of ECN++ for TCP [Ba20]), treat (as similarly addressed in Sec 5 of ECN++ for TCP [Ba20]), treat
as if the IW is too large: as if the IW is too large:
skipping to change at page 33, line 38 skipping to change at page 34, line 38
As presented, this algorithm can yield a false positive when the As presented, this algorithm can yield a false positive when the
sequence number wraps around, e.g., the code might increment sequence number wraps around, e.g., the code might increment
losscount in step 4 when no loss occurred or fail to increment losscount in step 4 when no loss occurred or fail to increment
losscount when a loss did occur. This can be avoided using either losscount when a loss did occur. This can be avoided using either
PAWS [RFC7323] context or internal extended sequence number PAWS [RFC7323] context or internal extended sequence number
representations (as in TCP-AO [RFC5925]). Alternately, false representations (as in TCP-AO [RFC5925]). Alternately, false
positives can be tolerated because they are expected to be positives can be tolerated because they are expected to be
infrequent and thus will not significantly impact the algorithm. infrequent and thus will not significantly impact the algorithm.
A number of additional constraints need to be imposed if this A number of additional constraints need to be imposed if this
mechanism is implemented to ensure that it defaults values that mechanism is implemented to ensure that it defaults to values that
comply with current Internet standards, is conservative in how it comply with current Internet standards, is conservative in how it
extends those values, and returns to those values in the absence of extends those values, and returns to those values in the absence of
positive feedback (i.e., success). To that end, we recommend the positive feedback (i.e., success). To that end, we recommend the
following list of example constraints: following list of example constraints:
>> The automatic IW algorithm MUST initialize MaxIW a value no >> The automatic IW algorithm MUST initialize MaxIW a value no
larger than the currently recommended Internet default, in the larger than the currently recommended Internet default, in the
absence of other context information. absence of other context information.
Thus, if there are too few connections to make a decision or if Thus, if there are too few connections to make a decision or if
skipping to change at page 35, line 44 skipping to change at page 36, line 44
False positives can occur during some kinds of segment reordering, False positives can occur during some kinds of segment reordering,
e.g., that might trigger spurious retransmissions even without a e.g., that might trigger spurious retransmissions even without a
true segment loss. These are not expected to be sufficiently common true segment loss. These are not expected to be sufficiently common
to dominate the algorithm and its conclusions. to dominate the algorithm and its conclusions.
This mechanism does require additional per-connection state, which This mechanism does require additional per-connection state, which
is currently common in some implementations, and is useful for other is currently common in some implementations, and is useful for other
reasons (e.g., the ISN is used in TCP-AO [RFC5925]). The mechanism reasons (e.g., the ISN is used in TCP-AO [RFC5925]). The mechanism
also benefits from persistent state kept across reboots, as would be also benefits from persistent state kept across reboots, as would be
other state sharing mechanisms (e.g., TCP Control Block Sharing other state sharing mechanisms (e.g., TCP Control Block Sharing per
[RFC2140]). The mechanism is inspired by RFC 2140's use of the main body of this document).
information across connections.
The receive window (rwnd) is not involved in this calculation. The The receive window (rwnd) is not involved in this calculation. The
size of rwnd is determined by receiver resources and provides space size of rwnd is determined by receiver resources and provides space
to accommodate segment reordering. It is not involved with to accommodate segment reordering. It is not involved with
congestion control, which is the focus of this document and its congestion control, which is the focus of this document and its
management of the IW. management of the IW.
C.5. Observations C.5. Observations
The IW may not converge to a single, global value. It also may not The IW may not converge to a single, global value. It also may not
 End of changes. 56 change blocks. 
118 lines changed or deleted 154 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/