draft-ietf-tcpm-2140bis-05.txt   draft-ietf-tcpm-2140bis-06.txt 
TCPM WG J. Touch TCPM WG J. Touch
Internet Draft Independent Internet Draft Independent
Intended status: Informational M. Welzl Intended status: Informational M. Welzl
Obsoletes: 2140 S. Islam Obsoletes: 2140 S. Islam
Expires: October 2020 University of Oslo Expires: May 2021 University of Oslo
April 29, 2020 November 25, 2020
TCP Control Block Interdependence TCP Control Block Interdependence
draft-ietf-tcpm-2140bis-05.txt draft-ietf-tcpm-2140bis-06.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
This document may contain material from IETF Documents or IETF This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this 10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow material may not have granted the IETF Trust the right to allow
skipping to change at page 1, line 45 skipping to change at page 1, line 45
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on October 29, 2020. This Internet-Draft will expire on May 25, 2021.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 21 skipping to change at page 3, line 21
9. Implications..................................................15 9. Implications..................................................15
9.1. Layering....................................................15 9.1. Layering....................................................15
9.2. Other possibilities.........................................16 9.2. Other possibilities.........................................16
10. Implementation Observations..................................16 10. Implementation Observations..................................16
11. Updates to RFC 2140..........................................17 11. Updates to RFC 2140..........................................17
12. Security Considerations......................................18 12. Security Considerations......................................18
13. IANA Considerations..........................................18 13. IANA Considerations..........................................18
14. References...................................................19 14. References...................................................19
14.1. Normative References....................................19 14.1. Normative References....................................19
14.2. Informative References..................................19 14.2. Informative References..................................19
15. Acknowledgments..............................................21 15. Acknowledgments..............................................22
16. Change log...................................................22 16. Change log...................................................22
Appendix A : TCB Sharing History.................................25 Appendix A : TCB Sharing History.................................25
Appendix B : TCP Option Sharing and Caching......................26 Appendix B : TCP Option Sharing and Caching......................26
Appendix C : Automating the Initial Window in TCP over Long Appendix C : Automating the Initial Window in TCP over Long
Timescales.......................................................28 Timescales.......................................................28
C.1. Introduction.............................................28 C.1. Introduction.............................................28
C.2. Design Considerations....................................28 C.2. Design Considerations....................................28
C.3. Proposed IW Algorithm....................................29 C.3. Proposed IW Algorithm....................................29
C.4. Discussion...............................................32 C.4. Discussion...............................................33
C.5. Observations.............................................33 C.5. Observations.............................................34
1. Introduction 1. Introduction
TCP is a connection-oriented reliable transport protocol layered TCP is a connection-oriented reliable transport protocol layered
over IP [RFC793]. Each TCP connection maintains state, usually in a over IP [RFC793]. Each TCP connection maintains state, usually in a
data structure called the TCP Control Block (TCB). The TCB contains data structure called the TCP Control Block (TCB). The TCB contains
information about the connection state, its associated local information about the connection state, its associated local
process, and feedback parameters about the connection's transmission process, and feedback parameters about the connection's transmission
properties. As originally specified and usually implemented, most properties. As originally specified and usually implemented, most
TCB information is maintained on a per-connection basis. Some TCB information is maintained on a per-connection basis. Some
skipping to change at page 9, line 13 skipping to change at page 9, line 13
old_TFO_failure old_TFO_failure ESTAB old_TFO_failure old_TFO_failure old_TFO_failure ESTAB old_TFO_failure
6.3. Discussion 6.3. Discussion
There is no particular benefit to caching MMS_S and MMS_R as these There is no particular benefit to caching MMS_S and MMS_R as these
are reported by the local IP stack. Caching sendMSS and PMTU is are reported by the local IP stack. Caching sendMSS and PMTU is
trivial; reported values are cached, and the most recent values are trivial; reported values are cached, and the most recent values are
used. The cache is updated when the MSS option is received in a SYN used. The cache is updated when the MSS option is received in a SYN
or after PMTUD (i.e., when an ICMPv4 Fraqmentation Needed [RFC1191] or after PMTUD (i.e., when an ICMPv4 Fraqmentation Needed [RFC1191]
or ICMPv6 Packet Too Big message is received [RFC8201] or the or ICMPv6 Packet Too Big message is received [RFC8201] or the
equivalent is inferred, e.g. as from PLPMTUD [RFC4821]), equivalent is inferred, e.g., as from PLPMTUD [RFC4821]),
respectively, so the cache always has the most recent values from respectively, so the cache always has the most recent values from
any connection. For sendMSS, the cache is consulted only at any connection. For sendMSS, the cache is consulted only at
connection establishment and not otherwise updated, which means that connection establishment and not otherwise updated, which means that
MSS options do not affect current connections. The default sendMSS MSS options do not affect current connections. The default sendMSS
is never saved; only reported MSS values update the cache, so an is never saved; only reported MSS values update the cache, so an
explicit override is required to reduce the sendMSS. explicit override is required to reduce the sendMSS.
RTT values are updated by formulae that merge the old and new RTT values are updated by formulae that merge the old and new
values. Dynamic RTT estimation requires a sequence of RTT values. Dynamic RTT estimation requires a sequence of RTT
measurements. As a result, the cached RTT (and its variance) is an measurements. As a result, the cached RTT (and its variance) is an
skipping to change at page 10, line 4 skipping to change at page 10, line 4
Most cached TCB values are updated when a connection closes. The Most cached TCB values are updated when a connection closes. The
exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122], exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122],
PMTU which is updated after Path MTU Discovery PMTU which is updated after Path MTU Discovery
[RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the [RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the
MSS option is received in the TCP SYN header. MSS option is received in the TCP SYN header.
Sharing sendMSS information affects only data in the SYN of the next Sharing sendMSS information affects only data in the SYN of the next
connection, because sendMSS information is typically included in connection, because sendMSS information is typically included in
most TCP SYN segments. Caching PMTU can accelerate the efficiency of most TCP SYN segments. Caching PMTU can accelerate the efficiency of
PMTUD, but can also result in black-holing until corrected if in PMTUD but can also result in black-holing until corrected if in
error. Caching MMS_R and MMS_S may be of little direct value as they error. Caching MMS_R and MMS_S may be of little direct value as they
are reported by the local IP stack anyway. are reported by the local IP stack anyway.
The way in which other TCP option state can be shared depends on the The way in which other TCP option state can be shared depends on the
details of that option. E.g., TFO state includes the TCP Fast Open details of that option. E.g., TFO state includes the TCP Fast Open
Cookie [RFC7413] or, in case TFO fails, a negative TCP Fast Open Cookie [RFC7413] or, in case TFO fails, a negative TCP Fast Open
response. RFC 7413 states, "The client MUST cache negative responses response. RFC 7413 states, "The client MUST cache negative responses
from the server in order to avoid potential connection failures. from the server in order to avoid potential connection failures.
Negative responses include the server not acknowledging the data in Negative responses include the server not acknowledging the data in
the SYN, ICMP error messages, and (most importantly) no response the SYN, ICMP error messages, and (most importantly) no response
skipping to change at page 13, line 27 skipping to change at page 13, line 27
of the current windows is increased for any new connection. This can of the current windows is increased for any new connection. This can
have detrimental consequences where several connections share a have detrimental consequences where several connections share a
highly congested link. highly congested link.
There are several ways to initialize the congestion window in a new There are several ways to initialize the congestion window in a new
TCB among an ensemble of current connections to a host. Current TCP TCB among an ensemble of current connections to a host. Current TCP
implementations initialize it to four segments as standard [rfc3390] implementations initialize it to four segments as standard [rfc3390]
and 10 segments experimentally [RFC6928]. These approaches assume and 10 segments experimentally [RFC6928]. These approaches assume
that new connections should behave as conservatively as possible. that new connections should behave as conservatively as possible.
The algorithm described in [Ba12] adjusts the initial cwnd depending The algorithm described in [Ba12] adjusts the initial cwnd depending
on the cwnd values of ongoing connections. There have also been on the cwnd values of ongoing connections. It is also possible to
suggestions to use the kind of sharing mechanisms described in this use sharing mechanisms over long timescales to adapt TCP's initial
document over long timescales to adapt TCP's initial window window automatically, as described further in Appendix A.
automatically, as described further in Appendix A [To12].
8. Compatibility Issues 8. Compatibility Issues
Here, we discuss various types of problems that may arise with TCB Here, we discuss various types of problems that may arise with TCB
information sharing. information sharing.
For the congestion and current window information, the initial For the congestion and current window information, the initial
values computed by TCB interdependence may not be consistent with values computed by TCB interdependence may not be consistent with
the long-term aggregate behavior of a set of concurrent connections the long-term aggregate behavior of a set of concurrent connections
between the same endpoints. Under conventional TCP congestion between the same endpoints. Under conventional TCP congestion
skipping to change at page 18, line 12 skipping to change at page 18, line 12
and send-MSS separately, adds path MTU and ssthresh, and addresses and send-MSS separately, adds path MTU and ssthresh, and addresses
the impact on TCP option state. the impact on TCP option state.
New sections have been added to address compatibility issues and New sections have been added to address compatibility issues and
implementation observations. The relation of this work to T/TCP has implementation observations. The relation of this work to T/TCP has
been moved to Appendix A on history, partly to reflect the been moved to Appendix A on history, partly to reflect the
deprecation of that protocol. deprecation of that protocol.
Appendix C has been added to discuss the potential to use temporal Appendix C has been added to discuss the potential to use temporal
sharing over long timescales to adapt TCP's initial window sharing over long timescales to adapt TCP's initial window
automatically, largely imported from [To12]. automatically, avoiding the need to periodically revise a single
global constant value.
Finally, this document updates and significantly expands the Finally, this document updates and significantly expands the
referenced literature. referenced literature.
12. Security Considerations 12. Security Considerations
These presented implementation methods do not have additional These presented implementation methods do not have additional
ramifications for explicit attacks. They may be susceptible to ramifications for explicit attacks. They may be susceptible to
denial-of-service attacks if not otherwise secured. denial-of-service attacks if not otherwise secured.
skipping to change at page 18, line 36 skipping to change at page 18, line 37
Implications section). Some shared TCB parameters are used only to Implications section). Some shared TCB parameters are used only to
create new TCBs, others are shared among the TCBs of ongoing create new TCBs, others are shared among the TCBs of ongoing
connections. New connections can join the ongoing set, e.g., to connections. New connections can join the ongoing set, e.g., to
optimize send window size among a set of connections to the same optimize send window size among a set of connections to the same
host. host.
Attacks on parameters used only for initialization affect only the Attacks on parameters used only for initialization affect only the
transient performance of a TCP connection. For short connections, transient performance of a TCP connection. For short connections,
the performance ramification can approach that of a denial-of- the performance ramification can approach that of a denial-of-
service attack. E.g., if an application changes its TCB to have a service attack. E.g., if an application changes its TCB to have a
false and small window size, subsequent connections would experience false and small window size, subsequent connections will experience
performance degradation until their window grew appropriately. performance degradation until their window grew appropriately.
TCB sharing reuses and mixes information from past and current TCB sharing reuses and mixes information from past and current
connections. Although reusing information could create a potential connections. Although reusing information could create a potential
for fingerprinting to identify hosts, the mixing reduces that for fingerprinting to identify hosts, the mixing reduces that
potential. There has been no evidence of fingerprinting based on potential. There has been no evidence of fingerprinting based on
this technique and it is currently considered safe in that regard. this technique and it is currently considered safe in that regard.
13. IANA Considerations 13. IANA Considerations
skipping to change at page 19, line 50 skipping to change at page 20, line 5
14.2. Informative References 14.2. Informative References
[Al10] Allman, M., "Initial Congestion Window Specification", [Al10] Allman, M., "Initial Congestion Window Specification",
(work in progress), draft-allman-tcpm-bump-initcwnd-00, (work in progress), draft-allman-tcpm-bump-initcwnd-00,
Nov. 2010. Nov. 2010.
[Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A [Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A
Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala
Lumpur, Malaysia, May 23-27 2016. Lumpur, Malaysia, May 23-27 2016.
[Ba20] Bagnulo, M., Briscoe, B., "ECN++: Adding Explicit
Congestion Notification (ECN) to TCP Control Packets",
draft-ietf-tcpm-generalized-ecn-06, Oct. 2020.
[Be94] Berners-Lee, T., et al., "The World-Wide Web," [Be94] Berners-Lee, T., et al., "The World-Wide Web,"
Communications of the ACM, V37, Aug. 1994, pp. 76-82. Communications of the ACM, V37, Aug. 1994, pp. 76-82.
[Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for [Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for
Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994. Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994.
[Br02] Brownlee, N. and K. Claffy, "Understanding Internet [Br02] Brownlee, N. and K. Claffy, "Understanding Internet
Traffic Streams: Dragonflies and Tortoises", IEEE Traffic Streams: Dragonflies and Tortoises", IEEE
Communications Magazine p110-117, 2002. Communications Magazine p110-117, 2002.
skipping to change at page 21, line 45 skipping to change at page 22, line 5
B., "Mechanisms for Optimizing Link Aggregation Group B., "Mechanisms for Optimizing Link Aggregation Group
(LAG) and Equal-Cost Multipath (ECMP) Component Link (LAG) and Equal-Cost Multipath (ECMP) Component Link
Utilization in Networks", RFC 7424, Jan. 2015 Utilization in Networks", RFC 7424, Jan. 2015
[RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer [RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer
Protocol Version 2 (HTTP/2)", RFC 7540, May 2015. Protocol Version 2 (HTTP/2)", RFC 7540, May 2015.
[RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP [RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP
to Support Rate-Limited Traffic", RFC 7661, Oct. 2015. to Support Rate-Limited Traffic", RFC 7661, Oct. 2015.
[To12] Touch, J., "Automating the Initial Window in TCP," draft-
touch-tcpm-automatic-iw-03 (expired), July 2012.
15. Acknowledgments 15. Acknowledgments
The authors would like to thank for Praveen Balasubramanian for The authors would like to thank for Praveen Balasubramanian for
information regarding TCB sharing in Windows, and Yuchung Cheng, information regarding TCB sharing in Windows, and Yuchung Cheng,
Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on
earlier versions of the draft. Earlier revisions of this work earlier versions of the draft. Earlier revisions of this work
received funding from a collaborative research project between the received funding from a collaborative research project between the
University of Oslo and Huawei Technologies Co., Ltd. and were partly University of Oslo and Huawei Technologies Co., Ltd. and were partly
supported by USC/ISI's Postel Center. supported by USC/ISI's Postel Center.
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
16. Change log 16. Change log
This section should be removed upon final publication as an RFC. This section should be removed upon final publication as an RFC.
ietf-06:
- Address WGLC comments
ietf-05:
- Correction of typographic errors, expansion of terminology
ietf-04: ietf-04:
- Fix internal cross-reference errors that appeared in ietf-02 - Fix internal cross-reference errors that appeared in ietf-02
- Updated tables to re-center; clarified text - Updated tables to re-center; clarified text
ietf-03: ietf-03:
- Correction of typographic errors, minor rewording in appendices - Correction of typographic errors, minor rewording in appendices
ietf-02: ietf-02:
skipping to change at page 23, line 25 skipping to change at page 23, line 37
- Stated that our OS implementation overview table only covers - Stated that our OS implementation overview table only covers
temporal sharing. temporal sharing.
- Correctly reflected sharing of old_RTT in Linux in the - Correctly reflected sharing of old_RTT in Linux in the
implementation overview table. implementation overview table.
- Marked entries that are considered safe to share with an - Marked entries that are considered safe to share with an
asterisk (suggestion was to split the table) asterisk (suggestion was to split the table)
- Discussed correct host identification: NATs may make IP - Discussed correct host identification: NATs may make IP
addresses the wrong input, could e.g. use HTTP cookie. addresses the wrong input, could e.g., use HTTP cookie.
- Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and - Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and
MTU MTU
- Added information about option sharing, listed options in - Added information about option sharing, listed options in
Appendix B Appendix B
Authors' Addresses Authors' Addresses
Joe Touch Joe Touch
skipping to change at page 28, line 7 skipping to change at page 28, line 7
MSS MSS
TFO negotiation failure (to avoid negotiation retries) TFO negotiation failure (to avoid negotiation retries)
Safe and necessary to keep state: Safe and necessary to keep state:
TFP cookie (if TFO succeeded in the past) TFP cookie (if TFO succeeded in the past)
Appendix C: Automating the Initial Window in TCP over Long Timescales Appendix C: Automating the Initial Window in TCP over Long Timescales
Note: this section is imported from [To12], updated only to refer to
itself as an appendix.
C.1. Introduction C.1. Introduction
Temporal sharing, as described earlier in this document, builds on
the assumption that multiple consecutive connections between the
same host pair are somewhat likely to be exposed to similar
environment characteristics. The stored information can therefore
become invalid over time, and suitable precautions should be taken
(this is discussed further in section 8.1). However, there are also
cases where it can make sense to use much longer-term measurements
of TCP connections to gradually influence TCP parameters. This
appendix describes an example of such a case.
TCP's congestion control algorithm uses an initial window value TCP's congestion control algorithm uses an initial window value
(IW), both as a starting point for new connections and after one RTO (IW), both as a starting point for new connections and as an upper
or more [RFC5681][RFC7661]. This value has evolved over time, limit for restarting after an idle period [RFC5681][RFC7661]. This
originally one maximum segment size (MSS), and increased to the value has evolved over time, originally one maximum segment size
lesser of four MSS or 4,380 bytes [RFC3390][RFC5681]. For typical (MSS), and increased to the lesser of four MSS or 4,380 bytes
Internet connections with an maximum transmission units (MTUs) of [RFC3390][RFC5681]. For a typical Internet connection with a maximum
1500 bytes, this permits three segments of 1,460 bytes each. transmission unit (MTU) of 1500 bytes, this permits three segments
of 1,460 bytes each.
The IW value was originally implied in the original TCP congestion The IW value was originally implied in the original TCP congestion
control description, and documented as a standard in 1997 control description and documented as a standard in 1997
[RFC2001][Ja88]. The value was last updated in 1998 experimentally, [RFC2001][Ja88]. The value was updated in 1998 experimentally and
and moved to the standards track in 2002 [RFC2414][RFC3390]. There moved to the standards track in 2002 [RFC2414][RFC3390]. In 2013, it
have been recent proposals to update the IW based on further was experimentally increased to 10 [RFC6928].
increases in host and router capabilities and network capacity, some
focusing on specific values (e.g., IW=10), and others prescribing a
schedule for increases over time (e.g., IW=6 for 2011, increasing by
1-2 MSS per year).
This appendix discusses how TCP can objectively measure when an IW This appendix discusses how TCP can objectively measure when an IW
is too large, and that such feedback should be used over long is too large, and that such feedback should be used over long
timescales to adjust the IW automatically. The result should be timescales to adjust the IW automatically. The result should be
safer to deploy and might avoid the need to repeatedly revisit IW safer to deploy and might avoid the need to repeatedly revisit IW
size over time. over time.
Note that this mechanism attempts to make the IW more adaptive over Note that this mechanism attempts to make the IW more adaptive over
time. It can increase the IW beyond that which is currently time. It can increase the IW beyond that which is currently
recommended for widescale deployment, and so its use should be recommended for widescale deployment, and so its use should be
carefully monitored. carefully monitored.
C.2. Design Considerations C.2. Design Considerations
TCP's IW value has existed statically for over two decades, so any TCP's IW value has existed statically for over two decades, so any
solution to adjusting the IW dynamically should have similarly solution to adjusting the IW dynamically should have similarly
stable, non-invasive effects on the performance and complexity of stable, non-invasive effects on the performance and complexity of
TCP. In order to be fair, the IW should be similar for most machines TCP. In order to be fair, the IW should be similar for most machines
on the public Internet. Finally, a desirable goal is to develop a on the public Internet. Finally, a desirable goal is to develop a
self-correcting algorithm, so that IW values that cause network self-correcting algorithm, so that IW values that cause network
problems can be avoided. To that end, we propose the following list problems can be avoided. To that end, we propose the following
of design goals: design goals:
o Impart little to no impact to TCP in the absence of loss, i.e., o Impart little to no impact to TCP in the absence of loss, i.e.,
it should not increase the complexity of default packet it should not increase the complexity of default packet
processing in the normal case. processing in the normal case.
o Adapt to network feedback over long timescales, avoiding values o Adapt to network feedback over long timescales, avoiding values
that persistently cause network problems. that persistently cause network problems.
o Decrease the IW in the presence of sustained loss of IW segments, o Decrease the IW in the presence of sustained loss of IW segments,
as determined over a number of different connections. as determined over a number of different connections.
skipping to change at page 29, line 41 skipping to change at page 29, line 44
the initial burst of packets, it is clearly inappropriate and could the initial burst of packets, it is clearly inappropriate and could
be inducing unnecessary loss in other competing connections. This be inducing unnecessary loss in other competing connections. This
might happen for sites behind very slow boxes with small buffers, might happen for sites behind very slow boxes with small buffers,
which may or may not be the first hop. which may or may not be the first hop.
C.3. Proposed IW Algorithm C.3. Proposed IW Algorithm
Below is a simple description of the proposed IW algorithm. It Below is a simple description of the proposed IW algorithm. It
relies on the following parameters: relies on the following parameters:
o MinIW = 3 MSS or 4,380 bytes (as per RFC3390] o MinIW = 3 MSS or 4,380 bytes (as per [RFC3390])
o MaxIW = 10 o MaxIW = 10 MSS (as per [RFC6928])
o MulDecr = 0.5 o MulDecr = 0.5
o AddIncr = 2 MSS o AddIncr = 2 MSS
o Threshold = 0.05 o Threshold = 0.05
We assume that the minimum IW (MinIW) should be as currently We assume that the minimum IW (MinIW) should be as currently
specified [RFC3390]. The maximum IW can be set to a fixed value specified [RFC3390]. The maximum IW can be set to a fixed value (as
[RFC6928], or set based on a schedule if trusted time references are recommended in [RFC6928]) or set based on a schedule if trusted time
available [Al10]; here we prefer a fixed value. We also propose to references are available [Al10]; here we prefer a fixed value. We
use an AIMD algorithm, with increase and decreases as noted. also propose to use an AIMD algorithm, with increase and decreases
as noted.
Although these parameters are somewhat arbitrary, their initial Although these parameters are somewhat arbitrary, their initial
values are not important except that the algorithm is AIMD and the values are not important except that the algorithm is AIMD and the
MaxIW should not exceed that recommended for other systems on the MaxIW should not exceed that recommended for other systems on the
Internet. Current proposals, including default current operation, Internet. Current proposals, including default current operation,
are degenerate cases of the algorithm below for given parameters - are degenerate cases of the algorithm below for given parameters -
notably MulDec = 1.0 and AddIncr = 0 MSS, thus disabling the notably MulDec = 1.0 and AddIncr = 0 MSS, thus disabling the
automatic part of the algorithm. automatic part of the algorithm.
The proposed algorithm is as follows: The proposed algorithm is as follows:
1. On boot: 1. On boot:
IW = MaxIW; # assume this is in bytes, and an even number of MSS IW = MaxIW; # assume this is in bytes, and an even number of MSS
2. Upon starting a new connection 2. Upon starting a new connection:
CWND = IW; CWND = IW;
conncount++; conncount++;
IWnotchecked = 1; # true IWnotchecked = 1; # true
3. During a connection's SYN-ACK processing, if SYN-ACK includes 3. During a connection's SYN-ACK processing, if SYN-ACK includes ECN
ECN, treat as if the IW is too large (as similarly addressed in Sec 5 of ECN++ for TCP [Ba20]), treat
as if the IW is too large:
if (IWnotchecked && (synackecn == 1)) { if (IWnotchecked && (synackecn == 1)) {
losscount++; losscount++;
IWnotchecked = 0; # never check again IWnotchecked = 0; # never check again
} }
4. During a connection, if retransmission occurs, check the seqno of 4. During a connection, if retransmission occurs, check the seqno of
the outgoing packet (in bytes) to see if the resent segment fixes the outgoing packet (in bytes) to see if the resent segment fixes
an IW loss: an IW loss:
if (Retransmitting && IWnotchecked && ((ISN - seqno) < IW))) { if (Retransmitting && IWnotchecked && ((ISN - seqno) < IW))) {
losscount++; losscount++;
IWnotchecked = 0; # never do this entire "if" again IWnotchecked = 0; # never do this entire "if" again
} else { } else {
IWnotchecked = 0; # you're beyond the IW so stop checking IWnotchecked = 0; # you're beyond the IW so stop checking
} }
5. Once every 1000 conections, as a separate process (i.e., not as 5. Once every 1000 connections, as a separate process (i.e., not as
part of processing a given connection): part of processing a given connection):
if (conncount > 1000) { if (conncount > 1000) {
if (losscount/conncount > threshold) { if (losscount/conncount > threshold) {
# the number of connections with errors is too high # the number of connections with errors is too high
IW = IW * MulDecr; IW = IW * MulDecr;
} else { } else {
IW = IW + AddIncr; IW = IW + AddIncr;
} }
} }
We recognize that this algorithm can yield a false positive when the As presented, this algorithm can yield a false positive when the
sequence number wraps around. This can be avoided using either PAWS sequence number wraps around, e.g., the code might increment
[RFC7323] context or 64-bit internal sequence numbers (as in TCP-AO losscount in step 4 when no loss occurred or fail to increment
[RFC5925]). Alternately, false positives can be allowed since they losscount when a loss did occur. This can be avoided using either
are expected to be infrequent and thus will not affect the overall PAWS [RFC7323] context or internal extended sequence number
statistics of the algorithm. representations (as in TCP-AO [RFC5925]). Alternately, false
positives can be tolerated because they are expected to be
infrequent and thus will not significantly impact the algorithm.
The following additional constraints are imposed: A number of additional constraints need to be imposed if this
mechanism is implemented to ensure that it defaults values that
comply with current Internet standards, is conservative in how it
extends those values, and returns to those values in the absence of
positive feedback (i.e., success). To that end, we recommend the
following list of example constraints:
>> The automatic IW algorithm MUST initialize to MaxIW, in the >> The automatic IW algorithm MUST initialize MaxIW a value no
larger than the currently recommended Internet default, in the
absence of other context information. absence of other context information.
If there are too few connections to make a decision or if there is Thus, if there are too few connections to make a decision or if
otherwise insufficient information to increase the IW, then the there is otherwise insufficient information to increase the IW, then
MaxIW defaults to the current recommended value. the MaxIW defaults to the current recommended value.
>> An implementation may allow the MaxIW to grow beyond the >> An implementation MAY allow the MaxIW to grow beyond the
currently recommended Internet default, but not more than 2 segments currently recommended Internet default, but not more than 2 segments
per calendar year. per calendar year.
If an endpoint has a persistent history of successfully transmitting Thus, if an endpoint has a persistent history of successfully
IW segments without loss, then it is allowed to probe the Internet transmitting IW segments without loss, then it is allowed to probe
to determine if larger IW values have similar success. This probing the Internet to determine if larger IW values have similar success.
is limited and requires a trusted time source, otherwise the MaxIW This probing is limited and requires a trusted time source,
remains constant. otherwise the MaxIW remains constant.
>> An implementation MUST adjust the IW based on loss statistics at >> An implementation MUST adjust the IW based on loss statistics at
least once every 1000 connections. least once every 1000 connections.
An endpoint needs to be sufficiently reactive to IW loss. An endpoint needs to be sufficiently reactive to IW loss.
>> An implementation MUST decrease the IW by at least one MSS when >> An implementation MUST decrease the IW by at least one MSS when
indicated during an evaluation interval. indicated during an evaluation interval.
An endpoint that detects loss needs to decrease its IW by at least An endpoint that detects loss needs to decrease its IW by at least
skipping to change at page 33, line 22 skipping to change at page 33, line 38
in addition to losses during the first IW of a connection. In this in addition to losses during the first IW of a connection. In this
case, the implementation MUST count each restart as a "connection" case, the implementation MUST count each restart as a "connection"
for the purposes of connection counts and periodic rechecking of the for the purposes of connection counts and periodic rechecking of the
IW value. IW value.
False positives can occur during some kinds of segment reordering, False positives can occur during some kinds of segment reordering,
e.g., that might trigger spurious retransmissions even without a e.g., that might trigger spurious retransmissions even without a
true segment loss. These are not expected to be sufficiently common true segment loss. These are not expected to be sufficiently common
to dominate the algorithm and its conclusions. to dominate the algorithm and its conclusions.
This mechanism does require additional per-connection state which is This mechanism does require additional per-connection state, which
currently common in some implementations, and is useful for other is currently common in some implementations, and is useful for other
reasons (e.g., the ISN is used in TCP-AO [RFC5925]). The mechanism reasons (e.g., the ISN is used in TCP-AO [RFC5925]). The mechanism
also benefits from persistent state kept across reboots, as would be also benefits from persistent state kept across reboots, as would be
other state sharing mechanisms (e.g., TCP Control Block Sharing other state sharing mechanisms (e.g., TCP Control Block Sharing
[RFC2140]). The mechanism is inspired by RFC 2140's use of [RFC2140]). The mechanism is inspired by RFC 2140's use of
information across connections. information across connections.
The receive window (RWIN) is not involved in this calculation. The The receive window (RWIN) is not involved in this calculation. The
size of RWIN is determined by receiver resources, and provides space size of RWIN is determined by receiver resources and provides space
to accommodate segment reordering. It is not involved with to accommodate segment reordering. It is not involved with
congestion control, which is the focus of this document and its congestion control, which is the focus of this document and its
management of the IW. management of the IW.
C.5. Observations C.5. Observations
The IW may not converge to a single, global value. It also may not The IW may not converge to a single, global value. It also may not
converge at all, but rather may oscillate by a few MSS as it converge at all, but rather may oscillate by a few MSS as it
repeatedly probes the Internet for larger IWs and fails. Both repeatedly probes the Internet for larger IWs and fails. Both
properties are consistent with TCP behavior during each individual properties are consistent with TCP behavior during each individual
 End of changes. 36 change blocks. 
70 lines changed or deleted 93 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/