draft-ietf-tsvwg-aqm-dualq-coupled-21.txt   draft-ietf-tsvwg-aqm-dualq-coupled-22.txt 
Transport Area working group (tsvwg) K. De Schepper Transport Area working group (tsvwg) K. De Schepper
Internet-Draft Nokia Bell Labs Internet-Draft Nokia Bell Labs
Intended status: Experimental B. Briscoe, Ed. Intended status: Experimental B. Briscoe, Ed.
Expires: 5 August 2022 Independent Expires: 5 September 2022 Independent
G. White G. White
CableLabs CableLabs
1 February 2022 4 March 2022
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput
(L4S) (L4S)
draft-ietf-tsvwg-aqm-dualq-coupled-21 draft-ietf-tsvwg-aqm-dualq-coupled-22
Abstract Abstract
This specification defines a framework for coupling the Active Queue This specification defines a framework for coupling the Active Queue
Management (AQM) algorithms in two queues intended for flows with Management (AQM) algorithms in two queues intended for flows with
different responses to congestion. This provides a way for the different responses to congestion. This provides a way for the
Internet to transition from the scaling problems of standard TCP Internet to transition from the scaling problems of standard TCP
Reno-friendly ('Classic') congestion controls to the family of Reno-friendly ('Classic') congestion controls to the family of
'Scalable' congestion controls. These are designed for consistently 'Scalable' congestion controls. These are designed for consistently
very Low queuing Latency, very Low congestion Loss and Scaling of very Low queuing Latency, very Low congestion Loss and Scaling of
skipping to change at page 2, line 10 skipping to change at page 2, line 10
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 5 August 2022. This Internet-Draft will expire on 5 September 2022.
Copyright Notice Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License. provided without warranty as described in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3
1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9
2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11
2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11
2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12
2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 13
2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13
2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17
2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17
2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18
2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 2.5.2. Management Requirements . . . . . . . . . . . . . . . 19
2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19
2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21
2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22
2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22
3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22
4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22
4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 22 4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput 4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23
or Delay? . . . . . . . . . . . . . . . . . . . . . . 23 4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24
4.1.2. Congestion Signal Saturation: Introduce L4S Drop or 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S
Delay? . . . . . . . . . . . . . . . . . . . . . . . 24 Throughput or Delay? . . . . . . . . . . . . . . . . 25
4.1.3. Protecting against Unresponsive ECN-Capable 4.2.3. L4S ECN Saturation: Introduce Drop or Delay? . . . . 26
Traffic . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.3.1. Protecting against Overload by Unresponsive
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 ECN-Capable Traffic . . . . . . . . . . . . . . . . 28
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 28
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 29
7.1. Normative References . . . . . . . . . . . . . . . . . . 27 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.2. Informative References . . . . . . . . . . . . . . . . . 27 7.1. Normative References . . . . . . . . . . . . . . . . . . 29
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 33 7.2. Informative References . . . . . . . . . . . . . . . . . 30
A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 33 Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 35
A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 44 A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 36
Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 48 A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 46
B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 48 Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 51
B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 54 B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 51
Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 56 B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 57
C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 56 Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 59
C.2. Guidance on Controlling Throughput Equivalence . . . . . 57 C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 59
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61 C.2. Guidance on Controlling Throughput Equivalence . . . . . 60
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 64
1. Introduction 1. Introduction
This document specifies a framework for DualQ Coupled AQMs, which is This document specifies a framework for DualQ Coupled AQMs, which is
the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch].
L4S enables both very low queuing latency (sub-millisecond on L4S enables both very low queuing latency (sub-millisecond on
average) and high throughput at the same time, for ad hoc numbers of average) and high throughput at the same time, for ad hoc numbers of
capacity-seeking applications all sharing the same capacity. capacity-seeking applications all sharing the same capacity.
1.1. Outline of the Problem 1.1. Outline of the Problem
skipping to change at page 4, line 26 skipping to change at page 4, line 35
delay to vary or cause the link to be under-utilized. These AQMs delay to vary or cause the link to be under-utilized. These AQMs
are tuned to allow a typical capacity-seeking Reno-friendly flow are tuned to allow a typical capacity-seeking Reno-friendly flow
to induce an average queue that roughly doubles the base RTT, to induce an average queue that roughly doubles the base RTT,
adding 5-15 ms of queuing on average (cf. 500 microseconds with adding 5-15 ms of queuing on average (cf. 500 microseconds with
L4S for the same mix of long-running and web traffic). However, L4S for the same mix of long-running and web traffic). However,
for many applications low delay is not useful unless it is for many applications low delay is not useful unless it is
consistently low. With these AQMs, 99th percentile queuing delay consistently low. With these AQMs, 99th percentile queuing delay
is 20-30 ms (cf. 2 ms with the same traffic over L4S). is 20-30 ms (cf. 2 ms with the same traffic over L4S).
* Similarly, recent research into using e2e congestion control * Similarly, recent research into using e2e congestion control
without needing an AQM in the network without needing an AQM in the network (e.g. BBR
(e.g.BBR [I-D.cardwell-iccrg-bbr-congestion-control]) seems to [I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a
have hit a similar lower limit to queuing delay of about 20ms on similar lower limit to queuing delay of about 20ms on average but
average but there are also regular 25ms delay spikes due to there are also regular 25ms delay spikes due to bandwidth probes
bandwidth probes and 60ms spikes due to flow-starts. and 60ms spikes due to flow-starts.
L4S learns from the experience of Data Center TCP [RFC8257], which L4S learns from the experience of Data Center TCP [RFC8257], which
shows the power of complementary changes both in the network and on shows the power of complementary changes both in the network and on
end-systems. DCTCP teaches us that two small but radical changes to end-systems. DCTCP teaches us that two small but radical changes to
congestion control are needed to cut the two major outstanding causes congestion control are needed to cut the two major outstanding causes
of queuing delay variability: of queuing delay variability:
1. Far smaller rate variations (sawteeth) than Reno-friendly 1. Far smaller rate variations (sawteeth) than Reno-friendly
congestion controls; congestion controls;
skipping to change at page 5, line 25 skipping to change at page 5, line 33
the queue is signalled immediately. the queue is signalled immediately.
Without ECN, either of these would lead to very high loss levels. Without ECN, either of these would lead to very high loss levels.
But, with ECN, the resulting high marking levels are just signals, But, with ECN, the resulting high marking levels are just signals,
not impairments. BBRv2 combines the best of both worlds - it works not impairments. BBRv2 combines the best of both worlds - it works
as a scalable congestion control when ECN is available, but also aims as a scalable congestion control when ECN is available, but also aims
to minimize delay when it isn't. to minimize delay when it isn't.
However, until now, Scalable congestion controls (like DCTCP) did not However, until now, Scalable congestion controls (like DCTCP) did not
co-exist well in a shared ECN-capable queue with existing ECN-capable co-exist well in a shared ECN-capable queue with existing ECN-capable
TCP Reno [RFC5681] or Cubic [RFC8312] congestion controls --- TCP Reno [RFC5681] or Cubic [RFC8312] congestion controls -- Scalable
Scalable controls are so aggressive that these 'Classic' algorithms controls are so aggressive that these 'Classic' algorithms would
would drive themselves to a small capacity share. Therefore, until drive themselves to a small capacity share. Therefore, until now,
now, L4S controls could only be deployed where a clean-slate L4S controls could only be deployed where a clean-slate environment
environment could be arranged, such as in private data centres (hence could be arranged, such as in private data centres (hence the name
the name DCTCP). DCTCP).
This document specifies a `DualQ Coupled AQM' extension that solves This document specifies a `DualQ Coupled AQM' extension that solves
the problem of coexistence between Scalable and Classic flows, the problem of coexistence between Scalable and Classic flows,
without having to inspect flow identifiers. It is not like flow- without having to inspect flow identifiers. It is not like flow-
queuing approaches [RFC8290] that classify packets by flow identifier queuing approaches [RFC8290] that classify packets by flow identifier
into separate queues in order to isolate sparse flows from the higher into separate queues in order to isolate sparse flows from the higher
latency in the queues assigned to heavier flows. If a flow needs latency in the queues assigned to heavier flows. If a flow needs
both low delay and high throughput, having a queue to itself does not both low delay and high throughput, having a queue to itself does not
isolate it from the harm it causes to itself. In contrast, DualQ isolate it from the harm it causes to itself. In contrast, DualQ
Coupled AQMs address the root cause of the latency problem --- they Coupled AQMs address the root cause of the latency problem -- they
are an enabler for the smooth low latency scalable behaviour of are an enabler for the smooth low latency scalable behaviour of
Scalable congestion controls, so that every packet in every flow can Scalable congestion controls, so that every packet in every flow can
potentially enjoy very low latency, then there would be no need to potentially enjoy very low latency, then there would be no need to
isolate each flow into a separate queue. isolate each flow into a separate queue.
1.2. Scope 1.2. Scope
L4S involves complementary changes in the network and on end-systems: L4S involves complementary changes in the network and on end-systems:
Network: A DualQ Coupled AQM (defined in the present document) or a Network: A DualQ Coupled AQM (defined in the present document) or a
modification to flow-queue AQMs (described in section 4.2.b of modification to flow-queue AQMs (described in section 4.2.b of the
[I-D.ietf-tsvwg-l4s-arch]); L4S architecture [I-D.ietf-tsvwg-l4s-arch]);
End-system: A Scalable congestion control (defined in section 4 of End-system: A Scalable congestion control (defined in section 4 of
[I-D.ietf-tsvwg-ecn-l4s-id]). the L4S ECN protocol [I-D.ietf-tsvwg-ecn-l4s-id]).
Packet identifier: The network and end-system parts of L4S can be Packet identifier: The network and end-system parts of L4S can be
deployed incrementally, because they both identify L4S packets deployed incrementally, because they both identify L4S packets
using the experimentally assigned explicit congestion notification using the experimentally assigned explicit congestion notification
(ECN) codepoints in the IP header: ECT(1) and CE [RFC8311] (ECN) codepoints in the IP header: ECT(1) and CE [RFC8311]
[I-D.ietf-tsvwg-ecn-l4s-id]. [I-D.ietf-tsvwg-ecn-l4s-id].
Data Center TCP (DCTCP [RFC8257]) is an example of a Scalable Data Center TCP (DCTCP [RFC8257]) is an example of a Scalable
congestion control for controlled environments that has been deployed congestion control for controlled environments that has been deployed
for some time in Linux, Windows and FreeBSD operating systems. for some time in Linux, Windows and FreeBSD operating systems.
During the progress of this document through the IETF a number of During the progress of this document through the IETF a number of
other Scalable congestion controls were implemented, e.g. TCP other Scalable congestion controls were implemented, e.g. TCP Prague
Prague [I-D.briscoe-iccrg-prague-congestion-control] [PragueLinux], [I-D.briscoe-iccrg-prague-congestion-control] [PragueLinux], BBRv2
BBRv2 [I-D.cardwell-iccrg-bbr-congestion-control], QUIC Prague and [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control], QUIC Prague and
the L4S variant of SCREAM for real-time media [RFC8298]. the L4S variant of SCREAM for real-time media [RFC8298].
The focus of this specification is to enable deployment of the The focus of this specification is to enable deployment of the
network part of the L4S service. Then, without any management network part of the L4S service. Then, without any management
intervention, applications can exploit this new network capability as intervention, applications can exploit this new network capability as
their operating systems migrate to Scalable congestion controls, their operating systems migrate to Scalable congestion controls,
which can then evolve _while_ their benefits are being enjoyed by which can then evolve _while_ their benefits are being enjoyed by
everyone on the Internet. everyone on the Internet.
The DualQ Coupled AQM framework can incorporate any AQM designed for The DualQ Coupled AQM framework can incorporate any AQM designed for
skipping to change at page 7, line 47 skipping to change at page 8, line 8
Classic service/queue: The Classic service is intended for all the Classic service/queue: The Classic service is intended for all the
congestion control behaviours that co-exist with Reno [RFC5681] congestion control behaviours that co-exist with Reno [RFC5681]
(e.g. Reno itself, Cubic [RFC8312], TFRC [RFC5348]). (e.g. Reno itself, Cubic [RFC8312], TFRC [RFC5348]).
Low-Latency, Low-Loss Scalable throughput (L4S) service/queue: The Low-Latency, Low-Loss Scalable throughput (L4S) service/queue: The
'L4S' service is intended for traffic from scalable congestion 'L4S' service is intended for traffic from scalable congestion
control algorithms, such as TCP Prague control algorithms, such as TCP Prague
[I-D.briscoe-iccrg-prague-congestion-control], which was derived [I-D.briscoe-iccrg-prague-congestion-control], which was derived
from Data Center TCP [RFC8257]. The L4S service is for more from Data Center TCP [RFC8257]. The L4S service is for more
general traffic than just TCP Prague--it allows the set of general traffic than just TCP Prague -- it allows the set of
congestion controls with similar scaling properties to Prague to congestion controls with similar scaling properties to Prague to
evolve, such as the examples listed earlier (Relentless, SCReAM, evolve, such as the examples listed earlier (Relentless, SCReAM,
etc.). etc.).
Classic Congestion Control: A congestion control behaviour that can Classic Congestion Control: A congestion control behaviour that can
co-exist with standard TCP Reno [RFC5681] without causing co-exist with standard TCP Reno [RFC5681] without causing
significantly negative impact on its flow rate [RFC5033]. With significantly negative impact on its flow rate [RFC5033]. With
Classic congestion controls, such as Reno or Cubic, because flow Classic congestion controls, such as Reno or Cubic, because flow
rate has scaled since TCP congestion control was first designed in rate has scaled since TCP congestion control was first designed in
1988, it now takes hundreds of round trips (and growing) to 1988, it now takes hundreds of round trips (and growing) to
recover after a congestion signal (whether a loss or an ECN mark) recover after a congestion signal (whether a loss or an ECN mark)
as shown in the examples in section 5.1 of as shown in the examples in section 5.1 of the L4S
[I-D.ietf-tsvwg-l4s-arch] and in [RFC3649]. Therefore control of architecture [I-D.ietf-tsvwg-l4s-arch] and in [RFC3649].
queuing and utilization becomes very slack, and the slightest Therefore control of queuing and utilization becomes very slack,
disturbances (e.g. from new flows starting) prevent a high rate and the slightest disturbances (e.g. from new flows starting)
from being attained. prevent a high rate from being attained.
Scalable Congestion Control: A congestion control where the average Scalable Congestion Control: A congestion control where the average
time from one congestion signal to the next (the recovery time) time from one congestion signal to the next (the recovery time)
remains invariant as the flow rate scales, all other factors being remains invariant as the flow rate scales, all other factors being
equal. This maintains the same degree of control over queueing equal. This maintains the same degree of control over queueing
and utilization whatever the flow rate, as well as ensuring that and utilization whatever the flow rate, as well as ensuring that
high throughput is robust to disturbances. For instance, DCTCP high throughput is robust to disturbances. For instance, DCTCP
averages 2 congestion signals per round-trip whatever the flow averages 2 congestion signals per round-trip whatever the flow
rate, as do other recently developed scalable congestion controls, rate, as do other recently developed scalable congestion controls,
e.g. Relentless TCP [Mathis09], TCP Prague e.g. Relentless TCP [Mathis09], TCP Prague
[I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux],
BBRv2 [I-D.cardwell-iccrg-bbr-congestion-control] and the L4S BBRv2 [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control] and the
variant of SCREAM for real-time media [SCReAM], [RFC8298]). For L4S variant of SCREAM for real-time media [SCReAM], [RFC8298]).
the public Internet a Scalable transport has to comply with the For the public Internet a Scalable transport has to comply with
requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id] (aka. the the requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id]
'Prague L4S requirements'). (aka. the 'Prague L4S requirements').
C: Abbreviation for Classic, e.g. when used as a subscript. C: Abbreviation for Classic, e.g. when used as a subscript.
L: Abbreviation for L4S, e.g. when used as a subscript. L: Abbreviation for L4S, e.g. when used as a subscript.
The terms Classic or L4S can also qualify other nouns, such as The terms Classic or L4S can also qualify other nouns, such as
'codepoint', 'identifier', 'classification', 'packet', 'flow'. 'codepoint', 'identifier', 'classification', 'packet', 'flow'.
For example: an L4S packet means a packet with an L4S identifier For example: an L4S packet means a packet with an L4S identifier
sent from an L4S congestion control. sent from an L4S congestion control.
skipping to change at page 13, line 9 skipping to change at page 13, line 20
identifiers, because it can apply the appropriate marking or dropping identifiers, because it can apply the appropriate marking or dropping
probability to all flows of each type. A separate probability to all flows of each type. A separate
specification [I-D.ietf-tsvwg-ecn-l4s-id] requires the network to specification [I-D.ietf-tsvwg-ecn-l4s-id] requires the network to
treat the ECT(1) and CE codepoints of the ECN field as this treat the ECT(1) and CE codepoints of the ECN field as this
identifier. An additional process document has proved necessary to identifier. An additional process document has proved necessary to
make the ECT(1) codepoint available for experimentation [RFC8311]. make the ECT(1) codepoint available for experimentation [RFC8311].
For policy reasons, an operator might choose to steer certain packets For policy reasons, an operator might choose to steer certain packets
(e.g. from certain flows or with certain addresses) out of the L (e.g. from certain flows or with certain addresses) out of the L
queue, even though they identify themselves as L4S by their ECN queue, even though they identify themselves as L4S by their ECN
codepoints. In such cases, [I-D.ietf-tsvwg-ecn-l4s-id] says that the codepoints. In such cases, the L4S ECN
device "MUST NOT alter the end-to-end L4S ECN identifier", so that it protocol [I-D.ietf-tsvwg-ecn-l4s-id] says that the device "MUST NOT
is preserved end-to-end. The aim is that each operator can choose alter the end-to-end L4S ECN identifier", so that it is preserved
how it treats L4S traffic locally, but an individual operator does end-to-end. The aim is that each operator can choose how it treats
not alter the identification of L4S packets, which would prevent L4S traffic locally, but an individual operator does not alter the
other operators downstream from making their own choices on how to identification of L4S packets, which would prevent other operators
treat L4S traffic. downstream from making their own choices on how to treat L4S traffic.
In addition, an operator could use other identifiers to classify In addition, an operator could use other identifiers to classify
certain additional packet types into the L queue that it deems will certain additional packet types into the L queue that it deems will
not risk harm to the L4S service. For instance addresses of specific not risk harm to the L4S service. For instance addresses of specific
applications or hosts; specific Diffserv codepoints such as EF applications or hosts; specific Diffserv codepoints such as EF
(Expedited Forwarding), Voice-Admit or the Non-Queue-Building (NQB) (Expedited Forwarding), Voice-Admit or the Non-Queue-Building (NQB)
per-hop behaviour; or certain protocols (e.g. ARP, DNS) (see per-hop behaviour; or certain protocols (e.g. ARP, DNS) (see
Section 5.4.1 of [I-D.ietf-tsvwg-ecn-l4s-id]). Note that the Section 5.4.1 of [I-D.ietf-tsvwg-ecn-l4s-id]). Note that the
mechanism only reads these identifiers. [I-D.ietf-tsvwg-ecn-l4s-id] mechanism only reads these identifiers. [I-D.ietf-tsvwg-ecn-l4s-id]
says it "MUST NOT alter these non-ECN identifiers". Thus, the L says it "MUST NOT alter these non-ECN identifiers". Thus, the L
skipping to change at page 15, line 45 skipping to change at page 15, line 45
C queue. This is because, as the C queue grows, the base AQM applies C queue. This is because, as the C queue grows, the base AQM applies
more congestion signals to L traffic (as well as C). As L flows more congestion signals to L traffic (as well as C). As L flows
reduce their rate in response, they use less than the scheduling reduce their rate in response, they use less than the scheduling
share for L traffic. So, because the scheduler is work preserving, share for L traffic. So, because the scheduler is work preserving,
it schedules any C traffic in the gaps. it schedules any C traffic in the gaps.
Giving priority to the L queue has the benefit of very low L queue Giving priority to the L queue has the benefit of very low L queue
delay, because the L queue is kept empty whenever L traffic is delay, because the L queue is kept empty whenever L traffic is
controlled by the coupling. Also there only has to be a coupling in controlled by the coupling. Also there only has to be a coupling in
one direction - from Classic to L4S. Priority has to be conditional one direction - from Classic to L4S. Priority has to be conditional
in some way to prevent the C queue being starved by excessive in some way to prevent the C queue being starved in the short-term
unresponsive L traffic (see Section 4.1) and to give C traffic a (see Section 4.2.2) to give C traffic a means to push in, as
means to push in, as explained next. With normal responsive L explained next. With normal responsive L traffic, the coupled ECN
traffic, the coupled ECN marking gives C traffic the ability to push marking gives C traffic the ability to push back against even strict
back against even strict priority, by congestion marking the L priority, by congestion marking the L traffic to make it yield some
traffic to make it yield some space. However, if there is just a space. However, if there is just a small finite set of C packets
small finite set of C packets (e.g. a DNS request or an initial (e.g. a DNS request or an initial window of data) some Classic AQMs
window of data) some Classic AQMs will not induce enough ECN marking will not induce enough ECN marking in the L queue, no matter how long
in the L queue, no matter how long the small set of C packets waits. the small set of C packets waits. Then, if the L queue happens to
Then, if the L queue happens to remain busy, the C traffic would remain busy, the C traffic would never get a scheduling opportunity
never get a scheduling opportunity from a strict priority scheduler. from a strict priority scheduler. Ideally the Classic AQM would be
Ideally the Classic AQM would be designed to increase the coupled designed to increase the coupled marking the longer that C packets
marking the longer that C packets have been waiting, but this is not have been waiting, but this is not always practical - hence the need
always practical - hence the need for L priority to be conditional. for L priority to be conditional. Giving a small weight or limited
Giving a small weight or limited waiting time for C traffic improves waiting time for C traffic improves response times for short Classic
response times for short Classic messages, such as DNS requests and messages, such as DNS requests, and improves Classic flow startup
improves Classic flow startup because immediate capacity is because immediate capacity is available.
available.
Example DualQ Coupled AQM algorithms called DualPI2 and Curvy RED are Example DualQ Coupled AQM algorithms called DualPI2 and Curvy RED are
given in Appendix A and Appendix B. Either example AQM can be used given in Appendix A and Appendix B. Either example AQM can be used
to couple packet marking and dropping across a dual Q. to couple packet marking and dropping across a dual Q.
DualPI2 uses a Proportional-Integral (PI) controller as the Base AQM. DualPI2 uses a Proportional-Integral (PI) controller as the Base AQM.
Indeed, this Base AQM with just the squared output and no L4S queue Indeed, this Base AQM with just the squared output and no L4S queue
can be used as a drop-in replacement for PIE [RFC8033], in which case can be used as a drop-in replacement for PIE [RFC8033], in which case
it is just called PI2 [PI2]. PI2 is a principled simplification of it is just called PI2 [PI2]. PI2 is a principled simplification of
PIE that is both more responsive and more stable in the face of PIE that is both more responsive and more stable in the face of
skipping to change at page 16, line 46 skipping to change at page 16, line 45
different drain rates. With AQMs in a dualQ structure this is different drain rates. With AQMs in a dualQ structure this is
particularly important because the drain rate of each queue can vary particularly important because the drain rate of each queue can vary
rapidly as flows for the two queues arrive and depart, even if the rapidly as flows for the two queues arrive and depart, even if the
combined link rate is constant. combined link rate is constant.
It would be possible to control the queues with other alternative It would be possible to control the queues with other alternative
AQMs, as long as the normative requirements (those expressed in AQMs, as long as the normative requirements (those expressed in
capitals) in Section 2.5 are observed. capitals) in Section 2.5 are observed.
The two queues could optionally be part of a larger queuing The two queues could optionally be part of a larger queuing
hierarchy, such as the initial example ideas hierarchy, such as the initial example ideas in
in [I-D.briscoe-tsvwg-l4s-diffserv]. [I-D.briscoe-tsvwg-l4s-diffserv].
2.5. Normative Requirements for a DualQ Coupled AQM 2.5. Normative Requirements for a DualQ Coupled AQM
The following requirements are intended to capture only the essential The following requirements are intended to capture only the essential
aspects of a DualQ Coupled AQM. They are intended to be independent aspects of a DualQ Coupled AQM. They are intended to be independent
of the particular AQMs used for each queue. of the particular AQMs used for each queue.
2.5.1. Functional Requirements 2.5.1. Functional Requirements
A Dual Queue Coupled AQM implementation MUST comply with the A Dual Queue Coupled AQM implementation MUST comply with the
skipping to change at page 17, line 29 skipping to change at page 17, line 29
technology underlying any L4S AQM. technology underlying any L4S AQM.
A Dual Queue Coupled AQM implementation MUST utilize two queues, each A Dual Queue Coupled AQM implementation MUST utilize two queues, each
with an AQM algorithm. with an AQM algorithm.
The AQM algorithm for the low latency (L) queue MUST be able to apply The AQM algorithm for the low latency (L) queue MUST be able to apply
ECN marking to ECN-capable packets. ECN marking to ECN-capable packets.
The scheduler draining the two queues MUST give L4S packets priority The scheduler draining the two queues MUST give L4S packets priority
over Classic, although priority MUST be bounded in order not to over Classic, although priority MUST be bounded in order not to
starve Classic traffic. The scheduler SHOULD be work-conserving, or starve Classic traffic (see Section 4.2.2). The scheduler SHOULD be
otherwise close to work-conserving. This is because Classic traffic work-conserving, or otherwise close to work-conserving. This is
needs to be able to efficiently fill any space left by L4S traffic because Classic traffic needs to be able to efficiently fill any
even though the scheduler would otherwise allocate it to L4S. space left by L4S traffic even though the scheduler would otherwise
allocate it to L4S.
[I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on [I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on
L4S traffic, relative to drop of Classic traffic. In order to ensure L4S traffic, relative to drop of Classic traffic. In order to ensure
coexistence of Classic and Scalable L4S traffic, it says, "The coexistence of Classic and Scalable L4S traffic, it says, "The
likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST be likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST be
roughly proportional to the square of the likelihood that it would roughly proportional to the square of the likelihood that it would
have marked it if it had been an L4S packet (p_L)." The term have marked it if it had been an L4S packet (p_L)." The term
'likelihood' is used to allow for marking and dropping to be either 'likelihood' is used to allow for marking and dropping to be either
probabilistic or deterministic. probabilistic or deterministic.
For the current specification, this translates into the following For the current specification, this translates into the following
requirement. A DualQ Coupled AQM MUST apply ECN marking to traffic requirement. A DualQ Coupled AQM MUST apply ECN marking to traffic
in the L queue that is no lower than that derived from the likelihood in the L queue that is no lower than that derived from the likelihood
of drop (or ECN marking) in the Classic queue using Eqn. (1). of drop (or ECN marking) in the Classic queue using Eqn. (1).
The constant of proportionality, k, in Eqn (1) determines the The constant of proportionality, k, in Eqn (1) determines the
relative flow rates of Classic and L4S flows when the AQM concerned relative flow rates of Classic and L4S flows when the AQM concerned
is the bottleneck (all other factors being equal). is the bottleneck (all other factors being equal). The L4S ECN
[I-D.ietf-tsvwg-ecn-l4s-id] says, "The constant of proportionality protocol [I-D.ietf-tsvwg-ecn-l4s-id] says, "The constant of
(k) does not have to be standardised for interoperability, but a proportionality (k) does not have to be standardised for
value of 2 is RECOMMENDED." interoperability, but a value of 2 is RECOMMENDED."
Assuming Scalable congestion controls for the Internet will be as Assuming Scalable congestion controls for the Internet will be as
aggressive as DCTCP, this will ensure their congestion window will be aggressive as DCTCP, this will ensure their congestion window will be
roughly the same as that of a standards track TCP Reno congestion roughly the same as that of a standards track TCP Reno congestion
control (Reno) [RFC5681] and other Reno-friendly controls, such as control (Reno) [RFC5681] and other Reno-friendly controls, such as
TCP Cubic in its Reno-compatibility mode. TCP Cubic in its Reno-compatibility mode.
The choice of k is a matter of operator policy, and operators MAY The choice of k is a matter of operator policy, and operators MAY
choose a different value using the guidelines in Appendix C.2. choose a different value using the guidelines in Appendix C.2.
skipping to change at page 19, line 7 skipping to change at page 19, line 7
* If a packet that does not carry an ECT(1) or CE codepoint is * If a packet that does not carry an ECT(1) or CE codepoint is
classified into the L queue: classified into the L queue:
- if the packet is ECT(0), the L AQM SHOULD apply CE-marking - if the packet is ECT(0), the L AQM SHOULD apply CE-marking
using a probability appropriate to Classic congestion control using a probability appropriate to Classic congestion control
and appropriate to the target delay in the L queue and appropriate to the target delay in the L queue
- if the packet is Not-ECT, the appropriate action depends on - if the packet is Not-ECT, the appropriate action depends on
whether some other function is protecting the L queue from whether some other function is protecting the L queue from
misbehaving flows (e.g. per-flow queue misbehaving flows (e.g. per-flow queue protection
protection [I-D.briscoe-docsis-q-protection] or latency [I-D.briscoe-docsis-q-protection] or latency policing):
policing):
o If separate queue protection is provided, the L AQM SHOULD o If separate queue protection is provided, the L AQM SHOULD
ignore the packet and forward it unchanged, meaning it ignore the packet and forward it unchanged, meaning it
should not calculate whether to apply congestion should not calculate whether to apply congestion
notification and it should neither drop nor CE-mark the notification and it should neither drop nor CE-mark the
packet (for instance, the operator might classify EF traffic packet (for instance, the operator might classify EF traffic
that is unresponsive to drop into the L queue, alongside that is unresponsive to drop into the L queue, alongside
responsive L4S-ECN traffic) responsive L4S-ECN traffic)
o if separate queue protection is not provided, the L AQM o if separate queue protection is not provided, the L AQM
skipping to change at page 19, line 37 skipping to change at page 19, line 36
- the C AQM SHOULD apply CE-marking using the coupled AQM - the C AQM SHOULD apply CE-marking using the coupled AQM
probability p_CL (= k*p'). probability p_CL (= k*p').
The above requirements are worded as "SHOULDs", because operator- The above requirements are worded as "SHOULDs", because operator-
specific classifiers are for flexibility, by definition. Therefore, specific classifiers are for flexibility, by definition. Therefore,
alternative actions might be appropriate in the operator's specific alternative actions might be appropriate in the operator's specific
circumstances. An example would be where the operator knows that circumstances. An example would be where the operator knows that
certain legacy traffic marked with one codepoint actually has a certain legacy traffic marked with one codepoint actually has a
congestion response associated with another codepoint. congestion response associated with another codepoint.
If the DualQ Coupled AQM has detected overload, it MUST begin using If the DualQ Coupled AQM has detected overload, it MUST introduce
Classic drop, and continue until the overload episode has subsided. Classic drop to both types of ECN-capable traffic until the overload
Introducing drop if ECN marking is persistently high is required by episode has subsided. Introducing drop if ECN marking is
Section 7 of [RFC3168] and Section 4.2.1 of [RFC7567]. persistently high is recommended by Section 7 of the ECN
specification [RFC3168] and Section 4.2.1 of the AQM
Recommendations [RFC7567].
2.5.2. Management Requirements 2.5.2. Management Requirements
2.5.2.1. Configuration 2.5.2.1. Configuration
By default, a DualQ Coupled AQM SHOULD NOT need any configuration for By default, a DualQ Coupled AQM SHOULD NOT need any configuration for
use at a bottleneck on the public Internet [RFC7567]. The following use at a bottleneck on the public Internet [RFC7567]. The following
parameters MAY be operator-configurable, e.g. to tune for non- parameters MAY be operator-configurable, e.g. to tune for non-
Internet settings: Internet settings:
skipping to change at page 21, line 5 skipping to change at page 21, line 5
* Coupling factor, k (see Appendix C.2); * Coupling factor, k (see Appendix C.2);
* A limit to the conditional priority of L4S. This is scheduler- * A limit to the conditional priority of L4S. This is scheduler-
dependent, but it SHOULD be expressed as a relation between the dependent, but it SHOULD be expressed as a relation between the
max delay of a C packet and an L packet. For example: max delay of a C packet and an L packet. For example:
- for a WRR scheduler a weight ratio between L and C of w:1 means - for a WRR scheduler a weight ratio between L and C of w:1 means
that the maximum delay to a C packet is w times that of an L that the maximum delay to a C packet is w times that of an L
packet. packet.
- for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.1.1) - for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.2.2)
a time-shift of tshift means that the maximum delay to a C a time-shift of tshift means that the maximum delay to a C
packet is tshift greater than that of an L packet. tshift could packet is tshift greater than that of an L packet. tshift could
be expressed as a multiple of the typical RTT rather than as an be expressed as a multiple of the typical RTT rather than as an
absolute delay. absolute delay.
* The maximum Classic ECN marking probability, p_Cmax, before * The maximum Classic ECN marking probability, p_Cmax, before
introducing drop. introducing drop.
2.5.2.2. Monitoring 2.5.2.2. Monitoring
skipping to change at page 22, line 41 skipping to change at page 22, line 41
The descriptions of specific DualQ Coupled AQM algorithms in the The descriptions of specific DualQ Coupled AQM algorithms in the
appendices cover scaling of their configuration parameters, e.g. with appendices cover scaling of their configuration parameters, e.g. with
respect to RTT and sampling frequency. respect to RTT and sampling frequency.
3. IANA Considerations (to be removed by RFC Editor) 3. IANA Considerations (to be removed by RFC Editor)
This specification contains no IANA considerations. This specification contains no IANA considerations.
4. Security Considerations 4. Security Considerations
4.1. Overload Handling 4.1. Low Delay without Requiring Per-Flow Processing
Where the interests of users or flows might conflict, it could be The L4S architecture [I-D.ietf-tsvwg-l4s-arch] compares the DualQ and
necessary to police traffic to isolate any harm to the performance of per-flow-queuing (FQ) approaches to L4S. The privacy considerations
individual flows. However it is hard to avoid unintended side- section in that document motivates the DualQ on the grounds that
effects with policing, and in a trusted environment policing is not users who want to encrypt application flow identifiers, e.g. in IPSec
necessary. Therefore per-flow policing or other encrypted VPN tunnels, don't have to sacrifice low delay
(e.g. [I-D.briscoe-docsis-q-protection]) needs to be separable from a ([RFC8404] encourages avoidance of such privacy compromises).
basic AQM, as an option under policy control.
However, a basic DualQ AQM does at least need to handle overload. A The security considerations section of the L4S architecture also
useful objective would be for the overload behaviour of the DualQ AQM includes subsections on policing of relative flow-rates (section 8.1)
to be at least no worse than a single queue AQM. However, a trade- and on policing of flows that cause excessive queuing delay (section
off needs to be made between complexity and the risk of either 8.2). It explains that the interests of users do not collide in the
traffic class harming the other. In each of the following three same way for delay as they do for bandwidth. For someone to get more
subsections, an overload issue specific to the DualQ is described, of the bandwidth of a shared link, someone else necessarily gets less
followed by proposed solution(s). (a 'zero-sum game'), whereas queuing delay can be reduced for
everyone, without any need for someone else to lose out. It also
explains that, on the current Internet, scheduling usually enforces
separation between 'sites' (e.g. households, businesses or mobile
users), but it is not common to need to schedule or police individual
application flows.
Under overload the higher priority L4S service will have to sacrifice By the above arguments, per-flow policing might not be necessary and
some aspect of its performance. Alternative solutions are provided in trusted environments it is certainly unlikely to be needed.
below that each relax a different factor: e.g. throughput, delay, Therefore, because it is hard to avoid complexity and unintended
drop. These choices need to be made either by the developer or by side-effects with per-flow policing, it needs to be separable from a
operator policy, rather than by the IETF. basic AQM, as an option, under policy control. On this basis, the
DualQ Coupled AQM provides low delay without prejudging the question
of per-flow policing.
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput or Delay? Nonetheless, the interests of users or flows might conflict, e.g. in
case of accident or malice. Then per-flow control could be
necessary. If flow-rate control is needed, it can be provided as a
modular addition to a DualQ. And similarly, if protection against
excessive queue delay is needed, a per-flow queue protection option
can be added to a DualQ (e.g. [I-D.briscoe-docsis-q-protection]).
4.2. Handling Unresponsive Flows and Overload
In the absence of any per-flow control, it is important that the
basic DualQ Coupled AQM gives unresponsive flows no more throughput
advantage than a single-queue AQM would, and that it at least handles
overload situations. Overload means that incoming load significantly
or persistently exceeds output capacity, but it is not intended to be
a precise term -- significant and persistent are matters of degree.
A trade-off needs to be made between complexity and the risk of
either traffic class harming the other. In overloaded conditions the
higher priority L4S service will have to sacrifice some aspect of its
performance. Depending on the degree of overload, alternative
solutions may relax a different factor: e.g. throughput, delay, drop.
These choices need to be made either by the developer or by operator
policy, rather than by the IETF. Subsequent subsections discuss
aspects relating to handling of different degrees of overload:
* Unresponsive flows (L and/or C) but not overloaded, i.e. the sum
of unresponsive load before adding any responsive traffic is below
capacity;
This case is handled by the regular Coupled DualQ (Section 2.1)
but not discussed there. So below, Section 4.2.1 explains the
design goal, and how it is achieved in practice;
* Unresponsive flows (L and/or C) causing persistent overload,
i.e. the sum of unresponsive load even before adding any
responsive traffic persistently exceeds capacity;
This case is not covered by the regular Coupled DualQ mechanism
(Section 2.1) but the last para in Section 2.5.1.1 sets out a
requirement to handle the case where ECN-capable traffic could
starve non-ECN-capable traffic. Section 4.2.3 below discusses
the general options and gives specific examples.
* Short-term overload that lies between the 'not overloaded' and
'persistently overloaded' cases.
For the period before overload is deemed persistent,
Section 4.2.2 discusses options for more immediate mechanisms
at the scheduler timescale. These prevent short-term
starvation of the C queue by making the priority of the L queue
conditional, as required in Section 2.5.1.
4.2.1. Unresponsive Traffic without Overload
When one or more L flows and/or C flows are unresponsive, but their
total load is within the link capacity so that they do not saturate
the coupled marking (below 100%), the goal of a DualQ AQM is to
behave no worse than a single-queue AQM.
Tests have shown that this is indeed the case with no additional
mechanism beyond the regular Coupled DualQ of Section 2.1 (see the
results of 'overload experiments' in [DCttH19]). Perhaps counter-
intuitively, whether the unresponsive flow classifies itself into the
L or the C queue, the DualQ system behaves as if it has subtracted
from the overall link capacity. Then, the coupling shares out the
remaining capacity between any competing responsive flows (in either
queue). See also Section 4.2.2, which discusses scheduler-specific
details.
4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S Throughput
or Delay?
Priority of L4S is required to be conditional (see Section 2.4 & Priority of L4S is required to be conditional (see Section 2.4 &
Section 2.5.1) to avoid total starvation of Classic by heavy L4S Section 2.5.1) to avoid short-term starvation of Classic. Otherwise,
traffic. This raises the question of whether to sacrifice L4S as explained in Section 2.4, even a lone responsive L4S flow could
throughput or L4S delay (or some other policy) to mitigate starvation temporarily block a small finite set of C packets (e.g. an initial
of Classic: window or DNS request). The blockage would only be brief, but it
could be longer for certain AQM implementations that can only
increase the congestion signal coupled from the C queue when C
packets are actually being dequeued. There is then the question of
whether to sacrifice L4S throughput or L4S delay (or some other
policy) to make the priority conditional:
Sacrifice L4S throughput: By using weighted round robin as the Sacrifice L4S throughput: By using weighted round robin as the
conditional priority scheduler, the L4S service can sacrifice some conditional priority scheduler, the L4S service can sacrifice some
throughput during overload. This can either be thought of as throughput during overload. This can either be thought of as
guaranteeing a minimum throughput service for Classic traffic, or guaranteeing a minimum throughput service for Classic traffic, or
as guaranteeing a maximum delay for a packet at the head of the as guaranteeing a maximum delay for a packet at the head of the
Classic queue. Classic queue.
Cautionary note: a WRR scheduler can only guarantee Classic
throughput if Classic sources are sending enough to use it --
congestion signals can undermine scheduling because they determine
how much responsive traffic of each class arrives for scheduling
in the first place. This is why scheduling is only relied on to
handle short-term starvation; until congestion signals build up
and the sources react. Even during long-term overload (discussed
more fully in Section 4.2.3), it's pragmatic to discard packets
from both queues, which again thins the traffic before it reaches
the scheduler. This is because a scheduler cannot be relied on to
handle long-term overload since the right scheduler weight cannot
be known for every scenario.
The scheduling weight of the Classic queue should be small The scheduling weight of the Classic queue should be small
(e.g. 1/16). Then, in most traffic scenarios the scheduler will (e.g. 1/16). In most traffic scenarios the scheduler will not
not interfere and it will not need to - the coupling mechanism and interfere and it will not need to, because the coupling mechanism
the end-systems will share out the capacity across both queues as and the end-systems will determine the share of capacity across
if it were a single pool. However, because the congestion both queues as if it were a single pool. However, if L4S traffic
coupling only applies in one direction (from C to L), if L4S is over-aggressive or unresponsive, the scheduler weight for
traffic is over-aggressive or unresponsive, the scheduler weight Classic traffic will at least be large enough to ensure it does
for Classic traffic will at least be large enough to ensure it not starve in the short-term.
does not starve.
In cases where the ratio of L4S to Classic flows (e.g. 19:1) is Although WRR scheduling is only expected to address short-term
greater than the ratio of their scheduler weights (e.g. 15:1), the overload, there are (somewhat rare) cases when WRR has an effect
L4S flows will get less than an equal share of the capacity, but on capacity shares over longer time-scales. But its effect is
only slightly. For instance, with the example numbers given, each minor, and it certainly does no harm. Specifically, in cases
L4S flow will get (15/16)/19 = 4.9% when ideally each would get where the ratio of L4S to Classic flows (e.g. 19:1) is greater
than the ratio of their scheduler weights (e.g. 15:1), the L4S
flows will get less than an equal share of the capacity, but only
slightly. For instance, with the example numbers given, each L4S
flow will get (15/16)/19 = 4.9% when ideally each would get
1/20=5%. In the rather specific case of an unresponsive flow 1/20=5%. In the rather specific case of an unresponsive flow
taking up just less than the capacity set aside for L4S taking up just less than the capacity set aside for L4S
(e.g. 14/16 in the above example), using WRR could significantly (e.g. 14/16 in the above example), using WRR could significantly
reduce the capacity left for any responsive L4S flows. reduce the capacity left for any responsive L4S flows.
The scheduling weight of the Classic queue should not be too The scheduling weight of the Classic queue should not be too
small, otherwise a C packet at the head of the queue could be small, otherwise a C packet at the head of the queue could be
excessively delayed by a continually busy L queue. For instance excessively delayed by a continually busy L queue. For instance
if the Classic weight is 1/16, the maximum that a Classic packet if the Classic weight is 1/16, the maximum that a Classic packet
at the head of the queue can be delayed by L traffic is the at the head of the queue can be delayed by L traffic is the
serialization delay of 15 MTU-sized packets. serialization delay of 15 MTU-sized packets.
Sacrifice L4S Delay: To control milder overload of responsive Sacrifice L4S Delay: The operator could choose to control overload
traffic, particularly when close to the maximum congestion signal, of the Classic queue by allowing some delay to 'leak' across to
the operator could choose to control overload of the Classic queue the L4S queue. The scheduler can be made to behave like a single
by allowing some delay to 'leak' across to the L4S queue. The First-In First-Out (FIFO) queue with different service times by
scheduler can be made to behave like a single First-In First-Out implementing a very simple conditional priority scheduler that
(FIFO) queue with different service times by implementing a very could be called a "time-shifted FIFO" (see the Modifier Earliest
simple conditional priority scheduler that could be called a Deadline First (MEDF) scheduler [MEDF]). This scheduler adds
"time-shifted FIFO" (see the Modifier Earliest Deadline First tshift to the queue delay of the next L4S packet, before comparing
(MEDF) scheduler of [MEDF]). This scheduler adds tshift to the it with the queue delay of the next Classic packet, then it
queue delay of the next L4S packet, before comparing it with the selects the packet with the greater adjusted queue delay.
queue delay of the next Classic packet, then it selects the packet
with the greater adjusted queue delay. Under regular conditions, Under regular conditions, this time-shifted FIFO scheduler behaves
this time-shifted FIFO scheduler behaves just like a strict just like a strict priority scheduler. But under moderate or high
priority scheduler. But under moderate or high overload it overload it prevents starvation of the Classic queue, because the
prevents starvation of the Classic queue, because the time-shift time-shift (tshift) defines the maximum extra queuing delay of
(tshift) defines the maximum extra queuing delay of Classic Classic packets relative to L4S. This would control milder
packets relative to L4S. overload of responsive traffic by introducing delay to defer
invoking the overload mechanisms in Section 4.2.3, particularly
when close to the maximum congestion signal.
The example implementations in Appendix A and Appendix B could both The example implementations in Appendix A and Appendix B could both
be implemented with either policy. be implemented with either policy.
4.1.2. Congestion Signal Saturation: Introduce L4S Drop or Delay? 4.2.3. L4S ECN Saturation: Introduce Drop or Delay?
To keep the throughput of both L4S and Classic flows roughly equal This section concerns persistent overload caused by unresponsive L
over the full load range, a different control strategy needs to be and/or C flows. To keep the throughput of both L4S and Classic flows
defined above the point where one AQM first saturates to a roughly equal over the full load range, a different control strategy
probability of 100% leaving no room to push back the load any harder. needs to be defined above the point where the L4S AQM persistently
If k>1, L4S will saturate first, even though saturation could be saturates to an ECN marking probability of 100% leaving no room to
caused by unresponsive traffic in either queue. push back the load any harder. L4S ECN marking will saturate first
(assuming the coupling factor k>1), even though saturation could be
caused by the sum of unresponsive traffic in either or both queues
exceeding the link capacity.
The term 'unresponsive' includes cases where a flow becomes The term 'unresponsive' includes cases where a flow becomes
temporarily unresponsive, for instance, a real-time flow that takes a temporarily unresponsive, for instance, a real-time flow that takes a
while to adapt its rate in response to congestion, or a standard Reno while to adapt its rate in response to congestion, or a standard Reno
flow that is normally responsive, but above a certain congestion flow that is normally responsive, but above a certain congestion
level it will not be able to reduce its congestion window below the level it will not be able to reduce its congestion window below the
allowed minimum of 2 segments [RFC5681], effectively becoming allowed minimum of 2 segments [RFC5681], effectively becoming
unresponsive. (Note that L4S traffic ought to remain responsive unresponsive. (Note that L4S traffic ought to remain responsive
below a window of 2 segments (see [I-D.ietf-tsvwg-ecn-l4s-id]). below a window of 2 segments (see the L4S
requirements [I-D.ietf-tsvwg-ecn-l4s-id]).
Saturation raises the question of whether to relieve congestion by Saturation raises the question of whether to relieve congestion by
introducing some drop into the L4S queue or by allowing delay to grow introducing some drop into the L4S queue or by allowing delay to grow
in both queues (which could eventually lead to tail drop too): in both queues (which could eventually lead to drop due to buffer
exhaustion anyway):
Drop on Saturation: Saturation can be avoided by setting a maximum Drop on Saturation: Persistent saturation can be defined by a
threshold for L4S ECN marking (assuming k>1) before saturation maximum threshold for coupled L4S ECN marking (assuming k>1)
starts to make the flow rates of the different traffic types before saturation starts to make the flow rates of the different
diverge. Above that the drop probability of Classic traffic is traffic types diverge. Above that, the drop probability of
applied to all packets of all traffic types. Then experiments Classic traffic is applied to all packets of all traffic types.
have shown that queueing delay can be kept at the target in any Then experiments have shown that queueing delay can be kept at the
overload situation, including with unresponsive traffic, and no target in any overload situation, including with unresponsive
further measures are required [DualQ-Test]. traffic, and no further measures are required (Section 4.2.3.1).
Delay on Saturation: When L4S marking saturates, instead of Delay on Saturation: When L4S marking saturates, instead of
introducing L4S drop, the drop and marking probabilities of both introducing L4S drop, the drop and marking probabilities of both
queues could be capped. Beyond that, delay will grow either queues could be capped. Beyond that, delay will grow either
solely in the queue with unresponsive traffic (if WRR is used), or solely in the queue with unresponsive traffic (if WRR is used), or
in both queues (if time-shifted FIFO is used). In either case, in both queues (if time-shifted FIFO is used). In either case,
the higher delay ought to control temporary high congestion. If the higher delay ought to control temporary high congestion. If
the overload is more persistent, eventually the combined DualQ the overload is more persistent, eventually the combined DualQ
will overflow and tail drop will control congestion. will overflow and tail drop will control congestion.
skipping to change at page 25, line 38 skipping to change at page 28, line 5
saturation" policy. The DOCSIS specification of a DualQ Coupled saturation" policy. The DOCSIS specification of a DualQ Coupled
AQM [DOCSIS3.1] also implements the 'drop on saturation' policy with AQM [DOCSIS3.1] also implements the 'drop on saturation' policy with
a very shallow L buffer. However, the addition of DOCSIS per-flow a very shallow L buffer. However, the addition of DOCSIS per-flow
Queue Protection [I-D.briscoe-docsis-q-protection] turns this into Queue Protection [I-D.briscoe-docsis-q-protection] turns this into
'delay on saturation' by redirecting some packets of the flow(s) most 'delay on saturation' by redirecting some packets of the flow(s) most
responsible for L queue overload into the C queue, which has a higher responsible for L queue overload into the C queue, which has a higher
delay target. If overload continues, this again becomes 'drop on delay target. If overload continues, this again becomes 'drop on
saturation' as the level of drop in the C queue rises to maintain the saturation' as the level of drop in the C queue rises to maintain the
target delay of the C queue. target delay of the C queue.
4.1.3. Protecting against Unresponsive ECN-Capable Traffic 4.2.3.1. Protecting against Overload by Unresponsive ECN-Capable
Traffic
Unresponsive traffic has a greater advantage if it is also ECN- Without a specific overload mechanism, unresponsive traffic would
capable. The advantage is undetectable at normal low levels of drop/ have a greater advantage if it were also ECN-capable. The advantage
marking, but it becomes significant with the higher levels of drop/ is undetectable at normal low levels of marking. However, it would
marking typical during overload. This is an issue whether the ECN- become significant with the higher levels of marking typical during
capable traffic is L4S or Classic. overload, when it could evade a significant degree of drop. This is
an issue whether the ECN-capable traffic is L4S or Classic.
This raises the question of whether and when to introduce drop of This raises the question of whether and when to introduce drop of
ECN-capable traffic, as required by both Section 7 of [RFC3168] and ECN-capable traffic, as required by both Section 7 of the ECN
Section 4.2.1 of [RFC7567]. spec [RFC3168] and Section 4.2.1 of the AQM
recommendations [RFC7567].
Experiments with the DualPI2 AQM (Appendix A) have shown that As an example, experiments with the DualPI2 AQM (Appendix A) have
introducing 'drop on saturation' at 100% L4S marking addresses this shown that introducing 'drop on saturation' at 100% coupled L4S
problem with unresponsive ECN as well as addressing the saturation marking addresses this problem with unresponsive ECN as well as
problem. It leaves only a small range of congestion levels where addressing the saturation problem. At saturation, DualPI2 switches
unresponsive traffic gains any advantage from using the ECN into overload mode, where the base AQM is driven by the max delay of
capability (relative to being unresponsive without ECN), and the both queues and it introduces probabilistic drop to both queues
advantage is hardly detectable [DualQ-Test]. equally. It leaves only a small range of congestion levels just
below saturation where unresponsive traffic gains any advantage from
using the ECN capability (relative to being unresponsive without
ECN), and the advantage is hardly detectable (see [DualQ-Test] and
section IV-E of [DCttH19]. Also overload with an unresponsive ECT(1)
flow gets no more bandwidth advantage than with ECT(0).
5. Acknowledgements 5. Acknowledgements
Thanks to Anil Agarwal, Sowmini Varadhan's, Gabi Bracha, Nicolas Thanks to Anil Agarwal, Sowmini Varadhan's, Gabi Bracha, Nicolas
Kuhn, Greg Skinner, Tom Henderson, David Pullen, Mirja Kuehlewind, Kuhn, Greg Skinner, Tom Henderson, David Pullen, Mirja Kuehlewind,
Gorry Fairhurst, Pete Heist and Ermin Sakic for detailed review Gorry Fairhurst, Pete Heist and Ermin Sakic for detailed review
comments particularly of the appendices and suggestions on how to comments particularly of the appendices and suggestions on how to
make the explanations clearer. Thanks also to Tom Henderson for make the explanations clearer. Thanks also to Tom Henderson for
insights on the choice of schedulers and queue delay measurement insights on the choice of schedulers and queue delay measurement
techniques. techniques.
The early contributions of Koen De Schepper, Bob Briscoe, Olga The early contributions of Koen De Schepper, Bob Briscoe, Olga
Bondarenko and Inton Tsang were part-funded by the European Community Bondarenko and Inton Tsang were part-funded by the European Community
under its Seventh Framework Programme through the Reducing Internet under its Seventh Framework Programme through the Reducing Internet
Transport Latency (RITE) project (ICT-317700). Bob Briscoe's Transport Latency (RITE) project (ICT-317700). Contributions of Koen
contribution was also part-funded by the Comcast Innovation Fund and De Schepper and Olivier Tilmans were also part-funded by the 5Growth
the Research Council of Norway through the TimeIn project. The views and DAEMON EU H2020 projects. Bob Briscoe's contribution was also
expressed here are solely those of the authors. part-funded by the Comcast Innovation Fund and the Research Council
of Norway through the TimeIn project. The views expressed here are
solely those of the authors.
6. Contributors 6. Contributors
The following contributed implementations and evaluations that The following contributed implementations and evaluations that
validated and helped to improve this specification: validated and helped to improve this specification:
Olga Albisser <olga@albisser.org> of Simula Research Lab, Norway Olga Albisser <olga@albisser.org> of Simula Research Lab, Norway
(Olga Bondarenko during early drafts) implemented the prototype (Olga Bondarenko during early drafts) implemented the prototype
DualPI2 AQM for Linux with Koen De Schepper and conducted DualPI2 AQM for Linux with Koen De Schepper and conducted
extensive evaluations as well as implementing the live performance extensive evaluations as well as implementing the live performance
skipping to change at page 27, line 13 skipping to change at page 29, line 38
implementations were tested. implementations were tested.
7. References 7. References
7.1. Normative References 7.1. Normative References
[I-D.ietf-tsvwg-ecn-l4s-id] [I-D.ietf-tsvwg-ecn-l4s-id]
Schepper, K. D. and B. Briscoe, "Explicit Congestion Schepper, K. D. and B. Briscoe, "Explicit Congestion
Notification (ECN) Protocol for Very Low Queuing Delay Notification (ECN) Protocol for Very Low Queuing Delay
(L4S)", Work in Progress, Internet-Draft, draft-ietf- (L4S)", Work in Progress, Internet-Draft, draft-ietf-
tsvwg-ecn-l4s-id-23, 24 December 2021, tsvwg-ecn-l4s-id-24, 1 February 2022,
<https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-
ecn-l4s-id-23>. ecn-l4s-id-24>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001, RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>. <https://www.rfc-editor.org/info/rfc3168>.
skipping to change at page 28, line 5 skipping to change at page 30, line 30
Queue- based Active Queue Management Algorithms", Proc. Queue- based Active Queue Management Algorithms", Proc.
Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI: Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI:
10.1117/12.473021, 2002, 10.1117/12.473021, 2002,
<https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. <https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>.
[ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An
Algorithm for Increasing the Robustness of RED's Active Algorithm for Increasing the Robustness of RED's Active
Queue Management", ACIRI Technical Report , August 2001, Queue Management", ACIRI Technical Report , August 2001,
<http://www.icir.org/floyd/red.html>. <http://www.icir.org/floyd/red.html>.
[BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", github
repository; Linux congestion control module,
<https://github.com/google/bbr/blob/v2alpha/README.md>.
[CCcensus19] [CCcensus19]
Mishra, A., Sun, X., Jain, A., Pande, S., Joshi, R., and Mishra, A., Sun, X., Jain, A., Pande, S., Joshi, R., and
B. Leong, "The Great Internet TCP Congestion Control B. Leong, "The Great Internet TCP Congestion Control
Census", Proc. ACM on Measurement and Analysis of Census", Proc. ACM on Measurement and Analysis of
Computing Systems 3(3), December 2019, Computing Systems 3(3), December 2019,
<https://doi.org/10.1145/3366693>. <https://doi.org/10.1145/3366693>.
[CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay",
ACM Queue 10(5), May 2012, ACM Queue 10(5), May 2012,
<http://queue.acm.org/issuedetail.cfm?issue=2208917>. <http://queue.acm.org/issuedetail.cfm?issue=2208917>.
skipping to change at page 29, line 39 skipping to change at page 32, line 20
Jacobson, "BBR Congestion Control", Work in Progress, Jacobson, "BBR Congestion Control", Work in Progress,
Internet-Draft, draft-cardwell-iccrg-bbr-congestion- Internet-Draft, draft-cardwell-iccrg-bbr-congestion-
control-01, 7 November 2021, control-01, 7 November 2021,
<https://datatracker.ietf.org/doc/html/draft-cardwell- <https://datatracker.ietf.org/doc/html/draft-cardwell-
iccrg-bbr-congestion-control-01>. iccrg-bbr-congestion-control-01>.
[I-D.ietf-tsvwg-l4s-arch] [I-D.ietf-tsvwg-l4s-arch]
Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White,
"Low Latency, Low Loss, Scalable Throughput (L4S) Internet "Low Latency, Low Loss, Scalable Throughput (L4S) Internet
Service: Architecture", Work in Progress, Internet-Draft, Service: Architecture", Work in Progress, Internet-Draft,
draft-ietf-tsvwg-l4s-arch-15, 24 December 2021, draft-ietf-tsvwg-l4s-arch-16, 1 February 2022,
<https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-
l4s-arch-15>. l4s-arch-16>.
[L4Sdemo16] [L4Sdemo16]
Bondarenko, O., De Schepper, K., Tsang, I., and B. Bondarenko, O., De Schepper, K., Tsang, I., and B.
Briscoe, "Ultra-Low Delay for All: Live Experience, Live Briscoe, "Ultra-Low Delay for All: Live Experience, Live
Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016,
<http://dl.acm.org/citation.cfm?doid=2910017.2910633 <http://dl.acm.org/citation.cfm?doid=2910017.2910633
(videos of demos: (videos of demos:
https://riteproject.eu/dctth/#1511dispatchwg )>. https://riteproject.eu/dctth/#1511dispatchwg )>.
[L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C., [L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C.,
skipping to change at page 32, line 42 skipping to change at page 35, line 25
[RFC8298] Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation [RFC8298] Johansson, I. and Z. Sarker, "Self-Clocked Rate Adaptation
for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December for Multimedia", RFC 8298, DOI 10.17487/RFC8298, December
2017, <https://www.rfc-editor.org/info/rfc8298>. 2017, <https://www.rfc-editor.org/info/rfc8298>.
[RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
RFC 8312, DOI 10.17487/RFC8312, February 2018, RFC 8312, DOI 10.17487/RFC8312, February 2018,
<https://www.rfc-editor.org/info/rfc8312>. <https://www.rfc-editor.org/info/rfc8312>.
[RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of
Pervasive Encryption on Operators", RFC 8404,
DOI 10.17487/RFC8404, July 2018,
<https://www.rfc-editor.org/info/rfc8404>.
[SCReAM] Johansson, I., "SCReAM", github repository; , [SCReAM] Johansson, I., "SCReAM", github repository; ,
<https://github.com/EricssonResearch/scream/blob/master/ <https://github.com/EricssonResearch/scream/blob/master/
README.md>. README.md>.
[SigQ-Dyn] Briscoe, B., "Rapid Signalling of Queue Dynamics", [SigQ-Dyn] Briscoe, B., "Rapid Signalling of Queue Dynamics",
Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI], Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI],
September 2017, <https://arxiv.org/abs/1904.07044>. September 2017, <https://arxiv.org/abs/1904.07044>.
Appendix A. Example DualQ Coupled PI2 Algorithm Appendix A. Example DualQ Coupled PI2 Algorithm
skipping to change at page 37, line 22 skipping to change at page 40, line 7
If limit is not exceeded, the packet is timestamped in line 4. This If limit is not exceeded, the packet is timestamped in line 4. This
assumes that queue delay is measured using the sojourn time technique assumes that queue delay is measured using the sojourn time technique
(see Note a for alternatives). (see Note a for alternatives).
At lines 5-9, the packet is classified and enqueued to the Classic or At lines 5-9, the packet is classified and enqueued to the Classic or
L4S queue dependent on the least significant bit of the ECN field in L4S queue dependent on the least significant bit of the ECN field in
the IP header (line 6). Packets with a codepoint having an LSB of 0 the IP header (line 6). Packets with a codepoint having an LSB of 0
(Not-ECT and ECT(0)) will be enqueued in the Classic queue. (Not-ECT and ECT(0)) will be enqueued in the Classic queue.
Otherwise, ECT(1) and CE packets will be enqueued in the L4S queue. Otherwise, ECT(1) and CE packets will be enqueued in the L4S queue.
Optional additional packet classification flexibility is omitted for Optional additional packet classification flexibility is omitted for
brevity (see [I-D.ietf-tsvwg-ecn-l4s-id]). brevity (see the L4S ECN protocol [I-D.ietf-tsvwg-ecn-l4s-id]).
The dequeue pseudocode (Figure 4) is repeatedly called whenever the The dequeue pseudocode (Figure 4) is repeatedly called whenever the
lower layer is ready to forward a packet. It schedules one packet lower layer is ready to forward a packet. It schedules one packet
for dequeuing (or zero if the queue is empty) then returns control to for dequeuing (or zero if the queue is empty) then returns control to
the caller, so that it does not block while that packet is being the caller, so that it does not block while that packet is being
forwarded. While making this dequeue decision, it also makes the forwarded. While making this dequeue decision, it also makes the
necessary AQM decisions on dropping or marking. The alternative of necessary AQM decisions on dropping or marking. The alternative of
applying the AQMs at enqueue would shift some processing from the applying the AQMs at enqueue would shift some processing from the
critical time when each packet is dequeued. However, it would also critical time when each packet is dequeued. However, it would also
add a whole queue of delay to the control signals, making the control add a whole queue of delay to the control signals, making the control
skipping to change at page 39, line 50 skipping to change at page 42, line 39
Integral (PI) controller that alters p' dependent on: a) the error Integral (PI) controller that alters p' dependent on: a) the error
between the current queuing delay (curq) and the target queuing between the current queuing delay (curq) and the target queuing
delay, 'target'; and b) the change in queuing delay since the last delay, 'target'; and b) the change in queuing delay since the last
sample. The name 'PI' represents the fact that the second factor sample. The name 'PI' represents the fact that the second factor
(how fast the queue is growing) is _P_roportional to load while the (how fast the queue is growing) is _P_roportional to load while the
first is the _I_ntegral of the load (so it removes any standing queue first is the _I_ntegral of the load (so it removes any standing queue
in excess of the target). in excess of the target).
The target parameter can be set based on local knowledge, but the aim The target parameter can be set based on local knowledge, but the aim
is for the default to be a good compromise for anywhere in the is for the default to be a good compromise for anywhere in the
intended deployment environment---the public Internet. According to intended deployment environment -- the public Internet. According to
[PI2param], the target queuing delay on line 9 of Figure 2 is related [PI2param], the target queuing delay on line 9 of Figure 2 is related
to the typical base RTT worldwide, RTT_typ, by two factors: target = to the typical base RTT worldwide, RTT_typ, by two factors: target =
RTT_typ * g * f. Below we summarize the rationale behind these RTT_typ * g * f. Below we summarize the rationale behind these
factors and introduce a further adjustment. The two factors ensure factors and introduce a further adjustment. The two factors ensure
that, in a large proportion of cases (say 90%), the sawtooth that, in a large proportion of cases (say 90%), the sawtooth
variations in RTT of a single flow will fit within the buffer without variations in RTT of a single flow will fit within the buffer without
underutilizing the link. Frankly, these factors are educated underutilizing the link. Frankly, these factors are educated
guesses, but with the emphasis closer to 'educated' than to 'guess' guesses, but with the emphasis closer to 'educated' than to 'guess'
(see [PI2param] for full background): (see [PI2param] for full background):
skipping to change at page 40, line 32 skipping to change at page 43, line 20
significant outlier and, on reflection, the experimental technique significant outlier and, on reflection, the experimental technique
seemed inappropriate to the CDN market in China. seemed inappropriate to the CDN market in China.
* g is taken as 0.38. The factor g is a geometry factor that * g is taken as 0.38. The factor g is a geometry factor that
characterizes the shape of the sawteeth of prevalent Classic characterizes the shape of the sawteeth of prevalent Classic
congestion controllers. The geometry factor is the fraction of congestion controllers. The geometry factor is the fraction of
the amplitude of the sawtooth variability in queue delay that lies the amplitude of the sawtooth variability in queue delay that lies
below the AQM's target. For instance, at low bit rate, the below the AQM's target. For instance, at low bit rate, the
geometry factor of standard Reno is 0.5, but at higher rates it geometry factor of standard Reno is 0.5, but at higher rates it
tends to just under 1. According to the census of congestion tends to just under 1. According to the census of congestion
controllers conducted by Mishra _et al_ in Jul-Oct 2019 controllers conducted by Mishra _et al_ in Jul-Oct
[CCcensus19], most Classic TCP traffic uses Cubic. And, according 2019 [CCcensus19], most Classic TCP traffic uses Cubic. And,
to the analysis in [PI2param], if running over a PI2 AQM, a large according to the analysis in [PI2param], if running over a PI2
proportion of this Cubic traffic would be in its Reno-Friendly AQM, a large proportion of this Cubic traffic would be in its
mode, which has a geometry factor of ~0.39 (all known Reno-Friendly mode, which has a geometry factor of ~0.39 (all
implementations). The rest of the Cubic traffic would be in true known implementations). The rest of the Cubic traffic would be in
Cubic mode, which has a geometry factor of ~0.36. Without true Cubic mode, which has a geometry factor of ~0.36. Without
modelling the sawtooth profiles from all the other less prevalent modelling the sawtooth profiles from all the other less prevalent
congestion controllers, we estimate a 7:3 weighted average of congestion controllers, we estimate a 7:3 weighted average of
these two, resulting in an average geometry factor of 0.38. these two, resulting in an average geometry factor of 0.38.
* f is taken as 2. The factor f is a safety factor that increases * f is taken as 2. The factor f is a safety factor that increases
the target queue to allow for the distribution of RTT_typ around the target queue to allow for the distribution of RTT_typ around
its mean. Otherwise the target queue would only avoid its mean. Otherwise the target queue would only avoid
underutilization for those users below the mean. It also provides underutilization for those users below the mean. It also provides
a safety margin for the proportion of paths in use that span a safety margin for the proportion of paths in use that span
beyond the distance between a user and their local CDN. Currently beyond the distance between a user and their local CDN. Currently
skipping to change at page 41, line 41 skipping to change at page 44, line 29
alpha depends on Tupdate (see line 13 of the initialization function alpha depends on Tupdate (see line 13 of the initialization function
in Figure 2). It is best to update p' as frequently as possible, but in Figure 2). It is best to update p' as frequently as possible, but
Tupdate will probably be constrained by hardware performance. As Tupdate will probably be constrained by hardware performance. As
shown in line 13, the update interval should be frequent enough to shown in line 13, the update interval should be frequent enough to
update at least once in the time taken for the target queue to drain update at least once in the time taken for the target queue to drain
('target') as long as it updates at least three times per maximum ('target') as long as it updates at least three times per maximum
RTT. Tupdate defaults to 16 ms in the reference Linux implementation RTT. Tupdate defaults to 16 ms in the reference Linux implementation
because it has to be rounded to a multiple of 4 ms. For link rates because it has to be rounded to a multiple of 4 ms. For link rates
from 4 to 200 Mb/s and a maximum RTT of 100ms, it has been verified from 4 to 200 Mb/s and a maximum RTT of 100ms, it has been verified
through extensive testing that Tupdate=16ms (as also recommended in through extensive testing that Tupdate=16ms (as also recommended in
[RFC8033]) is sufficient. the PIE spec [RFC8033]) is sufficient.
The choice of alpha and beta also determines the AQM's stable The choice of alpha and beta also determines the AQM's stable
operating range. The AQM ought to change p' as fast as possible in operating range. The AQM ought to change p' as fast as possible in
response to changes in load without over-compensating and therefore response to changes in load without over-compensating and therefore
causing oscillations in the queue. Therefore, the values of alpha causing oscillations in the queue. Therefore, the values of alpha
and beta also depend on the RTT of the expected worst-case flow and beta also depend on the RTT of the expected worst-case flow
(RTT_max). (RTT_max).
The maximum RTT of a PI controller (RTT_max in line 10 of Figure 2) The maximum RTT of a PI controller (RTT_max in line 10 of Figure 2)
is not an absolute maximum, but more instability (more queue is not an absolute maximum, but more instability (more queue
skipping to change at page 43, line 32 skipping to change at page 46, line 17
a. The drain rate of the queue can vary if it is scheduled relative a. The drain rate of the queue can vary if it is scheduled relative
to other queues, or to cater for fluctuations in a wireless to other queues, or to cater for fluctuations in a wireless
medium. To auto-adjust to changes in drain rate, the queue needs medium. To auto-adjust to changes in drain rate, the queue needs
to be measured in time, not bytes or packets [AQMmetrics], to be measured in time, not bytes or packets [AQMmetrics],
[CoDel]. Queuing delay could be measured directly by storing a [CoDel]. Queuing delay could be measured directly by storing a
per-packet time-stamp as each packet is enqueued, and subtracting per-packet time-stamp as each packet is enqueued, and subtracting
this from the system time when the packet is dequeued. If time- this from the system time when the packet is dequeued. If time-
stamping is not easy to introduce with certain hardware, queuing stamping is not easy to introduce with certain hardware, queuing
delay could be predicted indirectly by dividing the size of the delay could be predicted indirectly by dividing the size of the
queue by the predicted departure rate, which might be known queue by the predicted departure rate, which might be known
precisely for some link technologies (see for example [RFC8034]). precisely for some link technologies (see for example in DOCSIS
PIE [RFC8034]).
b. Line 2 of the dualpi2_enqueue() function (Figure 3) assumes an b. Line 2 of the dualpi2_enqueue() function (Figure 3) assumes an
implementation where lq and cq share common buffer memory. An implementation where lq and cq share common buffer memory. An
alternative implementation could use separate buffers for each alternative implementation could use separate buffers for each
queue, in which case the arriving packet would have to be queue, in which case the arriving packet would have to be
classified first to determine which buffer to check for available classified first to determine which buffer to check for available
space. The choice is a trade off; a shared buffer can use less space. The choice is a trade off; a shared buffer can use less
memory whereas separate buffers isolate the L4S queue from tail- memory whereas separate buffers isolate the L4S queue from tail-
drop due to large bursts of Classic traffic (e.g. a Classic Reno drop due to large bursts of Classic traffic (e.g. a Classic Reno
TCP during slow-start over a long RTT). TCP during slow-start over a long RTT).
skipping to change at page 45, line 38 skipping to change at page 48, line 18
coexists with 'Classic' Reno congestion control. So it is correct coexists with 'Classic' Reno congestion control. So it is correct
that, when the L4S queue drops packets, it drops them proportional to that, when the L4S queue drops packets, it drops them proportional to
p'^2, as if they are Classic packets. p'^2, as if they are Classic packets.
The two queues each test for overload in lines 4b and 12b of the The two queues each test for overload in lines 4b and 12b of the
dequeue function (Figure 7). Lines 8c to 8g drop L4S packets with dequeue function (Figure 7). Lines 8c to 8g drop L4S packets with
probability p'^2. Lines 8h to 8i mark the remaining packets with probability p'^2. Lines 8h to 8i mark the remaining packets with
probability p_CL. Given p_Lmax = 1, all remaining packets will be probability p_CL. Given p_Lmax = 1, all remaining packets will be
marked because, to have reached the else block at line 8b, p_CL >= 1. marked because, to have reached the else block at line 8b, p_CL >= 1.
Lines 2c to 2d in the core PI algorithm (Figure 8) deal with overload Line 2a in the core PI algorithm (Figure 8) deals with overload of
of the L4S queue when there is no Classic traffic. This is the L4S queue when there is little or no Classic traffic. This is
necessary, because the core PI algorithm maintains the appropriate necessary, because the core PI algorithm maintains the appropriate
drop probability to regulate overload, but it depends on the length drop probability to regulate overload, but it depends on the length
of the Classic queue. If there is no Classic queue the naive PI of the Classic queue. If there is little or no Classic queue the
update function in Figure 6 would drop nothing, even if the L4S queue naive PI update function in Figure 6 would drop nothing, even if the
were overloaded - so tail drop would have to take over (lines 2 and 3 L4S queue were overloaded - so tail drop would have to take over
of Figure 3). (lines 2 and 3 of Figure 3).
Instead, the test at line 2a of the full PI update function in Instead, line 2a of the full PI update function in Figure 8 ensures
Figure 8 keeps delay on target using drop. If the test at line 2a of that the base PI AQM in line 3 is driven by whichever of the two
Figure 8 finds that the Classic queue is empty, line 2d measures the queue delays is greater, but line 3 still always uses the same
current queue delay using the L4S queue instead. While the L4S queue Classic target (default 15 ms). If L queue delay is greater just
is not overloaded, its delay will always be tiny compared to the because there is little or no Classic traffic, normally it will still
target Classic queue delay. So p_CL will be driven to zero, and the be well below the base AQM target. This is because L4S traffic is
L4S queue will naturally be governed solely by p'_L from the native also governed by the shallow threshold of its own native AQM (lines 5
L4S AQM (lines 5 and 6 of the dequeue algorithm in Figure 7). But, and 6 of the dequeue algorithm in Figure 7). So the base AQM will be
if unresponsive L4S source(s) cause overload, the DualQ transitions driven to zero and not contribute. However, if the L queue is
smoothly to L4S marking based on the PI algorithm. If overload overloaded by traffic that is unresponsive to its marking, the max()
increases further, it naturally transitions from marking to dropping in line 2 enables the L queue to smoothly take over driving the base
by the mechanism already described. AQM into overload mode even if there is little or no Classic traffic.
Then the base AQM will keep the L queue to the Classic target
(default 15 ms) by shedding L packets.
1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues
2: while ( lq.byt() + cq.byt() > 0 ) { 2: while ( lq.byt() + cq.byt() > 0 ) {
3: if ( scheduler() == lq ) { 3: if ( scheduler() == lq ) {
4a: lq.dequeue(pkt) % L4S scheduled 4a: lq.dequeue(pkt) % L4S scheduled
4b: if ( p_CL < p_Lmax ) { % Check for overload saturation 4b: if ( p_CL < p_Lmax ) { % Check for overload saturation
5a: if (lq.len()>Th_len) % >1 packet queued 5a: if (lq.len()>Th_len) % >1 packet queued
5b: p'_L = laqm(lq.time()) % Native LAQM 5b: p'_L = laqm(lq.time()) % Native LAQM
5c: else 5c: else
5d: p'_L = 0 % Suppress marking 1 pkt queue 5d: p'_L = 0 % Suppress marking 1 pkt queue
skipping to change at page 47, line 6 skipping to change at page 49, line 45
18: } 18: }
19: return(pkt) % return the packet and stop 19: return(pkt) % return the packet and stop
20: } 20: }
21: return(NULL) % no packet to dequeue 21: return(NULL) % no packet to dequeue
22: } 22: }
Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM
(Including Code for Edge-Cases) (Including Code for Edge-Cases)
1: dualpi2_update(lq, cq) { % Update p' every Tupdate 1: dualpi2_update(lq, cq) { % Update p' every Tupdate
2a: if ( cq.byt() > 0 ) 2a: curq = max(cq.time(), lq.time()) % use greatest queuing time
2b: curq = cq.time() %use queuing time of first-in Classic packet
2c: else % Classic queue empty
2d: curq = lq.time() % use queuing time of first-in L4S packet
3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq)
4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor 4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor
5: p_C = p'^2 % Classic prob = (base prob)^2 5: p_C = p'^2 % Classic prob = (base prob)^2
6: prevq = curq 6: prevq = curq
7: } 7: }
Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM
(Including Overload Code) (Including Overload Code)
The choice of scheduler technology is critical to overload protection The choice of scheduler technology is critical to overload protection
(see Section 4.1). (see Section 4.2.2).
* A well-understood weighted scheduler such as weighted round robin * A well-understood weighted scheduler such as weighted round robin
(WRR) is recommended. As long as the scheduler weight for Classic (WRR) is recommended. As long as the scheduler weight for Classic
is small (e.g. 1/16), its exact value is unimportant because it is small (e.g. 1/16), its exact value is unimportant because it
does not normally determine capacity shares. The weight is only does not normally determine capacity shares. The weight is only
important to prevent unresponsive L4S traffic starving Classic important to prevent unresponsive L4S traffic starving Classic
traffic. This is because capacity sharing between the queues is traffic in the short term (see Section 4.2.2). This is because
normally determined by the coupled congestion signal, which capacity sharing between the queues is normally determined by the
overrides the scheduler, by making L4S sources leave roughly equal coupled congestion signal, which overrides the scheduler, by
per-flow capacity available for Classic flows. making L4S sources leave roughly equal per-flow capacity available
for Classic flows.
* Alternatively, a time-shifted FIFO (TS-FIFO) could be used. It * Alternatively, a time-shifted FIFO (TS-FIFO) could be used. It
works by selecting the head packet that has waited the longest, works by selecting the head packet that has waited the longest,
biased against the Classic traffic by a time-shift of tshift. To biased against the Classic traffic by a time-shift of tshift. To
implement time-shifted FIFO, the scheduler() function in line 3 of implement time-shifted FIFO, the scheduler() function in line 3 of
the dequeue code would simply be implemented as the scheduler() the dequeue code would simply be implemented as the scheduler()
function at the bottom of Figure 10 in Appendix B. For the public function at the bottom of Figure 10 in Appendix B. For the public
Internet a good value for tshift is 50ms. For private networks Internet a good value for tshift is 50ms. For private networks
with smaller diameter, about 4*target would be reasonable. TS- with smaller diameter, about 4*target would be reasonable. TS-
FIFO is a very simple scheduler, but complexity might need to be FIFO is a very simple scheduler, but complexity might need to be
skipping to change at page 48, line 14 skipping to change at page 51, line 14
- Even if time-stamping is supported, the sojourn time of the - Even if time-stamping is supported, the sojourn time of the
head packet is always stale. For instance, if a burst arrives head packet is always stale. For instance, if a burst arrives
at an empty queue, the sojourn time only fully measures the at an empty queue, the sojourn time only fully measures the
burst's delay when its last packet is dequeued, even though the burst's delay when its last packet is dequeued, even though the
queue knew about the burst from the start - so it could have queue knew about the burst from the start - so it could have
signalled congestion earlier. To remedy this, each head packet signalled congestion earlier. To remedy this, each head packet
can be marked when it is dequeued based on the expected delay can be marked when it is dequeued based on the expected delay
of the tail packet behind it, as explained below, rather than of the tail packet behind it, as explained below, rather than
based on the head packet's own delay due to the packets in based on the head packet's own delay due to the packets in
front of it. [Heist21] identifies a specific scenario where front of it. [Heist21] identifies a specific scenario where
bursty traffic significantly hits utilization of the L queue. bursty traffic significantly hits utilization of the L queue.
If this effect proves to be more widely applicable, it is If this effect proves to be more widely applicable, it is
believed that using the delay behind the head would improve believed that using the delay behind the head would improve
performance. performance.
The delay behind the head can be implemented by dividing the The delay behind the head can be implemented by dividing the
backlog at dequeue by the link rate or equivalently multiplying backlog at dequeue by the link rate or equivalently multiplying
the backlog by the delay per unit of backlog. The the backlog by the delay per unit of backlog. The
implementation details will depend on whether the link rate is implementation details will depend on whether the link rate is
known; if it is not, a moving average of the delay per unit known; if it is not, a moving average of the delay per unit
backlog can be maintained. This delay consists of backlog can be maintained. This delay consists of
serialization as well as media acquisition for shared media. serialization as well as media acquisition for shared media.
So the details will depend strongly on the specific link So the details will depend strongly on the specific link
technology, This approach should be less sensitive to timing technology, This approach should be less sensitive to timing
errors and cost less in operations and memory than the errors and cost less in operations and memory than the
otherwise equivalent 'scaled sojourn time' metric, which is the otherwise equivalent 'scaled sojourn time' metric, which is the
sojourn time of a packet scaled by the ratio of the queue sizes sojourn time of a packet scaled by the ratio of the queue sizes
when the packet departed and arrived [SigQ-Dyn]. when the packet departed and arrived [SigQ-Dyn].
* A strict priority scheduler would be inappropriate, because it * A strict priority scheduler would be inappropriate as discussed in
would starve Classic if L4S was overloaded. Section 4.2.2.
Appendix B. Example DualQ Coupled Curvy RED Algorithm Appendix B. Example DualQ Coupled Curvy RED Algorithm
As another example of a DualQ Coupled AQM algorithm, the pseudocode As another example of a DualQ Coupled AQM algorithm, the pseudocode
below gives the Curvy RED based algorithm. Although the AQM was below gives the Curvy RED based algorithm. Although the AQM was
designed to be efficient in integer arithmetic, to aid understanding designed to be efficient in integer arithmetic, to aid understanding
it is first given using floating point arithmetic (Figure 10). Then, it is first given using floating point arithmetic (Figure 10). Then,
one possible optimization for integer arithmetic is given, also in one possible optimization for integer arithmetic is given, also in
pseudocode (Figure 11). To aid comparison, the line numbers are kept pseudocode (Figure 11). To aid comparison, the line numbers are kept
in step between the two by using letter suffixes where the longer in step between the two by using letter suffixes where the longer
skipping to change at page 56, line 38 skipping to change at page 59, line 38
flow rates depend not only on the congestion probability, but also on flow rates depend not only on the congestion probability, but also on
their end-to-end RTT (= base RTT + queue delay). The rates of their end-to-end RTT (= base RTT + queue delay). The rates of
Reno [RFC5681] flows competing over an AQM are roughly inversely Reno [RFC5681] flows competing over an AQM are roughly inversely
proportional to their RTTs. Cubic exhibits similar RTT-dependence proportional to their RTTs. Cubic exhibits similar RTT-dependence
when in Reno-compatibility mode, but it is less RTT-dependent when in Reno-compatibility mode, but it is less RTT-dependent
otherwise. otherwise.
Until the early experiments with the DualQ Coupled AQM, the Until the early experiments with the DualQ Coupled AQM, the
importance of the reasonably large Classic queue in mitigating RTT- importance of the reasonably large Classic queue in mitigating RTT-
dependence when the base RTT is low had not been appreciated. dependence when the base RTT is low had not been appreciated.
Appendix A.1.6 of [I-D.ietf-tsvwg-ecn-l4s-id] uses numerical examples Appendix A.1.6 of the L4S ECN protocol [I-D.ietf-tsvwg-ecn-l4s-id]
to explain why bloated buffers had concealed the RTT-dependence of uses numerical examples to explain why bloated buffers had concealed
Classic congestion controls before that time. Then it explains why, the RTT-dependence of Classic congestion controls before that time.
the more that queuing delays have reduced, the more that RTT- Then it explains why, the more that queuing delays have reduced, the
dependence has surfaced as a potential starvation problem for long more that RTT-dependence has surfaced as a potential starvation
RTT flows, when competing against very short RTT flows. problem for long RTT flows, when competing against very short RTT
flows.
Given that congestion control on end-systems is voluntary, there is Given that congestion control on end-systems is voluntary, there is
no reason why it has to be voluntarily RTT-dependent. The RTT- no reason why it has to be voluntarily RTT-dependent. The RTT-
dependence of existing Classic traffic cannot be 'undeployed'. dependence of existing Classic traffic cannot be 'undeployed'.
Therefore, [I-D.ietf-tsvwg-ecn-l4s-id] requires L4S congestion Therefore, [I-D.ietf-tsvwg-ecn-l4s-id] requires L4S congestion
controls to be significantly less RTT-dependent than the standard controls to be significantly less RTT-dependent than the standard
Reno congestion control [RFC5681], at least at low RTT. Then RTT- Reno congestion control [RFC5681], at least at low RTT. Then RTT-
dependence ought to be no worse than it is with appropriately sized dependence ought to be no worse than it is with appropriately sized
Classic buffers. Following this approach means there is no need for Classic buffers. Following this approach means there is no need for
network devices to address RTT-dependence, although there would be no network devices to address RTT-dependence, although there would be no
skipping to change at page 58, line 5 skipping to change at page 61, line 7
On the Classic side, we consider Reno as the most sensitive and On the Classic side, we consider Reno as the most sensitive and
therefore worst-case Classic congestion control. We will also therefore worst-case Classic congestion control. We will also
consider Cubic in its Reno-friendly mode ('CReno'), as the most consider Cubic in its Reno-friendly mode ('CReno'), as the most
prevalent congestion control, according to the references and prevalent congestion control, according to the references and
analysis in [PI2param]. In either case, the Classic packet rate in analysis in [PI2param]. In either case, the Classic packet rate in
steady state is given by the well-known square root formula for Reno steady state is given by the well-known square root formula for Reno
congestion control: congestion control:
r_C = 1.22 / (R_C * p_C^0.5) (5) r_C = 1.22 / (R_C * p_C^0.5) (5)
On the L4S side, we consider the Prague congestion control On the L4S side, we consider the Prague congestion
[I-D.briscoe-iccrg-prague-congestion-control] as the reference for control [I-D.briscoe-iccrg-prague-congestion-control] as the
steady-state dependence on congestion. Prague conforms to the same reference for steady-state dependence on congestion. Prague conforms
equation as DCTCP, but we do not use the equation derived in the to the same equation as DCTCP, but we do not use the equation derived
DCTCP paper, which is only appropriate for step marking. The coupled in the DCTCP paper, which is only appropriate for step marking. The
marking, p_CL, is the appropriate one when considering throughput coupled marking, p_CL, is the appropriate one when considering
equivalence with Classic flows. Unlike step marking, coupled throughput equivalence with Classic flows. Unlike step marking,
markings are inherently spaced out, so we use the formula for DCTCP coupled markings are inherently spaced out, so we use the formula for
packet rate with probabilistic marking derived in Appendix A of DCTCP packet rate with probabilistic marking derived in Appendix A of
[PI2]. We use the equation without RTT-independence enabled, which [PI2]. We use the equation without RTT-independence enabled, which
will be explained later. will be explained later.
r_L = 2 / (R_L * p_CL) (6) r_L = 2 / (R_L * p_CL) (6)
For packet rate equivalence, we equate the two packet rates and For packet rate equivalence, we equate the two packet rates and
rearrange into the same form as Equation (1), so the two can be rearrange into the same form as Equation (1), so the two can be
equated and simplified to produce a formula for a theoretical equated and simplified to produce a formula for a theoretical
coupling factor, which we shall call k*: coupling factor, which we shall call k*:
skipping to change at page 58, line 47 skipping to change at page 61, line 49
RTT-dependence is caused by window-based congestion control, so it RTT-dependence is caused by window-based congestion control, so it
ought to be reversed there, not in the network. Therefore, we use a ought to be reversed there, not in the network. Therefore, we use a
fixed coupling factor in the network, and reduce RTT-dependence in fixed coupling factor in the network, and reduce RTT-dependence in
L4S senders. We cannot expect Classic senders to all be updated to L4S senders. We cannot expect Classic senders to all be updated to
reduce their RTT-dependence. But solely addressing the problem in reduce their RTT-dependence. But solely addressing the problem in
L4S senders at least makes RTT-dependence no worse - not just between L4S senders at least makes RTT-dependence no worse - not just between
L4S senders, but also between L4S and Classic senders. L4S senders, but also between L4S and Classic senders.
Traditionally, throughput equivalence has been defined for flows Traditionally, throughput equivalence has been defined for flows
under comparable conditions, including with the same base RTT under comparable conditions, including with the same base
[RFC2914]. So if we assume the same base RTT, R_b, for comparable RTT [RFC2914]. So if we assume the same base RTT, R_b, for
flows, we can put both R_C and R_L in terms of R_b. comparable flows, we can put both R_C and R_L in terms of R_b.
We can approximate the L4S RTT to be hardly greater than the base We can approximate the L4S RTT to be hardly greater than the base
RTT, i.e. R_L ~= R_b. And we can replace R_C with (R_b + q_C), where RTT, i.e. R_L ~= R_b. And we can replace R_C with (R_b + q_C), where
the Classic queue, q_C, depends on the target queue delay that the the Classic queue, q_C, depends on the target queue delay that the
operator has configured for the Classic AQM. operator has configured for the Classic AQM.
Taking PI2 as an example Classic AQM, it seems that we could just Taking PI2 as an example Classic AQM, it seems that we could just
take R_C = R_b + target (recommended 15 ms by default in take R_C = R_b + target (recommended 15 ms by default in
Appendix A.1). However, target is roughly the queue depth reached by Appendix A.1). However, target is roughly the queue depth reached by
the tips of the sawteeth of a congestion control, not the average the tips of the sawteeth of a congestion control, not the average
skipping to change at page 60, line 19 skipping to change at page 63, line 19
Internet: Internet:
r_L / r_C = 2 (R_C * p_C^0.5) / 1.22 (R_L * p_CL) r_L / r_C = 2 (R_C * p_C^0.5) / 1.22 (R_L * p_CL)
= (R_C * p_CL) / (1.22 * R_L * p_CL) = (R_C * p_CL) / (1.22 * R_L * p_CL)
= R_C / (1.22 * R_L) (10) = R_C / (1.22 * R_L) (10)
As an example, we can then consider single competing CReno and Prague As an example, we can then consider single competing CReno and Prague
flows, by expressing both their RTTs in (10) in terms of their base flows, by expressing both their RTTs in (10) in terms of their base
RTTs, R_bC and R_bL. So R_C is replaced by equation (8) for CReno. RTTs, R_bC and R_bL. So R_C is replaced by equation (8) for CReno.
And R_L is replaced by the max() function below, which represents the And R_L is replaced by the max() function below, which represents the
effective RTT of the current Prague congestion control effective RTT of the current Prague congestion
[I-D.briscoe-iccrg-prague-congestion-control] in its (default) RTT- control [I-D.briscoe-iccrg-prague-congestion-control] in its
independent mode, because it sets a floor to the effective RTT that (default) RTT-independent mode, because it sets a floor to the
it uses for additive increase: effective RTT that it uses for additive increase:
~= 0.85 * (R_bC + target) / (1.22 * max(R_bL, R_typ)) ~= 0.85 * (R_bC + target) / (1.22 * max(R_bL, R_typ))
~= (R_bC + target) / (1.4 * max(R_bL, R_typ)) ~= (R_bC + target) / (1.4 * max(R_bL, R_typ))
It can be seen that, for base RTTs below target (15 ms), both the It can be seen that, for base RTTs below target (15 ms), both the
numerator and the denominator plateau, which has the desired effect numerator and the denominator plateau, which has the desired effect
of limiting RTT-dependence. of limiting RTT-dependence.
At the start of the above derivations, an explanation was promised At the start of the above derivations, an explanation was promised
for why the L4S throughput equation in equation (6) did not need to for why the L4S throughput equation in equation (6) did not need to
 End of changes. 69 change blocks. 
259 lines changed or deleted 385 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/