draft-ietf-tsvwg-aqm-dualq-coupled-17.txt   draft-ietf-tsvwg-aqm-dualq-coupled-18.txt 
Transport Area working group (tsvwg) K. De Schepper Transport Area working group (tsvwg) K. De Schepper
Internet-Draft Nokia Bell Labs Internet-Draft Nokia Bell Labs
Intended status: Experimental B. Briscoe, Ed. Intended status: Experimental B. Briscoe, Ed.
Expires: 7 April 2022 Independent Expires: 28 April 2022 Independent
G. White G. White
CableLabs CableLabs
4 October 2021 25 October 2021
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput
(L4S) (L4S)
draft-ietf-tsvwg-aqm-dualq-coupled-17 draft-ietf-tsvwg-aqm-dualq-coupled-18
Abstract Abstract
The Low Latency Low Loss Scalable Throughput (L4S) architecture This specification defines a framework for coupling the Active Queue
allows data flows over the public Internet to achieve consistent low Management (AQM) algorithms in two queues intended for flows with
queuing latency, generally zero congestion loss and scaling of per- different responses to congestion. This provides a way for the
flow throughput without the scaling problems of standard TCP Reno- Internet to transition from the scaling problems of standard TCP
friendly congestion controls. To achieve this, L4S data flows have Reno-friendly ('Classic') congestion controls to the family of
to use one of the family of 'Scalable' congestion controls (TCP 'Scalable' congestion controls. These achieve consistently very Low
Prague and Data Center TCP are examples) and a form of Explicit queuing Latency, very Low congestion Loss and Scaling of per-flow
Congestion Notification (ECN) with modified behaviour. However, throughput (L4S) by using Explicit Congestion Notification (ECN) in a
until now, Scalable congestion controls did not co-exist with modified way. Until the Coupled DualQ, these L4S senders could only
existing Reno/Cubic traffic --- Scalable controls are so aggressive be deployed where a clean-slate environment could be arranged, such
that 'Classic' (e.g. Reno-friendly) algorithms sharing an ECN-capable as in private data centres. The coupling acts like a semi-permeable
queue would drive themselves to a small capacity share. Therefore, membrane: isolating the sub-millisecond average queuing delay and
until now, L4S controls could only be deployed where a clean-slate zero congestion loss of L4S from Classic latency and loss; but
environment could be arranged, such as in private data centres (hence pooling the capacity between any combination of Scalable and Classic
the name DCTCP). This specification defines `DualQ Coupled Active flows with roughly equivalent throughput per flow. The DualQ
Queue Management (AQM)', which enables Scalable congestion controls achieves this indirectly, without having to inspect transport layer
that comply with the Prague L4S requirements to co-exist safely with flow identifiers and without compromising the performance of the
Classic Internet traffic. Classic traffic. The solution has low complexity and requires no
Analytical study and implementation testing of the Coupled AQM have
shown that Scalable and Classic flows competing under similar
conditions run at roughly the same rate. It achieves this
indirectly, without having to inspect transport layer flow
identifiers. When tested in a residential broadband setting, DCTCP
also achieves sub-millisecond average queuing delay and zero
congestion loss under a wide range of mixes of DCTCP and `Classic'
broadband Internet traffic, without compromising the performance of
the Classic traffic. The solution has low complexity and requires no
configuration for the public Internet. configuration for the public Internet.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 7 April 2022. This Internet-Draft will expire on 28 April 2022.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text extracted from this document must include Simplified BSD License text
as described in Section 4.e of the Trust Legal Provisions and are as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License. provided without warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 4 1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3
1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 8 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9
2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11
2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11
2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12
2.3. Traffic Classification . . . . . . . . . . . . . . . . . 13 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12
2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13
2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17
2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17
2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18
2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 2.5.2. Management Requirements . . . . . . . . . . . . . . . 19
2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19
2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21
2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 21 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 21
2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22
3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22
skipping to change at page 3, line 14 skipping to change at page 3, line 4
2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21
2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 21 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 21
2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22
3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22
4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22
4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 22 4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 22
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput 4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput
or Delay? . . . . . . . . . . . . . . . . . . . . . . 23 or Delay? . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2. Congestion Signal Saturation: Introduce L4S Drop or 4.1.2. Congestion Signal Saturation: Introduce L4S Drop or
Delay? . . . . . . . . . . . . . . . . . . . . . . . 24 Delay? . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.3. Protecting against Unresponsive ECN-Capable 4.1.3. Protecting against Unresponsive ECN-Capable
Traffic . . . . . . . . . . . . . . . . . . . . . . . 25 Traffic . . . . . . . . . . . . . . . . . . . . . . . 25
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 26
7.1. Normative References . . . . . . . . . . . . . . . . . . 26 7.1. Normative References . . . . . . . . . . . . . . . . . . 26
7.2. Informative References . . . . . . . . . . . . . . . . . 27 7.2. Informative References . . . . . . . . . . . . . . . . . 27
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 32 Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 33
A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 33 A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 33
A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 43 A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 44
Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 47 Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 48
B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 47 B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 48
B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 53 B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 54
Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 55 Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 56
C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 55 C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 56
C.2. Guidance on Controlling Throughput Equivalence . . . . . 56 C.2. Guidance on Controlling Throughput Equivalence . . . . . 57
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 57 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 60
1. Introduction 1. Introduction
This document specifies a framework for DualQ Coupled AQMs, which is This document specifies a framework for DualQ Coupled AQMs, which is
the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch].
L4S enables both very low queuing latency (sub-millisecond on L4S enables both very low queuing latency (sub-millisecond on
average) and high throughput at the same time, for ad hoc numbers of average) and high throughput at the same time, for ad hoc numbers of
capacity-seeking applications all sharing the same capacity. capacity-seeking applications all sharing the same capacity.
1.1. Outline of the Problem 1.1. Outline of the Problem
skipping to change at page 5, line 34 skipping to change at page 5, line 6
Without the former, a 'Classic' (e.g. Reno-friendly) flow's round Without the former, a 'Classic' (e.g. Reno-friendly) flow's round
trip time (RTT) varies between roughly 1 and 2 times the base RTT trip time (RTT) varies between roughly 1 and 2 times the base RTT
between the machines in question. Without the latter a 'Classic' between the machines in question. Without the latter a 'Classic'
flow's response to changing events is delayed by a worst-case flow's response to changing events is delayed by a worst-case
(transcontinental) RTT, which could be hundreds of times the actual (transcontinental) RTT, which could be hundreds of times the actual
smoothing delay needed for the RTT of typical traffic from localized smoothing delay needed for the RTT of typical traffic from localized
CDNs. CDNs.
These changes are the two main features of the family of so-called These changes are the two main features of the family of so-called
'Scalable' congestion controls (which includes DCTCP). Both these 'Scalable' congestion controls (which includes DCTCP, TCP Prague and
changes only reduce delay in combination with a complementary change SCReAM). Both these changes only reduce delay in combination with a
in the network and they are both only feasible with ECN, not drop, complementary change in the network and they are both only feasible
for the signalling: with ECN, not drop, for the signalling:
1. The smaller sawteeth allow an extremely shallow ECN packet- 1. The smaller sawteeth allow an extremely shallow ECN packet-
marking threshold in the queue. marking threshold in the queue.
2. And no smoothing in the network means that every fluctuation of 2. And no smoothing in the network means that every fluctuation of
the queue is signalled immediately. the queue is signalled immediately.
Without ECN, either of these would lead to very high loss levels. Without ECN, either of these would lead to very high loss levels.
But, with ECN, the resulting high marking levels are just signals, But, with ECN, the resulting high marking levels are just signals,
not impairments. not impairments. BBRv2 combines the best of both worlds - it works
as a scalable congestion control when ECN is available, but also aims
to minimize delay when it isn't.
However, until now, Scalable congestion controls (like DCTCP) did not However, until now, Scalable congestion controls (like DCTCP) did not
co-exist well in a shared ECN-capable queue with existing ECN-capable co-exist well in a shared ECN-capable queue with existing ECN-capable
TCP Reno [RFC5681] or Cubic [RFC8312] congestion controls --- TCP Reno [RFC5681] or Cubic [RFC8312] congestion controls ---
Scalable controls are so aggressive that these 'Classic' algorithms Scalable controls are so aggressive that these 'Classic' algorithms
would drive themselves to a small capacity share. Therefore, until would drive themselves to a small capacity share. Therefore, until
now, L4S controls could only be deployed where a clean-slate now, L4S controls could only be deployed where a clean-slate
environment could be arranged, such as in private data centres (hence environment could be arranged, such as in private data centres (hence
the name DCTCP). the name DCTCP).
This document specifies a `DualQ Coupled AQM' extension that solves This document specifies a `DualQ Coupled AQM' extension that solves
the problem of coexistence between Scalable and Classic flows, the problem of coexistence between Scalable and Classic flows,
without having to inspect flow identifiers. It is not like flow- without having to inspect flow identifiers. It is not like flow-
queuing approaches [RFC8290] that classify packets by flow identifier queuing approaches [RFC8290] that classify packets by flow identifier
into separate queues in order to isolate sparse flows from the higher into separate queues in order to isolate sparse flows from the higher
latency in the queues assigned to heavier flows. If a flow needs latency in the queues assigned to heavier flows. If a flow needs
both low delay and high throughput, having a queue to itself does not both low delay and high throughput, having a queue to itself does not
isolate it from the harm it causes to itself. In contrast, DualQ isolate it from the harm it causes to itself. In contrast, DualQ
Coupled AQMs addresses the root cause of the latency problem --- they Coupled AQMs address the root cause of the latency problem --- they
are an enabler for the smooth low latency scalable behaviour of are an enabler for the smooth low latency scalable behaviour of
Scalable congestion controls, so that every packet in every flow can Scalable congestion controls, so that every packet in every flow can
enjoy very low latency, then there is no need to isolate each flow potentially enjoy very low latency, then there would be no need to
into a separate queue. isolate each flow into a separate queue.
1.2. Scope 1.2. Scope
L4S involves complementary changes in the network and on end-systems: L4S involves complementary changes in the network and on end-systems:
Network: A DualQ Coupled AQM (defined in the present document) or a Network: A DualQ Coupled AQM (defined in the present document) or a
modification to flow-queue AQMs (described in section 4.2.b of modification to flow-queue AQMs (described in section 4.2.b of
[I-D.ietf-tsvwg-l4s-arch]); [I-D.ietf-tsvwg-l4s-arch]);
End-system: A Scalable congestion control (defined in section 4 of End-system: A Scalable congestion control (defined in section 4 of
skipping to change at page 7, line 30 skipping to change at page 7, line 5
For the public Internet, nearly all the benefit will typically be For the public Internet, nearly all the benefit will typically be
achieved by deploying the Coupled AQM into either end of the access achieved by deploying the Coupled AQM into either end of the access
link between a 'site' and the Internet, which is invariably the link between a 'site' and the Internet, which is invariably the
bottleneck (see section 6.4 of[I-D.ietf-tsvwg-l4s-arch] about bottleneck (see section 6.4 of[I-D.ietf-tsvwg-l4s-arch] about
deployment, which also defines the term 'site' to mean a home, an deployment, which also defines the term 'site' to mean a home, an
office, a campus or mobile user equipment). office, a campus or mobile user equipment).
Latency is not the only concern of L4S: Latency is not the only concern of L4S:
* The 'Low Loss" part of the name denotes that L4S generally * The "Low Loss" part of the name denotes that L4S generally
achieves zero congestion loss (which would otherwise cause achieves zero congestion loss (which would otherwise cause
retransmission delays), due to its use of ECN. retransmission delays), due to its use of ECN.
* The "Scalable throughput" part of the name denotes that the per- * The "Scalable throughput" part of the name denotes that the per-
flow throughput of Scalable congestion controls should scale flow throughput of Scalable congestion controls should scale
indefinitely, avoiding the imminent scaling problems with 'TCP- indefinitely, avoiding the imminent scaling problems with 'TCP-
Friendly' congestion control algorithms [RFC3649]. Friendly' congestion control algorithms [RFC3649].
The former is clearly in scope of this AQM document. However, the The former is clearly in scope of this AQM document. However, the
latter is an outcome of the end-system behaviour, and therefore latter is an outcome of the end-system behaviour, and therefore
outside the scope of this AQM document, even though the AQM is an outside the scope of this AQM document, even though the AQM is an
enabler. enabler.
The overall L4S architecture [I-D.ietf-tsvwg-l4s-arch] gives more The overall L4S architecture [I-D.ietf-tsvwg-l4s-arch] gives more
detail, including on wider deployment aspects such as backwards detail, including on wider deployment aspects such as backwards
compatibility of Scalable congestion controls in bottlenecks where a compatibility of Scalable congestion controls in bottlenecks where a
DualQ Coupled AQM has not been deployed. The supporting papers DualQ Coupled AQM has not been deployed. The supporting papers
[DualPI2Linux], [PI2] and [DCttH15] give the full rationale for the [DualPI2Linux], [PI2], [DCttH19] and [PI2param] give the full
AQM's design, both discursively and in more precise mathematical rationale for the AQM's design, both discursively and in more precise
form, as well as the results of performance evaluations. mathematical form, as well as the results of performance evaluations.
1.3. Terminology 1.3. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119] when, and document are to be interpreted as described in [RFC2119] when, and
only when, they appear in all capitals, as shown here. only when, they appear in all capitals, as shown here.
The DualQ Coupled AQM uses two queues for two services. Each of the The DualQ Coupled AQM uses two queues for two services. Each of the
following terms identifies both the service and the queue that following terms identifies both the service and the queue that
skipping to change at page 10, line 16 skipping to change at page 9, line 42
Thousands of tests have been conducted in a typical fixed residential Thousands of tests have been conducted in a typical fixed residential
broadband setting. Experiments used a range of base round trip broadband setting. Experiments used a range of base round trip
delays up to 100ms and link rates up to 200 Mb/s between the data delays up to 100ms and link rates up to 200 Mb/s between the data
centre and home network, with varying amounts of background traffic centre and home network, with varying amounts of background traffic
in both queues. For every L4S packet, the AQM kept the average in both queues. For every L4S packet, the AQM kept the average
queuing delay below 1ms (or 2 packets where serialization delay queuing delay below 1ms (or 2 packets where serialization delay
exceeded 1ms on slower links), with 99th percentile no worse than exceeded 1ms on slower links), with 99th percentile no worse than
2ms. No losses at all were introduced by the L4S AQM. Details of 2ms. No losses at all were introduced by the L4S AQM. Details of
the extensive experiments are available [DualPI2Linux], [PI2], the extensive experiments are available [DualPI2Linux], [PI2],
[DCttH15]. [DCttH19].
Subjective testing was also conducted by multiple people all In all these experiments, the host was connected to the home network
by fixed Ethernet, in order to quantify the queuing delay that can be
achieved by a user who cares about delay. It should be emphasized
that L4S support at the bottleneck link cannot 'undelay' bursts
introduced by another link on the path, for instance by legacy WiFi
equipment. However, if L4S support is added to the queue feeding the
_outgoing_ WAN link of a home gateway, it would be counterproductive
not to also reduce the burstiness of the _incoming_ WiFi. Also,
trials of WiFi equipment with an L4S DualQ Coupled AQM on the
_outgoing_ WiFi interface are in progress, and early results of an
L4S DualQ Coupled AQM in a 5G radio access network testbed with
emulated outdoor cell edge radio fading are given in [L4S_5G].
Subjective testing has also been conducted by multiple people all
simultaneously using very demanding high bandwidth low latency simultaneously using very demanding high bandwidth low latency
applications over a single shared access link [L4Sdemo16]. In one applications over a single shared access link [L4Sdemo16]. In one
application, each user could use finger gestures to pan or zoom their application, each user could use finger gestures to pan or zoom their
own high definition (HD) sub-window of a larger video scene generated own high definition (HD) sub-window of a larger video scene generated
on the fly in 'the cloud' from a football match. Another user on the fly in 'the cloud' from a football match. Another user
wearing VR goggles was remotely receiving a feed from a 360-degree wearing VR goggles was remotely receiving a feed from a 360-degree
camera in a racing car, again with the sub-window in their field of camera in a racing car, again with the sub-window in their field of
vision generated on the fly in 'the cloud' dependent on their head vision generated on the fly in 'the cloud' dependent on their head
movements. Even though other users were also downloading large movements. Even though other users were also downloading large
amounts of L4S and Classic data, playing a gaming benchmark and amounts of L4S and Classic data, playing a gaming benchmark and
skipping to change at page 10, line 43 skipping to change at page 10, line 34
including the downloads) achieved the same very low latency. With an including the downloads) achieved the same very low latency. With an
alternative AQM, the video noticeably lagged behind the finger alternative AQM, the video noticeably lagged behind the finger
gestures and head movements. gestures and head movements.
Unlike Diffserv Expedited Forwarding, the L4S queue does not have to Unlike Diffserv Expedited Forwarding, the L4S queue does not have to
be limited to a small proportion of the link capacity in order to be limited to a small proportion of the link capacity in order to
achieve low delay. The L4S queue can be filled with a heavy load of achieve low delay. The L4S queue can be filled with a heavy load of
capacity-seeking flows (TCP Prague etc.) and still achieve low delay. capacity-seeking flows (TCP Prague etc.) and still achieve low delay.
The L4S queue does not rely on the presence of other traffic in the The L4S queue does not rely on the presence of other traffic in the
Classic queue that can be 'overtaken'. It gives low latency to L4S Classic queue that can be 'overtaken'. It gives low latency to L4S
traffic whether or not there is Classic traffic, and the latency of traffic whether or not there is Classic traffic. The tail latency of
Classic traffic does not suffer when a proportion of the traffic is traffic served by the Classic AQM is sometimes a little better
L4S. sometimes a little worse, when a proportion of the traffic is L4S.
The two queues are only necessary because: The two queues are only necessary because:
* the large variations (sawteeth) of Classic flows need roughly a * the large variations (sawteeth) of Classic flows need roughly a
base RTT of queuing delay to ensure full utilization base RTT of queuing delay to ensure full utilization
* Scalable flows do not need a queue to keep utilization high, but * Scalable flows do not need a queue to keep utilization high, but
they cannot keep latency predictably low if they are mixed with they cannot keep latency predictably low if they are mixed with
Classic traffic, Classic traffic,
skipping to change at page 11, line 29 skipping to change at page 11, line 20
Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the
Prague L4S requirements). Prague L4S requirements).
* The Dual Queue structure that provides latency separation for L4S * The Dual Queue structure that provides latency separation for L4S
flows to isolate them from the typically large Classic queue. flows to isolate them from the typically large Classic queue.
2.1. Coupled AQM 2.1. Coupled AQM
In the 1990s, the `TCP formula' was derived for the relationship In the 1990s, the `TCP formula' was derived for the relationship
between the steady-state congestion window, cwnd, and the drop between the steady-state congestion window, cwnd, and the drop
probability, p of standard Reno congestion control [RFC5681] . To a probability, p of standard Reno congestion control [RFC5681]. To a
first order approximation, the steady-state cwnd of Reno is inversely first order approximation, the steady-state cwnd of Reno is inversely
proportional to the square root of p. proportional to the square root of p.
The design focuses on Reno as the worst case, because if it does no The design focuses on Reno as the worst case, because if it does no
harm to Reno, it will not harm Cubic or any traffic designed to be harm to Reno, it will not harm Cubic or any traffic designed to be
friendly to Reno. TCP Cubic implements a Reno-compatibility mode, friendly to Reno. TCP Cubic implements a Reno-compatibility mode,
which is relevant for typical RTTs under 20ms as long as the which is relevant for typical RTTs under 20ms as long as the
throughput of a single flow is less than about 700Mb/s. In such throughput of a single flow is less than about 350Mb/s. In such
cases it can be assumed that Cubic traffic behaves similarly to Reno cases it can be assumed that Cubic traffic behaves similarly to Reno.
(but with a slightly different constant of proportionality). The The term 'Classic' will be used for the collection of Reno-friendly
term 'Classic' will be used for the collection of Reno-friendly
traffic including Cubic and potentially other experimental congestion traffic including Cubic and potentially other experimental congestion
controls intended not to significantly impact the flow rate of Reno. controls intended not to significantly impact the flow rate of Reno.
A supporting paper [PI2] includes the derivation of the equivalent A supporting paper [PI2] includes the derivation of the equivalent
rate equation for DCTCP, for which cwnd is inversely proportional to rate equation for DCTCP, for which cwnd is inversely proportional to
p (not the square root), where in this case p is the ECN marking p (not the square root), where in this case p is the ECN marking
probability. DCTCP is not the only congestion control that behaves probability. DCTCP is not the only congestion control that behaves
like this, so the term 'Scalable' will be used for all similar like this, so the term 'Scalable' will be used for all similar
congestion control behaviours (see examples in Section 1.2). The congestion control behaviours (see examples in Section 1.2). The
term 'L4S' is used for traffic driven by a Scalable congestion term 'L4S' is used for traffic driven by a Scalable congestion
skipping to change at page 13, line 31 skipping to change at page 13, line 19
device "MUST NOT alter the end-to-end L4S ECN identifier", so that it device "MUST NOT alter the end-to-end L4S ECN identifier", so that it
is preserved end-to-end. The aim is that each operator can choose is preserved end-to-end. The aim is that each operator can choose
how it treats L4S traffic locally, but an individual operator does how it treats L4S traffic locally, but an individual operator does
not alter the identification of L4S packets, which would prevent not alter the identification of L4S packets, which would prevent
other operators downstream from making their own choices on how to other operators downstream from making their own choices on how to
treat L4S traffic. treat L4S traffic.
In addition, an operator could use other identifiers to classify In addition, an operator could use other identifiers to classify
certain additional packet types into the L queue that it deems will certain additional packet types into the L queue that it deems will
not risk harm to the L4S service. For instance addresses of specific not risk harm to the L4S service. For instance addresses of specific
applications or hosts (see [I-D.ietf-tsvwg-ecn-l4s-id]), specific applications or hosts; specific Diffserv codepoints such as EF
Diffserv codepoints such as EF (Expedited Forwarding) and Voice-Admit (Expedited Forwarding), Voice-Admit or the Non-Queue-Building (NQB)
service classes (see [I-D.briscoe-tsvwg-l4s-diffserv]), the Non- per-hop behaviour; or certain protocols (e.g. ARP, DNS) (see
Queue-Building (NQB) per-hop behaviour [I-D.ietf-tsvwg-nqb] or Section 5.4.1 of [I-D.ietf-tsvwg-ecn-l4s-id]). Note that the
certain protocols (e.g. ARP, DNS). Note that the mechanism only mechanism only reads these identifiers. [I-D.ietf-tsvwg-ecn-l4s-id]
reads these identifiers. [I-D.ietf-tsvwg-ecn-l4s-id] says it "MUST says it "MUST NOT alter these non-ECN identifiers". Thus, the L
NOT alter these non-ECN identifiers". Thus, the L queue is not queue is not solely an L4S queue, it can be considered more generally
solely an L4S queue, it can be consider more generally as a low as a low latency queue.
latency queue.
2.4. Overall DualQ Coupled AQM Structure 2.4. Overall DualQ Coupled AQM Structure
Figure 1 shows the overall structure that any DualQ Coupled AQM is Figure 1 shows the overall structure that any DualQ Coupled AQM is
likely to have. This schematic is intended to aid understanding of likely to have. This schematic is intended to aid understanding of
the current designs of DualQ Coupled AQMs. However, it is not the current designs of DualQ Coupled AQMs. However, it is not
intended to preclude other innovative ways of satisfying the intended to preclude other innovative ways of satisfying the
normative requirements in Section 2.5 that minimally define a DualQ normative requirements in Section 2.5 that minimally define a DualQ
Coupled AQM. Coupled AQM. Also, the schematic only illustrates operation under
normally expected circumstances; behaviour under overload or with
operator-specific classifiers is deferred to Section 2.5.1.1.
The classifier on the left separates incoming traffic between the two The classifier on the left separates incoming traffic between the two
queues (L and C). Each queue has its own AQM that determines the queues (L and C). Each queue has its own AQM that determines the
likelihood of marking or dropping (p_L and p_C). It has been likelihood of marking or dropping (p_L and p_C). It has been
proved [PI2] that it is preferable to control load with a linear proved [PI2] that it is preferable to control load with a linear
controller, then square the output before applying it as a drop controller, then square the output before applying it as a drop
probability to Reno-friendly traffic (because Reno congestion control probability to Reno-friendly traffic (because Reno congestion control
decreases its load proportional to the square-root of the increase in decreases its load proportional to the square-root of the increase in
drop). So, the AQM for Classic traffic needs to be implemented in drop). So, the AQM for Classic traffic needs to be implemented in
two stages: i) a base stage that outputs an internal probability p' two stages: i) a base stage that outputs an internal probability p'
skipping to change at page 15, line 7 skipping to change at page 15, line 7
The constant of proportionality or coupling factor, k, in equation The constant of proportionality or coupling factor, k, in equation
(1) determines the ratio between the congestion probabilities (loss (1) determines the ratio between the congestion probabilities (loss
or marking) experienced by L4S and Classic traffic. Thus k or marking) experienced by L4S and Classic traffic. Thus k
indirectly determines the ratio between L4S and Classic flow rates, indirectly determines the ratio between L4S and Classic flow rates,
because flows (assuming they are responsive) adjust their rate in because flows (assuming they are responsive) adjust their rate in
response to congestion probability. Appendix C.2 gives guidance on response to congestion probability. Appendix C.2 gives guidance on
the choice of k and its effect on relative flow rates. the choice of k and its effect on relative flow rates.
_________ _________
| | ,------. | | ,------.
L4S queue | |===>| ECN | L4S (L) queue | |===>| ECN |
,'| _______|_| |marker|\ ,'| _______|_| |marker|\
<' | | `------'\\ <' | | `------'\\
//`' v ^ p_L \\ //`' v ^ p_L \\
// ,-------. | \\ // ,-------. | \\
// |Native |p'_L | \\,. // |Native |p'_L | \\,.
// | L4S |--->(MAX) < | ___ // | L4S |--->(MAX) < | ___
,----------.// | AQM | ^ p_CL `\|.'Cond-`. ,----------.// | AQM | ^ p_CL `\|.'Cond-`.
| IP-ECN |/ `-------' | / itional \ | IP-ECN |/ `-------' | / itional \
==>|Classifier| ,-------. (k*p') [ priority]==> ==>|Classifier| ,-------. (k*p') [ priority]==>
| |\ | Base | | \scheduler/ | |\ | Base | | \scheduler/
`----------'\\ | AQM |---->: ,'|`-.___.-' `----------'\\ | AQM |---->: ,'|`-.___.-'
\\ | |p' | <' | \\ | |p' | <' |
\\ `-------' (p'^2) //`' \\ `-------' (p'^2) //`'
\\ ^ | // \\ ^ | //
\\,. | v p_C // \\,. | v p_C //
< | _________ .------.// < | _________ .------.//
`\| | | | Drop |/ `\| | | | Drop |/
Classic |queue |===>|/mark | Classic (C) |queue |===>|/mark |
__|______| `------' __|______| `------'
Figure 1: DualQ Coupled AQM Schematic Figure 1: DualQ Coupled AQM Schematic
Legend: ===> traffic flow; ---> control dependency. Legend: ===> traffic flow; ---> control dependency.
After the AQMs have applied their dropping or marking, the scheduler After the AQMs have applied their dropping or marking, the scheduler
forwards their packets to the link. Even though the scheduler gives forwards their packets to the link. Even though the scheduler gives
priority to the L queue, it is not as strong as the coupling from the priority to the L queue, it is not as strong as the coupling from the
C queue. This is because, as the C queue grows, the base AQM applies C queue. This is because, as the C queue grows, the base AQM applies
skipping to change at page 17, line 5 skipping to change at page 16, line 49
As already explained, this ensures configuration can be invariant for As already explained, this ensures configuration can be invariant for
different drain rates. With AQMs in a dualQ structure this is different drain rates. With AQMs in a dualQ structure this is
particularly important because the drain rate of each queue can vary particularly important because the drain rate of each queue can vary
rapidly as flows for the two queues arrive and depart, even if the rapidly as flows for the two queues arrive and depart, even if the
combined link rate is constant. combined link rate is constant.
It would be possible to control the queues with other alternative It would be possible to control the queues with other alternative
AQMs, as long as the normative requirements (those expressed in AQMs, as long as the normative requirements (those expressed in
capitals) in Section 2.5 are observed. capitals) in Section 2.5 are observed.
The two queues could optionally be part of a larger queuing
hierarchy, such as the initial example ideas
in [I-D.briscoe-tsvwg-l4s-diffserv].
2.5. Normative Requirements for a DualQ Coupled AQM 2.5. Normative Requirements for a DualQ Coupled AQM
The following requirements are intended to capture only the essential The following requirements are intended to capture only the essential
aspects of a DualQ Coupled AQM. They are intended to be independent aspects of a DualQ Coupled AQM. They are intended to be independent
of the particular AQMs used for each queue. of the particular AQMs used for each queue.
2.5.1. Functional Requirements 2.5.1. Functional Requirements
A Dual Queue Coupled AQM implementation MUST comply with the A Dual Queue Coupled AQM implementation MUST comply with the
prerequisite L4S behaviours for any L4S network node (not just a prerequisite L4S behaviours for any L4S network node (not just a
DualQ) as specified in section 5 of [I-D.ietf-tsvwg-ecn-l4s-id]. DualQ) as specified in section 5 of [I-D.ietf-tsvwg-ecn-l4s-id].
These primarily concern classification and remarking as briefly These primarily concern classification and remarking as briefly
summarized in Section 2.3 earlier. But there is also a subsection summarized in Section 2.3 earlier. But there is also a subsection
(5.5) giving guidance on reducing the burstiness of the link (5.5) giving guidance on reducing the burstiness of the link
technology underlying any L4S AQM. technology underlying any L4S AQM.
A Dual Queue Coupled AQM implementation MUST utilize two queues, each A Dual Queue Coupled AQM implementation MUST utilize two queues, each
with an AQM algorithm. The two queues can be part of a larger with an AQM algorithm.
queuing hierarchy [I-D.briscoe-tsvwg-l4s-diffserv].
The AQM algorithm for the low latency (L) queue MUST be able to apply The AQM algorithm for the low latency (L) queue MUST be able to apply
ECN marking to ECN-capable packets. ECN marking to ECN-capable packets.
The scheduler draining the two queues MUST give L4S packets priority The scheduler draining the two queues MUST give L4S packets priority
over Classic, although priority MUST be bounded in order not to over Classic, although priority MUST be bounded in order not to
starve Classic traffic. The scheduler SHOULD be work-conserving. starve Classic traffic. The scheduler SHOULD be work-conserving, or
otherwise close to work-conserving, given Classic service will often
rely on borrowing from the L4S service.
[I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on [I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on
L4S traffic, relative to drop of Classic traffic. In order to ensure L4S traffic, relative to drop of Classic traffic. In order to ensure
coexistence of Classic and Scalable L4S traffic, it says, "The coexistence of Classic and Scalable L4S traffic, it says, "The
likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST be likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST be
roughly proportional to the square of the likelihood that it would roughly proportional to the square of the likelihood that it would
have marked it if it had been an L4S packet (p_L)." The term have marked it if it had been an L4S packet (p_L)." The term
'likelihood' is used to allow for marking and dropping to be either 'likelihood' is used to allow for marking and dropping to be either
probabilistic or deterministic. probabilistic or deterministic.
skipping to change at page 18, line 11 skipping to change at page 18, line 11
[I-D.ietf-tsvwg-ecn-l4s-id] says, "The constant of proportionality [I-D.ietf-tsvwg-ecn-l4s-id] says, "The constant of proportionality
(k) does not have to be standardised for interoperability, but a (k) does not have to be standardised for interoperability, but a
value of 2 is RECOMMENDED." value of 2 is RECOMMENDED."
Assuming Scalable congestion controls for the Internet will be as Assuming Scalable congestion controls for the Internet will be as
aggressive as DCTCP, this will ensure their congestion window will be aggressive as DCTCP, this will ensure their congestion window will be
roughly the same as that of a standards track TCP Reno congestion roughly the same as that of a standards track TCP Reno congestion
control (Reno) [RFC5681] and other Reno-friendly controls, such as control (Reno) [RFC5681] and other Reno-friendly controls, such as
TCP Cubic in its Reno-compatibility mode. TCP Cubic in its Reno-compatibility mode.
The choice of k is a matter of operator policy, and operators MAY The choice of k is a matter of operator policy, and operators MAY
choose a different value using Table 1 and the guidelines in choose a different value using the guidelines in Appendix C.2.
Appendix C.2.
If multiple customers or users share capacity at a bottleneck If multiple customers or users share capacity at a bottleneck
(e.g. in the Internet access link of a campus network), the (e.g. in the Internet access link of a campus network), the
operator's choice of k will determine capacity sharing between the operator's choice of k will determine capacity sharing between the
flows of different customers. However, on the public Internet, flows of different customers. However, on the public Internet,
access network operators typically isolate customers from each other access network operators typically isolate customers from each other
with some form of layer-2 multiplexing (OFDM(A) in DOCSIS3.1, CDMA in with some form of layer-2 multiplexing (OFDM(A) in DOCSIS3.1, CDMA in
3G, SC-FDMA in LTE) or L3 scheduling (WRR in DSL), rather than 3G, SC-FDMA in LTE) or L3 scheduling (WRR in DSL), rather than
relying on host congestion controls to share capacity between relying on host congestion controls to share capacity between
customers [RFC0970]. In such cases, the choice of k will solely customers [RFC0970]. In such cases, the choice of k will solely
skipping to change at page 21, line 38 skipping to change at page 21, line 38
statistics produced (mean, percentiles, etc.) will depend on statistics produced (mean, percentiles, etc.) will depend on
implementation constraints. To facilitate comparative evaluation implementation constraints. To facilitate comparative evaluation
of different implementations and approaches, an implementation of different implementations and approaches, an implementation
SHOULD allow mean and 99th percentile queue delay to be derived SHOULD allow mean and 99th percentile queue delay to be derived
(per queue per sample interval). A relatively simple way to do (per queue per sample interval). A relatively simple way to do
this would be to store a coarse-grained histogram of queue delay. this would be to store a coarse-grained histogram of queue delay.
This could be done with a small number of bins with configurable This could be done with a small number of bins with configurable
edges that represent contiguous ranges of queue delay. Then, over edges that represent contiguous ranges of queue delay. Then, over
a sample interval, each bin would accumulate a count of the number a sample interval, each bin would accumulate a count of the number
of packets that had fallen within each range. The maximum queue of packets that had fallen within each range. The maximum queue
delay per queue per interval MAY also be recorded. delay per queue per interval MAY also be recorded, to aid
diagnosis of faults and anomalous events.
2.5.2.3. Anomaly Detection 2.5.2.3. Anomaly Detection
An experimental DualQ Coupled AQM SHOULD asynchronously report the An experimental DualQ Coupled AQM SHOULD asynchronously report the
following data about anomalous conditions: following data about anomalous conditions:
* Start-time and duration of overload state. * Start-time and duration of overload state.
A hysteresis mechanism SHOULD be used to prevent flapping in and A hysteresis mechanism SHOULD be used to prevent flapping in and
out of overload causing an event storm. For instance, exit from out of overload causing an event storm. For instance, exit from
skipping to change at page 23, line 13 skipping to change at page 23, line 13
followed by proposed solution(s). followed by proposed solution(s).
Under overload the higher priority L4S service will have to sacrifice Under overload the higher priority L4S service will have to sacrifice
some aspect of its performance. Alternative solutions are provided some aspect of its performance. Alternative solutions are provided
below that each relax a different factor: e.g. throughput, delay, below that each relax a different factor: e.g. throughput, delay,
drop. These choices need to be made either by the developer or by drop. These choices need to be made either by the developer or by
operator policy, rather than by the IETF. operator policy, rather than by the IETF.
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput or Delay? 4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput or Delay?
Priority of L4S is required to be conditional to avoid total Priority of L4S is required to be conditional (see Section 2.5.1) to
starvation of Classic by heavy L4S traffic. This raises the question avoid total starvation of Classic by heavy L4S traffic. This raises
of whether to sacrifice L4S throughput or L4S delay (or some other the question of whether to sacrifice L4S throughput or L4S delay (or
policy) to mitigate starvation of Classic: some other policy) to mitigate starvation of Classic:
Sacrifice L4S throughput: By using weighted round robin as the Sacrifice L4S throughput: By using weighted round robin as the
conditional priority scheduler, the L4S service can sacrifice some conditional priority scheduler, the L4S service can sacrifice some
throughput during overload. This can either be thought of as throughput during overload. This can either be thought of as
guaranteeing a minimum throughput service for Classic traffic, or guaranteeing a minimum throughput service for Classic traffic, or
as guaranteeing a maximum delay for a packet at the head of the as guaranteeing a maximum delay for a packet at the head of the
Classic queue. Classic queue.
The scheduling weight of the Classic queue should be small The scheduling weight of the Classic queue should be small
(e.g. 1/16). Then, in most traffic scenarios the scheduler will (e.g. 1/16). Then, in most traffic scenarios the scheduler will
skipping to change at page 25, line 49 skipping to change at page 25, line 49
This raises the question of whether and when to switch off ECN This raises the question of whether and when to switch off ECN
marking and use solely drop instead, as required by both Section 7 of marking and use solely drop instead, as required by both Section 7 of
[RFC3168] and Section 4.2.1 of [RFC7567]. [RFC3168] and Section 4.2.1 of [RFC7567].
Experiments with the DualPI2 AQM (Appendix A) have shown that Experiments with the DualPI2 AQM (Appendix A) have shown that
introducing 'drop on saturation' at 100% L4S marking addresses this introducing 'drop on saturation' at 100% L4S marking addresses this
problem with unresponsive ECN as well as addressing the saturation problem with unresponsive ECN as well as addressing the saturation
problem. It leaves only a small range of congestion levels where problem. It leaves only a small range of congestion levels where
unresponsive traffic gains any advantage from using the ECN unresponsive traffic gains any advantage from using the ECN
capability, and the advantage is hardly detectable [DualQ-Test]. capability (relative to being unresponsive without ECN), and the
advantage is hardly detectable [DualQ-Test].
5. Acknowledgements 5. Acknowledgements
Thanks to Anil Agarwal, Sowmini Varadhan's, Gabi Bracha, Nicolas Thanks to Anil Agarwal, Sowmini Varadhan's, Gabi Bracha, Nicolas
Kuhn, Greg Skinner, Tom Henderson and David Pullen for detailed Kuhn, Greg Skinner, Tom Henderson, David Pullen, Mirja Kuehlewind,
review comments particularly of the appendices and suggestions on how Gorry Fairhurst, Pete Heist and Ermin Sakic for detailed review
to make the explanations clearer. Thanks also to Tom Henderson for comments particularly of the appendices and suggestions on how to
make the explanations clearer. Thanks also to Tom Henderson for
insights on the choice of schedulers and queue delay measurement insights on the choice of schedulers and queue delay measurement
techniques. techniques.
The early contributions of Koen De Schepper, Bob Briscoe, Olga The early contributions of Koen De Schepper, Bob Briscoe, Olga
Bondarenko and Inton Tsang were part-funded by the European Community Bondarenko and Inton Tsang were part-funded by the European Community
under its Seventh Framework Programme through the Reducing Internet under its Seventh Framework Programme through the Reducing Internet
Transport Latency (RITE) project (ICT-317700). Bob Briscoe's Transport Latency (RITE) project (ICT-317700). Bob Briscoe's
contribution was also part-funded by the Comcast Innovation Fund and contribution was also part-funded by the Comcast Innovation Fund and
the Research Council of Norway through the TimeIn project. The views the Research Council of Norway through the TimeIn project. The views
expressed here are solely those of the authors. expressed here are solely those of the authors.
skipping to change at page 28, line 26 skipping to change at page 28, line 26
[CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay",
ACM Queue 10(5), May 2012, ACM Queue 10(5), May 2012,
<http://queue.acm.org/issuedetail.cfm?issue=2208917>. <http://queue.acm.org/issuedetail.cfm?issue=2208917>.
[CRED_Insights] [CRED_Insights]
Briscoe, B., "Insights from Curvy RED (Random Early Briscoe, B., "Insights from Curvy RED (Random Early
Detection)", BT Technical Report TR-TUB8-2015-003 Detection)", BT Technical Report TR-TUB8-2015-003
arXiv:1904.07339 [cs.NI], July 2015, arXiv:1904.07339 [cs.NI], July 2015,
<https://arxiv.org/abs/1904.07339>. <https://arxiv.org/abs/1904.07339>.
[DCttH15] De Schepper, K., Bondarenko, O., Briscoe, B., and I. [DCttH19] De Schepper, K., Bondarenko, O., Tilmans, O., and B.
Tsang, "`Data Centre to the Home': Ultra-Low Latency for Briscoe, "`Data Centre to the Home': Ultra-Low Latency for
All", RITE project Technical Report , 2015, All", Updated RITE project Technical Report , July 2019,
<http://riteproject.eu/publications/>. <https://bobbriscoe.net/pubs.html#DCttH_TR>.
[DOCSIS3.1] [DOCSIS3.1]
CableLabs, "MAC and Upper Layer Protocols Interface CableLabs, "MAC and Upper Layer Protocols Interface
(MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable
Service Interface Specifications DOCSIS® 3.1 Version i17 Service Interface Specifications DOCSIS® 3.1 Version i17
or later, 21 January 2019, <https://specification- or later, 21 January 2019, <https://specification-
search.cablelabs.com/CM-SP-MULPIv3.1>. search.cablelabs.com/CM-SP-MULPIv3.1>.
[DualPI2Linux] [DualPI2Linux]
Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O.,
skipping to change at page 29, line 5 skipping to change at page 29, line 5
<https://www.netdevconf.org/0x13/session.html?talk- <https://www.netdevconf.org/0x13/session.html?talk-
DUALPI2-AQM>. DUALPI2-AQM>.
[DualQ-Test] [DualQ-Test]
Steen, H., "Destruction Testing: Ultra-Low Delay using Steen, H., "Destruction Testing: Ultra-Low Delay using
Dual Queue Coupled Active Queue Management", Masters Dual Queue Coupled Active Queue Management", Masters
Thesis, Dept of Informatics, Uni Oslo , May 2017, Thesis, Dept of Informatics, Uni Oslo , May 2017,
<https://www.duo.uio.no/bitstream/handle/10852/57424/ <https://www.duo.uio.no/bitstream/handle/10852/57424/
thesis-henrste.pdf?sequence=1>. thesis-henrste.pdf?sequence=1>.
[Heist21] Heist, P. and J. Morton, "L4S Tests", github README,
August 2021, <https://github.com/heistp/l4s-
tests/#underutilization-with-bursty-traffic>.
[I-D.briscoe-docsis-q-protection] [I-D.briscoe-docsis-q-protection]
Briscoe, B. and G. White, "Queue Protection to Preserve Briscoe, B. and G. White, "Queue Protection to Preserve
Low Latency", Work in Progress, Internet-Draft, draft- Low Latency", Work in Progress, Internet-Draft, draft-
briscoe-docsis-q-protection-00, 8 July 2019, briscoe-docsis-q-protection-00, 8 July 2019,
<https://datatracker.ietf.org/doc/html/draft-briscoe- <https://datatracker.ietf.org/doc/html/draft-briscoe-
docsis-q-protection-00>. docsis-q-protection-00>.
[I-D.briscoe-iccrg-prague-congestion-control] [I-D.briscoe-iccrg-prague-congestion-control]
Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague
Congestion Control", Work in Progress, Internet-Draft, Congestion Control", Work in Progress, Internet-Draft,
skipping to change at page 29, line 42 skipping to change at page 29, line 46
cardwell-iccrg-bbr-congestion-control-00>. cardwell-iccrg-bbr-congestion-control-00>.
[I-D.ietf-tsvwg-l4s-arch] [I-D.ietf-tsvwg-l4s-arch]
Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White,
"Low Latency, Low Loss, Scalable Throughput (L4S) Internet "Low Latency, Low Loss, Scalable Throughput (L4S) Internet
Service: Architecture", Work in Progress, Internet-Draft, Service: Architecture", Work in Progress, Internet-Draft,
draft-ietf-tsvwg-l4s-arch-10, 1 July 2021, draft-ietf-tsvwg-l4s-arch-10, 1 July 2021,
<https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-
l4s-arch-10>. l4s-arch-10>.
[I-D.ietf-tsvwg-nqb]
White, G. and T. Fossati, "A Non-Queue-Building Per-Hop
Behavior (NQB PHB) for Differentiated Services", Work in
Progress, Internet-Draft, draft-ietf-tsvwg-nqb-07, 28 July
2021, <https://datatracker.ietf.org/doc/html/draft-ietf-
tsvwg-nqb-07>.
[L4Sdemo16] [L4Sdemo16]
Bondarenko, O., De Schepper, K., Tsang, I., and B. Bondarenko, O., De Schepper, K., Tsang, I., and B.
Briscoe, "Ultra-Low Delay for All: Live Experience, Live Briscoe, "Ultra-Low Delay for All: Live Experience, Live
Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016,
<http://dl.acm.org/citation.cfm?doid=2910017.2910633 <http://dl.acm.org/citation.cfm?doid=2910017.2910633
(videos of demos: (videos of demos:
https://riteproject.eu/dctth/#1511dispatchwg )>. https://riteproject.eu/dctth/#1511dispatchwg )>.
[L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C.,
Johansson, I., Strand, J., Lédl, P., and D. Schnieders,
"Enabling time-critical applications over 5G with rate
adaptation", Ericsson - Deutsche Telekom White Paper BNEW-
21:025455 Uen, May 2021, <https://www.ericsson.com/en/
reports-and-papers/white-papers/enabling-time-critical-
applications-over-5g-with-rate-adaptation>.
[Labovitz10] [Labovitz10]
Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide,
J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc
ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010, ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010,
<https://doi.org/10.1145/1851275.1851194>. <https://doi.org/10.1145/1851275.1851194>.
[LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency
DOCSIS: Technology Overview", CableLabs White Paper , DOCSIS: Technology Overview", CableLabs White Paper ,
February 2019, <https://cablela.bs/low-latency-docsis- February 2019, <https://cablela.bs/low-latency-docsis-
technology-overview-february-2019>. technology-overview-february-2019>.
skipping to change at page 31, line 13 skipping to change at page 31, line 17
<https://www.rfc-editor.org/info/rfc970>. <https://www.rfc-editor.org/info/rfc970>.
[RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
S., Wroclawski, J., and L. Zhang, "Recommendations on S., Wroclawski, J., and L. Zhang, "Recommendations on
Queue Management and Congestion Avoidance in the Queue Management and Congestion Avoidance in the
Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998,
<https://www.rfc-editor.org/info/rfc2309>. <https://www.rfc-editor.org/info/rfc2309>.
[RFC2914] Floyd, S., "Congestion Control Principles", BCP 41,
RFC 2914, DOI 10.17487/RFC2914, September 2000,
<https://www.rfc-editor.org/info/rfc2914>.
[RFC3246] Davie, B., Charny, A., Bennet, J.C.R., Benson, K., Le [RFC3246] Davie, B., Charny, A., Bennet, J.C.R., Benson, K., Le
Boudec, J.Y., Courtney, W., Davari, S., Firoiu, V., and D. Boudec, J.Y., Courtney, W., Davari, S., Firoiu, V., and D.
Stiliadis, "An Expedited Forwarding PHB (Per-Hop Stiliadis, "An Expedited Forwarding PHB (Per-Hop
Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002,
<https://www.rfc-editor.org/info/rfc3246>. <https://www.rfc-editor.org/info/rfc3246>.
[RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows",
RFC 3649, DOI 10.17487/RFC3649, December 2003, RFC 3649, DOI 10.17487/RFC3649, December 2003,
<https://www.rfc-editor.org/info/rfc3649>. <https://www.rfc-editor.org/info/rfc3649>.
skipping to change at page 34, line 8 skipping to change at page 34, line 18
* The base AQM function that implements the PI algorithm * The base AQM function that implements the PI algorithm
dualpi2_update(lq, cq) (Figure 6) used to regularly update the dualpi2_update(lq, cq) (Figure 6) used to regularly update the
base probability (p'), which is squared for the Classic AQM as base probability (p'), which is squared for the Classic AQM as
well as being coupled across to the L4S queue. well as being coupled across to the L4S queue.
It also uses the following functions that are not shown in full here: It also uses the following functions that are not shown in full here:
* scheduler(), which selects between the head packets of the two * scheduler(), which selects between the head packets of the two
queues; the choice of scheduler technology is discussed later; queues; the choice of scheduler technology is discussed later;
* cq.len() or lq.len() returns the current length (aka. backlog) of * cq.byt() or lq.byt() returns the current length (aka. backlog) of
the relevant queue in bytes; the relevant queue in bytes;
* cq.len() or lq.len() returns the current length of the relevant
queue in packets;
* cq.time() or lq.time() returns the current queuing delay * cq.time() or lq.time() returns the current queuing delay
(aka. sojourn time or service time) of the relevant queue in units (aka. sojourn time or service time) of the relevant queue in units
of time (see Note a); of time (see Note a);
* mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; * mark(pkt) and drop(pkt) for ECN-marking and dropping a packet;
In experiments so far (building on experiments with PIE) on broadband In experiments so far (building on experiments with PIE) on broadband
access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms
to 100 ms, DualPI2 achieves good results with the default parameters to 100 ms, DualPI2 achieves good results with the default parameters
in Figure 2. The parameters are categorised by whether they relate in Figure 2. The parameters are categorised by whether they relate
skipping to change at page 35, line 12 skipping to change at page 35, line 12
sensible values can be used in scenarios other than the regular sensible values can be used in scenarios other than the regular
public Internet. public Internet.
1: dualpi2_params_init(...) { % Set input parameter defaults 1: dualpi2_params_init(...) { % Set input parameter defaults
2: % DualQ Coupled framework parameters 2: % DualQ Coupled framework parameters
5: limit = MAX_LINK_RATE * 250 ms % Dual buffer size 5: limit = MAX_LINK_RATE * 250 ms % Dual buffer size
3: k = 2 % Coupling factor 3: k = 2 % Coupling factor
4: % NOT SHOWN % scheduler-dependent weight or equival't parameter 4: % NOT SHOWN % scheduler-dependent weight or equival't parameter
6: 6:
7: % PI2 Classic AQM parameters 7: % PI2 Classic AQM parameters
8: % Typical RTT, RTT_typ = 34 ms 8: target = 15 ms % Queue delay target
9: target = 15 ms % Queue delay target = RTT_typ * 0.22 * 2 9: RTT_max = 100 ms % Worst case RTT expected
10: RTT_max = 100 ms % Worst case RTT expected 10: % PI2 constants derived from above PI2 parameters
11: % PI2 constants derived from above PI2 parameters 11: p_Cmax = min(1/k^2, 1) % Max Classic drop/mark prob
12: p_Cmax = min(1/k^2, 1) % Max Classic drop/mark prob 12: Tupdate = min(target, RTT_max/3) % PI sampling interval
13: Tupdate = min(target, RTT_max/3) % PI sampling interval 13: alpha = 0.1 * Tupdate / RTT_max^2 % PI integral gain in Hz
14: alpha = 0.1 * Tupdate / RTT_max^2 % PI integral gain in Hz 14: beta = 0.3 / RTT_max % PI proportional gain in Hz
15: beta = 0.3 / RTT_max % PI proportional gain in Hz 15:
16: 16: % L4S ramp AQM parameters
17: % L4S ramp AQM parameters 17: minTh = 800 us % L4S min marking threshold in time units
18: minTh = 800 us % L4S min marking threshold in time units 18: range = 400 us % Range of L4S ramp in time units
19: range = 400 us % Range of L4S ramp in time units 19: Th_len = 1 pkt % Min L4S marking threshold in packets
20: Th_len = 2 * MTU % Min L4S marking threshold in bytes 20: % L4S constants
21: % L4S constants incl. those derived from other parameters 21: p_Lmax = 1 % Max L4S marking prob
22: p_Lmax = 1 % Max L4S marking prob 22: }
23: floor = Th_len / MIN_LINK_RATE
24: if (minTh < floor) {
25: % Shift ramp so minTh >= serialization time of 2 MTU
26: minTh = floor
27: }
28: maxTh = minTh+range % L4S max marking threshold in time units
29: }
Figure 2: Example Header Pseudocode for DualQ Coupled PI2 AQM Figure 2: Example Header Pseudocode for DualQ Coupled PI2 AQM
The overall goal of the code is to maintain the base probability (p', The overall goal of the code is to apply the marking and dropping
p-prime as in Section 2.4), which is an internal variable from which probabilities for L4S and Classic traffic (p_L and p_C). These are
the marking and dropping probabilities for L4S and Classic traffic derived from the underlying base probabilities p'_L and p' driven
(p_L and p_C) are derived, with p_L in turn being derived from p_CL. respectively by the traffic in the L and C queues. The marking
probability for the L queue (p_L) depends on both the base
probability in its own queue (p'_L) and a probability called p_CL,
which is coupled across from p' in the C queue (see Section 2.4 for
the derivation of the specific equations and dependencies).
The probabilities p_CL and p_C are derived in lines 4 and 5 of the The probabilities p_CL and p_C are derived in lines 4 and 5 of the
dualpi2_update() function (Figure 6) then used in the dualpi2_update() function (Figure 6) then used in the
dualpi2_dequeue() function where p_L is also derived from p_CL at dualpi2_dequeue() function where p_L is also derived from p_CL at
line 6 (Figure 4). The code walk-through below builds up to line 6 (Figure 4). The code walk-through below builds up to
explaining that part of the code eventually, but it starts from explaining that part of the code eventually, but it starts from
packet arrival. packet arrival.
1: dualpi2_enqueue(lq, cq, pkt) { % Test limit and classify lq or cq 1: dualpi2_enqueue(lq, cq, pkt) { % Test limit and classify lq or cq
2: if ( lq.len() + cq.len() + MTU > limit) 2: if ( lq.byt() + cq.byt() + MTU > limit)
3: drop(pkt) % drop packet if buffer is full 3: drop(pkt) % drop packet if buffer is full
4: timestamp(pkt) % attach arrival time to packet 4: timestamp(pkt) % attach arrival time to packet
5: % Packet classifier 5: % Packet classifier
6: if ( ecn(pkt) modulo 2 == 1 ) % ECN bits = ECT(1) or CE 6: if ( ecn(pkt) modulo 2 == 1 ) % ECN bits = ECT(1) or CE
7: lq.enqueue(pkt) 7: lq.enqueue(pkt)
8: else % ECN bits = not-ECT or ECT(0) 8: else % ECN bits = not-ECT or ECT(0)
9: cq.enqueue(pkt) 9: cq.enqueue(pkt)
10: } 10: }
Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM
1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues
2: while ( lq.len() + cq.len() > 0 ) { 2: while ( lq.byt() + cq.byt() > 0 ) {
3: if ( scheduler() == lq ) { 3: if ( scheduler() == lq ) {
4: lq.dequeue(pkt) % Scheduler chooses lq 4: lq.dequeue(pkt) % Scheduler chooses lq
5: p'_L = laqm(lq.time()) % Native L4S AQM 5: p'_L = laqm(lq.time()) && (lq.len() > Th_len) % Native LAQM
6: p_L = max(p'_L, p_CL) % Combining function 6: p_L = max(p'_L, p_CL) % Combining function
7: if ( recur(lq, p_L) ) % Linear marking 7: if ( recur(lq, p_L) ) % Linear marking
8: mark(pkt) 8: mark(pkt)
9: } else { 9: } else {
10: cq.dequeue(pkt) % Scheduler chooses cq 10: cq.dequeue(pkt) % Scheduler chooses cq
11: if ( recur(cq, p_C) ) { % probability p_C = p'^2 11: if ( recur(cq, p_C) ) { % probability p_C = p'^2
12: if ( ecn(pkt) == 0 ) { % if ECN field = not-ECT 12: if ( ecn(pkt) == 0 ) { % if ECN field = not-ECT
13: drop(pkt) % squared drop 13: drop(pkt) % squared drop
14: continue % continue to the top of the while loop 14: continue % continue to the top of the while loop
15: } 15: }
skipping to change at page 38, line 9 skipping to change at page 38, line 9
Line 6 calculates p_L as the maximum of the coupled L4S Line 6 calculates p_L as the maximum of the coupled L4S
probability p_CL and the probability from the native L4S AQM p'_L. probability p_CL and the probability from the native L4S AQM p'_L.
This implements the max() function shown in Figure 1 to couple the This implements the max() function shown in Figure 1 to couple the
outputs of the two AQMs together. Of the two probabilities input outputs of the two AQMs together. Of the two probabilities input
to p_L in line 6: to p_L in line 6:
- p'_L is calculated per packet in line 5 by the laqm() function - p'_L is calculated per packet in line 5 by the laqm() function
(see Figure 5), (see Figure 5),
- Whereas p_CL is maintained by the dualpi2_update() function - Whereas p_CL is maintained by the dualpi2_update() function
which runs every Tupdate (Tupdate is set in line 13 of which runs every Tupdate (Tupdate is set in line 12 of
Figure 2). Figure 2).
* If a Classic packet is scheduled, lines 10 to 17 drop or mark the * If a Classic packet is scheduled, lines 10 to 17 drop or mark the
packet with probability p_C. packet with probability p_C.
The Native L4S AQM algorithm (Figure 5) is a ramp function, similar The Native L4S AQM algorithm (Figure 5) is a ramp function, similar
to the RED algorithm, but simplified as follows: to the RED algorithm, but simplified as follows:
* The extent of the ramp is defined in units of queuing delay, not * The extent of the ramp is defined in units of queuing delay, not
bytes, so that configuration remains invariant as the queue bytes, so that configuration remains invariant as the queue
skipping to change at page 38, line 36 skipping to change at page 38, line 36
* The ramp rises linearly directly from 0 to 1, not to an * The ramp rises linearly directly from 0 to 1, not to an
intermediate value of p'_L as RED would, because there is no need intermediate value of p'_L as RED would, because there is no need
to keep ECN marking probability low. to keep ECN marking probability low.
* Marking does not have to be randomized. Determinism is used * Marking does not have to be randomized. Determinism is used
instead of randomness; to reduce the delay necessary to smooth out instead of randomness; to reduce the delay necessary to smooth out
the noise of randomness from the signal. the noise of randomness from the signal.
The ramp function requires two configuration parameters, the minimum The ramp function requires two configuration parameters, the minimum
threshold (minTh) and the width of the ramp (range), both in units of threshold (minTh) and the width of the ramp (range), both in units of
queuing time), as shown in lines 18 & 19 of the initialization queuing time, as shown in lines 17 & 18 of the initialization
function in Figure 2. The ramp function can be configured as a step function in Figure 2. The ramp function can be configured as a step
(see Note c). (see Note c).
Although the DCTCP paper [Alizadeh-stability] recommends an ECN Although the DCTCP paper [Alizadeh-stability] recommends an ECN
marking threshold of 0.17*RTT_typ, it also shows that the threshold marking threshold of 0.17*RTT_typ, it also shows that the threshold
can be much shallower with hardly any worse under-utilization of the can be much shallower with hardly any worse under-utilization of the
link (because the amplitude of DCTCP's sawteeth is so small). Based link (because the amplitude of DCTCP's sawteeth is so small). Based
on extensive experiments, for the public Internet the default minimum on extensive experiments, for the public Internet the default minimum
ECN marking threshold in Figure 2 is considered a good compromise, ECN marking threshold (target) in Figure 2 is considered a good
even though it is significantly smaller fraction of RTT_typ. compromise, even though it is significantly smaller fraction of
RTT_typ.
A minimum marking threshold parameter (Th_len) in transmission units A minimum marking threshold parameter (Th_len, default 1 packet) is
(default 2 MTU) is also necessary to ensure that the ramp does not also necessary to ensure that the ramp does not trigger excessive
trigger excessive marking on slow links. The code in lines 24-27 of marking on slow links. Where an implementation knows the link rate,
the initialization function (Figure 2) converts 2 MTU into time units it can set up this minimum at the time it is configured. For
and shifts the ramp so that the min threshold is no shallower than instance, it would divide 1 MTU by the link rate to convert it into a
this floor. serialization time, then if the lower threshold of the Native L AQM
ramp was lower than this serialization time, it could increase the
thresholds to shift the bottom of the ramp to 2 MTU. This is the
approach used in DOCSIS [DOCSIS3.1], because the configured link rate
is dedicated to the DualQ.
In software implementations, as shown in the pseudocode, the link
rate might be shared with other queues. The second part of the
logical AND condition in Line 5 of Figure 4 caters for such cases.
Even if the outcome of the Native L4S AQM function, laqm(), is true,
it does not mark a packet unless the queue also exceeds 1 packet (but
see note later about the Linux implementation).
1: laqm(qdelay) { % Returns native L4S AQM probability 1: laqm(qdelay) { % Returns native L4S AQM probability
2: if (qdelay >= maxTh) 2: if (qdelay >= maxTh)
3: return 1 3: return 1
4: else if (qdelay > minTh) 4: else if (qdelay > minTh)
5: return (qdelay - minTh)/range % Divide could use a bit-shift 5: return (qdelay - minTh)/range % Divide could use a bit-shift
6: else 6: else
7: return 0 7: return 0
8: } 8: }
skipping to change at page 40, line 16 skipping to change at page 40, line 16
Integral (PI) controller that alters p' dependent on: a) the error Integral (PI) controller that alters p' dependent on: a) the error
between the current queuing delay (curq) and the target queuing between the current queuing delay (curq) and the target queuing
delay, 'target'; and b) the change in queuing delay since the last delay, 'target'; and b) the change in queuing delay since the last
sample. The name 'PI' represents the fact that the second factor sample. The name 'PI' represents the fact that the second factor
(how fast the queue is growing) is _P_roportional to load while the (how fast the queue is growing) is _P_roportional to load while the
first is the _I_ntegral of the load (so it removes any standing queue first is the _I_ntegral of the load (so it removes any standing queue
in excess of the target). in excess of the target).
The target parameter can be set based on local knowledge, but the aim The target parameter can be set based on local knowledge, but the aim
is for the default to be a good compromise for anywhere in the is for the default to be a good compromise for anywhere in the
intended deployment environment---the public Internet. The target intended deployment environment---the public Internet. According to
queuing delay is related to the typical base RTT, RTT_typ, by two [PI2param], the target queuing delay on line 9 of Figure 2 is related
factors, shown in the comment on line 9 of Figure 2 as target = to the typical base RTT worldwide, RTT_typ, by two factors: target =
RTT_typ * 0.22 * 2. These factors ensure that, in a large proportion RTT_typ * g * f. Below we summarize the rationale behind these
of cases (say 90%), the sawtooth variations in RTT will fit within factors and introduce a further adjustment. The two factors ensure
the buffer without underutilizing the link. Frankly, these factors that, in a large proportion of cases (say 90%), the sawtooth
are educated guesses, but with the emphasis closer to 'educated' than variations in RTT of a single flow will fit within the buffer without
to 'guess' (see [PI2param] for background investigations): underutilizing the link. Frankly, these factors are educated
guesses, but with the emphasis closer to 'educated' than to 'guess'
(see [PI2param] for full background):
* RTT_typ is taken as 34 ms. This is based on an average CDN * RTT_typ is taken as 25 ms. This is based on an average CDN
latency measured in each country weighted by the number of latency measured in each country weighted by the number of
Internet users in that country to produce an overall weighted Internet users in that country to produce an overall weighted
average for the Internet [PI2param]. average for the Internet [PI2param]. Countries were ranked by
number of Internet users, and once 90% of Internet users were
covered, smaller countries were excluded to avoid
unrepresentatively small sample sizes. Also, importantly, the
data for the average CDN latency in China (with the largest number
of Internet users) has been removed, because the CDN latency was a
significant outlier and, on reflection, the experimental technique
seemed inappropriate to the CDN market in China.
* The factor 0.22 is a geometry factor that characterizes the shape * g is taken as 0.38. The factor g is a geometry factor that
of the sawteeth of prevalent Classic congestion controllers. The characterizes the shape of the sawteeth of prevalent Classic
geometry factor is the difference between the minimum and the congestion controllers. The geometry factor is the fraction of
average queue delays of the sawteeth, relative to the base RTT. the amplitude of the sawtooth variability in queue delay that lies
For instance, the geometry factor of standard Reno is 0.5. below the AQM's target. For instance, at low bit rate, the
According to the census of congestion controllers conducted by geometry factor of standard Reno is 0.5, but at higher rates it
Mishra _et al_ in Jul-Oct 2019 [CCcensus19], most Classic TCP tends to just under 1. According to the census of congestion
traffic uses Cubic. And, according to the analysis in [PI2param], controllers conducted by Mishra _et al_ in Jul-Oct 2019
if running over a PI2 AQM, a large proportion of this Cubic [CCcensus19], most Classic TCP traffic uses Cubic. And, according
traffic would be in its Reno-Friendly mode, which has a geometry to the analysis in [PI2param], if running over a PI2 AQM, a large
factor of 0.21 (Linux implementation). The rest of the Cubic proportion of this Cubic traffic would be in its Reno-Friendly
traffic would be in true Cubic mode, which has a geometry factor mode, which has a geometry factor of ~0.39 (all known
of 0.32. Without modelling the sawtooth profiles from all the implementations). The rest of the Cubic traffic would be in true
other less prevalent congestion controllers, we estimate a 9:1 Cubic mode, which has a geometry factor of ~0.36. Without
weighted average of these two, resulting in an average geometry modelling the sawtooth profiles from all the other less prevalent
factor of 0.22. congestion controllers, we estimate a 7:3 weighted average of
these two, resulting in an average geometry factor of 0.38.
* The factor 2, is a safety factor that increases the target queue * f is taken as 2. The factor f is a safety factor that increases
to allow for the distribution of RTT_typ around its mean. the target queue to allow for the distribution of RTT_typ around
Otherwise the target queue would only avoid underutilization for its mean. Otherwise the target queue would only avoid
those users below the mean. It also provides a safety margin for underutilization for those users below the mean. It also provides
the proportion of paths in use that span beyond the distance a safety margin for the proportion of paths in use that span
between a user and their local CDN. Currently no data is beyond the distance between a user and their local CDN. Currently
available on the variance of queue delay around the mean in each no data is available on the variance of queue delay around the
region, so there is plenty of room for this guess to become more mean in each region, so there is plenty of room for this guess to
educated. become more educated.
* [PI2param] recommends target = RTT_typ * g * f = 25ms * 0.38 * 2 =
19 ms. However a further adjustment is warranted, because target
is moving year on year. The paper is based on data collected in
2019, and it mentions evidence from speedtest.net that suggests
RTT_typ reduced by 17% (fixed) or 12% (mobile) between 2020 and
2021. Therefore we recommend a default of target = 15 ms at the
time of writing (2021).
Operators can always use the data and discussion in [PI2param] to
configure a more appropriate target for their environment. For
instance, an operator might wish to question the assumptions called
out in that paper, such as the goal of no underutilization for a
large majority of single flow transfers (given many large transfers
use multiple flows to avoid the scaling limitations of Classic
flows).
The two 'gain factors' in line 3 of Figure 6, alpha and beta, The two 'gain factors' in line 3 of Figure 6, alpha and beta,
respectively weight how strongly each of the two elements (Integral respectively weight how strongly each of the two elements (Integral
and Proportional) alters p'. They are in units of 'per second of and Proportional) alters p'. They are in units of 'per second of
delay' or Hz, because they transform differences in queueing delay delay' or Hz, because they transform differences in queueing delay
into changes in probability (assuming probability has a value from 0 into changes in probability (assuming probability has a value from 0
to 1). to 1).
alpha and beta determine how much p' ought to change after each Alpha and beta determine how much p' ought to change after each
update interval (Tupdate). For smaller Tupdate, p' should change by update interval (Tupdate). For smaller Tupdate, p' should change by
the same amount per second, but in finer more frequent steps. So the same amount per second, but in finer more frequent steps. So
alpha depends on Tupdate (see line 13 of the initialization function alpha depends on Tupdate (see line 13 of the initialization function
in Figure 2). It is best to update p' as frequently as possible, but in Figure 2). It is best to update p' as frequently as possible, but
Tupdate will probably be constrained by hardware performance. As Tupdate will probably be constrained by hardware performance. As
shown in line 13, the update interval should be frequent enough to shown in line 13, the update interval should be frequent enough to
update at least once in the time taken for the target queue to drain update at least once in the time taken for the target queue to drain
('target') as long as it updates at least three times per maximum ('target') as long as it updates at least three times per maximum
RTT. Tupdate defaults to 16 ms in the reference Linux implementation RTT. Tupdate defaults to 16 ms in the reference Linux implementation
because it has to be rounded to a multiple of 4 ms. For link rates because it has to be rounded to a multiple of 4 ms. For link rates
skipping to change at page 42, line 34 skipping to change at page 43, line 12
for the L4S queue is k*alpha (with defaults alpha = 0.16 Hz and k=2, for the L4S queue is k*alpha (with defaults alpha = 0.16 Hz and k=2,
effective L4S alpha = 0.32 Hz). effective L4S alpha = 0.32 Hz).
Unlike in PIE [RFC8033], alpha and beta do not need to be tuned every Unlike in PIE [RFC8033], alpha and beta do not need to be tuned every
Tupdate dependent on p'. Instead, in PI2, alpha and beta are Tupdate dependent on p'. Instead, in PI2, alpha and beta are
independent of p' because the squaring applied to Classic traffic independent of p' because the squaring applied to Classic traffic
tunes them inherently. This is explained in [PI2], which also tunes them inherently. This is explained in [PI2], which also
explains why this more principled approach removes the need for most explains why this more principled approach removes the need for most
of the heuristics that had to be added to PIE. of the heuristics that had to be added to PIE.
Nonetheless, an implementer might wish to add selected heuristics to Nonetheless, an implementer might wish to add selected details to
either AQM. For instance the Linux reference DualPI2 implementation either AQM. For instance the Linux reference DualPI2 implementation
includes the following: includes the following (not shown in the pseudocode above):
* Prior to enqueuing an L4S packet, if the L queue contains <2 * The check that the queue exceeds Th_len before marking with the
packets, the packet is flagged to suppress any native L4S AQM native L4S AQM is actually at enqueue, not dequeue, otherwise it
marking at dequeue (which depends on sojourn time); would exempt the last packet of a burst from being marked. The
result of the check is conveyed from enqueue to the dequeue
function via a boolean in the packet metadata.
* Classic and coupled marking or dropping (i.e. based on p_C and * Classic and coupled marking or dropping (i.e. based on p_C and
p_CL from the PI controller) is only applied to a packet if the p_CL from the PI controller) is not applied to a packet if the
respective queue length in bytes is > 2 MTU (prior to enqueuing respective queue length in bytes is < 2 MTU (prior to enqueuing
the packet or after dequeuing it, depending on whether the AQM is the packet or dequeuing it, depending on whether the AQM is
configured to be applied at enqueue or dequeue); configured to be applied at enqueue or dequeue);
* In the WRR scheduler, the 'credit' indicating which queue should * In the WRR scheduler, the 'credit' indicating which queue should
transmit is only changed if there are packets in both queues transmit is only changed if there are packets in both queues
(i.e. if there is actual resource contention). This means that a (i.e. if there is actual resource contention). This means that a
properly paced L flow might never be delayed by the WRR. The WRR properly paced L flow might never be delayed by the WRR. The WRR
credit is reset in favour of the L queue when the link is idle. credit is reset in favour of the L queue when the link is idle.
An implementer might also wish to add other heuristics, e.g. burst An implementer might also wish to add other heuristics, e.g. burst
protection [RFC8033] or enhanced burst protection [RFC8034]. protection [RFC8033] or enhanced burst protection [RFC8034].
skipping to change at page 45, line 12 skipping to change at page 46, line 6
is not overloaded, its delay will always be tiny compared to the is not overloaded, its delay will always be tiny compared to the
target Classic queue delay. So p_CL will be driven to zero, and the target Classic queue delay. So p_CL will be driven to zero, and the
L4S queue will naturally be governed solely by p'_L from the native L4S queue will naturally be governed solely by p'_L from the native
L4S AQM (lines 5 and 6 of the dequeue algorithm in Figure 7). But, L4S AQM (lines 5 and 6 of the dequeue algorithm in Figure 7). But,
if unresponsive L4S source(s) cause overload, the DualQ transitions if unresponsive L4S source(s) cause overload, the DualQ transitions
smoothly to L4S marking based on the PI algorithm. If overload smoothly to L4S marking based on the PI algorithm. If overload
increases further, it naturally transitions from marking to dropping increases further, it naturally transitions from marking to dropping
by the switch-over mechanism already described. by the switch-over mechanism already described.
1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues
2: while ( lq.len() + cq.len() > 0 ) { 2: while ( lq.byt() + cq.byt() > 0 ) {
3: if ( scheduler() == lq ) { 3: if ( scheduler() == lq ) {
4a: lq.dequeue(pkt) % L4S scheduled 4a: lq.dequeue(pkt) % L4S scheduled
4b: if ( p_CL < p_Lmax ) { % Check for overload saturation 4b: if ( p_CL < p_Lmax ) { % Check for overload saturation
5: p'_L = laqm(lq.time()) % Native L4S AQM 5: p'_L = laqm(lq.time()) && (lq.len()>Th_len) % Native LAQM
6: p_L = max(p'_L, p_CL) % Combining function 6: p_L = max(p'_L, p_CL) % Combining function
7: if ( recur(lq, p_L) ) % Linear marking 7: if ( recur(lq, p_L) %Linear marking
8a: mark(pkt) 8a: mark(pkt)
8b: } else { % overload saturation 8b: } else { % overload saturation
8c: if ( recur(lq, p_C) ) { % probability p_C = p'^2 8c: if ( recur(lq, p_C) ) { % probability p_C = p'^2
8e: drop(pkt) % revert to Classic drop due to overload 8e: drop(pkt) % revert to Classic drop due to overload
8f: continue % continue to the top of the while loop 8f: continue % continue to the top of the while loop
8g: } 8g: }
8h: if ( recur(lq, p_CL) ) % probability p_CL = k * p' 8h: if ( recur(lq, p_CL) ) % probability p_CL = k * p'
8i: mark(pkt) % linear marking of remaining packets 8i: mark(pkt) % linear marking of remaining packets
8j: } 8j: }
9: } else { 9: } else {
skipping to change at page 46, line 6 skipping to change at page 46, line 42
18: } 18: }
19: return(pkt) % return the packet and stop 19: return(pkt) % return the packet and stop
20: } 20: }
21: return(NULL) % no packet to dequeue 21: return(NULL) % no packet to dequeue
22: } 22: }
Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM
(Including Overload Code) (Including Overload Code)
1: dualpi2_update(lq, cq) { % Update p' every Tupdate 1: dualpi2_update(lq, cq) { % Update p' every Tupdate
2a: if ( cq.len() > 0 ) 2a: if ( cq.byt() > 0 )
2b: curq = cq.time() %use queuing time of first-in Classic packet 2b: curq = cq.time() %use queuing time of first-in Classic packet
2c: else % Classic queue empty 2c: else % Classic queue empty
2d: curq = lq.time() % use queuing time of first-in L4S packet 2d: curq = lq.time() % use queuing time of first-in L4S packet
3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq)
4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor 4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor
5: p_C = p'^2 % Classic prob = (base prob)^2 5: p_C = p'^2 % Classic prob = (base prob)^2
6: prevq = curq 6: prevq = curq
7: } 7: }
Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM
(Including Overload Code) (Including Overload Code)
The choice of scheduler technology is critical to overload protection The choice of scheduler technology is critical to overload protection
(see Section 4.1). (see Section 4.1).
* A well-understood weighted scheduler such as weighted round robin * A well-understood weighted scheduler such as weighted round robin
(WRR) is recommended. As long as the scheduler weight for Classic (WRR) is recommended. As long as the scheduler weight for Classic
is small (e.g. 1/16), its exact value is unimportant because it is small (e.g. 1/16), its exact value is unimportant because it
does not normally determine capacity shares. The weight is only does not normally determine capacity shares. The weight is only
skipping to change at page 47, line 9 skipping to change at page 47, line 42
- TS-FIFO does not fully isolate latency in the L4S queue from - TS-FIFO does not fully isolate latency in the L4S queue from
uncontrolled bursts in the Classic queue; uncontrolled bursts in the Classic queue;
- TS-FIFO is only appropriate if time-stamping of packets is - TS-FIFO is only appropriate if time-stamping of packets is
feasible; feasible;
- Even if time-stamping is supported, the sojourn time of the - Even if time-stamping is supported, the sojourn time of the
head packet is always stale. For instance, if a burst arrives head packet is always stale. For instance, if a burst arrives
at an empty queue, the sojourn time will only measure the delay at an empty queue, the sojourn time will only measure the delay
of the burst once the burst is over, even though the queue knew of the burst once the burst is over, even though the queue knew
about it from the start. At the cost of more operations and about it from the start. To remedy this, each head packet can
more storage, a 'scaled sojourn time' metric of queue delay can be marked based on the delay it causes to packets backlogged
be used, which is the sojourn time of a packet scaled by the behind it, rather than based on its own delay due to the
ratio of the queue sizes when the packet departed and packets in front of it. [Heist21] identifies a specific
arrived [SigQ-Dyn]. scenario where bursty traffic significantly hits utilization of
the L queue. If this effect proves to be more widely
applicable, it is believed that using the delay behind the head
would improve performance. It can be implemented by
multiplying the backlog at dequeue by the serialization delay
per unit of backlog. The implementation details will depend on
whether the link rate is known; if it is not, a moving average
of the serialization delay can be maintained. This approach
should cost less in operations and memory than the proposed
'scaled sojourn time' metric, which is the sojourn time of a
packet scaled by the ratio of the queue sizes when the packet
departed and arrived [SigQ-Dyn].
* A strict priority scheduler would be inappropriate, because it * A strict priority scheduler would be inappropriate, because it
would starve Classic if L4S was overloaded. would starve Classic if L4S was overloaded.
Appendix B. Example DualQ Coupled Curvy RED Algorithm Appendix B. Example DualQ Coupled Curvy RED Algorithm
As another example of a DualQ Coupled AQM algorithm, the pseudocode As another example of a DualQ Coupled AQM algorithm, the pseudocode
below gives the Curvy RED based algorithm. Although the AQM was below gives the Curvy RED based algorithm. Although the AQM was
designed to be efficient in integer arithmetic, to aid understanding designed to be efficient in integer arithmetic, to aid understanding
it is first given using floating point arithmetic (Figure 10). Then, it is first given using floating point arithmetic (Figure 10). Then,
skipping to change at page 48, line 4 skipping to change at page 48, line 48
* The scheduling function scheduler(), which selects between the * The scheduling function scheduler(), which selects between the
head packets of the two queues. head packets of the two queues.
It also uses the following functions that are either shown elsewhere, It also uses the following functions that are either shown elsewhere,
or not shown in full here: or not shown in full here:
* The enqueue function, which is identical to that used for DualPI2, * The enqueue function, which is identical to that used for DualPI2,
dualpi2_enqueue(lq, cq, pkt) in Figure 3; dualpi2_enqueue(lq, cq, pkt) in Figure 3;
* mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; * mark(pkt) and drop(pkt) for ECN-marking and dropping a packet;
* cq.len() or lq.len() returns the current length (aka. backlog) of
* cq.byt() or lq.byt() returns the current length (aka. backlog) of
the relevant queue in bytes; the relevant queue in bytes;
* cq.time() or lq.time() returns the current queuing delay * cq.time() or lq.time() returns the current queuing delay
(aka. sojourn time or service time) of the relevant queue in units (aka. sojourn time or service time) of the relevant queue in units
of time (see Note a in Appendix A.1). of time (see Note a in Appendix A.1).
Because Curvy RED was evaluated before DualPI2, certain improvements Because Curvy RED was evaluated before DualPI2, certain improvements
introduced for DualPI2 were not evaluated for Curvy RED. In the introduced for DualPI2 were not evaluated for Curvy RED. In the
pseudocode below, the straightforward improvements have been added on pseudocode below, the straightforward improvements have been added on
the assumption they will provide similar benefits, but that has not the assumption they will provide similar benefits, but that has not
skipping to change at page 50, line 6 skipping to change at page 51, line 6
15: % L4S AQM parameters 15: % L4S AQM parameters
16: T = 1 ms % Queue delay threshold for native L4S AQM 16: T = 1 ms % Queue delay threshold for native L4S AQM
17: % Constants derived from above parameters 17: % Constants derived from above parameters
18: S_L = S_C - k' % L4S ramp scaling factor as a power of 2 18: S_L = S_C - k' % L4S ramp scaling factor as a power of 2
19: range_L = 2^S_L % Range of L4S ramp 19: range_L = 2^S_L % Range of L4S ramp
20: } 20: }
Figure 9: Example Header Pseudocode for DualQ Coupled Curvy RED AQM Figure 9: Example Header Pseudocode for DualQ Coupled Curvy RED AQM
1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues 1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues
2: while ( lq.len() + cq.len() > 0 ) { 2: while ( lq.byt() + cq.byt() > 0 ) {
3: if ( scheduler() == lq ) { 3: if ( scheduler() == lq ) {
4: lq.dequeue(pkt) % L4S scheduled 4: lq.dequeue(pkt) % L4S scheduled
5a: p_CL = (Q_C - minTh) / range_L 5a: p_CL = (Q_C - minTh) / range_L
5b: if ( ( lq.time() > T ) 5b: if ( ( lq.time() > T )
5c: OR ( p_CL > maxrand(U) ) ) 5c: OR ( p_CL > maxrand(U) ) )
6: mark(pkt) 6: mark(pkt)
7: } else { 7: } else {
8: cq.dequeue(pkt) % Classic scheduled 8: cq.dequeue(pkt) % Classic scheduled
9a: Q_C = gamma * cq.time() + (1-gamma) * Q_C % Classic Q EWMA 9a: Q_C = gamma * cq.time() + (1-gamma) * Q_C % Classic Q EWMA
10a: sqrt_p_C = (Q_C - minTh) / range_C 10a: sqrt_p_C = (Q_C - minTh) / range_C
skipping to change at page 52, line 40 skipping to change at page 53, line 40
Curvy RED constrains delay with a softened target that allows some Curvy RED constrains delay with a softened target that allows some
increase in delay as load increases. This is achieved by increasing increase in delay as load increases. This is achieved by increasing
drop probability on a convex curve relative to queue growth (the drop probability on a convex curve relative to queue growth (the
square curve in the Classic queue, if U=1). Like RED, the curve hugs square curve in the Classic queue, if U=1). Like RED, the curve hugs
the zero axis while the queue is shallow. Then, as load increases, the zero axis while the queue is shallow. Then, as load increases,
it introduces a growing barrier to higher delay. But, unlike RED, it it introduces a growing barrier to higher delay. But, unlike RED, it
requires only two parameters, not three. The disadvantage of Curvy requires only two parameters, not three. The disadvantage of Curvy
RED (compared to a PI controller for example) is that it is not RED (compared to a PI controller for example) is that it is not
adapted to a wide range of RTTs. Curvy RED can be used as is when adapted to a wide range of RTTs. Curvy RED can be used as is when
the RTT range to be supported is limited, otherwise an adaptation the RTT range to be supported is limited, otherwise an adaptation
mechanism is required. mechanism is needed.
From our limited experiments with Curvy RED so far, recommended From our limited experiments with Curvy RED so far, recommended
values of these parameters are: S_C = -1; g_C = 5; T = 5 * MTU at the values of these parameters are: S_C = -1; g_C = 5; T = 5 * MTU at the
link rate (about 1ms at 60Mb/s) for the range of base RTTs typical on link rate (about 1ms at 60Mb/s) for the range of base RTTs typical on
the public Internet. [CRED_Insights] explains why these parameters the public Internet. [CRED_Insights] explains why these parameters
are applicable whatever rate link this AQM implementation is deployed are applicable whatever rate link this AQM implementation is deployed
on and how the parameters would need to be adjusted for a scenario on and how the parameters would need to be adjusted for a scenario
with a different range of RTTs (e.g. a data centre). The setting of with a different range of RTTs (e.g. a data centre). The setting of
k depends on policy (see Section 2.5 and Appendix C.2 respectively k depends on policy (see Section 2.5 and Appendix C.2 respectively
for its recommended setting and guidance on alternatives). for its recommended setting and guidance on alternatives).
skipping to change at page 54, line 24 skipping to change at page 55, line 24
The maxrand(u) function in lines 16-21 simply generates u random The maxrand(u) function in lines 16-21 simply generates u random
numbers and returns the maximum. Typically, maxrand(u) could be run numbers and returns the maximum. Typically, maxrand(u) could be run
in parallel out of band. For instance, if U=1, the Classic queue in parallel out of band. For instance, if U=1, the Classic queue
would require the maximum of two random numbers. So, instead of would require the maximum of two random numbers. So, instead of
calling maxrand(2*U) in-band, the maximum of every pair of values calling maxrand(2*U) in-band, the maximum of every pair of values
from a pseudorandom number generator could be generated out-of-band, from a pseudorandom number generator could be generated out-of-band,
and held in a buffer ready for the Classic queue to consume. and held in a buffer ready for the Classic queue to consume.
1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues 1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues
2: while ( lq.len() + cq.len() > 0 ) { 2: while ( lq.byt() + cq.byt() > 0 ) {
3: if ( scheduler() == lq ) { 3: if ( scheduler() == lq ) {
4: lq.dequeue(pkt) % L4S scheduled 4: lq.dequeue(pkt) % L4S scheduled
5: if ((lq.time() > T) OR (Q_C >> (S_L-2) > maxrand(U))) 5: if ((lq.time() > T) OR (Q_C >> (S_L-2) > maxrand(U)))
6: mark(pkt) 6: mark(pkt)
7: } else { 7: } else {
8: cq.dequeue(pkt) % Classic scheduled 8: cq.dequeue(pkt) % Classic scheduled
9: Q_C += (qc.ns() - Q_C) >> g_C % Classic Q EWMA 9: Q_C += (qc.ns() - Q_C) >> g_C % Classic Q EWMA
10: if ( (Q_C >> (S_C-2) ) > maxrand(2*U) ) { 10: if ( (Q_C >> (S_C-2) ) > maxrand(2*U) ) {
11: if ( (ecn(pkt) == 0) { % ECN field = not-ECT 11: if ( (ecn(pkt) == 0) { % ECN field = not-ECT
12: drop(pkt) % Squared drop, redo loop 12: drop(pkt) % Squared drop, redo loop
skipping to change at page 55, line 30 skipping to change at page 56, line 30
integer code the division needed to weight the moving average can be integer code the division needed to weight the moving average can be
implemented by a right bit-shift (>> g_C). implemented by a right bit-shift (>> g_C).
Appendix C. Choice of Coupling Factor, k Appendix C. Choice of Coupling Factor, k
C.1. RTT-Dependence C.1. RTT-Dependence
Where Classic flows compete for the same capacity, their relative Where Classic flows compete for the same capacity, their relative
flow rates depend not only on the congestion probability, but also on flow rates depend not only on the congestion probability, but also on
their end-to-end RTT (= base RTT + queue delay). The rates of their end-to-end RTT (= base RTT + queue delay). The rates of
competing Reno [RFC5681] flows are roughly inversely proportional to Reno [RFC5681] flows competing over an AQM are roughly inversely
their RTTs. Cubic exhibits similar RTT-dependence when in Reno- proportional to their RTTs. Cubic exhibits similar RTT-dependence
compatibility mode, but is less RTT-dependent otherwise. when in Reno-compatibility mode, but it is less RTT-dependent
otherwise.
Until the early experiments with the DualQ Coupled AQM, the Until the early experiments with the DualQ Coupled AQM, the
importance of the reasonably large Classic queue in mitigating RTT- importance of the reasonably large Classic queue in mitigating RTT-
dependence had not been appreciated. Appendix A.1.6 of dependence when the base RTT is low had not been appreciated.
[I-D.ietf-tsvwg-ecn-l4s-id] uses numerical examples to explain why Appendix A.1.6 of [I-D.ietf-tsvwg-ecn-l4s-id] uses numerical examples
bloated buffers had concealed the RTT-dependence of Classic to explain why bloated buffers had concealed the RTT-dependence of
congestion controls before that time. Then it explains why, the more Classic congestion controls before that time. Then it explains why,
that queuing delays have reduced, the more that RTT-dependence has the more that queuing delays have reduced, the more that RTT-
surfaced as a potential starvation problem for long RTT flows. dependence has surfaced as a potential starvation problem for long
RTT flows, when competing against very short RTT flows.
Given that congestion control on end-systems is voluntary, there is Given that congestion control on end-systems is voluntary, there is
no reason why it has to be voluntarily RTT-dependent. Therefore no reason why it has to be voluntarily RTT-dependent. The RTT-
[I-D.ietf-tsvwg-ecn-l4s-id] requires L4S congestion controls to be dependence of existing Classic traffic cannot be 'undeployed'.
significantly less RTT-dependent than the standard Reno congestion Therefore, [I-D.ietf-tsvwg-ecn-l4s-id] requires L4S congestion
control [RFC5681]. Following this approach means there is no need controls to be significantly less RTT-dependent than the standard
for network devices to address RTT-dependence, although there would Reno congestion control [RFC5681], at least at low RTT. Then RTT-
be no harm if they did, which per-flow queuing inherently does. dependence ought to be no worse than it is with appropriately sized
Classic buffers. Following this approach means there is no need for
At the time of writing, the range of approaches to RTT-dependence in network devices to address RTT-dependence, although there would be no
L4S congestion controls has not settled. Therefore, the guidance on harm if they did, which per-flow queuing inherently does.
the choice of the coupling factor in Appendix C.2 is given against
DCTCP [RFC8257], which has well-understood RTT-dependence. The
guidance is given for various RTT ratios, so that it can be adapted
to future circumstances.
C.2. Guidance on Controlling Throughput Equivalence C.2. Guidance on Controlling Throughput Equivalence
+===============+======+=======+ The coupling factor, k, determines the balance between L4S and
| RTT_C / RTT_L | Reno | Cubic | Classic flow rates (see Section 2.5.2.1 and equation (1)).
+===============+======+=======+
| 1 | k'=1 | k'=0 |
+---------------+------+-------+
| 2 | k'=2 | k'=1 |
+---------------+------+-------+
| 3 | k'=2 | k'=2 |
+---------------+------+-------+
| 4 | k'=3 | k'=2 |
+---------------+------+-------+
| 5 | k'=3 | k'=3 |
+---------------+------+-------+
Table 1: Value of k' for For the public Internet, a coupling factor of k=2 is recommended, and
which DCTCP throughput is justified below. For scenarios other than the public Internet, a
roughly the same as Reno or good coupling factor can be derived by plugging the appropriate
Cubic, for some example RTT numbers into the same working.
ratios
In the above appendices that give example DualQ Coupled algorithms, To summarize the maths below, from equation (5) it can be seen that
to aid efficient implementation, a coupling factor that is an integer choosing k=1.64 will make L4S throughput roughly the same as Classic,
power of 2 is always used. k' is always used to denote the power. k' _if their actual end-to-end RTTs are the same_. However, even if the
is related to the coupling factor k in Equation (1) (Section 2.1) by base RTTs are the same, the actual RTTs are unlikely to be the same,
k=2^k'. because Classic traffic needs a fairly large queue to avoid under-
utilization and excess drop. Whereas L4S does not.
To determine the appropriate coupling factor policy, the operator Therefore, to determine the appropriate coupling factor policy, the
first has to judge whether it wants DCTCP flows to have roughly equal operator needs to decide at what base RTT it wants L4S and Classic
throughput with Reno or with Cubic (because, even in its Reno- flows to have roughly equal throughput, once the effect of the
compatibility mode, Cubic is about 1.4 times more aggressive than additional Classic queue on Classic throughput has been taken into
Reno). Then the operator needs to decide at what ratio of RTTs it account. With this approach, a network operator can determine a good
wants DCTCP and Classic flows to have roughly equal throughput. For coupling factor without knowing the precise L4S algorithm for
example choosing k'=0 (equivalent to k=1) will make DCTCP throughput reducing RTT-dependence - or even in the absence of any algorithm.
roughly the same as Cubic, _if their RTTs are the same_.
However, even if the base RTTs are the same, the actual RTTs are The following additional terminology will be used, with appropriate
unlikely to be the same, because Classic (Cubic or Reno) traffic subscripts:
needs roughly a typical base round trip of queue to avoid under-
utilization and excess drop. Whereas L4S (DCTCP) does not. The
operator might still choose this policy if it judges that DCTCP
throughput should be rewarded for keeping its own queue short.
On the other hand, the operator will choose one of the higher values r: Packet rate [pkt/s]
for k', if it wants to slow DCTCP down to roughly the same throughput
as Classic flows, to compensate for Classic flows slowing themselves
down by causing themselves extra queuing delay.
The values for k' in the table are derived from the formulae below, R: RTT [s/round]
which were developed in [DCttH15]:
2^k' = 1.64 (RTT_reno / RTT_dc) (5) p: ECN marking probability []
2^k' = 1.19 (RTT_cubic / RTT_dc ) (6)
For localized traffic from a particular ISP's data centre, using the On the Classic side, we consider Reno as the most sensitive and
measured RTTs, it was calculated that a value of k'=3 (equivalent to therefore worst case Classic congestion control, and we will also
k=8) would achieve throughput equivalence, and experiments verified consider Cubic in its Reno-friendly mode ('CReno'), as the most
the formula very closely. prevalent congestion control, according to the references and
analysis in [PI2param]. In either case, the Classic packet rate in
steady state is given by the well-known square root formula:
For a typical mix of RTTs from local data centres and across the r_C = 1.22 / (R_C * p_C^0.5)
general Internet, a value of k'=1 (equivalent to k=2) is recommended
as a good workable compromise. On the L4S side, we consider the Prague congestion control
[I-D.briscoe-iccrg-prague-congestion-control] as the reference for
steady-state dependence on congestion. Prague conforms to the same
equation as DCTCP, but we do not use the equation derived in the
DCTCP paper, which is only appropriate for step marking. The coupled
marking, p_CL, is the appropriate one when considering throughput
equivalence with Classic flows. Unlike step marking, coupled
markings are inherently spaced out, so we use the formula for DCTCP
packet rate with probabilistic marking derived in Appendix A of
[PI2]. We use the equation without RTT-independence enabled, which
will be explained later.
r_L = 2/ (R_L * p_CL)
For packet rate equivalence, we equate the two packet rates and
rearrange into the same form as Equation (1), so the two can be
equated and simplified to produce a formula for k:
r_c = r_L
=> p_C = (p_CL/1.64 * R_L/R_C)^2
p_C = ( p_CL / k )^2 (1)
k = 1.64 * (R_C / R_L) (5)
We now have the coupling factor in terms of two RTTs. Traditionally,
throughput equivalence is defined for flows under comparable
conditions, including with the same base RTT [RFC2914]. So if we
assume the same base RTT, R_b, for comparable flows, we can put both
R_C and R_L in terms of R_b.
We can approximate the L4S RTT to be hardly greater than the base
RTT, i.e. R_L ~= R_b. And next we replace R_C with (R_b + q_C),
where the Classic queue, q_C, depends on the target queue delay that
the operator has configured for the Classic AQM.
Taking PI2 as an example Classic AQM, it seems that we could just
take R_C = R_b + target (recommended 15 ms by default in
Appendix A.1). However, target is roughly the queue depth reached by
the tips of the sawteeth of a congestion control, not the average
[PI2param]. That is R_max = R_b + target.
The position of the average in relation to the max depends on the
amplitude of the sawteeth, so we will consider Reno [RFC5681] as the
most sensitive worst-case as well as Cubic [RFC8312] in its Reno-
friendly mode ('CReno') as the most prevalent congestion control
algorithm on the Internet according to [PI2param].
Both are AIMD, so we will generalize using b as the multiplicative
decrease factor (b_r = 0.5 for Reno, b_c = 0.7 for CReno). Then:
R_C = (R_max + b*R_max) / 2
= R_max * (1+b)/2
R_reno = 0.75 * (R_b + target); R_creno = 0.85 * (R_b + target).
Plugging all this into equation (5) we get coupling factor,
k_reno = 1.64*0.75*(R_b+target)/R_b
= 1.23*(1 + target/R_b); k_creno = 1.39 * (1 + target/R_b)
For instance, it is recommended that the operator chooses R_b = 25
ms, as a typical base RTT between Internet users and CDNs [PI2param].
Then:
k_reno = 1.23 * (1 + 15/25) k_creno = 1.39 * (1 + 15/25)
= 1.97 = 2.22
~= 2 ~= 2
The approximation is relevant to any of the above example DualQ
Coupled algorithms, which use a coupling factor that is an integer
power of 2 to aid efficient implementation.
An operator can make a policy choice to decide on a different base
RTT at which it wants throughput equivalence. Nonetheless, it makes
sense to choose what is believed to be the typical RTT most users
experience, because a Classic AQM's target queuing delay is also
derived from a typical RTT for the Internet. Therefore, below this
typical RTT, Classic AQMs become fairly RTT-independent. And L4S
flows are also required to become RTT-independent below a typical RTT
[I-D.ietf-tsvwg-ecn-l4s-id]. Therefore, throughput equivalence ought
to be no worse than with Classic AQMs and Classic congestion
controls.
As remarked earlier, the throughput equation used for Prague was with
RTT-independence disabled. This is because we only need the point on
this equation at the typical base RTT - where the operator chooses to
calculate the coupling factor. We do not need to know the full range
of the equation used for RTT-independence as long as it is roughly
the same at this one point. Then, there will at least be throughput
equivalence at that base RTT. And assuming Prague senders implement
RTT-independence over a range of RTTs, the throughput equivalence
will then extend over that range.
As a non-Internet example, for localized traffic from a particular
ISP's data centre, using the measured RTTs, it was calculated that a
value of k = 8 would achieve throughput equivalence, and experiments
verified the formula very closely.
But, for a typical mix of RTTs across the general Internet, a value
of k=2 is recommended as a good workable compromise.
Authors' Addresses Authors' Addresses
Koen De Schepper Koen De Schepper
Nokia Bell Labs Nokia Bell Labs
Antwerp Antwerp
Belgium Belgium
Email: koen.de_schepper@nokia.com Email: koen.de_schepper@nokia.com
URI: https://www.bell-labs.com/usr/koen.de_schepper URI: https://www.bell-labs.com/usr/koen.de_schepper
 End of changes. 82 change blocks. 
276 lines changed or deleted 429 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/