 1/draftietftsvwgaqmdualqcoupled18.txt 20211103 13:13:14.057579673 0700
+++ 2/draftietftsvwgaqmdualqcoupled19.txt 20211103 13:13:14.181582802 0700
@@ 1,61 +1,62 @@
Transport Area working group (tsvwg) K. De Schepper
InternetDraft Nokia Bell Labs
Intended status: Experimental B. Briscoe, Ed.
Expires: 28 April 2022 Independent
+Expires: 7 May 2022 Independent
G. White
CableLabs
 25 October 2021
+ 3 November 2021
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput
(L4S)
 draftietftsvwgaqmdualqcoupled18
+ draftietftsvwgaqmdualqcoupled19
Abstract
This specification defines a framework for coupling the Active Queue
Management (AQM) algorithms in two queues intended for flows with
different responses to congestion. This provides a way for the
Internet to transition from the scaling problems of standard TCP
Renofriendly ('Classic') congestion controls to the family of
 'Scalable' congestion controls. These achieve consistently very Low
 queuing Latency, very Low congestion Loss and Scaling of perflow
 throughput (L4S) by using Explicit Congestion Notification (ECN) in a
 modified way. Until the Coupled DualQ, these L4S senders could only
 be deployed where a cleanslate environment could be arranged, such
 as in private data centres. The coupling acts like a semipermeable
 membrane: isolating the submillisecond average queuing delay and
 zero congestion loss of L4S from Classic latency and loss; but
 pooling the capacity between any combination of Scalable and Classic
 flows with roughly equivalent throughput per flow. The DualQ
 achieves this indirectly, without having to inspect transport layer
 flow identifiers and without compromising the performance of the
 Classic traffic. The solution has low complexity and requires no
 configuration for the public Internet.
+ 'Scalable' congestion controls. These are designed for consistently
+ very Low queuing Latency, very Low congestion Loss and Scaling of
+ perflow throughput (L4S) by using Explicit Congestion Notification
+ (ECN) in a modified way. Until the Coupled DualQ, these L4S senders
+ could only be deployed where a cleanslate environment could be
+ arranged, such as in private data centres. The coupling acts like a
+ semipermeable membrane: isolating the submillisecond average
+ queuing delay and zero congestion loss of L4S from Classic latency
+ and loss; but pooling the capacity between any combination of
+ Scalable and Classic flows with roughly equivalent throughput per
+ flow. The DualQ achieves this indirectly, without having to inspect
+ transport layer flow identifiers and without compromising the
+ performance of the Classic traffic, relative to a single queue. The
+ DualQ design has low complexity and requires no configuration for the
+ public Internet.
Status of This Memo
This InternetDraft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
InternetDrafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as InternetDrafts. The list of current Internet
Drafts is at https://datatracker.ietf.org/drafts/current/.
InternetDrafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use InternetDrafts as reference
material or to cite them other than as "work in progress."
 This InternetDraft will expire on 28 April 2022.
+ This InternetDraft will expire on 7 May 2022.
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
licenseinfo) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
@@ 67,55 +68,55 @@
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3
1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9
2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11
2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11
2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12
 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12
+ 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 13
2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13
2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17
2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17
2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18
2.5.2. Management Requirements . . . . . . . . . . . . . . . 19
2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19
2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21
 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 21
+ 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22
2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22
3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22
4. Security Considerations . . . . . . . . . . . . . . . . . . . 22
4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 22
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput
or Delay? . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2. Congestion Signal Saturation: Introduce L4S Drop or
Delay? . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.3. Protecting against Unresponsive ECNCapable
Traffic . . . . . . . . . . . . . . . . . . . . . . . 25
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26
 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 26
 7.1. Normative References . . . . . . . . . . . . . . . . . . 26
+ 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 27
+ 7.1. Normative References . . . . . . . . . . . . . . . . . . 27
7.2. Informative References . . . . . . . . . . . . . . . . . 27
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 33
A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 33
A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 44
Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 48
B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 48
B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 54
Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 56
C.1. RTTDependence . . . . . . . . . . . . . . . . . . . . . 56
C.2. Guidance on Controlling Throughput Equivalence . . . . . 57
 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 60
+ Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61
1. Introduction
This document specifies a framework for DualQ Coupled AQMs, which is
the network part of the L4S architecture [ID.ietftsvwgl4sarch].
L4S enables both very low queuing latency (submillisecond on
average) and high throughput at the same time, for ad hoc numbers of
capacityseeking applications all sharing the same capacity.
1.1. Outline of the Problem
@@ 543,21 +544,22 @@
p_C = ( p_CL / k )^2 (1)
where k is the constant of proportionality, which is termed the
coupling factor.
2.2. Dual Queue
Classic traffic needs to build a large queue to prevent under
utilization. Therefore a separate queue is provided for L4S traffic,
and it is scheduled with priority over the Classic queue. Priority
 is conditional to prevent starvation of Classic traffic.
+ is conditional to prevent starvation of Classic traffic in certain
+ conditions (see Section 2.4).
Nonetheless, coupled marking ensures that giving priority to L4S
traffic still leaves the right amount of spare scheduling time for
Classic flows to each get equivalent throughput to DCTCP flows (all
other factors such as RTT being equal).
2.3. Traffic Classification
Both the Coupled AQM and DualQ mechanisms need an identifier to
distinguish L4S (L) and Classic (C) packets. Then the coupling
@@ 684,48 +686,57 @@
C queue. This is because, as the C queue grows, the base AQM applies
more congestion signals to L traffic (as well as C). As L flows
reduce their rate in response, they use less than the scheduling
share for L traffic. So, because the scheduler is work preserving,
it schedules any C traffic in the gaps.
Giving priority to the L queue has the benefit of very low L queue
delay, because the L queue is kept empty whenever L traffic is
controlled by the coupling. Also there only has to be a coupling in
one direction  from Classic to L4S. Priority has to be conditional
 in some way to prevent the C queue starving under overload conditions
 (see Section 4.1). With normal responsive traffic simple strict
 priority would work, but it would make new Classic traffic wait until
 its queue activated the coupling and L4S flows had in turn reduced
 their rate enough to drain the L queue so that Classic traffic could
 be scheduled. Giving a small weight or limited waiting time for C
 traffic improves response times for short Classic messages, such as
 DNS requests and improves Classic flow startup because immediate
 capacity is available.
+ in some way to prevent the C queue being starved by excessive
+ unresponsive L traffic (see Section 4.1) and to give C traffic a
+ means to push in, as explained next. With normal responsive L
+ traffic, the coupled ECN marking gives C traffic the ability to push
+ back against even strict priority, by congestion marking the L
+ traffic to make it yield some space. However, if there is just a
+ small finite set of C packets (e.g. a DNS request or an initial
+ window of data) some Classic AQMs will not induce enough ECN marking
+ in the L queue, no matter how long the small set of C packets waits.
+ Then, if the L queue happens to remain busy, the C traffic would
+ never get a scheduling opportunity from a strict priority scheduler.
+ Ideally the Classic AQM would be designed to increase the coupled
+ marking the longer that C packets have been waiting, but this is not
+ always practical  hence the need for L priority to be conditional.
+ Giving a small weight or limited waiting time for C traffic improves
+ response times for short Classic messages, such as DNS requests and
+ improves Classic flow startup because immediate capacity is
+ available.
Example DualQ Coupled AQM algorithms called DualPI2 and Curvy RED are
given in Appendix A and Appendix B. Either example AQM can be used
to couple packet marking and dropping across a dual Q.
DualPI2 uses a ProportionalIntegral (PI) controller as the Base AQM.
Indeed, this Base AQM with just the squared output and no L4S queue
can be used as a dropin replacement for PIE [RFC8033], in which case
it is just called PI2 [PI2]. PI2 is a principled simplification of
PIE that is both more responsive and more stable in the face of
dynamically varying load.
 Curvy RED is derived from RED [RFC2309], but its configuration
 parameters are insensitive to link rate and it requires less
 operations per packet. However, DualPI2 is more responsive and
 stable over a wider range of RTTs than Curvy RED. As a consequence,
 at the time of writing, DualPI2 has attracted more development and
 evaluation attention than Curvy RED, leaving the Curvy RED design
 incomplete and not so fully evaluated.
+ Curvy RED is derived from RED [RFC2309], except its configuration
+ parameters are delaybased to make them insensitive to link rate and
+ it requires less operations per packet. However, DualPI2 is more
+ responsive and stable over a wider range of RTTs than Curvy RED. As
+ a consequence, at the time of writing, DualPI2 has attracted more
+ development and evaluation attention than Curvy RED, leaving the
+ Curvy RED design not so fully evaluated.
Both AQMs regulate their queue in units of time rather than bytes.
As already explained, this ensures configuration can be invariant for
different drain rates. With AQMs in a dualQ structure this is
particularly important because the drain rate of each queue can vary
rapidly as flows for the two queues arrive and depart, even if the
combined link rate is constant.
It would be possible to control the queues with other alternative
AQMs, as long as the normative requirements (those expressed in
@@ 753,22 +764,23 @@
A Dual Queue Coupled AQM implementation MUST utilize two queues, each
with an AQM algorithm.
The AQM algorithm for the low latency (L) queue MUST be able to apply
ECN marking to ECNcapable packets.
The scheduler draining the two queues MUST give L4S packets priority
over Classic, although priority MUST be bounded in order not to
starve Classic traffic. The scheduler SHOULD be workconserving, or
 otherwise close to workconserving, given Classic service will often
 rely on borrowing from the L4S service.
+ otherwise close to workconserving. This is because Classic traffic
+ needs to be able to efficiently fill any space left by L4S traffic
+ even though the scheduler would otherwise allocate it to L4S.
[ID.ietftsvwgecnl4sid] defines the meaning of an ECN marking on
L4S traffic, relative to drop of Classic traffic. In order to ensure
coexistence of Classic and Scalable L4S traffic, it says, "The
likelihood that an AQM drops a NotECT Classic packet (p_C) MUST be
roughly proportional to the square of the likelihood that it would
have marked it if it had been an L4S packet (p_L)." The term
'likelihood' is used to allow for marking and dropping to be either
probabilistic or deterministic.
@@ 1015,24 +1028,25 @@
followed by proposed solution(s).
Under overload the higher priority L4S service will have to sacrifice
some aspect of its performance. Alternative solutions are provided
below that each relax a different factor: e.g. throughput, delay,
drop. These choices need to be made either by the developer or by
operator policy, rather than by the IETF.
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput or Delay?
 Priority of L4S is required to be conditional (see Section 2.5.1) to
 avoid total starvation of Classic by heavy L4S traffic. This raises
 the question of whether to sacrifice L4S throughput or L4S delay (or
 some other policy) to mitigate starvation of Classic:
+ Priority of L4S is required to be conditional (see Section 2.4 &
+ Section 2.5.1) to avoid total starvation of Classic by heavy L4S
+ traffic. This raises the question of whether to sacrifice L4S
+ throughput or L4S delay (or some other policy) to mitigate starvation
+ of Classic:
Sacrifice L4S throughput: By using weighted round robin as the
conditional priority scheduler, the L4S service can sacrifice some
throughput during overload. This can either be thought of as
guaranteeing a minimum throughput service for Classic traffic, or
as guaranteeing a maximum delay for a packet at the head of the
Classic queue.
The scheduling weight of the Classic queue should be small
(e.g. 1/16). Then, in most traffic scenarios the scheduler will
@@ 2153,38 +2166,46 @@
recommended over WRR):
 TSFIFO does not fully isolate latency in the L4S queue from
uncontrolled bursts in the Classic queue;
 TSFIFO is only appropriate if timestamping of packets is
feasible;
 Even if timestamping is supported, the sojourn time of the
head packet is always stale. For instance, if a burst arrives
 at an empty queue, the sojourn time will only measure the delay
 of the burst once the burst is over, even though the queue knew
 about it from the start. To remedy this, each head packet can
 be marked based on the delay it causes to packets backlogged
 behind it, rather than based on its own delay due to the
 packets in front of it. [Heist21] identifies a specific
 scenario where bursty traffic significantly hits utilization of
 the L queue. If this effect proves to be more widely
 applicable, it is believed that using the delay behind the head
 would improve performance. It can be implemented by
 multiplying the backlog at dequeue by the serialization delay
 per unit of backlog. The implementation details will depend on
 whether the link rate is known; if it is not, a moving average
 of the serialization delay can be maintained. This approach
 should cost less in operations and memory than the proposed
 'scaled sojourn time' metric, which is the sojourn time of a
 packet scaled by the ratio of the queue sizes when the packet
 departed and arrived [SigQDyn].
+ at an empty queue, the sojourn time only fully measures the
+ burst's delay when its last packet is dequeued, even though the
+ queue knew about the burst from the start  so it could have
+ signalled congestion earlier. To remedy this, each head packet
+ can be marked when it is dequeued based on the expected delay
+ of the tail packet behind it, as explained below, rather than
+ based on the head packet's own delay due to the packets in
+ front of it. [Heist21] identifies a specific scenario where
+ bursty traffic significantly hits utilization of the L queue.
+ If this effect proves to be more widely applicable, it is
+ believed that using the delay behind the head would improve
+ performance.
+
+ The delay behind the head can be implemented by dividing the
+ backlog at dequeue by the link rate or equivalently multiplying
+ the backlog by the delay per unit of backlog. The
+ implementation details will depend on whether the link rate is
+ known; if it is not, a moving average of the delay per unit
+ backlog can be maintained. This delay consists of
+ serialization as well as media acquisition for shared media.
+ So the details will depend strongly on the specific link
+ technology, This approach should be less sensitive to timing
+ errors and cost less in operations and memory than the
+ otherwise equivalent 'scaled sojourn time' metric, which is the
+ sojourn time of a packet scaled by the ratio of the queue sizes
+ when the packet departed and arrived [SigQDyn].
* A strict priority scheduler would be inappropriate, because it
would starve Classic if L4S was overloaded.
Appendix B. Example DualQ Coupled Curvy RED Algorithm
As another example of a DualQ Coupled AQM algorithm, the pseudocode
below gives the Curvy RED based algorithm. Although the AQM was
designed to be efficient in integer arithmetic, to aid understanding
it is first given using floating point arithmetic (Figure 10). Then,
@@ 2568,146 +2589,189 @@
C.2. Guidance on Controlling Throughput Equivalence
The coupling factor, k, determines the balance between L4S and
Classic flow rates (see Section 2.5.2.1 and equation (1)).
For the public Internet, a coupling factor of k=2 is recommended, and
justified below. For scenarios other than the public Internet, a
good coupling factor can be derived by plugging the appropriate
numbers into the same working.
 To summarize the maths below, from equation (5) it can be seen that
 choosing k=1.64 will make L4S throughput roughly the same as Classic,
 _if their actual endtoend RTTs are the same_. However, even if the
 base RTTs are the same, the actual RTTs are unlikely to be the same,
 because Classic traffic needs a fairly large queue to avoid under
 utilization and excess drop. Whereas L4S does not.
+ To summarize the maths below, from equation (7) it can be seen that
+ choosing k=1.64 would theoretically make L4S throughput roughly the
+ same as Classic, _if their actual endtoend RTTs were the same_.
+ However, even if the base RTTs are the same, the actual RTTs are
+ unlikely to be the same, because Classic traffic needs a fairly large
+ queue to avoid underutilization and excess drop. Whereas L4S does
+ not.
Therefore, to determine the appropriate coupling factor policy, the
operator needs to decide at what base RTT it wants L4S and Classic
flows to have roughly equal throughput, once the effect of the
additional Classic queue on Classic throughput has been taken into
account. With this approach, a network operator can determine a good
coupling factor without knowing the precise L4S algorithm for
reducing RTTdependence  or even in the absence of any algorithm.
The following additional terminology will be used, with appropriate
subscripts:
r: Packet rate [pkt/s]
R: RTT [s/round]
p: ECN marking probability []
On the Classic side, we consider Reno as the most sensitive and
 therefore worst case Classic congestion control, and we will also
+ therefore worstcase Classic congestion control. We will also
consider Cubic in its Renofriendly mode ('CReno'), as the most
prevalent congestion control, according to the references and
analysis in [PI2param]. In either case, the Classic packet rate in
 steady state is given by the wellknown square root formula:
+ steady state is given by the wellknown square root formula for Reno
+ congestion control:
 r_C = 1.22 / (R_C * p_C^0.5)
+ r_C = 1.22 / (R_C * p_C^0.5) (5)
On the L4S side, we consider the Prague congestion control
[ID.briscoeiccrgpraguecongestioncontrol] as the reference for
steadystate dependence on congestion. Prague conforms to the same
equation as DCTCP, but we do not use the equation derived in the
DCTCP paper, which is only appropriate for step marking. The coupled
marking, p_CL, is the appropriate one when considering throughput
equivalence with Classic flows. Unlike step marking, coupled
markings are inherently spaced out, so we use the formula for DCTCP
packet rate with probabilistic marking derived in Appendix A of
[PI2]. We use the equation without RTTindependence enabled, which
will be explained later.
 r_L = 2/ (R_L * p_CL)
+ r_L = 2 / (R_L * p_CL) (6)
For packet rate equivalence, we equate the two packet rates and
rearrange into the same form as Equation (1), so the two can be
 equated and simplified to produce a formula for k:
+ equated and simplified to produce a formula for a theoretical
+ coupling factor, which we shall call k*:
r_c = r_L
=> p_C = (p_CL/1.64 * R_L/R_C)^2
p_C = ( p_CL / k )^2 (1)
 k = 1.64 * (R_C / R_L) (5)
+ k* = 1.64 * (R_C / R_L) (7)
 We now have the coupling factor in terms of two RTTs. Traditionally,
 throughput equivalence is defined for flows under comparable
 conditions, including with the same base RTT [RFC2914]. So if we
 assume the same base RTT, R_b, for comparable flows, we can put both
 R_C and R_L in terms of R_b.
+ We say that this coupling factor is theoretical, because it is in
+ terms of two RTTs, which raises two practical questions: i) for
+ multiple flows with different RTTs, the RTT for each traffic class
+ would have to be derived from the RTTs of all the flows in that class
+ (actually the harmonic mean would be needed); ii) a network node
+ cannot easily know the RTT of any of the flows anyway.
+
+ RTTdependence is cuased by windowbased congestion control, so it
+ ought to be reversed there, not in the network, Therefore, we use a
+ fixed coupling factor in the network, and reduce RTTdependence in
+ L4S senders. We cannot expect Classic senders to all be updated to
+ reduce their RTTdependence. But solely addressing the problem in
+ L4S senders at least makes RTTdependence no worse  not just between
+ L4S senders, but also between L4S and Classic senders.
+
+ Traditionally, throughput equivalence has been defined for flows
+ under comparable conditions, including with the same base RTT
+ [RFC2914]. So if we assume the same base RTT, R_b, for comparable
+ flows, we can put both R_C and R_L in terms of R_b.
We can approximate the L4S RTT to be hardly greater than the base
 RTT, i.e. R_L ~= R_b. And next we replace R_C with (R_b + q_C),
 where the Classic queue, q_C, depends on the target queue delay that
 the operator has configured for the Classic AQM.
+ RTT, i.e. R_L ~= R_b. And we can replace R_C with (R_b + q_C), where
+ the Classic queue, q_C, depends on the target queue delay that the
+ operator has configured for the Classic AQM.
Taking PI2 as an example Classic AQM, it seems that we could just
take R_C = R_b + target (recommended 15 ms by default in
Appendix A.1). However, target is roughly the queue depth reached by
the tips of the sawteeth of a congestion control, not the average
[PI2param]. That is R_max = R_b + target.
The position of the average in relation to the max depends on the
 amplitude of the sawteeth, so we will consider Reno [RFC5681] as the
 most sensitive worstcase as well as Cubic [RFC8312] in its Reno
 friendly mode ('CReno') as the most prevalent congestion control
 algorithm on the Internet according to [PI2param].

 Both are AIMD, so we will generalize using b as the multiplicative
 decrease factor (b_r = 0.5 for Reno, b_c = 0.7 for CReno). Then:
+ amplitude and geometry of the sawteeth. We consider two examples:
+ Reno [RFC5681], as the most sensitive worstcase, and Cubic [RFC8312]
+ in its Renofriendly mode ('CReno') as the most prevalent congestion
+ control algorithm on the Internet according to the references in
+ [PI2param]. Both are AIMD, so we will generalize using b as the
+ multiplicative decrease factor (b_r = 0.5 for Reno, b_c = 0.7 for
+ CReno). Then:
R_C = (R_max + b*R_max) / 2
= R_max * (1+b)/2
R_reno = 0.75 * (R_b + target); R_creno = 0.85 * (R_b + target).
+ (8)
 Plugging all this into equation (5) we get coupling factor,
+ Plugging all this into equation (7) we get a fixed coupling factor
+ for each:
k_reno = 1.64*0.75*(R_b+target)/R_b
= 1.23*(1 + target/R_b); k_creno = 1.39 * (1 + target/R_b)
 For instance, it is recommended that the operator chooses R_b = 25
 ms, as a typical base RTT between Internet users and CDNs [PI2param].
 Then:
+ An operator can then choose the base RTT at which it wants throughput
+ to be equivalent. For instance, if we recommend that the operator
+ chooses R_b = 25 ms, as a typical base RTT between Internet users and
+ CDNs [PI2param], then these coupling factors become:
k_reno = 1.23 * (1 + 15/25) k_creno = 1.39 * (1 + 15/25)
= 1.97 = 2.22
 ~= 2 ~= 2
+ ~= 2 ~= 2 (9)
The approximation is relevant to any of the above example DualQ
Coupled algorithms, which use a coupling factor that is an integer
 power of 2 to aid efficient implementation.
+ power of 2 to aid efficient implementation. It also fits best to the
+ worst case (Reno).
 An operator can make a policy choice to decide on a different base
 RTT at which it wants throughput equivalence. Nonetheless, it makes
+ To check the outcome of this coupling factor, we can express the
+ ratio of L4S to Classic throughput by substituting from their rate
+ equations (5) and (6), then also substituting for p_C in terms of
+ p_CL, using equation (1) with k=2 as just determined for the
+ Internet:
+
+ r_L / r_C = 2 (R_C * p_C^0.5) / 1.22 (R_L * p_CL)
+ = (R_C * p_CL) / (1.22 * R_L * p_CL)
+ = R_C / (1.22 * R_L) (10)
+
+ As an example, we can then consider single competing CReno and Prague
+ flows, by expressing both their RTTs in (10) in terms of their base
+ RTTs, R_bC and R_bL. So R_C is replaced by equation (8) for CReno.
+ And R_L is replaced by the max() function below, which represents the
+ effective RTT of the current Prague congestion control
+ [ID.briscoeiccrgpraguecongestioncontrol] in its (default) RTT
+ independent mode, because it sets a floor to the effective RTT that
+ it uses for additive increase:
+
+ ~= 0.85 * (R_bC + target) / (1.22 * max(R_bL, R_typ))
+ ~= (R_bC + target) / (1.4 * max(R_bL, R_typ))
+
+ It can be seen that, for base RTTs below target (15 ms), both the
+ numerator and the denominator plateau, which has the desired effect
+ of limiting RTTdependence.
+
+ At the start of the above derivations, an explanation was promised
+ for why the L4S throughput equation in equation (6) did not need to
+ model RTTindependence. This is because we only use one point  at
+ the the typical base RTT where the operator chooses to calculate the
+ coupling factor. Then, throughput equivalence will at least hold at
+ that chosen point. Nonetheless, assuming Prague senders implement
+ RTTindependence over a range of RTTs below this, the throughput
+ equivalence will then extend over that range as well.
+
+ Congestion control designers can choose different ways to reduce RTT
+ dependence. And each operator can make a policy choice to decide on
+ a different base RTT, and therefore a different k, at which it wants
+ throughput equivalence. Nonetheless, for the Internet, it makes
sense to choose what is believed to be the typical RTT most users
experience, because a Classic AQM's target queuing delay is also
 derived from a typical RTT for the Internet. Therefore, below this
 typical RTT, Classic AQMs become fairly RTTindependent. And L4S
 flows are also required to become RTTindependent below a typical RTT
 [ID.ietftsvwgecnl4sid]. Therefore, throughput equivalence ought
 to be no worse than with Classic AQMs and Classic congestion
 controls.

 As remarked earlier, the throughput equation used for Prague was with
 RTTindependence disabled. This is because we only need the point on
 this equation at the typical base RTT  where the operator chooses to
 calculate the coupling factor. We do not need to know the full range
 of the equation used for RTTindependence as long as it is roughly
 the same at this one point. Then, there will at least be throughput
 equivalence at that base RTT. And assuming Prague senders implement
 RTTindependence over a range of RTTs, the throughput equivalence
 will then extend over that range.
+ derived from a typical RTT for the Internet.
As a nonInternet example, for localized traffic from a particular
ISP's data centre, using the measured RTTs, it was calculated that a
value of k = 8 would achieve throughput equivalence, and experiments
verified the formula very closely.
But, for a typical mix of RTTs across the general Internet, a value
of k=2 is recommended as a good workable compromise.
Authors' Addresses