draft-ietf-tsvwg-aqm-dualq-coupled-18.txt   draft-ietf-tsvwg-aqm-dualq-coupled-19.txt 
Transport Area working group (tsvwg) K. De Schepper Transport Area working group (tsvwg) K. De Schepper
Internet-Draft Nokia Bell Labs Internet-Draft Nokia Bell Labs
Intended status: Experimental B. Briscoe, Ed. Intended status: Experimental B. Briscoe, Ed.
Expires: 28 April 2022 Independent Expires: 7 May 2022 Independent
G. White G. White
CableLabs CableLabs
25 October 2021 3 November 2021
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput
(L4S) (L4S)
draft-ietf-tsvwg-aqm-dualq-coupled-18 draft-ietf-tsvwg-aqm-dualq-coupled-19
Abstract Abstract
This specification defines a framework for coupling the Active Queue This specification defines a framework for coupling the Active Queue
Management (AQM) algorithms in two queues intended for flows with Management (AQM) algorithms in two queues intended for flows with
different responses to congestion. This provides a way for the different responses to congestion. This provides a way for the
Internet to transition from the scaling problems of standard TCP Internet to transition from the scaling problems of standard TCP
Reno-friendly ('Classic') congestion controls to the family of Reno-friendly ('Classic') congestion controls to the family of
'Scalable' congestion controls. These achieve consistently very Low 'Scalable' congestion controls. These are designed for consistently
queuing Latency, very Low congestion Loss and Scaling of per-flow very Low queuing Latency, very Low congestion Loss and Scaling of
throughput (L4S) by using Explicit Congestion Notification (ECN) in a per-flow throughput (L4S) by using Explicit Congestion Notification
modified way. Until the Coupled DualQ, these L4S senders could only (ECN) in a modified way. Until the Coupled DualQ, these L4S senders
be deployed where a clean-slate environment could be arranged, such could only be deployed where a clean-slate environment could be
as in private data centres. The coupling acts like a semi-permeable arranged, such as in private data centres. The coupling acts like a
membrane: isolating the sub-millisecond average queuing delay and semi-permeable membrane: isolating the sub-millisecond average
zero congestion loss of L4S from Classic latency and loss; but queuing delay and zero congestion loss of L4S from Classic latency
pooling the capacity between any combination of Scalable and Classic and loss; but pooling the capacity between any combination of
flows with roughly equivalent throughput per flow. The DualQ Scalable and Classic flows with roughly equivalent throughput per
achieves this indirectly, without having to inspect transport layer flow. The DualQ achieves this indirectly, without having to inspect
flow identifiers and without compromising the performance of the transport layer flow identifiers and without compromising the
Classic traffic. The solution has low complexity and requires no performance of the Classic traffic, relative to a single queue. The
configuration for the public Internet. DualQ design has low complexity and requires no configuration for the
public Internet.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 28 April 2022. This Internet-Draft will expire on 7 May 2022.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 36 skipping to change at page 2, line 36
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3
1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9
2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11
2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11
2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12
2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 13
2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13
2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17
2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17
2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18
2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 2.5.2. Management Requirements . . . . . . . . . . . . . . . 19
2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19
2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21
2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 21 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22
2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22
3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22
4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22
4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 22 4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 22
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput 4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput
or Delay? . . . . . . . . . . . . . . . . . . . . . . 23 or Delay? . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2. Congestion Signal Saturation: Introduce L4S Drop or 4.1.2. Congestion Signal Saturation: Introduce L4S Drop or
Delay? . . . . . . . . . . . . . . . . . . . . . . . 24 Delay? . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.3. Protecting against Unresponsive ECN-Capable 4.1.3. Protecting against Unresponsive ECN-Capable
Traffic . . . . . . . . . . . . . . . . . . . . . . . 25 Traffic . . . . . . . . . . . . . . . . . . . . . . . 25
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.1. Normative References . . . . . . . . . . . . . . . . . . 26 7.1. Normative References . . . . . . . . . . . . . . . . . . 27
7.2. Informative References . . . . . . . . . . . . . . . . . 27 7.2. Informative References . . . . . . . . . . . . . . . . . 27
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 33 Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 33
A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 33 A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 33
A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 44 A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 44
Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 48 Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 48
B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 48 B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 48
B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 54 B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 54
Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 56 Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 56
C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 56 C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 56
C.2. Guidance on Controlling Throughput Equivalence . . . . . 57 C.2. Guidance on Controlling Throughput Equivalence . . . . . 57
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 60 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61
1. Introduction 1. Introduction
This document specifies a framework for DualQ Coupled AQMs, which is This document specifies a framework for DualQ Coupled AQMs, which is
the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch].
L4S enables both very low queuing latency (sub-millisecond on L4S enables both very low queuing latency (sub-millisecond on
average) and high throughput at the same time, for ad hoc numbers of average) and high throughput at the same time, for ad hoc numbers of
capacity-seeking applications all sharing the same capacity. capacity-seeking applications all sharing the same capacity.
1.1. Outline of the Problem 1.1. Outline of the Problem
skipping to change at page 12, line 35 skipping to change at page 12, line 35
p_C = ( p_CL / k )^2 (1) p_C = ( p_CL / k )^2 (1)
where k is the constant of proportionality, which is termed the where k is the constant of proportionality, which is termed the
coupling factor. coupling factor.
2.2. Dual Queue 2.2. Dual Queue
Classic traffic needs to build a large queue to prevent under- Classic traffic needs to build a large queue to prevent under-
utilization. Therefore a separate queue is provided for L4S traffic, utilization. Therefore a separate queue is provided for L4S traffic,
and it is scheduled with priority over the Classic queue. Priority and it is scheduled with priority over the Classic queue. Priority
is conditional to prevent starvation of Classic traffic. is conditional to prevent starvation of Classic traffic in certain
conditions (see Section 2.4).
Nonetheless, coupled marking ensures that giving priority to L4S Nonetheless, coupled marking ensures that giving priority to L4S
traffic still leaves the right amount of spare scheduling time for traffic still leaves the right amount of spare scheduling time for
Classic flows to each get equivalent throughput to DCTCP flows (all Classic flows to each get equivalent throughput to DCTCP flows (all
other factors such as RTT being equal). other factors such as RTT being equal).
2.3. Traffic Classification 2.3. Traffic Classification
Both the Coupled AQM and DualQ mechanisms need an identifier to Both the Coupled AQM and DualQ mechanisms need an identifier to
distinguish L4S (L) and Classic (C) packets. Then the coupling distinguish L4S (L) and Classic (C) packets. Then the coupling
skipping to change at page 16, line 9 skipping to change at page 15, line 45
C queue. This is because, as the C queue grows, the base AQM applies C queue. This is because, as the C queue grows, the base AQM applies
more congestion signals to L traffic (as well as C). As L flows more congestion signals to L traffic (as well as C). As L flows
reduce their rate in response, they use less than the scheduling reduce their rate in response, they use less than the scheduling
share for L traffic. So, because the scheduler is work preserving, share for L traffic. So, because the scheduler is work preserving,
it schedules any C traffic in the gaps. it schedules any C traffic in the gaps.
Giving priority to the L queue has the benefit of very low L queue Giving priority to the L queue has the benefit of very low L queue
delay, because the L queue is kept empty whenever L traffic is delay, because the L queue is kept empty whenever L traffic is
controlled by the coupling. Also there only has to be a coupling in controlled by the coupling. Also there only has to be a coupling in
one direction - from Classic to L4S. Priority has to be conditional one direction - from Classic to L4S. Priority has to be conditional
in some way to prevent the C queue starving under overload conditions in some way to prevent the C queue being starved by excessive
(see Section 4.1). With normal responsive traffic simple strict unresponsive L traffic (see Section 4.1) and to give C traffic a
priority would work, but it would make new Classic traffic wait until means to push in, as explained next. With normal responsive L
its queue activated the coupling and L4S flows had in turn reduced traffic, the coupled ECN marking gives C traffic the ability to push
their rate enough to drain the L queue so that Classic traffic could back against even strict priority, by congestion marking the L
be scheduled. Giving a small weight or limited waiting time for C traffic to make it yield some space. However, if there is just a
traffic improves response times for short Classic messages, such as small finite set of C packets (e.g. a DNS request or an initial
DNS requests and improves Classic flow startup because immediate window of data) some Classic AQMs will not induce enough ECN marking
capacity is available. in the L queue, no matter how long the small set of C packets waits.
Then, if the L queue happens to remain busy, the C traffic would
never get a scheduling opportunity from a strict priority scheduler.
Ideally the Classic AQM would be designed to increase the coupled
marking the longer that C packets have been waiting, but this is not
always practical - hence the need for L priority to be conditional.
Giving a small weight or limited waiting time for C traffic improves
response times for short Classic messages, such as DNS requests and
improves Classic flow startup because immediate capacity is
available.
Example DualQ Coupled AQM algorithms called DualPI2 and Curvy RED are Example DualQ Coupled AQM algorithms called DualPI2 and Curvy RED are
given in Appendix A and Appendix B. Either example AQM can be used given in Appendix A and Appendix B. Either example AQM can be used
to couple packet marking and dropping across a dual Q. to couple packet marking and dropping across a dual Q.
DualPI2 uses a Proportional-Integral (PI) controller as the Base AQM. DualPI2 uses a Proportional-Integral (PI) controller as the Base AQM.
Indeed, this Base AQM with just the squared output and no L4S queue Indeed, this Base AQM with just the squared output and no L4S queue
can be used as a drop-in replacement for PIE [RFC8033], in which case can be used as a drop-in replacement for PIE [RFC8033], in which case
it is just called PI2 [PI2]. PI2 is a principled simplification of it is just called PI2 [PI2]. PI2 is a principled simplification of
PIE that is both more responsive and more stable in the face of PIE that is both more responsive and more stable in the face of
dynamically varying load. dynamically varying load.
Curvy RED is derived from RED [RFC2309], but its configuration Curvy RED is derived from RED [RFC2309], except its configuration
parameters are insensitive to link rate and it requires less parameters are delay-based to make them insensitive to link rate and
operations per packet. However, DualPI2 is more responsive and it requires less operations per packet. However, DualPI2 is more
stable over a wider range of RTTs than Curvy RED. As a consequence, responsive and stable over a wider range of RTTs than Curvy RED. As
at the time of writing, DualPI2 has attracted more development and a consequence, at the time of writing, DualPI2 has attracted more
evaluation attention than Curvy RED, leaving the Curvy RED design development and evaluation attention than Curvy RED, leaving the
incomplete and not so fully evaluated. Curvy RED design not so fully evaluated.
Both AQMs regulate their queue in units of time rather than bytes. Both AQMs regulate their queue in units of time rather than bytes.
As already explained, this ensures configuration can be invariant for As already explained, this ensures configuration can be invariant for
different drain rates. With AQMs in a dualQ structure this is different drain rates. With AQMs in a dualQ structure this is
particularly important because the drain rate of each queue can vary particularly important because the drain rate of each queue can vary
rapidly as flows for the two queues arrive and depart, even if the rapidly as flows for the two queues arrive and depart, even if the
combined link rate is constant. combined link rate is constant.
It would be possible to control the queues with other alternative It would be possible to control the queues with other alternative
AQMs, as long as the normative requirements (those expressed in AQMs, as long as the normative requirements (those expressed in
skipping to change at page 17, line 30 skipping to change at page 17, line 30
A Dual Queue Coupled AQM implementation MUST utilize two queues, each A Dual Queue Coupled AQM implementation MUST utilize two queues, each
with an AQM algorithm. with an AQM algorithm.
The AQM algorithm for the low latency (L) queue MUST be able to apply The AQM algorithm for the low latency (L) queue MUST be able to apply
ECN marking to ECN-capable packets. ECN marking to ECN-capable packets.
The scheduler draining the two queues MUST give L4S packets priority The scheduler draining the two queues MUST give L4S packets priority
over Classic, although priority MUST be bounded in order not to over Classic, although priority MUST be bounded in order not to
starve Classic traffic. The scheduler SHOULD be work-conserving, or starve Classic traffic. The scheduler SHOULD be work-conserving, or
otherwise close to work-conserving, given Classic service will often otherwise close to work-conserving. This is because Classic traffic
rely on borrowing from the L4S service. needs to be able to efficiently fill any space left by L4S traffic
even though the scheduler would otherwise allocate it to L4S.
[I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on [I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on
L4S traffic, relative to drop of Classic traffic. In order to ensure L4S traffic, relative to drop of Classic traffic. In order to ensure
coexistence of Classic and Scalable L4S traffic, it says, "The coexistence of Classic and Scalable L4S traffic, it says, "The
likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST be likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST be
roughly proportional to the square of the likelihood that it would roughly proportional to the square of the likelihood that it would
have marked it if it had been an L4S packet (p_L)." The term have marked it if it had been an L4S packet (p_L)." The term
'likelihood' is used to allow for marking and dropping to be either 'likelihood' is used to allow for marking and dropping to be either
probabilistic or deterministic. probabilistic or deterministic.
skipping to change at page 23, line 13 skipping to change at page 23, line 21
followed by proposed solution(s). followed by proposed solution(s).
Under overload the higher priority L4S service will have to sacrifice Under overload the higher priority L4S service will have to sacrifice
some aspect of its performance. Alternative solutions are provided some aspect of its performance. Alternative solutions are provided
below that each relax a different factor: e.g. throughput, delay, below that each relax a different factor: e.g. throughput, delay,
drop. These choices need to be made either by the developer or by drop. These choices need to be made either by the developer or by
operator policy, rather than by the IETF. operator policy, rather than by the IETF.
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput or Delay? 4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput or Delay?
Priority of L4S is required to be conditional (see Section 2.5.1) to Priority of L4S is required to be conditional (see Section 2.4 &
avoid total starvation of Classic by heavy L4S traffic. This raises Section 2.5.1) to avoid total starvation of Classic by heavy L4S
the question of whether to sacrifice L4S throughput or L4S delay (or traffic. This raises the question of whether to sacrifice L4S
some other policy) to mitigate starvation of Classic: throughput or L4S delay (or some other policy) to mitigate starvation
of Classic:
Sacrifice L4S throughput: By using weighted round robin as the Sacrifice L4S throughput: By using weighted round robin as the
conditional priority scheduler, the L4S service can sacrifice some conditional priority scheduler, the L4S service can sacrifice some
throughput during overload. This can either be thought of as throughput during overload. This can either be thought of as
guaranteeing a minimum throughput service for Classic traffic, or guaranteeing a minimum throughput service for Classic traffic, or
as guaranteeing a maximum delay for a packet at the head of the as guaranteeing a maximum delay for a packet at the head of the
Classic queue. Classic queue.
The scheduling weight of the Classic queue should be small The scheduling weight of the Classic queue should be small
(e.g. 1/16). Then, in most traffic scenarios the scheduler will (e.g. 1/16). Then, in most traffic scenarios the scheduler will
skipping to change at page 47, line 40 skipping to change at page 48, line 7
recommended over WRR): recommended over WRR):
- TS-FIFO does not fully isolate latency in the L4S queue from - TS-FIFO does not fully isolate latency in the L4S queue from
uncontrolled bursts in the Classic queue; uncontrolled bursts in the Classic queue;
- TS-FIFO is only appropriate if time-stamping of packets is - TS-FIFO is only appropriate if time-stamping of packets is
feasible; feasible;
- Even if time-stamping is supported, the sojourn time of the - Even if time-stamping is supported, the sojourn time of the
head packet is always stale. For instance, if a burst arrives head packet is always stale. For instance, if a burst arrives
at an empty queue, the sojourn time will only measure the delay at an empty queue, the sojourn time only fully measures the
of the burst once the burst is over, even though the queue knew burst's delay when its last packet is dequeued, even though the
about it from the start. To remedy this, each head packet can queue knew about the burst from the start - so it could have
be marked based on the delay it causes to packets backlogged signalled congestion earlier. To remedy this, each head packet
behind it, rather than based on its own delay due to the can be marked when it is dequeued based on the expected delay
packets in front of it. [Heist21] identifies a specific of the tail packet behind it, as explained below, rather than
scenario where bursty traffic significantly hits utilization of based on the head packet's own delay due to the packets in
the L queue. If this effect proves to be more widely front of it. [Heist21] identifies a specific scenario where
applicable, it is believed that using the delay behind the head bursty traffic significantly hits utilization of the L queue.
would improve performance. It can be implemented by If this effect proves to be more widely applicable, it is
multiplying the backlog at dequeue by the serialization delay believed that using the delay behind the head would improve
per unit of backlog. The implementation details will depend on performance.
whether the link rate is known; if it is not, a moving average
of the serialization delay can be maintained. This approach The delay behind the head can be implemented by dividing the
should cost less in operations and memory than the proposed backlog at dequeue by the link rate or equivalently multiplying
'scaled sojourn time' metric, which is the sojourn time of a the backlog by the delay per unit of backlog. The
packet scaled by the ratio of the queue sizes when the packet implementation details will depend on whether the link rate is
departed and arrived [SigQ-Dyn]. known; if it is not, a moving average of the delay per unit
backlog can be maintained. This delay consists of
serialization as well as media acquisition for shared media.
So the details will depend strongly on the specific link
technology, This approach should be less sensitive to timing
errors and cost less in operations and memory than the
otherwise equivalent 'scaled sojourn time' metric, which is the
sojourn time of a packet scaled by the ratio of the queue sizes
when the packet departed and arrived [SigQ-Dyn].
* A strict priority scheduler would be inappropriate, because it * A strict priority scheduler would be inappropriate, because it
would starve Classic if L4S was overloaded. would starve Classic if L4S was overloaded.
Appendix B. Example DualQ Coupled Curvy RED Algorithm Appendix B. Example DualQ Coupled Curvy RED Algorithm
As another example of a DualQ Coupled AQM algorithm, the pseudocode As another example of a DualQ Coupled AQM algorithm, the pseudocode
below gives the Curvy RED based algorithm. Although the AQM was below gives the Curvy RED based algorithm. Although the AQM was
designed to be efficient in integer arithmetic, to aid understanding designed to be efficient in integer arithmetic, to aid understanding
it is first given using floating point arithmetic (Figure 10). Then, it is first given using floating point arithmetic (Figure 10). Then,
skipping to change at page 57, line 18 skipping to change at page 57, line 18
C.2. Guidance on Controlling Throughput Equivalence C.2. Guidance on Controlling Throughput Equivalence
The coupling factor, k, determines the balance between L4S and The coupling factor, k, determines the balance between L4S and
Classic flow rates (see Section 2.5.2.1 and equation (1)). Classic flow rates (see Section 2.5.2.1 and equation (1)).
For the public Internet, a coupling factor of k=2 is recommended, and For the public Internet, a coupling factor of k=2 is recommended, and
justified below. For scenarios other than the public Internet, a justified below. For scenarios other than the public Internet, a
good coupling factor can be derived by plugging the appropriate good coupling factor can be derived by plugging the appropriate
numbers into the same working. numbers into the same working.
To summarize the maths below, from equation (5) it can be seen that To summarize the maths below, from equation (7) it can be seen that
choosing k=1.64 will make L4S throughput roughly the same as Classic, choosing k=1.64 would theoretically make L4S throughput roughly the
_if their actual end-to-end RTTs are the same_. However, even if the same as Classic, _if their actual end-to-end RTTs were the same_.
base RTTs are the same, the actual RTTs are unlikely to be the same, However, even if the base RTTs are the same, the actual RTTs are
because Classic traffic needs a fairly large queue to avoid under- unlikely to be the same, because Classic traffic needs a fairly large
utilization and excess drop. Whereas L4S does not. queue to avoid under-utilization and excess drop. Whereas L4S does
not.
Therefore, to determine the appropriate coupling factor policy, the Therefore, to determine the appropriate coupling factor policy, the
operator needs to decide at what base RTT it wants L4S and Classic operator needs to decide at what base RTT it wants L4S and Classic
flows to have roughly equal throughput, once the effect of the flows to have roughly equal throughput, once the effect of the
additional Classic queue on Classic throughput has been taken into additional Classic queue on Classic throughput has been taken into
account. With this approach, a network operator can determine a good account. With this approach, a network operator can determine a good
coupling factor without knowing the precise L4S algorithm for coupling factor without knowing the precise L4S algorithm for
reducing RTT-dependence - or even in the absence of any algorithm. reducing RTT-dependence - or even in the absence of any algorithm.
The following additional terminology will be used, with appropriate The following additional terminology will be used, with appropriate
subscripts: subscripts:
r: Packet rate [pkt/s] r: Packet rate [pkt/s]
R: RTT [s/round] R: RTT [s/round]
p: ECN marking probability [] p: ECN marking probability []
On the Classic side, we consider Reno as the most sensitive and On the Classic side, we consider Reno as the most sensitive and
therefore worst case Classic congestion control, and we will also therefore worst-case Classic congestion control. We will also
consider Cubic in its Reno-friendly mode ('CReno'), as the most consider Cubic in its Reno-friendly mode ('CReno'), as the most
prevalent congestion control, according to the references and prevalent congestion control, according to the references and
analysis in [PI2param]. In either case, the Classic packet rate in analysis in [PI2param]. In either case, the Classic packet rate in
steady state is given by the well-known square root formula: steady state is given by the well-known square root formula for Reno
congestion control:
r_C = 1.22 / (R_C * p_C^0.5) r_C = 1.22 / (R_C * p_C^0.5) (5)
On the L4S side, we consider the Prague congestion control On the L4S side, we consider the Prague congestion control
[I-D.briscoe-iccrg-prague-congestion-control] as the reference for [I-D.briscoe-iccrg-prague-congestion-control] as the reference for
steady-state dependence on congestion. Prague conforms to the same steady-state dependence on congestion. Prague conforms to the same
equation as DCTCP, but we do not use the equation derived in the equation as DCTCP, but we do not use the equation derived in the
DCTCP paper, which is only appropriate for step marking. The coupled DCTCP paper, which is only appropriate for step marking. The coupled
marking, p_CL, is the appropriate one when considering throughput marking, p_CL, is the appropriate one when considering throughput
equivalence with Classic flows. Unlike step marking, coupled equivalence with Classic flows. Unlike step marking, coupled
markings are inherently spaced out, so we use the formula for DCTCP markings are inherently spaced out, so we use the formula for DCTCP
packet rate with probabilistic marking derived in Appendix A of packet rate with probabilistic marking derived in Appendix A of
[PI2]. We use the equation without RTT-independence enabled, which [PI2]. We use the equation without RTT-independence enabled, which
will be explained later. will be explained later.
r_L = 2/ (R_L * p_CL) r_L = 2 / (R_L * p_CL) (6)
For packet rate equivalence, we equate the two packet rates and For packet rate equivalence, we equate the two packet rates and
rearrange into the same form as Equation (1), so the two can be rearrange into the same form as Equation (1), so the two can be
equated and simplified to produce a formula for k: equated and simplified to produce a formula for a theoretical
coupling factor, which we shall call k*:
r_c = r_L r_c = r_L
=> p_C = (p_CL/1.64 * R_L/R_C)^2 => p_C = (p_CL/1.64 * R_L/R_C)^2
p_C = ( p_CL / k )^2 (1) p_C = ( p_CL / k )^2 (1)
k = 1.64 * (R_C / R_L) (5) k* = 1.64 * (R_C / R_L) (7)
We now have the coupling factor in terms of two RTTs. Traditionally, We say that this coupling factor is theoretical, because it is in
throughput equivalence is defined for flows under comparable terms of two RTTs, which raises two practical questions: i) for
conditions, including with the same base RTT [RFC2914]. So if we multiple flows with different RTTs, the RTT for each traffic class
assume the same base RTT, R_b, for comparable flows, we can put both would have to be derived from the RTTs of all the flows in that class
R_C and R_L in terms of R_b. (actually the harmonic mean would be needed); ii) a network node
cannot easily know the RTT of any of the flows anyway.
RTT-dependence is cuased by window-based congestion control, so it
ought to be reversed there, not in the network, Therefore, we use a
fixed coupling factor in the network, and reduce RTT-dependence in
L4S senders. We cannot expect Classic senders to all be updated to
reduce their RTT-dependence. But solely addressing the problem in
L4S senders at least makes RTT-dependence no worse - not just between
L4S senders, but also between L4S and Classic senders.
Traditionally, throughput equivalence has been defined for flows
under comparable conditions, including with the same base RTT
[RFC2914]. So if we assume the same base RTT, R_b, for comparable
flows, we can put both R_C and R_L in terms of R_b.
We can approximate the L4S RTT to be hardly greater than the base We can approximate the L4S RTT to be hardly greater than the base
RTT, i.e. R_L ~= R_b. And next we replace R_C with (R_b + q_C), RTT, i.e. R_L ~= R_b. And we can replace R_C with (R_b + q_C), where
where the Classic queue, q_C, depends on the target queue delay that the Classic queue, q_C, depends on the target queue delay that the
the operator has configured for the Classic AQM. operator has configured for the Classic AQM.
Taking PI2 as an example Classic AQM, it seems that we could just Taking PI2 as an example Classic AQM, it seems that we could just
take R_C = R_b + target (recommended 15 ms by default in take R_C = R_b + target (recommended 15 ms by default in
Appendix A.1). However, target is roughly the queue depth reached by Appendix A.1). However, target is roughly the queue depth reached by
the tips of the sawteeth of a congestion control, not the average the tips of the sawteeth of a congestion control, not the average
[PI2param]. That is R_max = R_b + target. [PI2param]. That is R_max = R_b + target.
The position of the average in relation to the max depends on the The position of the average in relation to the max depends on the
amplitude of the sawteeth, so we will consider Reno [RFC5681] as the amplitude and geometry of the sawteeth. We consider two examples:
most sensitive worst-case as well as Cubic [RFC8312] in its Reno- Reno [RFC5681], as the most sensitive worst-case, and Cubic [RFC8312]
friendly mode ('CReno') as the most prevalent congestion control in its Reno-friendly mode ('CReno') as the most prevalent congestion
algorithm on the Internet according to [PI2param]. control algorithm on the Internet according to the references in
[PI2param]. Both are AIMD, so we will generalize using b as the
Both are AIMD, so we will generalize using b as the multiplicative multiplicative decrease factor (b_r = 0.5 for Reno, b_c = 0.7 for
decrease factor (b_r = 0.5 for Reno, b_c = 0.7 for CReno). Then: CReno). Then:
R_C = (R_max + b*R_max) / 2 R_C = (R_max + b*R_max) / 2
= R_max * (1+b)/2 = R_max * (1+b)/2
R_reno = 0.75 * (R_b + target); R_creno = 0.85 * (R_b + target). R_reno = 0.75 * (R_b + target); R_creno = 0.85 * (R_b + target).
(8)
Plugging all this into equation (5) we get coupling factor, Plugging all this into equation (7) we get a fixed coupling factor
for each:
k_reno = 1.64*0.75*(R_b+target)/R_b k_reno = 1.64*0.75*(R_b+target)/R_b
= 1.23*(1 + target/R_b); k_creno = 1.39 * (1 + target/R_b) = 1.23*(1 + target/R_b); k_creno = 1.39 * (1 + target/R_b)
For instance, it is recommended that the operator chooses R_b = 25 An operator can then choose the base RTT at which it wants throughput
ms, as a typical base RTT between Internet users and CDNs [PI2param]. to be equivalent. For instance, if we recommend that the operator
Then: chooses R_b = 25 ms, as a typical base RTT between Internet users and
CDNs [PI2param], then these coupling factors become:
k_reno = 1.23 * (1 + 15/25) k_creno = 1.39 * (1 + 15/25) k_reno = 1.23 * (1 + 15/25) k_creno = 1.39 * (1 + 15/25)
= 1.97 = 2.22 = 1.97 = 2.22
~= 2 ~= 2 ~= 2 ~= 2 (9)
The approximation is relevant to any of the above example DualQ The approximation is relevant to any of the above example DualQ
Coupled algorithms, which use a coupling factor that is an integer Coupled algorithms, which use a coupling factor that is an integer
power of 2 to aid efficient implementation. power of 2 to aid efficient implementation. It also fits best to the
worst case (Reno).
An operator can make a policy choice to decide on a different base To check the outcome of this coupling factor, we can express the
RTT at which it wants throughput equivalence. Nonetheless, it makes ratio of L4S to Classic throughput by substituting from their rate
equations (5) and (6), then also substituting for p_C in terms of
p_CL, using equation (1) with k=2 as just determined for the
Internet:
r_L / r_C = 2 (R_C * p_C^0.5) / 1.22 (R_L * p_CL)
= (R_C * p_CL) / (1.22 * R_L * p_CL)
= R_C / (1.22 * R_L) (10)
As an example, we can then consider single competing CReno and Prague
flows, by expressing both their RTTs in (10) in terms of their base
RTTs, R_bC and R_bL. So R_C is replaced by equation (8) for CReno.
And R_L is replaced by the max() function below, which represents the
effective RTT of the current Prague congestion control
[I-D.briscoe-iccrg-prague-congestion-control] in its (default) RTT-
independent mode, because it sets a floor to the effective RTT that
it uses for additive increase:
~= 0.85 * (R_bC + target) / (1.22 * max(R_bL, R_typ))
~= (R_bC + target) / (1.4 * max(R_bL, R_typ))
It can be seen that, for base RTTs below target (15 ms), both the
numerator and the denominator plateau, which has the desired effect
of limiting RTT-dependence.
At the start of the above derivations, an explanation was promised
for why the L4S throughput equation in equation (6) did not need to
model RTT-independence. This is because we only use one point - at
the the typical base RTT where the operator chooses to calculate the
coupling factor. Then, throughput equivalence will at least hold at
that chosen point. Nonetheless, assuming Prague senders implement
RTT-independence over a range of RTTs below this, the throughput
equivalence will then extend over that range as well.
Congestion control designers can choose different ways to reduce RTT-
dependence. And each operator can make a policy choice to decide on
a different base RTT, and therefore a different k, at which it wants
throughput equivalence. Nonetheless, for the Internet, it makes
sense to choose what is believed to be the typical RTT most users sense to choose what is believed to be the typical RTT most users
experience, because a Classic AQM's target queuing delay is also experience, because a Classic AQM's target queuing delay is also
derived from a typical RTT for the Internet. Therefore, below this derived from a typical RTT for the Internet.
typical RTT, Classic AQMs become fairly RTT-independent. And L4S
flows are also required to become RTT-independent below a typical RTT
[I-D.ietf-tsvwg-ecn-l4s-id]. Therefore, throughput equivalence ought
to be no worse than with Classic AQMs and Classic congestion
controls.
As remarked earlier, the throughput equation used for Prague was with
RTT-independence disabled. This is because we only need the point on
this equation at the typical base RTT - where the operator chooses to
calculate the coupling factor. We do not need to know the full range
of the equation used for RTT-independence as long as it is roughly
the same at this one point. Then, there will at least be throughput
equivalence at that base RTT. And assuming Prague senders implement
RTT-independence over a range of RTTs, the throughput equivalence
will then extend over that range.
As a non-Internet example, for localized traffic from a particular As a non-Internet example, for localized traffic from a particular
ISP's data centre, using the measured RTTs, it was calculated that a ISP's data centre, using the measured RTTs, it was calculated that a
value of k = 8 would achieve throughput equivalence, and experiments value of k = 8 would achieve throughput equivalence, and experiments
verified the formula very closely. verified the formula very closely.
But, for a typical mix of RTTs across the general Internet, a value But, for a typical mix of RTTs across the general Internet, a value
of k=2 is recommended as a good workable compromise. of k=2 is recommended as a good workable compromise.
Authors' Addresses Authors' Addresses
 End of changes. 32 change blocks. 
115 lines changed or deleted 179 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/