draft-ietf-tsvwg-aqm-dualq-coupled-18.txt | draft-ietf-tsvwg-aqm-dualq-coupled-19.txt | |||
---|---|---|---|---|
Transport Area working group (tsvwg) K. De Schepper | Transport Area working group (tsvwg) K. De Schepper | |||
Internet-Draft Nokia Bell Labs | Internet-Draft Nokia Bell Labs | |||
Intended status: Experimental B. Briscoe, Ed. | Intended status: Experimental B. Briscoe, Ed. | |||
Expires: 28 April 2022 Independent | Expires: 7 May 2022 Independent | |||
G. White | G. White | |||
CableLabs | CableLabs | |||
25 October 2021 | 3 November 2021 | |||
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | |||
(L4S) | (L4S) | |||
draft-ietf-tsvwg-aqm-dualq-coupled-18 | draft-ietf-tsvwg-aqm-dualq-coupled-19 | |||
Abstract | Abstract | |||
This specification defines a framework for coupling the Active Queue | This specification defines a framework for coupling the Active Queue | |||
Management (AQM) algorithms in two queues intended for flows with | Management (AQM) algorithms in two queues intended for flows with | |||
different responses to congestion. This provides a way for the | different responses to congestion. This provides a way for the | |||
Internet to transition from the scaling problems of standard TCP | Internet to transition from the scaling problems of standard TCP | |||
Reno-friendly ('Classic') congestion controls to the family of | Reno-friendly ('Classic') congestion controls to the family of | |||
'Scalable' congestion controls. These achieve consistently very Low | 'Scalable' congestion controls. These are designed for consistently | |||
queuing Latency, very Low congestion Loss and Scaling of per-flow | very Low queuing Latency, very Low congestion Loss and Scaling of | |||
throughput (L4S) by using Explicit Congestion Notification (ECN) in a | per-flow throughput (L4S) by using Explicit Congestion Notification | |||
modified way. Until the Coupled DualQ, these L4S senders could only | (ECN) in a modified way. Until the Coupled DualQ, these L4S senders | |||
be deployed where a clean-slate environment could be arranged, such | could only be deployed where a clean-slate environment could be | |||
as in private data centres. The coupling acts like a semi-permeable | arranged, such as in private data centres. The coupling acts like a | |||
membrane: isolating the sub-millisecond average queuing delay and | semi-permeable membrane: isolating the sub-millisecond average | |||
zero congestion loss of L4S from Classic latency and loss; but | queuing delay and zero congestion loss of L4S from Classic latency | |||
pooling the capacity between any combination of Scalable and Classic | and loss; but pooling the capacity between any combination of | |||
flows with roughly equivalent throughput per flow. The DualQ | Scalable and Classic flows with roughly equivalent throughput per | |||
achieves this indirectly, without having to inspect transport layer | flow. The DualQ achieves this indirectly, without having to inspect | |||
flow identifiers and without compromising the performance of the | transport layer flow identifiers and without compromising the | |||
Classic traffic. The solution has low complexity and requires no | performance of the Classic traffic, relative to a single queue. The | |||
configuration for the public Internet. | DualQ design has low complexity and requires no configuration for the | |||
public Internet. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on 28 April 2022. | This Internet-Draft will expire on 7 May 2022. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2021 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
skipping to change at page 2, line 36 ¶ | skipping to change at page 2, line 36 ¶ | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 | 1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 | |||
1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 | 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 | 1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 | 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 | |||
2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 | 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 | 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12 | 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 13 | |||
2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 | 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 | |||
2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 | 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 | |||
2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 | 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 | |||
2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 | 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 | |||
2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 | 2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 | |||
2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 | 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 | |||
2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | |||
2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 21 | 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22 | |||
2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | |||
3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | |||
4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | |||
4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 22 | 4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 22 | |||
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput | 4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput | |||
or Delay? . . . . . . . . . . . . . . . . . . . . . . 23 | or Delay? . . . . . . . . . . . . . . . . . . . . . . 23 | |||
4.1.2. Congestion Signal Saturation: Introduce L4S Drop or | 4.1.2. Congestion Signal Saturation: Introduce L4S Drop or | |||
Delay? . . . . . . . . . . . . . . . . . . . . . . . 24 | Delay? . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
4.1.3. Protecting against Unresponsive ECN-Capable | 4.1.3. Protecting against Unresponsive ECN-Capable | |||
Traffic . . . . . . . . . . . . . . . . . . . . . . . 25 | Traffic . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 | 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 | |||
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 | 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 | 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 | |||
7.1. Normative References . . . . . . . . . . . . . . . . . . 26 | 7.1. Normative References . . . . . . . . . . . . . . . . . . 27 | |||
7.2. Informative References . . . . . . . . . . . . . . . . . 27 | 7.2. Informative References . . . . . . . . . . . . . . . . . 27 | |||
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 33 | Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 33 | |||
A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 33 | A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 33 | |||
A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 44 | A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 44 | |||
Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 48 | Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 48 | |||
B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 48 | B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 48 | |||
B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 54 | B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 54 | |||
Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 56 | Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 56 | |||
C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 56 | C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 56 | |||
C.2. Guidance on Controlling Throughput Equivalence . . . . . 57 | C.2. Guidance on Controlling Throughput Equivalence . . . . . 57 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 60 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61 | |||
1. Introduction | 1. Introduction | |||
This document specifies a framework for DualQ Coupled AQMs, which is | This document specifies a framework for DualQ Coupled AQMs, which is | |||
the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. | the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. | |||
L4S enables both very low queuing latency (sub-millisecond on | L4S enables both very low queuing latency (sub-millisecond on | |||
average) and high throughput at the same time, for ad hoc numbers of | average) and high throughput at the same time, for ad hoc numbers of | |||
capacity-seeking applications all sharing the same capacity. | capacity-seeking applications all sharing the same capacity. | |||
1.1. Outline of the Problem | 1.1. Outline of the Problem | |||
skipping to change at page 12, line 35 ¶ | skipping to change at page 12, line 35 ¶ | |||
p_C = ( p_CL / k )^2 (1) | p_C = ( p_CL / k )^2 (1) | |||
where k is the constant of proportionality, which is termed the | where k is the constant of proportionality, which is termed the | |||
coupling factor. | coupling factor. | |||
2.2. Dual Queue | 2.2. Dual Queue | |||
Classic traffic needs to build a large queue to prevent under- | Classic traffic needs to build a large queue to prevent under- | |||
utilization. Therefore a separate queue is provided for L4S traffic, | utilization. Therefore a separate queue is provided for L4S traffic, | |||
and it is scheduled with priority over the Classic queue. Priority | and it is scheduled with priority over the Classic queue. Priority | |||
is conditional to prevent starvation of Classic traffic. | is conditional to prevent starvation of Classic traffic in certain | |||
conditions (see Section 2.4). | ||||
Nonetheless, coupled marking ensures that giving priority to L4S | Nonetheless, coupled marking ensures that giving priority to L4S | |||
traffic still leaves the right amount of spare scheduling time for | traffic still leaves the right amount of spare scheduling time for | |||
Classic flows to each get equivalent throughput to DCTCP flows (all | Classic flows to each get equivalent throughput to DCTCP flows (all | |||
other factors such as RTT being equal). | other factors such as RTT being equal). | |||
2.3. Traffic Classification | 2.3. Traffic Classification | |||
Both the Coupled AQM and DualQ mechanisms need an identifier to | Both the Coupled AQM and DualQ mechanisms need an identifier to | |||
distinguish L4S (L) and Classic (C) packets. Then the coupling | distinguish L4S (L) and Classic (C) packets. Then the coupling | |||
skipping to change at page 16, line 9 ¶ | skipping to change at page 15, line 45 ¶ | |||
C queue. This is because, as the C queue grows, the base AQM applies | C queue. This is because, as the C queue grows, the base AQM applies | |||
more congestion signals to L traffic (as well as C). As L flows | more congestion signals to L traffic (as well as C). As L flows | |||
reduce their rate in response, they use less than the scheduling | reduce their rate in response, they use less than the scheduling | |||
share for L traffic. So, because the scheduler is work preserving, | share for L traffic. So, because the scheduler is work preserving, | |||
it schedules any C traffic in the gaps. | it schedules any C traffic in the gaps. | |||
Giving priority to the L queue has the benefit of very low L queue | Giving priority to the L queue has the benefit of very low L queue | |||
delay, because the L queue is kept empty whenever L traffic is | delay, because the L queue is kept empty whenever L traffic is | |||
controlled by the coupling. Also there only has to be a coupling in | controlled by the coupling. Also there only has to be a coupling in | |||
one direction - from Classic to L4S. Priority has to be conditional | one direction - from Classic to L4S. Priority has to be conditional | |||
in some way to prevent the C queue starving under overload conditions | in some way to prevent the C queue being starved by excessive | |||
(see Section 4.1). With normal responsive traffic simple strict | unresponsive L traffic (see Section 4.1) and to give C traffic a | |||
priority would work, but it would make new Classic traffic wait until | means to push in, as explained next. With normal responsive L | |||
its queue activated the coupling and L4S flows had in turn reduced | traffic, the coupled ECN marking gives C traffic the ability to push | |||
their rate enough to drain the L queue so that Classic traffic could | back against even strict priority, by congestion marking the L | |||
be scheduled. Giving a small weight or limited waiting time for C | traffic to make it yield some space. However, if there is just a | |||
traffic improves response times for short Classic messages, such as | small finite set of C packets (e.g. a DNS request or an initial | |||
DNS requests and improves Classic flow startup because immediate | window of data) some Classic AQMs will not induce enough ECN marking | |||
capacity is available. | in the L queue, no matter how long the small set of C packets waits. | |||
Then, if the L queue happens to remain busy, the C traffic would | ||||
never get a scheduling opportunity from a strict priority scheduler. | ||||
Ideally the Classic AQM would be designed to increase the coupled | ||||
marking the longer that C packets have been waiting, but this is not | ||||
always practical - hence the need for L priority to be conditional. | ||||
Giving a small weight or limited waiting time for C traffic improves | ||||
response times for short Classic messages, such as DNS requests and | ||||
improves Classic flow startup because immediate capacity is | ||||
available. | ||||
Example DualQ Coupled AQM algorithms called DualPI2 and Curvy RED are | Example DualQ Coupled AQM algorithms called DualPI2 and Curvy RED are | |||
given in Appendix A and Appendix B. Either example AQM can be used | given in Appendix A and Appendix B. Either example AQM can be used | |||
to couple packet marking and dropping across a dual Q. | to couple packet marking and dropping across a dual Q. | |||
DualPI2 uses a Proportional-Integral (PI) controller as the Base AQM. | DualPI2 uses a Proportional-Integral (PI) controller as the Base AQM. | |||
Indeed, this Base AQM with just the squared output and no L4S queue | Indeed, this Base AQM with just the squared output and no L4S queue | |||
can be used as a drop-in replacement for PIE [RFC8033], in which case | can be used as a drop-in replacement for PIE [RFC8033], in which case | |||
it is just called PI2 [PI2]. PI2 is a principled simplification of | it is just called PI2 [PI2]. PI2 is a principled simplification of | |||
PIE that is both more responsive and more stable in the face of | PIE that is both more responsive and more stable in the face of | |||
dynamically varying load. | dynamically varying load. | |||
Curvy RED is derived from RED [RFC2309], but its configuration | Curvy RED is derived from RED [RFC2309], except its configuration | |||
parameters are insensitive to link rate and it requires less | parameters are delay-based to make them insensitive to link rate and | |||
operations per packet. However, DualPI2 is more responsive and | it requires less operations per packet. However, DualPI2 is more | |||
stable over a wider range of RTTs than Curvy RED. As a consequence, | responsive and stable over a wider range of RTTs than Curvy RED. As | |||
at the time of writing, DualPI2 has attracted more development and | a consequence, at the time of writing, DualPI2 has attracted more | |||
evaluation attention than Curvy RED, leaving the Curvy RED design | development and evaluation attention than Curvy RED, leaving the | |||
incomplete and not so fully evaluated. | Curvy RED design not so fully evaluated. | |||
Both AQMs regulate their queue in units of time rather than bytes. | Both AQMs regulate their queue in units of time rather than bytes. | |||
As already explained, this ensures configuration can be invariant for | As already explained, this ensures configuration can be invariant for | |||
different drain rates. With AQMs in a dualQ structure this is | different drain rates. With AQMs in a dualQ structure this is | |||
particularly important because the drain rate of each queue can vary | particularly important because the drain rate of each queue can vary | |||
rapidly as flows for the two queues arrive and depart, even if the | rapidly as flows for the two queues arrive and depart, even if the | |||
combined link rate is constant. | combined link rate is constant. | |||
It would be possible to control the queues with other alternative | It would be possible to control the queues with other alternative | |||
AQMs, as long as the normative requirements (those expressed in | AQMs, as long as the normative requirements (those expressed in | |||
skipping to change at page 17, line 30 ¶ | skipping to change at page 17, line 30 ¶ | |||
A Dual Queue Coupled AQM implementation MUST utilize two queues, each | A Dual Queue Coupled AQM implementation MUST utilize two queues, each | |||
with an AQM algorithm. | with an AQM algorithm. | |||
The AQM algorithm for the low latency (L) queue MUST be able to apply | The AQM algorithm for the low latency (L) queue MUST be able to apply | |||
ECN marking to ECN-capable packets. | ECN marking to ECN-capable packets. | |||
The scheduler draining the two queues MUST give L4S packets priority | The scheduler draining the two queues MUST give L4S packets priority | |||
over Classic, although priority MUST be bounded in order not to | over Classic, although priority MUST be bounded in order not to | |||
starve Classic traffic. The scheduler SHOULD be work-conserving, or | starve Classic traffic. The scheduler SHOULD be work-conserving, or | |||
otherwise close to work-conserving, given Classic service will often | otherwise close to work-conserving. This is because Classic traffic | |||
rely on borrowing from the L4S service. | needs to be able to efficiently fill any space left by L4S traffic | |||
even though the scheduler would otherwise allocate it to L4S. | ||||
[I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on | [I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on | |||
L4S traffic, relative to drop of Classic traffic. In order to ensure | L4S traffic, relative to drop of Classic traffic. In order to ensure | |||
coexistence of Classic and Scalable L4S traffic, it says, "The | coexistence of Classic and Scalable L4S traffic, it says, "The | |||
likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST be | likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST be | |||
roughly proportional to the square of the likelihood that it would | roughly proportional to the square of the likelihood that it would | |||
have marked it if it had been an L4S packet (p_L)." The term | have marked it if it had been an L4S packet (p_L)." The term | |||
'likelihood' is used to allow for marking and dropping to be either | 'likelihood' is used to allow for marking and dropping to be either | |||
probabilistic or deterministic. | probabilistic or deterministic. | |||
skipping to change at page 23, line 13 ¶ | skipping to change at page 23, line 21 ¶ | |||
followed by proposed solution(s). | followed by proposed solution(s). | |||
Under overload the higher priority L4S service will have to sacrifice | Under overload the higher priority L4S service will have to sacrifice | |||
some aspect of its performance. Alternative solutions are provided | some aspect of its performance. Alternative solutions are provided | |||
below that each relax a different factor: e.g. throughput, delay, | below that each relax a different factor: e.g. throughput, delay, | |||
drop. These choices need to be made either by the developer or by | drop. These choices need to be made either by the developer or by | |||
operator policy, rather than by the IETF. | operator policy, rather than by the IETF. | |||
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput or Delay? | 4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput or Delay? | |||
Priority of L4S is required to be conditional (see Section 2.5.1) to | Priority of L4S is required to be conditional (see Section 2.4 & | |||
avoid total starvation of Classic by heavy L4S traffic. This raises | Section 2.5.1) to avoid total starvation of Classic by heavy L4S | |||
the question of whether to sacrifice L4S throughput or L4S delay (or | traffic. This raises the question of whether to sacrifice L4S | |||
some other policy) to mitigate starvation of Classic: | throughput or L4S delay (or some other policy) to mitigate starvation | |||
of Classic: | ||||
Sacrifice L4S throughput: By using weighted round robin as the | Sacrifice L4S throughput: By using weighted round robin as the | |||
conditional priority scheduler, the L4S service can sacrifice some | conditional priority scheduler, the L4S service can sacrifice some | |||
throughput during overload. This can either be thought of as | throughput during overload. This can either be thought of as | |||
guaranteeing a minimum throughput service for Classic traffic, or | guaranteeing a minimum throughput service for Classic traffic, or | |||
as guaranteeing a maximum delay for a packet at the head of the | as guaranteeing a maximum delay for a packet at the head of the | |||
Classic queue. | Classic queue. | |||
The scheduling weight of the Classic queue should be small | The scheduling weight of the Classic queue should be small | |||
(e.g. 1/16). Then, in most traffic scenarios the scheduler will | (e.g. 1/16). Then, in most traffic scenarios the scheduler will | |||
skipping to change at page 47, line 40 ¶ | skipping to change at page 48, line 7 ¶ | |||
recommended over WRR): | recommended over WRR): | |||
- TS-FIFO does not fully isolate latency in the L4S queue from | - TS-FIFO does not fully isolate latency in the L4S queue from | |||
uncontrolled bursts in the Classic queue; | uncontrolled bursts in the Classic queue; | |||
- TS-FIFO is only appropriate if time-stamping of packets is | - TS-FIFO is only appropriate if time-stamping of packets is | |||
feasible; | feasible; | |||
- Even if time-stamping is supported, the sojourn time of the | - Even if time-stamping is supported, the sojourn time of the | |||
head packet is always stale. For instance, if a burst arrives | head packet is always stale. For instance, if a burst arrives | |||
at an empty queue, the sojourn time will only measure the delay | at an empty queue, the sojourn time only fully measures the | |||
of the burst once the burst is over, even though the queue knew | burst's delay when its last packet is dequeued, even though the | |||
about it from the start. To remedy this, each head packet can | queue knew about the burst from the start - so it could have | |||
be marked based on the delay it causes to packets backlogged | signalled congestion earlier. To remedy this, each head packet | |||
behind it, rather than based on its own delay due to the | can be marked when it is dequeued based on the expected delay | |||
packets in front of it. [Heist21] identifies a specific | of the tail packet behind it, as explained below, rather than | |||
scenario where bursty traffic significantly hits utilization of | based on the head packet's own delay due to the packets in | |||
the L queue. If this effect proves to be more widely | front of it. [Heist21] identifies a specific scenario where | |||
applicable, it is believed that using the delay behind the head | bursty traffic significantly hits utilization of the L queue. | |||
would improve performance. It can be implemented by | If this effect proves to be more widely applicable, it is | |||
multiplying the backlog at dequeue by the serialization delay | believed that using the delay behind the head would improve | |||
per unit of backlog. The implementation details will depend on | performance. | |||
whether the link rate is known; if it is not, a moving average | ||||
of the serialization delay can be maintained. This approach | The delay behind the head can be implemented by dividing the | |||
should cost less in operations and memory than the proposed | backlog at dequeue by the link rate or equivalently multiplying | |||
'scaled sojourn time' metric, which is the sojourn time of a | the backlog by the delay per unit of backlog. The | |||
packet scaled by the ratio of the queue sizes when the packet | implementation details will depend on whether the link rate is | |||
departed and arrived [SigQ-Dyn]. | known; if it is not, a moving average of the delay per unit | |||
backlog can be maintained. This delay consists of | ||||
serialization as well as media acquisition for shared media. | ||||
So the details will depend strongly on the specific link | ||||
technology, This approach should be less sensitive to timing | ||||
errors and cost less in operations and memory than the | ||||
otherwise equivalent 'scaled sojourn time' metric, which is the | ||||
sojourn time of a packet scaled by the ratio of the queue sizes | ||||
when the packet departed and arrived [SigQ-Dyn]. | ||||
* A strict priority scheduler would be inappropriate, because it | * A strict priority scheduler would be inappropriate, because it | |||
would starve Classic if L4S was overloaded. | would starve Classic if L4S was overloaded. | |||
Appendix B. Example DualQ Coupled Curvy RED Algorithm | Appendix B. Example DualQ Coupled Curvy RED Algorithm | |||
As another example of a DualQ Coupled AQM algorithm, the pseudocode | As another example of a DualQ Coupled AQM algorithm, the pseudocode | |||
below gives the Curvy RED based algorithm. Although the AQM was | below gives the Curvy RED based algorithm. Although the AQM was | |||
designed to be efficient in integer arithmetic, to aid understanding | designed to be efficient in integer arithmetic, to aid understanding | |||
it is first given using floating point arithmetic (Figure 10). Then, | it is first given using floating point arithmetic (Figure 10). Then, | |||
skipping to change at page 57, line 18 ¶ | skipping to change at page 57, line 18 ¶ | |||
C.2. Guidance on Controlling Throughput Equivalence | C.2. Guidance on Controlling Throughput Equivalence | |||
The coupling factor, k, determines the balance between L4S and | The coupling factor, k, determines the balance between L4S and | |||
Classic flow rates (see Section 2.5.2.1 and equation (1)). | Classic flow rates (see Section 2.5.2.1 and equation (1)). | |||
For the public Internet, a coupling factor of k=2 is recommended, and | For the public Internet, a coupling factor of k=2 is recommended, and | |||
justified below. For scenarios other than the public Internet, a | justified below. For scenarios other than the public Internet, a | |||
good coupling factor can be derived by plugging the appropriate | good coupling factor can be derived by plugging the appropriate | |||
numbers into the same working. | numbers into the same working. | |||
To summarize the maths below, from equation (5) it can be seen that | To summarize the maths below, from equation (7) it can be seen that | |||
choosing k=1.64 will make L4S throughput roughly the same as Classic, | choosing k=1.64 would theoretically make L4S throughput roughly the | |||
_if their actual end-to-end RTTs are the same_. However, even if the | same as Classic, _if their actual end-to-end RTTs were the same_. | |||
base RTTs are the same, the actual RTTs are unlikely to be the same, | However, even if the base RTTs are the same, the actual RTTs are | |||
because Classic traffic needs a fairly large queue to avoid under- | unlikely to be the same, because Classic traffic needs a fairly large | |||
utilization and excess drop. Whereas L4S does not. | queue to avoid under-utilization and excess drop. Whereas L4S does | |||
not. | ||||
Therefore, to determine the appropriate coupling factor policy, the | Therefore, to determine the appropriate coupling factor policy, the | |||
operator needs to decide at what base RTT it wants L4S and Classic | operator needs to decide at what base RTT it wants L4S and Classic | |||
flows to have roughly equal throughput, once the effect of the | flows to have roughly equal throughput, once the effect of the | |||
additional Classic queue on Classic throughput has been taken into | additional Classic queue on Classic throughput has been taken into | |||
account. With this approach, a network operator can determine a good | account. With this approach, a network operator can determine a good | |||
coupling factor without knowing the precise L4S algorithm for | coupling factor without knowing the precise L4S algorithm for | |||
reducing RTT-dependence - or even in the absence of any algorithm. | reducing RTT-dependence - or even in the absence of any algorithm. | |||
The following additional terminology will be used, with appropriate | The following additional terminology will be used, with appropriate | |||
subscripts: | subscripts: | |||
r: Packet rate [pkt/s] | r: Packet rate [pkt/s] | |||
R: RTT [s/round] | R: RTT [s/round] | |||
p: ECN marking probability [] | p: ECN marking probability [] | |||
On the Classic side, we consider Reno as the most sensitive and | On the Classic side, we consider Reno as the most sensitive and | |||
therefore worst case Classic congestion control, and we will also | therefore worst-case Classic congestion control. We will also | |||
consider Cubic in its Reno-friendly mode ('CReno'), as the most | consider Cubic in its Reno-friendly mode ('CReno'), as the most | |||
prevalent congestion control, according to the references and | prevalent congestion control, according to the references and | |||
analysis in [PI2param]. In either case, the Classic packet rate in | analysis in [PI2param]. In either case, the Classic packet rate in | |||
steady state is given by the well-known square root formula: | steady state is given by the well-known square root formula for Reno | |||
congestion control: | ||||
r_C = 1.22 / (R_C * p_C^0.5) | r_C = 1.22 / (R_C * p_C^0.5) (5) | |||
On the L4S side, we consider the Prague congestion control | On the L4S side, we consider the Prague congestion control | |||
[I-D.briscoe-iccrg-prague-congestion-control] as the reference for | [I-D.briscoe-iccrg-prague-congestion-control] as the reference for | |||
steady-state dependence on congestion. Prague conforms to the same | steady-state dependence on congestion. Prague conforms to the same | |||
equation as DCTCP, but we do not use the equation derived in the | equation as DCTCP, but we do not use the equation derived in the | |||
DCTCP paper, which is only appropriate for step marking. The coupled | DCTCP paper, which is only appropriate for step marking. The coupled | |||
marking, p_CL, is the appropriate one when considering throughput | marking, p_CL, is the appropriate one when considering throughput | |||
equivalence with Classic flows. Unlike step marking, coupled | equivalence with Classic flows. Unlike step marking, coupled | |||
markings are inherently spaced out, so we use the formula for DCTCP | markings are inherently spaced out, so we use the formula for DCTCP | |||
packet rate with probabilistic marking derived in Appendix A of | packet rate with probabilistic marking derived in Appendix A of | |||
[PI2]. We use the equation without RTT-independence enabled, which | [PI2]. We use the equation without RTT-independence enabled, which | |||
will be explained later. | will be explained later. | |||
r_L = 2/ (R_L * p_CL) | r_L = 2 / (R_L * p_CL) (6) | |||
For packet rate equivalence, we equate the two packet rates and | For packet rate equivalence, we equate the two packet rates and | |||
rearrange into the same form as Equation (1), so the two can be | rearrange into the same form as Equation (1), so the two can be | |||
equated and simplified to produce a formula for k: | equated and simplified to produce a formula for a theoretical | |||
coupling factor, which we shall call k*: | ||||
r_c = r_L | r_c = r_L | |||
=> p_C = (p_CL/1.64 * R_L/R_C)^2 | => p_C = (p_CL/1.64 * R_L/R_C)^2 | |||
p_C = ( p_CL / k )^2 (1) | p_C = ( p_CL / k )^2 (1) | |||
k = 1.64 * (R_C / R_L) (5) | k* = 1.64 * (R_C / R_L) (7) | |||
We now have the coupling factor in terms of two RTTs. Traditionally, | We say that this coupling factor is theoretical, because it is in | |||
throughput equivalence is defined for flows under comparable | terms of two RTTs, which raises two practical questions: i) for | |||
conditions, including with the same base RTT [RFC2914]. So if we | multiple flows with different RTTs, the RTT for each traffic class | |||
assume the same base RTT, R_b, for comparable flows, we can put both | would have to be derived from the RTTs of all the flows in that class | |||
R_C and R_L in terms of R_b. | (actually the harmonic mean would be needed); ii) a network node | |||
cannot easily know the RTT of any of the flows anyway. | ||||
RTT-dependence is cuased by window-based congestion control, so it | ||||
ought to be reversed there, not in the network, Therefore, we use a | ||||
fixed coupling factor in the network, and reduce RTT-dependence in | ||||
L4S senders. We cannot expect Classic senders to all be updated to | ||||
reduce their RTT-dependence. But solely addressing the problem in | ||||
L4S senders at least makes RTT-dependence no worse - not just between | ||||
L4S senders, but also between L4S and Classic senders. | ||||
Traditionally, throughput equivalence has been defined for flows | ||||
under comparable conditions, including with the same base RTT | ||||
[RFC2914]. So if we assume the same base RTT, R_b, for comparable | ||||
flows, we can put both R_C and R_L in terms of R_b. | ||||
We can approximate the L4S RTT to be hardly greater than the base | We can approximate the L4S RTT to be hardly greater than the base | |||
RTT, i.e. R_L ~= R_b. And next we replace R_C with (R_b + q_C), | RTT, i.e. R_L ~= R_b. And we can replace R_C with (R_b + q_C), where | |||
where the Classic queue, q_C, depends on the target queue delay that | the Classic queue, q_C, depends on the target queue delay that the | |||
the operator has configured for the Classic AQM. | operator has configured for the Classic AQM. | |||
Taking PI2 as an example Classic AQM, it seems that we could just | Taking PI2 as an example Classic AQM, it seems that we could just | |||
take R_C = R_b + target (recommended 15 ms by default in | take R_C = R_b + target (recommended 15 ms by default in | |||
Appendix A.1). However, target is roughly the queue depth reached by | Appendix A.1). However, target is roughly the queue depth reached by | |||
the tips of the sawteeth of a congestion control, not the average | the tips of the sawteeth of a congestion control, not the average | |||
[PI2param]. That is R_max = R_b + target. | [PI2param]. That is R_max = R_b + target. | |||
The position of the average in relation to the max depends on the | The position of the average in relation to the max depends on the | |||
amplitude of the sawteeth, so we will consider Reno [RFC5681] as the | amplitude and geometry of the sawteeth. We consider two examples: | |||
most sensitive worst-case as well as Cubic [RFC8312] in its Reno- | Reno [RFC5681], as the most sensitive worst-case, and Cubic [RFC8312] | |||
friendly mode ('CReno') as the most prevalent congestion control | in its Reno-friendly mode ('CReno') as the most prevalent congestion | |||
algorithm on the Internet according to [PI2param]. | control algorithm on the Internet according to the references in | |||
[PI2param]. Both are AIMD, so we will generalize using b as the | ||||
Both are AIMD, so we will generalize using b as the multiplicative | multiplicative decrease factor (b_r = 0.5 for Reno, b_c = 0.7 for | |||
decrease factor (b_r = 0.5 for Reno, b_c = 0.7 for CReno). Then: | CReno). Then: | |||
R_C = (R_max + b*R_max) / 2 | R_C = (R_max + b*R_max) / 2 | |||
= R_max * (1+b)/2 | = R_max * (1+b)/2 | |||
R_reno = 0.75 * (R_b + target); R_creno = 0.85 * (R_b + target). | R_reno = 0.75 * (R_b + target); R_creno = 0.85 * (R_b + target). | |||
(8) | ||||
Plugging all this into equation (5) we get coupling factor, | Plugging all this into equation (7) we get a fixed coupling factor | |||
for each: | ||||
k_reno = 1.64*0.75*(R_b+target)/R_b | k_reno = 1.64*0.75*(R_b+target)/R_b | |||
= 1.23*(1 + target/R_b); k_creno = 1.39 * (1 + target/R_b) | = 1.23*(1 + target/R_b); k_creno = 1.39 * (1 + target/R_b) | |||
For instance, it is recommended that the operator chooses R_b = 25 | An operator can then choose the base RTT at which it wants throughput | |||
ms, as a typical base RTT between Internet users and CDNs [PI2param]. | to be equivalent. For instance, if we recommend that the operator | |||
Then: | chooses R_b = 25 ms, as a typical base RTT between Internet users and | |||
CDNs [PI2param], then these coupling factors become: | ||||
k_reno = 1.23 * (1 + 15/25) k_creno = 1.39 * (1 + 15/25) | k_reno = 1.23 * (1 + 15/25) k_creno = 1.39 * (1 + 15/25) | |||
= 1.97 = 2.22 | = 1.97 = 2.22 | |||
~= 2 ~= 2 | ~= 2 ~= 2 (9) | |||
The approximation is relevant to any of the above example DualQ | The approximation is relevant to any of the above example DualQ | |||
Coupled algorithms, which use a coupling factor that is an integer | Coupled algorithms, which use a coupling factor that is an integer | |||
power of 2 to aid efficient implementation. | power of 2 to aid efficient implementation. It also fits best to the | |||
worst case (Reno). | ||||
An operator can make a policy choice to decide on a different base | To check the outcome of this coupling factor, we can express the | |||
RTT at which it wants throughput equivalence. Nonetheless, it makes | ratio of L4S to Classic throughput by substituting from their rate | |||
equations (5) and (6), then also substituting for p_C in terms of | ||||
p_CL, using equation (1) with k=2 as just determined for the | ||||
Internet: | ||||
r_L / r_C = 2 (R_C * p_C^0.5) / 1.22 (R_L * p_CL) | ||||
= (R_C * p_CL) / (1.22 * R_L * p_CL) | ||||
= R_C / (1.22 * R_L) (10) | ||||
As an example, we can then consider single competing CReno and Prague | ||||
flows, by expressing both their RTTs in (10) in terms of their base | ||||
RTTs, R_bC and R_bL. So R_C is replaced by equation (8) for CReno. | ||||
And R_L is replaced by the max() function below, which represents the | ||||
effective RTT of the current Prague congestion control | ||||
[I-D.briscoe-iccrg-prague-congestion-control] in its (default) RTT- | ||||
independent mode, because it sets a floor to the effective RTT that | ||||
it uses for additive increase: | ||||
~= 0.85 * (R_bC + target) / (1.22 * max(R_bL, R_typ)) | ||||
~= (R_bC + target) / (1.4 * max(R_bL, R_typ)) | ||||
It can be seen that, for base RTTs below target (15 ms), both the | ||||
numerator and the denominator plateau, which has the desired effect | ||||
of limiting RTT-dependence. | ||||
At the start of the above derivations, an explanation was promised | ||||
for why the L4S throughput equation in equation (6) did not need to | ||||
model RTT-independence. This is because we only use one point - at | ||||
the the typical base RTT where the operator chooses to calculate the | ||||
coupling factor. Then, throughput equivalence will at least hold at | ||||
that chosen point. Nonetheless, assuming Prague senders implement | ||||
RTT-independence over a range of RTTs below this, the throughput | ||||
equivalence will then extend over that range as well. | ||||
Congestion control designers can choose different ways to reduce RTT- | ||||
dependence. And each operator can make a policy choice to decide on | ||||
a different base RTT, and therefore a different k, at which it wants | ||||
throughput equivalence. Nonetheless, for the Internet, it makes | ||||
sense to choose what is believed to be the typical RTT most users | sense to choose what is believed to be the typical RTT most users | |||
experience, because a Classic AQM's target queuing delay is also | experience, because a Classic AQM's target queuing delay is also | |||
derived from a typical RTT for the Internet. Therefore, below this | derived from a typical RTT for the Internet. | |||
typical RTT, Classic AQMs become fairly RTT-independent. And L4S | ||||
flows are also required to become RTT-independent below a typical RTT | ||||
[I-D.ietf-tsvwg-ecn-l4s-id]. Therefore, throughput equivalence ought | ||||
to be no worse than with Classic AQMs and Classic congestion | ||||
controls. | ||||
As remarked earlier, the throughput equation used for Prague was with | ||||
RTT-independence disabled. This is because we only need the point on | ||||
this equation at the typical base RTT - where the operator chooses to | ||||
calculate the coupling factor. We do not need to know the full range | ||||
of the equation used for RTT-independence as long as it is roughly | ||||
the same at this one point. Then, there will at least be throughput | ||||
equivalence at that base RTT. And assuming Prague senders implement | ||||
RTT-independence over a range of RTTs, the throughput equivalence | ||||
will then extend over that range. | ||||
As a non-Internet example, for localized traffic from a particular | As a non-Internet example, for localized traffic from a particular | |||
ISP's data centre, using the measured RTTs, it was calculated that a | ISP's data centre, using the measured RTTs, it was calculated that a | |||
value of k = 8 would achieve throughput equivalence, and experiments | value of k = 8 would achieve throughput equivalence, and experiments | |||
verified the formula very closely. | verified the formula very closely. | |||
But, for a typical mix of RTTs across the general Internet, a value | But, for a typical mix of RTTs across the general Internet, a value | |||
of k=2 is recommended as a good workable compromise. | of k=2 is recommended as a good workable compromise. | |||
Authors' Addresses | Authors' Addresses | |||
End of changes. 32 change blocks. | ||||
115 lines changed or deleted | 179 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |