draft-ietf-tsvwg-aqm-dualq-coupled-07.txt | draft-ietf-tsvwg-aqm-dualq-coupled-08.txt | |||
---|---|---|---|---|
Transport Area working group (tsvwg) K. De Schepper | Transport Area working group (tsvwg) K. De Schepper | |||
Internet-Draft Nokia Bell Labs | Internet-Draft Nokia Bell Labs | |||
Intended status: Experimental B. Briscoe, Ed. | Intended status: Experimental B. Briscoe, Ed. | |||
Expires: April 25, 2019 CableLabs | Expires: May 8, 2019 CableLabs | |||
O. Bondarenko | O. Bondarenko | |||
Simula Research Lab | Simula Research Lab | |||
I. Tsang | I. Tsang | |||
Nokia | Nokia | |||
October 22, 2018 | November 04, 2018 | |||
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | |||
(L4S) | (L4S) | |||
draft-ietf-tsvwg-aqm-dualq-coupled-07 | draft-ietf-tsvwg-aqm-dualq-coupled-08 | |||
Abstract | Abstract | |||
Data Centre TCP (DCTCP) was designed to provide predictably low | The Low Latency Low Loss Scalable Throughput (L4S) architecture | |||
queuing latency, near-zero loss, and throughput scalability using | allows data flows over the public Internet to predictably achieve | |||
explicit congestion notification (ECN) and an extremely simple | ultra-low queuing latency, generally zero congestion loss and scaling | |||
marking behaviour on switches. However, DCTCP does not co-exist with | of per-flow throughput without the problems of traditional TCP. To | |||
existing TCP traffic---DCTCP is so aggressive that existing TCP | achieve this, L4S data flows use a 'scalable' congestion control | |||
algorithms approach starvation. So, until now, DCTCP could only be | similar to Data Centre TCP (DCTCP) and a form of Explicit Congestion | |||
deployed where a clean-slate environment could be arranged, such as | Notification (ECN) with modified behaviour. However, until now, | |||
in private data centres. This specification defines `DualQ Coupled | scalable congestion controls did not co-exist with existing TCP Reno/ | |||
Active Queue Management (AQM)' to allow scalable congestion controls | Cubic traffic---scalable controls are so aggressive that 'Classic' | |||
like DCTCP to safely co-exist with classic Internet traffic. The | TCP algorithms drive themselves to starvation. Therefore, until now, | |||
Coupled AQM ensures that a flow runs at about the same rate whether | L4S controls could only be deployed where a clean-slate environment | |||
it uses DCTCP or TCP Reno/Cubic, but without inspecting transport | could be arranged, such as in private data centres (hence the name | |||
layer flow identifiers. When tested in a residential broadband | DCTCP). This specification defines `DualQ Coupled Active Queue | |||
setting, DCTCP achieved sub-millisecond average queuing delay and | Management (AQM)', which enables these scalable congestion controls | |||
zero congestion loss under a wide range of mixes of DCTCP and | to safely co-exist with Classic Internet traffic. | |||
`Classic' broadband Internet traffic, without compromising the | ||||
performance of the Classic traffic. The solution also reduces | The Coupled AQM ensures that a flow runs at about the same rate | |||
network complexity and eliminates network configuration. | whether it uses DCTCP or TCP Reno/Cubic. It achieves this | |||
indirectly, without having to inspect transport layer flow | ||||
identifiers, When tested in a residential broadband setting, DCTCP | ||||
also achieves sub-millisecond average queuing delay and zero | ||||
congestion loss under a wide range of mixes of DCTCP and `Classic' | ||||
broadband Internet traffic, without compromising the performance of | ||||
the Classic traffic. The solution also reduces network complexity | ||||
and eliminates network configuration. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on April 25, 2019. | This Internet-Draft will expire on May 8, 2019. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 31 ¶ | skipping to change at page 2, line 39 ¶ | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Problem and Scope . . . . . . . . . . . . . . . . . . . . 3 | 1.1. Problem and Scope . . . . . . . . . . . . . . . . . . . . 3 | |||
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
1.3. Features . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.3. Features . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 7 | 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 7 | |||
2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 7 | 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 8 | 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
2.3. Traffic Classification . . . . . . . . . . . . . . . . . 8 | 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 9 | |||
2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 9 | 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 10 | |||
2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 11 | 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 12 | |||
2.5.1. Functional Requirements . . . . . . . . . . . . . . . 11 | 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 12 | |||
2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 13 | 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 13 | |||
2.5.2. Management Requirements . . . . . . . . . . . . . . . 14 | 2.5.2. Management Requirements . . . . . . . . . . . . . . . 15 | |||
3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 | 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 | |||
4. Security Considerations . . . . . . . . . . . . . . . . . . . 15 | 4. Security Considerations . . . . . . . . . . . . . . . . . . . 16 | |||
4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 15 | 4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 16 | |||
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput | 4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput | |||
or Delay? . . . . . . . . . . . . . . . . . . . . . . 15 | or Delay? . . . . . . . . . . . . . . . . . . . . . . 17 | |||
4.1.2. Congestion Signal Saturation: Introduce L4S Drop or | 4.1.2. Congestion Signal Saturation: Introduce L4S Drop or | |||
Delay? . . . . . . . . . . . . . . . . . . . . . . . 16 | Delay? . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
4.1.3. Protecting against Unresponsive ECN-Capable Traffic . 17 | 4.1.3. Protecting against Unresponsive ECN-Capable Traffic . 19 | |||
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 | 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 19 | |||
6. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 | 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
6.1. Normative References . . . . . . . . . . . . . . . . . . 18 | 6.1. Normative References . . . . . . . . . . . . . . . . . . 20 | |||
6.2. Informative References . . . . . . . . . . . . . . . . . 18 | 6.2. Informative References . . . . . . . . . . . . . . . . . 20 | |||
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 21 | Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 23 | |||
A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 21 | A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 23 | |||
A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 27 | A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 30 | |||
Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 30 | Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 33 | |||
Appendix C. Guidance on Controlling Throughput Equivalence . . . 36 | Appendix C. Guidance on Controlling Throughput Equivalence . . . 39 | |||
Appendix D. Open Issues . . . . . . . . . . . . . . . . . . . . 37 | Appendix D. Open Issues . . . . . . . . . . . . . . . . . . . . 40 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 | |||
1. Introduction | 1. Introduction | |||
1.1. Problem and Scope | 1.1. Problem and Scope | |||
Latency is becoming the critical performance factor for many (most?) | Latency is becoming the critical performance factor for many (most?) | |||
applications on the public Internet, e.g. interactive Web, Web | applications on the public Internet, e.g. interactive Web, Web | |||
services, voice, conversational video, interactive video, interactive | services, voice, conversational video, interactive video, interactive | |||
remote presence, instant messaging, online gaming, remote desktop, | remote presence, instant messaging, online gaming, remote desktop, | |||
cloud-based applications, and video-assisted remote control of | cloud-based applications, and video-assisted remote control of | |||
machinery and industrial processes. In the developed world, further | machinery and industrial processes. In the developed world, further | |||
increases in access network bit-rate offer diminishing returns, | increases in access network bit-rate offer diminishing returns, | |||
whereas latency is still a multi-faceted problem. In the last decade | whereas latency is still a multi-faceted problem. In the last decade | |||
or so, much has been done to reduce propagation time by placing | or so, much has been done to reduce propagation time by placing | |||
caches or servers closer to users. However, queuing remains a major | caches or servers closer to users. However, queuing remains a major | |||
component of latency. | intermittent component of latency. | |||
The Diffserv architecture provides Expedited Forwarding [RFC3246], so | The Diffserv architecture provides Expedited Forwarding [RFC3246], so | |||
that low latency traffic can jump the queue of other traffic. | that low latency traffic can jump the queue of other traffic. | |||
However, on access links dedicated to individual sites (homes, small | However, on access links dedicated to individual sites (homes, small | |||
enterprises or mobile devices), often all traffic at any one time | enterprises or mobile devices), often all traffic at any one time | |||
will be latency-sensitive and, if all the traffic on a link is marked | will be latency-sensitive and, if all the traffic on a link is marked | |||
as EF, Diffserv cannot reduce the delay of any of it. In contrast, | as EF, Diffserv cannot reduce the delay of any of it. In contrast, | |||
the Low Latency Low Loss Scalable throughput (L4S) approach removes | the Low Latency Low Loss Scalable throughput (L4S) approach removes | |||
the causes of any unnecessary queuing delay. | the causes of any unnecessary queuing delay. | |||
skipping to change at page 4, line 5 ¶ | skipping to change at page 4, line 11 ¶ | |||
Active queue management (AQM) was originally developed to solve this | Active queue management (AQM) was originally developed to solve this | |||
problem (and others). Unlike Diffserv, which gives low latency to | problem (and others). Unlike Diffserv, which gives low latency to | |||
some traffic at the expense of others, AQM controls latency for _all_ | some traffic at the expense of others, AQM controls latency for _all_ | |||
traffic in a class. In general, AQMs introduce an increasing level | traffic in a class. In general, AQMs introduce an increasing level | |||
of discard from the buffer the longer the queue persists above a | of discard from the buffer the longer the queue persists above a | |||
shallow threshold. This gives sufficient signals to capacity-seeking | shallow threshold. This gives sufficient signals to capacity-seeking | |||
(aka. greedy) flows to keep the buffer empty for its intended | (aka. greedy) flows to keep the buffer empty for its intended | |||
purpose: absorbing bursts. However, RED [RFC2309] and other | purpose: absorbing bursts. However, RED [RFC2309] and other | |||
algorithms from the 1990s were sensitive to their configuration and | algorithms from the 1990s were sensitive to their configuration and | |||
hard to set correctly. So, AQM was not widely deployed. | hard to set correctly. So, AQM was not widely deployed in the 1990s. | |||
More recent state-of-the-art AQMs, e.g. fq_CoDel [RFC8290], | More recent state-of-the-art AQMs, e.g. fq_CoDel [RFC8290], | |||
PIE [RFC8033], Adaptive RED [ARED01], are easier to configure, | PIE [RFC8033], Adaptive RED [ARED01], are easier to configure, | |||
because they define the queuing threshold in time not bytes, so it is | because they define the queuing threshold in time not bytes, so it is | |||
invariant for different link rates. However, no matter how good the | invariant for different link rates. However, no matter how good the | |||
AQM, the sawtoothing rate of TCP will either cause queuing delay to | AQM, the sawtoothing rate of TCP will either cause queuing delay to | |||
vary or cause the link to be under-utilized. Even with a perfectly | vary or cause the link to be under-utilized. Even with a perfectly | |||
tuned AQM, the additional queuing delay will be of the same order as | tuned AQM, the additional queuing delay will be of the same order as | |||
the underlying speed-of-light delay across the network. Flow-queuing | the underlying speed-of-light delay across the network. Flow-queuing | |||
can isolate one flow from another, but it cannot isolate a TCP flow | can isolate one flow from another, but it cannot isolate a TCP flow | |||
skipping to change at page 4, line 40 ¶ | skipping to change at page 4, line 46 ¶ | |||
The former causes a flow's round trip time (RTT) to vary from about 1 | The former causes a flow's round trip time (RTT) to vary from about 1 | |||
to 2 times the base RTT between the machines in question. The latter | to 2 times the base RTT between the machines in question. The latter | |||
delays the system's response to change by a worst-case | delays the system's response to change by a worst-case | |||
(transcontinental) RTT, which could be hundreds of times the actual | (transcontinental) RTT, which could be hundreds of times the actual | |||
RTT of typical traffic from localized CDNs. | RTT of typical traffic from localized CDNs. | |||
Latency is not our only concern: | Latency is not our only concern: | |||
3. It was known when TCP was first developed that it would not scale | 3. It was known when TCP was first developed that it would not scale | |||
to high bandwidth-delay products. | to high bandwidth-delay products [TCP-CA]. | |||
Given regular broadband bit-rates over WAN distances are | Given regular broadband bit-rates over WAN distances are | |||
already [RFC3649] beyond the scaling range of `classic' TCP Reno, | already [RFC3649] beyond the scaling range of `classic' TCP Reno, | |||
`less unscalable' Cubic [RFC8312] and | `less unscalable' Cubic [RFC8312] and | |||
Compound [I-D.sridharan-tcpm-ctcp] variants of TCP have been | Compound [I-D.sridharan-tcpm-ctcp] variants of TCP have been | |||
successfully deployed. However, these are now approaching their | successfully deployed. However, these are now approaching their | |||
scaling limits. Unfortunately, fully scalable TCPs such as DCTCP | scaling limits. Unfortunately, fully scalable TCPs such as DCTCP | |||
cause `classic' TCP to starve itself, which is why they have been | cause `classic' TCP to starve itself, which is why they have been | |||
confined to private data centres or research testbeds (until now). | confined to private data centres or research testbeds (until now). | |||
This document specifies a `DualQ Coupled AQM' extension that solves | This document specifies a `DualQ Coupled AQM' extension that solves | |||
the problem of coexistence between scalable and classic flows, | the problem of coexistence between scalable and classic flows, | |||
without having to inspect flow identifiers. The AQM is not like | without having to inspect flow identifiers. The AQM is not like | |||
flow-queuing approaches [RFC8290] that classify packets by flow | flow-queuing approaches [RFC8290] that classify packets by flow | |||
identifier into numerous separate queues in order to isolate sparse | identifier into numerous separate queues in order to isolate sparse | |||
flows from the higher latency in the queues assigned to heavier flow. | flows from the higher latency in the queues assigned to heavier | |||
In contrast, the AQM exploits the behaviour of scalable congestion | flows. In contrast, the AQM exploits the behaviour of scalable | |||
controls like DCTCP so that every packet in every flow sharing the | congestion controls like DCTCP so that every packet in every flow | |||
queue for DCTCP-like traffic can be served with very low latency. | sharing the queue for DCTCP-like traffic can be served with very low | |||
latency. | ||||
This AQM extension can be combined with any single queue AQM that | This AQM extension can be combined with any AQM designed for a single | |||
generates a statistical or deterministic mark/drop probability driven | queue that generates a statistical or deterministic mark/drop | |||
by the queue dynamics. In many cases it simplifies the basic control | probability driven by the queue dynamics. In many cases it | |||
algorithm, and requires little extra processing. Therefore it is | simplifies the basic control algorithm, and requires little extra | |||
believed the Coupled AQM would be applicable and easy to deploy in | processing. Therefore it is believed the Coupled AQM would be | |||
all types of buffers; buffers in cost-reduced mass-market residential | applicable and easy to deploy in all types of buffers; buffers in | |||
equipment; buffers in end-system stacks; buffers in carrier-scale | cost-reduced mass-market residential equipment; buffers in end-system | |||
equipment including remote access servers, routers, firewalls and | stacks; buffers in carrier-scale equipment including remote access | |||
Ethernet switches; buffers in network interface cards, buffers in | servers, routers, firewalls and Ethernet switches; buffers in network | |||
virtualized network appliances, hypervisors, and so on. | interface cards, buffers in virtualized network appliances, | |||
hypervisors, and so on. | ||||
The overall L4S architecture is described in | For the public Internet, nearly all the benefit will typically be | |||
[I-D.ietf-tsvwg-l4s-arch]. The supporting papers [PI2] and [DCttH15] | achieved by deploying the Coupled AQM into either end of the access | |||
give the full rationale for the AQM's design, both discursively and | link between a 'site' and the Internet, which is invariably the | |||
in more precise mathematical form. | bottleneck. Here, the term 'site' is used loosely to mean a home, an | |||
office, a campus or mobile user equipment. | ||||
The overall L4S architecture [I-D.ietf-tsvwg-l4s-arch] gives more | ||||
detail, including on wider deployment aspects such as coexistence in | ||||
bottlenecks where a DualQ Coupled AQM has not been deployed. The | ||||
supporting papers [PI2] and [DCttH15] give the full rationale for the | ||||
AQM's design, both discursively and in more precise mathematical | ||||
form. | ||||
1.2. Terminology | 1.2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in [RFC2119]. In this | document are to be interpreted as described in [RFC2119] when, and | |||
document, these words will appear with that interpretation only when | only when, they appear in all capitals, as shown here. | |||
in ALL CAPS. Lower case uses of these words are not to be | ||||
interpreted as carrying RFC-2119 significance. | ||||
The DualQ Coupled AQM uses two queues for two services. Each of the | The DualQ Coupled AQM uses two queues for two services. Each of the | |||
following terms identifies both the service and the queue that | following terms identifies both the service and the queue that | |||
provides the service: | provides the service: | |||
Classic (denoted by subscript C): The `Classic' service is intended | Classic (denoted by subscript C): The `Classic' service is intended | |||
for all the behaviours that currently co-exist with TCP Reno (TCP | for all the behaviours that currently co-exist with TCP Reno (TCP | |||
Cubic, Compound, SCTP, etc). | Cubic, Compound, SCTP, etc). | |||
Low-Latency, Low-Loss and Scalable (L4S, denoted by subscript L): | Low-Latency, Low-Loss and Scalable (L4S, denoted by subscript L): | |||
The `L4S' service is intended for a set of congestion controls | The `L4S' service is intended for a set of congestion controls | |||
with scalable properties such as DCTCP (e.g. | with scalable properties (e.g. DCTCP [RFC8257], Relentless | |||
Relentless [Mathis09]). | TCP [Mathis09], the L4S variant of SCREAM for real-time | |||
media {ToDo: ref}). For the public Internet a scalable control | ||||
has to comply with the requirements in [I-D.ietf-tsvwg-ecn-l4s-id] | ||||
(aka. the 'TCP Prague requirements'). | ||||
Either service can cope with a proportion of unresponsive or less- | Either service can cope with a proportion of unresponsive or less- | |||
responsive traffic as well (e.g. DNS, VoIP, etc), just as a single | responsive traffic as well, as long (e.g. DNS, VoIP, game sync | |||
queue AQM can. The DualQ Coupled AQM behaviour is similar to a | datagrams, etc), just as a single queue AQM can if this traffic makes | |||
single FIFO queue with respect to unresponsive and overload traffic. | minimal contribution to queuing. The DualQ Coupled AQM behaviour | |||
below is defined to be similar to a single FIFO queue with respect to | ||||
unresponsive and overload traffic. | ||||
1.3. Features | 1.3. Features | |||
The AQM couples marking and/or dropping across the two queues such | The AQM couples marking and/or dropping across the two queues such | |||
that a flow will get roughly the same throughput whichever it uses. | that a flow will get roughly the same throughput whichever it uses. | |||
Therefore both queues can feed into the full capacity of a link and | Therefore both queues can feed into the full capacity of a link and | |||
no rates need to be configured for the queues. The L4S queue enables | no rates need to be configured for the queues. The L4S queue enables | |||
scalable congestion controls like DCTCP to give stunningly low and | scalable congestion controls like DCTCP to give stunningly low and | |||
predictably low latency, without compromising the performance of | predictably low latency, without compromising the performance of | |||
competing 'Classic' Internet traffic. Thousands of tests have been | competing 'Classic' Internet traffic. Thousands of tests have been | |||
conducted in a typical fixed residential broadband setting. Typical | conducted in a typical fixed residential broadband setting. Typical | |||
experiments used base round trip delays up to 100ms between the data | experiments used base round trip delays up to 100ms between the data | |||
centre and home network, and large amounts of background traffic in | centre and home network, and large amounts of background traffic in | |||
both queues. For every L4S packet, the AQM kept the average queuing | both queues. For every L4S packet, the AQM kept the average queuing | |||
delay below 1ms (or 2 packets if serialization delay is bigger for | delay below 1ms (or 2 packets if serialization delay is bigger for | |||
slow links), and no losses at all were introduced by the AQM. | slow links), and no losses at all were introduced by the AQM. | |||
Details of the extensive experiments will be made available [PI2] | Details of the extensive experiments are available [PI2] [DCttH15]. | |||
[DCttH15]. | ||||
Subjective testing was also conducted using a demanding panoramic | Subjective testing was also conducted by multiple people all | |||
interactive video application run over a stack with DCTCP enabled and | simultaneously using very demanding high bandwidth low latency | |||
deployed on the testbed. Each user could pan or zoom their own high | applications over a single shared access link [L4Sdemo16]. In one | |||
definition (HD) sub-window of a larger video scene from a football | application, each user could use finger gestures to pan or zoom their | |||
match. Even though the user was also downloading large amounts of | own high definition (HD) sub-window of a larger video scene generated | |||
L4S and Classic data, latency was so low that the picture appeared to | on the fly in 'the cloud' from a football match. Another user | |||
stick to their finger on the touchpad (all the L4S data achieved the | wearing VR goggles was remotely receiving a feed from a 360-degree | |||
same ultra-low latency). With an alternative AQM, the video | camera in a racing car, again with the sub-window in their field of | |||
noticeably lagged behind the finger gestures. | vision generated on the fly in 'the cloud' dependent on their head | |||
movements. Even though other users were also downloading large | ||||
amounts of L4S and Classic data, playing a gaming benchmark and | ||||
watchings videos over the same 40Mb/s downstream broadband link, | ||||
latency was so low that the football picture appeared to stick to the | ||||
user's finger on the touchpad and the experience fed from the remote | ||||
camera did not noticeably lag head movements. All the L4S data (even | ||||
including the downloads) achieved the same ultra-low latency. With | ||||
an alternative AQM, the video noticeably lagged behind the finger | ||||
gestures and head movements. | ||||
Unlike Diffserv Expedited Forwarding, the L4S queue does not have to | Unlike Diffserv Expedited Forwarding, the L4S queue does not have to | |||
be limited to a small proportion of the link capacity in order to | be limited to a small proportion of the link capacity in order to | |||
achieve low delay. The L4S queue can be filled with a heavy load of | achieve low delay. The L4S queue can be filled with a heavy load of | |||
capacity-seeking flows like DCTCP and still achieve low delay. The | capacity-seeking flows like DCTCP and still achieve low delay. The | |||
L4S queue does not rely on the presence of other traffic in the | L4S queue does not rely on the presence of other traffic in the | |||
Classic queue that can be 'overtaken'. It gives low latency to L4S | Classic queue that can be 'overtaken'. It gives low latency to L4S | |||
traffic whether or not there is Classic traffic, and the latency of | traffic whether or not there is Classic traffic, and the latency of | |||
Classic traffic does not suffer when a proportion of the traffic is | Classic traffic does not suffer when a proportion of the traffic is | |||
L4S. The two queues are only necessary because DCTCP-like flows | L4S. The two queues are only necessary because DCTCP-like flows | |||
skipping to change at page 7, line 17 ¶ | skipping to change at page 7, line 43 ¶ | |||
Then, without any management intervention, applications can exploit | Then, without any management intervention, applications can exploit | |||
it by migrating to scalable controls like DCTCP, which can then | it by migrating to scalable controls like DCTCP, which can then | |||
evolve _while_ their benefits are being enjoyed by everyone on the | evolve _while_ their benefits are being enjoyed by everyone on the | |||
Internet. | Internet. | |||
2. DualQ Coupled AQM | 2. DualQ Coupled AQM | |||
There are two main aspects to the approach: | There are two main aspects to the approach: | |||
o the Coupled AQM that addresses throughput equivalence between | o the Coupled AQM that addresses throughput equivalence between | |||
Classic (e.g. Reno, Cubic) flows and L4S (e.g. DCTCP) flows | Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the | |||
TCP Prague requirements). | ||||
o the Dual Queue structure that provides latency separation for L4S | o the Dual Queue structure that provides latency separation for L4S | |||
flows to isolate them from the typically large Classic queue. | flows to isolate them from the typically large Classic queue. | |||
2.1. Coupled AQM | 2.1. Coupled AQM | |||
In the 1990s, the `TCP formula' was derived for the relationship | In the 1990s, the `TCP formula' was derived for the relationship | |||
between TCP's congestion window, cwnd, and its drop probability, p. | between TCP's congestion window, cwnd, and its drop probability, p. | |||
To a first order approximation, cwnd of TCP Reno is inversely | To a first order approximation, cwnd of TCP Reno is inversely | |||
proportional to the square root of p. | proportional to the square root of p. | |||
TCP Cubic implements a Reno-compatibility mode, which is the only | We focus on Reno as the worst case, because if we do not harm Reno, | |||
relevant mode for typical RTTs under 20ms as long as the throughput | we will not harm Cubic. Nonetheless, TCP Cubic implements a Reno- | |||
of a single flow is less than about 500Mb/s. Therefore it can be | compatibility mode, which is the only relevant mode for typical RTTs | |||
assumed that Cubic traffic behaves similarly to Reno (but with a | under 20ms as long as the throughput of a single flow is less than | |||
slightly different constant of proportionality), and the term | about 500Mb/s. Therefore it can be assumed that Cubic traffic | |||
'Classic' will be used for the collection of Reno-friendly traffic | behaves similarly to Reno (but with a slightly different constant of | |||
including Cubic in Reno mode. | proportionality). The term 'Classic' will be used for the collection | |||
of Reno-friendly traffic including Cubic in Reno mode. | ||||
The supporting paper [PI2] includes the derivation of the equivalent | The supporting paper [PI2] includes the derivation of the equivalent | |||
rate equation for DCTCP, for which cwnd is inversely proportional to | rate equation for DCTCP, for which cwnd is inversely proportional to | |||
p (not the square root), where in this case p is the ECN marking | p (not the square root), where in this case p is the ECN marking | |||
probability. DCTCP is not the only congestion control that behaves | probability. DCTCP is not the only congestion control that behaves | |||
like this, so the term 'L4S' traffic will be used for all similar | like this, so the term 'L4S' traffic will be used for all similar | |||
behaviour. | behaviour. | |||
In order to make a DCTCP flow run at roughly the same rate as a Reno | For safe co-existence, under stationary conditions, a DCTCP flow has | |||
TCP flow (all other factors being equal), the drop or marking | to run at roughly the same rate as a Reno TCP flow (all other factors | |||
probability for Classic traffic, p_C has to be distinct from the | being equal). So the drop or marking probability for Classic | |||
marking probability for L4S traffic, p_L (in contrast to RFC3168 | traffic, p_C has to be distinct from the marking probability for L4S | |||
which requires them to be the same). To remain stable, Classic | traffic, p_L. [RFC8311] updates the original ECN specification | |||
traffic needs p_C to change relatively slowly, whereas L4S traffic | [RFC3168] to allow these probabilities to be distinct, because RFC | |||
needs to be controlled rapidly by a probability p_L that track the | 3168 required them to be the same. | |||
instantaneous queue. It is necessary to make the Classic drop | ||||
probability p_C proportional to the square of a variable we shall | Also, to remain stable, Classic sources need the network to smooth | |||
call p_CL, which is an input to the instantaneous L4S marking | p_C so it changes relatively slowly. In contrast, L4S avoids | |||
probability p_L but changes as slowly as p_C. This makes the Reno | smoothing in the network, because it delays all signals for a worst- | |||
flow rate roughly equal the DCTCP flow rate, because it squares the | case RTT. So instead, L4S sources smooth the ECN marking probability | |||
square root of p_C in the Reno rate equation to make it proportional | themselves, so they expect the network to generate ECN marks with a | |||
to the smoothed value of p_L used in the DCTCP rate equation. | probability p_L that tracks the instantaneous unsmoothed queue. | |||
The Coupled AQM achieves safe coexistence by making the Classic drop | ||||
probability p_C proportional to the square of the coupled L4S | ||||
probability p_CL. p_CL is an input to the instantaneous L4S marking | ||||
probability p_L but it changes as slowly as p_C. This makes the Reno | ||||
flow rate roughly equal the DCTCP flow rate, because the squaring of | ||||
p_CL counterbalances the square root of p_C in the Classic 'TCP | ||||
formula'. | ||||
Stating this as a formula, the relation between Classic drop | Stating this as a formula, the relation between Classic drop | |||
probability, p_C, and the input variable p_CL to the L4S marking | probability, p_C, and the coupled L4S probability p_CL needs to take | |||
probability p_L needs to take the form: | the form: | |||
p_C = ( p_CL / k )^2 (1) | p_C = ( p_CL / k )^2 (1) | |||
where k is the constant of proportionality. | where k is the constant of proportionality, which we shall call the | |||
coupling factor. | ||||
2.2. Dual Queue | 2.2. Dual Queue | |||
Classic traffic typically builds a large queue to prevent under- | Classic traffic typically builds a large queue to prevent under- | |||
utilization. Therefore a separate queue is provided for L4S traffic, | utilization. Therefore a separate queue is provided for L4S traffic, | |||
and it is scheduled with priority over Classic. Priority is | and it is scheduled with priority over Classic. Priority is | |||
conditional to prevent starvation of Classic traffic. | conditional to prevent starvation of Classic traffic. | |||
Nonetheless, coupled marking ensures that giving priority to L4S | Nonetheless, coupled marking ensures that giving priority to L4S | |||
traffic still leaves the right amount of spare scheduling time for | traffic still leaves the right amount of spare scheduling time for | |||
Classic flows to each get equivalent throughput to DCTCP flows (all | Classic flows to each get equivalent throughput to DCTCP flows (all | |||
other factors such as RTT being equal). The algorithm achieves this | other factors such as RTT being equal). | |||
without having to inspect flow identifiers. | ||||
2.3. Traffic Classification | 2.3. Traffic Classification | |||
Both the Coupled AQM and DualQ mechanisms need an identifier to | Both the Coupled AQM and DualQ mechanisms need an identifier to | |||
distinguish L and C packets. A separate draft | distinguish L and C packets. Then the coupling algorithm can achieve | |||
[I-D.ietf-tsvwg-ecn-l4s-id] recommends using the ECT(1) codepoint of | coexistence without having to inspect flow identifiers, because it | |||
the ECN field as this identifier, having assessed various | can apply the appropriate marking or dropping probability to all | |||
alternatives. An additional process document has proved necessary to | flows of each type. A separate | |||
make the ECT(1) codepoint available for experimentation [RFC8311]. | specification [I-D.ietf-tsvwg-ecn-l4s-id] requires the sender to use | |||
the ECT(1) codepoint of the ECN field as this identifier, having | ||||
assessed various alternatives. An additional process document has | ||||
proved necessary to make the ECT(1) codepoint available for | ||||
experimentation [RFC8311]. | ||||
For policy reasons, an operator might choose to steer certain packets | For policy reasons, an operator might choose to steer certain packets | |||
(e.g. from certain flows or with certain addresses) out of the L | (e.g. from certain flows or with certain addresses) out of the L | |||
queue, even though they identify themselves as L4S by their ECN | queue, even though they identify themselves as L4S by their ECN | |||
codepoints. In such cases, the classifier MUST NOT alter the ECN | codepoints. In such cases, the device MUST NOT alter the ECN field, | |||
field, so that it is preserved end-to-end. The aim is that each | so that it is preserved end-to-end. The aim is that each operator | |||
operator can choose how it treats L4S traffic locally, but an | can choose how it treats L4S traffic locally, but an individual | |||
individual operator does not alter the identification of L4S packets, | operator does not alter the identification of L4S packets, which | |||
which would prevent other operators downstream from making their own | would prevent other operators downstream from making their own | |||
choices on how to treat L4S traffic. | choices on how to treat L4S traffic. | |||
In addition, other identifiers could be used to classify certain | In addition, other identifiers could be used to classify certain | |||
additional packet types into the L queue, that are deemed not to risk | additional packet types into the L queue, that are deemed not to risk | |||
harming the L4S service. For instance addresses of specific | harming the L4S service. For instance addresses of specific | |||
applications or hosts (see [I-D.ietf-tsvwg-ecn-l4s-id]), specific | applications or hosts (see [I-D.ietf-tsvwg-ecn-l4s-id]), specific | |||
Diffserv codepoints such as EF (Expedited Forwarding) and Voice-Admit | Diffserv codepoints such as EF (Expedited Forwarding) and Voice-Admit | |||
service classes (see [I-D.briscoe-tsvwg-l4s-diffserv]) or certain | service classes (see [I-D.briscoe-tsvwg-l4s-diffserv]) or certain | |||
protocols (e.g. ARP, DNS). | protocols (e.g. ARP, DNS). | |||
Note that the DualQ Coupled AQM only reads these classifiers, it MUST | Note that the mechanism only reads these classifiers, it MUST NOT re- | |||
NOT re-mark or alter these identifiers (except for marking the ECN | mark or alter these identifiers (except for marking the ECN field | |||
field with the CE codepoint - with increasing frequency to indicate | with the CE codepoint - with increasing frequency to indicate | |||
increasing congestion). | increasing congestion). | |||
2.4. Overall DualQ Coupled AQM Structure | 2.4. Overall DualQ Coupled AQM Structure | |||
Figure 1 shows the overall structure that any DualQ Coupled AQM is | Figure 1 shows the overall structure that any DualQ Coupled AQM is | |||
likely to have. This schematic is intended to aid understanding of | likely to have. This schematic is intended to aid understanding of | |||
the current designs of DualQ Coupled AQMs. However, it is not | the current designs of DualQ Coupled AQMs. However, it is not | |||
intended to preclude other innovative ways of satisfying the | intended to preclude other innovative ways of satisfying the | |||
normative requirements in Section 2.5 that minimally define a DualQ | normative requirements in Section 2.5 that minimally define a DualQ | |||
Coupled AQM. | Coupled AQM. | |||
The classifier on the left separates incoming traffic between the two | The classifier on the left separates incoming traffic between the two | |||
queues (L and C). Each queue has its own AQM that determines the | queues (L and C). Each queue has its own AQM that determines the | |||
likelihood of marking or dropping (p_L and p_C). It has been proved | likelihood of marking or dropping (p_L and p_C). It has been | |||
[PI2] that it is preferable to control TCP with a linear PI | proved [PI2] that it is preferable to control load with a linear | |||
controller, then square the output before applying it as a drop | controller, then square the output before applying it as a drop | |||
probability to TCP. So, the AQM for Classic traffic needs to be | probability to TCP (because TCP decreases its load proportional to | |||
implemented in two stages: i) a base stage that outputs an internal | the square-root of the increase in drop). So, the AQM for Classic | |||
probability p' (pronounced p-prime); and ii) a squaring stage that | traffic needs to be implemented in two stages: i) a base stage that | |||
outputs p_C, where | outputs an internal probability p' (pronounced p-prime); and ii) a | |||
squaring stage that outputs p_C, where | ||||
p_C = (p')^2. (2) | p_C = (p')^2. (2) | |||
Substituting for p_C in Eqn (1) gives: | Substituting for p_C in Eqn (1) gives: | |||
p' = p_CL / k | p' = p_CL / k | |||
So we get our slow-moving input to ECN marking in the L queue as: | So the slow-moving input to ECN marking in the L queue (the coupled | |||
L4S probability) is: | ||||
p_CL = k*p', (3) | p_CL = k*p', (3) | |||
where k is the constant coupling factor (see Appendix C). | where k is the constant coupling factor (see Appendix C). | |||
It can be seen that these two transformations of p' implement the | It can be seen that these two transformations of p' implement the | |||
required coupling given in equation (1) earlier. Substituting for p' | required coupling given in equation (1) earlier. | |||
from equation (3) into (2): | ||||
p_C = ( p_CL / k )^2. | ||||
The actual probability p_L that we apply to the L queue needs to | The actual probability p_L that we apply to the L queue needs to | |||
track the immediate L queue delay, as well as track p_CL under | track the immediate L queue delay, as well as track p_CL under | |||
stationary conditions. So we use a native AQM in the L queue that | stationary conditions. So we use a native AQM in the L queue that | |||
calculates a marking probability p'L as a function of the | calculates a probability p'_L as a function of the instantaneous L | |||
instantaneous L queue. And, given the L queue has conditional strict | queue. And, given the L queue has conditional strict priority over | |||
priority over the C queue, whenever the L queue grows, we should | the C queue, whenever the L queue grows, we should apply marking | |||
apply marking probability p'_L, but p_L should not fall below p_CL. | probability p'_L, but p_L should not fall below p_CL. This suggests: | |||
This suggests: | ||||
p_L = max(p'L, p_CL), | p_L = max(p'_L, p_CL), (4) | |||
which has also been found to work very well in practice. | which has also been found to work very well in practice. | |||
This allows p_L to be coupled to p_C by marking L4S traffic | ||||
proportionately to the intermediate output from the first stage. | ||||
Specifically, the output of the base AQM is coupled across to the L | ||||
queue in proportion to the output of the base AQM | ||||
_________ | _________ | |||
| | ,------. | | | ,------. | |||
L4S queue | |===>| ECN | | L4S queue | |===>| ECN | | |||
,'| _______|_| |marker|\ | ,'| _______|_| |marker|\ | |||
<' | | `------'\\ | <' | | `------'\\ | |||
//`' v ^ p_L \\ | //`' v ^ p_L \\ | |||
// ,-------. | \\ | // ,-------. | \\ | |||
// |Native |p'L | \\,. | // |Native |p'_L | \\,. | |||
// | L4S |-->(MAX) < | ___ | // | L4S |--->(MAX) < | ___ | |||
,----------.// | AQM | ^ p_CL `\|.'Cond-`. | ,----------.// | AQM | ^ p_CL `\|.'Cond-`. | |||
| IP-ECN |/ `-------' | / itional \ | | IP-ECN |/ `-------' | / itional \ | |||
==>|Classifier| ,-------. (k*p') [ priority]==> | ==>|Classifier| ,-------. (k*p') [ priority]==> | |||
| |\ | Base | | \scheduler/ | | |\ | Base | | \scheduler/ | |||
`----------'\\ | AQM |--->: ,'|`-.___.-' | `----------'\\ | AQM |---->: ,'|`-.___.-' | |||
\\ | |p' | <' | | \\ | |p' | <' | | |||
\\ `-------' (p'^2) //`' | \\ `-------' (p'^2) //`' | |||
\\ ^ | // | \\ ^ | // | |||
\\,. | v p_C // | \\,. | v p_C // | |||
< | _________ .------.// | < | _________ .------.// | |||
`\| | | | Drop |/ | `\| | | | Drop |/ | |||
Classic |queue |===>|/mark | | Classic |queue |===>|/mark | | |||
__|______| `------' | __|______| `------' | |||
Legend: ===> traffic flow; ---> control dependency. | Legend: ===> traffic flow; ---> control dependency. | |||
Figure 1: DualQ Coupled AQM Schematic | Figure 1: DualQ Coupled AQM Schematic | |||
skipping to change at page 11, line 50 ¶ | skipping to change at page 12, line 38 ¶ | |||
capitals) in Section 2.5 are observed. | capitals) in Section 2.5 are observed. | |||
2.5. Normative Requirements for a DualQ Coupled AQM | 2.5. Normative Requirements for a DualQ Coupled AQM | |||
The following requirements are intended to capture only the essential | The following requirements are intended to capture only the essential | |||
aspects of a DualQ Coupled AQM. They are intended to be independent | aspects of a DualQ Coupled AQM. They are intended to be independent | |||
of the particular AQMs used for each queue. | of the particular AQMs used for each queue. | |||
2.5.1. Functional Requirements | 2.5.1. Functional Requirements | |||
In the Dual Queue, L4S packets MUST be given priority over Classic, | A Dual Queue Coupled AQM implementation MUST utilize two queues, each | |||
although priority MUST be bounded in order not to starve Classic | with an AQM algorithm. The two queues can be part of a larger | |||
traffic. | queuing hierarchy [I-D.briscoe-tsvwg-l4s-diffserv]. | |||
The AQM algorithm for the low latency (L) queue MUST apply ECN | ||||
marking. | ||||
The scheduler draining the two queues MUST give L4S packets priority | ||||
over Classic, although priority MUST be bounded in order not to | ||||
starve Classic traffic. | ||||
Whatever identifier is used for L4S experiments, | ||||
[I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on | [I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on | |||
L4S traffic, relative to drop of Classic traffic. In order to | L4S traffic, relative to drop of Classic traffic. In order to | |||
prevent starvation of Classic traffic by scalable L4S traffic, it | prevent starvation of Classic traffic by scalable L4S traffic, it | |||
says, "The likelihood that an AQM drops a Not-ECT Classic packet | says, "The likelihood that an AQM drops a Not-ECT Classic packet | |||
(p_C) MUST be roughly proportional to the square of the likelihood | (p_C) MUST be roughly proportional to the square of the likelihood | |||
that it would have marked it if it had been an L4S packet (p_L)." In | that it would have marked it if it had been an L4S packet (p_L)." | |||
other words, in any DualQ Coupled AQM, the power to which p_L is | The term 'likelihood' is used to allow for marking and dropping to be | |||
raised in Eqn. (1) MUST be 2. The term 'likelihood' is used to allow | either probabilistic or deterministic. | |||
for marking and dropping to be either probabilistic or deterministic. | ||||
For the current specification, this translates into the following | ||||
requirement. A DualQ Coupled AQM MUST apply ECN marking to traffic | ||||
in the L queue that is no lower than that derived from the likelihood | ||||
of drop (or ECN marking) in the Classic queue using Eqn. (1). | ||||
The constant of proportionality, k, in Eqn (1) determines the | The constant of proportionality, k, in Eqn (1) determines the | |||
relative flow rates of Classic and L4S flows when the AQM concerned | relative flow rates of Classic and L4S flows when the AQM concerned | |||
is the bottleneck (all other factors being equal). | is the bottleneck (all other factors being equal). | |||
[I-D.ietf-tsvwg-ecn-l4s-id] says, "The constant of proportionality | [I-D.ietf-tsvwg-ecn-l4s-id] says, "The constant of proportionality | |||
(k) does not have to be standardised for interoperability, but a | (k) does not have to be standardised for interoperability, but a | |||
value of 2 is RECOMMENDED." | value of 2 is RECOMMENDED." | |||
Assuming scalable congestion controls for the Internet will be as | Assuming scalable congestion controls for the Internet will be as | |||
aggressive as DCTCP, this will ensure their congestion window will be | aggressive as DCTCP, this will ensure their congestion window will be | |||
roughly the same as that of a standards track TCP congestion control | roughly the same as that of a standards track TCP congestion control | |||
(Reno) [RFC5681] and other so-called TCP-friendly controls, such as | (Reno) [RFC5681] and other so-called TCP-friendly controls, such as | |||
TCP Cubic in its TCP-friendly mode. | TCP Cubic in its TCP-friendly mode. | |||
{ToDo: The requirements for scalable congestion controls on the | ||||
Internet (termed the TCP Prague requirements) | ||||
[I-D.ietf-tsvwg-ecn-l4s-id] are not necessarily final. If the | ||||
aggressiveness of DCTCP is not defined as the benchmark for scalable | ||||
controls on the Internet, the recommended value of k will also be | ||||
subject to change.} | ||||
The choice of k is a matter of operator policy, and operators MAY | The choice of k is a matter of operator policy, and operators MAY | |||
choose a different value using Table 1 and the guidelines in | choose a different value using Table 1 and the guidelines in | |||
Appendix C. | Appendix C. | |||
If multiple users share capacity at a bottleneck (e.g. in the | If multiple users share capacity at a bottleneck (e.g. in the | |||
Internet access link of a campus network), the operator's choice of k | Internet access link of a campus network), the operator's choice of k | |||
will determine capacity sharing between the flows of different users. | will determine capacity sharing between the flows of different users. | |||
However, on the public Internet, access network operators typically | However, on the public Internet, access network operators typically | |||
isolate customers from each other with some form of layer-2 | isolate customers from each other with some form of layer-2 | |||
multiplexing (TDM in DOCSIS, CDMA in 3G) or L3 scheduling (WRR in | multiplexing (OFDM(A) in DOCSIS3.1, CDMA in 3G, SC-FDMA in LTE) or L3 | |||
DSL), rather than relying on TCP to share capacity between customers | scheduling (WRR in DSL), rather than relying on TCP to share capacity | |||
[RFC0970]. In such cases, the choice of k will solely affect | between customers [RFC0970]. In such cases, the choice of k will | |||
relative flow rates within each customer's access capacity, not | solely affect relative flow rates within each customer's access | |||
between customers. Also, k will not affect relative flow rates at | capacity, not between customers. Also, k will not affect relative | |||
any times when all flows are Classic or all L4S, and it will not | flow rates at any times when all flows are Classic or all L4S, and it | |||
affect small flows. | will not affect the relative throughput of small flows. | |||
2.5.1.1. Requirements in Unexpected Cases | 2.5.1.1. Requirements in Unexpected Cases | |||
The flexibility to allow operator-specific classifiers (Section 2.3) | The flexibility to allow operator-specific classifiers (Section 2.3) | |||
leads to the need to specify what the AQM in each queue ought to do | leads to the need to specify what the AQM in each queue ought to do | |||
with packets that do not carry the ECN field expected for that queue. | with packets that do not carry the ECN field expected for that queue. | |||
It is recommended that the AQM in each queue inspects the ECN field | It is recommended that the AQM in each queue inspects the ECN field | |||
to determine what sort of congestion notification to signal, then | to determine what sort of congestion notification to signal, then | |||
decides whether to apply congestion notification to this particular | decides whether to apply congestion notification to this particular | |||
packet, as follows: | packet, as follows: | |||
skipping to change at page 14, line 18 ¶ | skipping to change at page 15, line 16 ¶ | |||
By default, a DualQ Coupled AQM SHOULD NOT need any configuration for | By default, a DualQ Coupled AQM SHOULD NOT need any configuration for | |||
use at a bottleneck on the public Internet [RFC7567]. The following | use at a bottleneck on the public Internet [RFC7567]. The following | |||
parameters MAY be operator-configurable, e.g. to tune for non- | parameters MAY be operator-configurable, e.g. to tune for non- | |||
Internet settings: | Internet settings: | |||
o Optional packet classifier(s) to use in addition to the ECN field | o Optional packet classifier(s) to use in addition to the ECN field | |||
(see Section 2.3); | (see Section 2.3); | |||
o Expected typical RTT (a parameter for typical or target queuing | o Expected typical RTT (a parameter for typical or target queuing | |||
delay in each queue might be configurable instead); | delay in each queue might be configurable instead; if so it MUST | |||
be expressed in units of time); | ||||
o Expected maximum RTT (a stability parameter that depends on | o Expected maximum RTT (a stability parameter that depends on | |||
maximum RTT might be configurable instead); | maximum RTT might be configurable instead); | |||
o Coupling factor, k; | o Coupling factor, k; | |||
o The limit to the conditional priority of L4S (scheduler-dependent, | o The limit to the conditional priority of L4S (scheduler-dependent, | |||
e.g. the scheduler weight for WRR, or the time-shift for time- | e.g. the scheduler weight for WRR, or the time-shift for time- | |||
shifted FIFO); | shifted FIFO); | |||
o The maximum Classic ECN marking probability, p_Cmax, before | o The maximum Classic ECN marking probability, p_Cmax, before | |||
switching over to drop. | switching over to drop. | |||
An experimental DualQ Coupled AQM SHOULD allow the operator to | An experimental DualQ Coupled AQM SHOULD allow the operator to | |||
monitor the following operational statistics: | monitor each of the following operational statistics on demand, per | |||
queue and per configurable sample interval, for performance | ||||
monitoring and perhaps also for accounting in some cases: | ||||
o Bits forwarded (total and per queue per sample interval), from | o Bits forwarded, from which utilization can be calculated; | |||
which utilization can be calculated | ||||
o Q delay (per queue over sample interval) {ToDo: max per interval, | o Total packets arriving, enqueued and dequeued to distinguish tail | |||
histogram with configurable edges (from which percentile(s) can be | discard from proactive AQM discard; | |||
derived), not incl. medium access delay} | ||||
o Total packets arriving, enqueued and dequeued (per queue per | o ECN packets marked, non-ECN packets dropped, ECN packets dropped, | |||
sample interval) | from which marking and dropping probabilities can be calculated; | |||
o ECN packets marked, non-ECN packets dropped, ECN packets dropped | o Queue delay (not including serialization delay of the head packet | |||
(per queue per sample interval), from which marking and dropping | or medium acquisition delay) - see further notes below. | |||
probabilities can be calculated | ||||
o Time and duration of each overload event. | Unlike the other statistics, queue delay cannot be captured in a | |||
simple accumulating counter. Therefore the type of queue delay | ||||
statistics produced (mean, percentiles, etc.) will depend on | ||||
implementation constraints. To facilitate comparative evaluation | ||||
of different implementations and approaches, an implementation | ||||
SHOULD allow mean and 99th percentile queue delay to be derived | ||||
(per queue per sample interval). A relatively simple way to do | ||||
this would be to store a coarse-grained histogram of queue delay. | ||||
This could be done with a small number of bins with configurable | ||||
edges that represent contiguous ranges of queue delay. Then, over | ||||
a sample interval, each bin would accumulate a count of the number | ||||
of packets that had fallen within each range. The maximum queue | ||||
delay per queue per interval MAY also be recorded. | ||||
The type of statistics produced for variables like Q delay (mean, | An experimental DualQ Coupled AQM SHOULD asynchronously report the | |||
percentiles, etc.) will depend on implementation constraints. | following data about anomalous conditions: | |||
o Start-time and duration of overload state. | ||||
A hysteresis mechanism SHOULD be used to prevent flapping in and | ||||
out of overload causing an event storm. For instance, exit from | ||||
overload state could trigger one report, but also latch a timer. | ||||
Then, during that time, if the AQM enters and exits overload state | ||||
any number of times, the duration in overload state is accumulated | ||||
but no new report is generated until the first time the AQM is out | ||||
of overload once the timer has expired. | ||||
[RFC5706] suggests that deployment, coexistence and scaling should | ||||
also be covered as management requirements. The raison d'etre of the | ||||
DualQ Couple AQM is to enable deployment and coexistence of scalable | ||||
congestion controls - as incremental replacements for today's TCP- | ||||
friendly controls that do not scale with bandwidth-delay product. | ||||
Therefore, these motivating issues are explained in the Introduction | ||||
and detailed in the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. | ||||
Also, the descriptions of specific DualQ Coupled AQM algorithms in | ||||
the appendices cover scaling of their configuration parameters, e.g. | ||||
with respect to RTT and sampling frequency. | ||||
3. IANA Considerations | 3. IANA Considerations | |||
This specification contains no IANA considerations. | This specification contains no IANA considerations. | |||
4. Security Considerations | 4. Security Considerations | |||
4.1. Overload Handling | 4.1. Overload Handling | |||
Where the interests of users or flows might conflict, it could be | Where the interests of users or flows might conflict, it could be | |||
skipping to change at page 15, line 34 ¶ | skipping to change at page 17, line 16 ¶ | |||
useful objective would be for the overload behaviour of the DualQ AQM | useful objective would be for the overload behaviour of the DualQ AQM | |||
to be at least no worse than a single queue AQM. However, a trade- | to be at least no worse than a single queue AQM. However, a trade- | |||
off needs to be made between complexity and the risk of either | off needs to be made between complexity and the risk of either | |||
traffic class harming the other. In each of the following three | traffic class harming the other. In each of the following three | |||
subsections, an overload issue specific to the DualQ is described, | subsections, an overload issue specific to the DualQ is described, | |||
followed by proposed solution(s). | followed by proposed solution(s). | |||
Under overload the higher priority L4S service will have to sacrifice | Under overload the higher priority L4S service will have to sacrifice | |||
some aspect of its performance. Alternative solutions are provided | some aspect of its performance. Alternative solutions are provided | |||
below that each relax a different factor: e.g. throughput, delay, | below that each relax a different factor: e.g. throughput, delay, | |||
drop. Some of these choices might need to be determined by operator | drop. These choices need to be made either by the developer or by | |||
policy or by the developer, rather than by the IETF. {ToDo: Reach | operator policy, rather than by the IETF. | |||
consensus on which it is to be in each case.} | ||||
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput or Delay? | 4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput or Delay? | |||
Priority of L4S is required to be conditional to avoid total | Priority of L4S is required to be conditional to avoid total | |||
throughput starvation of Classic by heavy L4S traffic. This raises | throughput starvation of Classic by heavy L4S traffic. This raises | |||
the question of whether to sacrifice L4S throughput or L4S delay (or | the question of whether to sacrifice L4S throughput or L4S delay (or | |||
some other policy) to mitigate starvation of Classic: | some other policy) to mitigate starvation of Classic: | |||
Sacrifice L4S throughput: By using weighted round robin as the | Sacrifice L4S throughput: By using weighted round robin as the | |||
conditional priority scheduler, the L4S service can sacrifice some | conditional priority scheduler, the L4S service can sacrifice some | |||
skipping to change at page 16, line 45 ¶ | skipping to change at page 18, line 26 ¶ | |||
packets relative to L4S. | packets relative to L4S. | |||
The example implementation in Appendix A can implement either policy. | The example implementation in Appendix A can implement either policy. | |||
4.1.2. Congestion Signal Saturation: Introduce L4S Drop or Delay? | 4.1.2. Congestion Signal Saturation: Introduce L4S Drop or Delay? | |||
To keep the throughput of both L4S and Classic flows roughly equal | To keep the throughput of both L4S and Classic flows roughly equal | |||
over the full load range, a different control strategy needs to be | over the full load range, a different control strategy needs to be | |||
defined above the point where one AQM first saturates to a | defined above the point where one AQM first saturates to a | |||
probability of 100% leaving no room to push back the load any harder. | probability of 100% leaving no room to push back the load any harder. | |||
If k>1, L4S will saturate first, but saturation can be caused by | If k>1, L4S will saturate first, even though saturation could be | |||
unresponsive traffic in either queue. | caused by unresponsive traffic in either queue. | |||
The term 'unresponsive' includes cases where a flow becomes | The term 'unresponsive' includes cases where a flow becomes | |||
temporarily unresponsive, for instance, a real-time flow that takes a | temporarily unresponsive, for instance, a real-time flow that takes a | |||
while to adapt its rate in response to congestion, or a TCP-like flow | while to adapt its rate in response to congestion, or a TCP-like flow | |||
that is normally responsive, but above a certain congestion level it | that is normally responsive, but above a certain congestion level it | |||
will not be able to reduce its congestion window below the minimum of | will not be able to reduce its congestion window below the minimum of | |||
2 segments, effectively becoming unresponsive. (Note that L4S | 2 segments [RFC5681], effectively becoming unresponsive. (Note that | |||
traffic ought to remain responsive below a window of 2 segments (see | L4S traffic ought to remain responsive below a window of 2 segments | |||
[I-D.ietf-tsvwg-ecn-l4s-id]). | (see [I-D.ietf-tsvwg-ecn-l4s-id]). | |||
Saturation raises the question of whether to relieve congestion by | Saturation raises the question of whether to relieve congestion by | |||
introducing some drop into the L4S queue or by allowing delay to grow | introducing some drop into the L4S queue or by allowing delay to grow | |||
in both queues (which could eventually lead to tail drop too): | in both queues (which could eventually lead to tail drop too): | |||
Drop on Saturation: Saturation can be avoided by setting a maximum | Drop on Saturation: Saturation can be avoided by setting a maximum | |||
threshold for L4S ECN marking (assuming k>1) before saturation | threshold for L4S ECN marking (assuming k>1) before saturation | |||
starts to make the flow rates of the different traffic types | starts to make the flow rates of the different traffic types | |||
diverge. Above that the drop probability of Classic traffic is | diverge. Above that the drop probability of Classic traffic is | |||
applied to all packets of all traffic types. Then experiments | applied to all packets of all traffic types. Then experiments | |||
have shown that queueing delay can be kept at the target in any | have shown that queueing delay can be kept at the target in any | |||
overload situation, including with unresponsive traffic, and no | overload situation, including with unresponsive traffic, and no | |||
further measures are required. | further measures are required [DualQ-Test]. | |||
Delay on Saturation: When L4S marking saturates, instead of | Delay on Saturation: When L4S marking saturates, instead of | |||
switching to drop, the drop and marking probabilities could be | switching to drop, the drop and marking probabilities could be | |||
capped. Beyond that, delay will grow either solely in the queue | capped. Beyond that, delay will grow either solely in the queue | |||
with unresponsive traffic (if WRR is used), or in both queues (if | with unresponsive traffic (if WRR is used), or in both queues (if | |||
time-shifted FIFO is used). In either case, the higher delay | time-shifted FIFO is used). In either case, the higher delay | |||
ought to control temporary high congestion. If the overload is | ought to control temporary high congestion. If the overload is | |||
more persistent, eventually the combined DualQ will overflow and | more persistent, eventually the combined DualQ will overflow and | |||
tail drop will control congestion. | tail drop will control congestion. | |||
The example implementation in Appendix A applies only the "drop on | The example implementation in Appendix A solely applies the "drop on | |||
saturation" policy. | saturation" policy. | |||
4.1.3. Protecting against Unresponsive ECN-Capable Traffic | 4.1.3. Protecting against Unresponsive ECN-Capable Traffic | |||
Unresponsive traffic has a greater advantage if it is also ECN- | Unresponsive traffic has a greater advantage if it is also ECN- | |||
capable. The advantage is undetectable at normal low levels of drop/ | capable. The advantage is undetectable at normal low levels of drop/ | |||
marking, but it becomes significant with the higher levels of drop/ | marking, but it becomes significant with the higher levels of drop/ | |||
marking typical during overload. This is an issue whether the ECN- | marking typical during overload. This is an issue whether the ECN- | |||
capable traffic is L4S or Classic. | capable traffic is L4S or Classic. | |||
skipping to change at page 18, line 10 ¶ | skipping to change at page 19, line 41 ¶ | |||
problem with unresponsive ECN as well as addressing the saturation | problem with unresponsive ECN as well as addressing the saturation | |||
problem. It leaves only a small range of congestion levels where | problem. It leaves only a small range of congestion levels where | |||
unresponsive traffic gains any advantage from using the ECN | unresponsive traffic gains any advantage from using the ECN | |||
capability, and the advantage is hardly detectable [DualQ-Test]. | capability, and the advantage is hardly detectable [DualQ-Test]. | |||
5. Acknowledgements | 5. Acknowledgements | |||
Thanks to Anil Agarwal, Sowmini Varadhan's and Gabi Bracha for | Thanks to Anil Agarwal, Sowmini Varadhan's and Gabi Bracha for | |||
detailed review comments particularly of the appendices and | detailed review comments particularly of the appendices and | |||
suggestions on how to make our explanation clearer. Thanks also to | suggestions on how to make our explanation clearer. Thanks also to | |||
Greg White and Tom Henderson for insights on the choice of schedulers | Greg White for improving the normative requirements and both Greg and | |||
and queue delay measurement techniques. | Tom Henderson for insights on the choice of schedulers, queue delay | |||
measurement techniques. | ||||
The authors' contributions were originally part-funded by the | The authors' contributions were originally part-funded by the | |||
European Community under its Seventh Framework Programme through the | European Community under its Seventh Framework Programme through the | |||
Reducing Internet Transport Latency (RITE) project (ICT-317700). Bob | Reducing Internet Transport Latency (RITE) project (ICT-317700). Bob | |||
Briscoe's contribution was also part-funded by the Research Council | Briscoe's contribution was also part-funded by the Research Council | |||
of Norway through the TimeIn project. The views expressed here are | of Norway through the TimeIn project. The views expressed here are | |||
solely those of the authors. | solely those of the authors. | |||
6. References | 6. References | |||
skipping to change at page 19, line 34 ¶ | skipping to change at page 21, line 23 ¶ | |||
Low Loss, Scalable Throughput (L4S) Internet Service: | Low Loss, Scalable Throughput (L4S) Internet Service: | |||
Architecture", draft-ietf-tsvwg-l4s-arch-02 (work in | Architecture", draft-ietf-tsvwg-l4s-arch-02 (work in | |||
progress), March 2018. | progress), March 2018. | |||
[I-D.sridharan-tcpm-ctcp] | [I-D.sridharan-tcpm-ctcp] | |||
Sridharan, M., Tan, K., Bansal, D., and D. Thaler, | Sridharan, M., Tan, K., Bansal, D., and D. Thaler, | |||
"Compound TCP: A New TCP Congestion Control for High-Speed | "Compound TCP: A New TCP Congestion Control for High-Speed | |||
and Long Distance Networks", draft-sridharan-tcpm-ctcp-02 | and Long Distance Networks", draft-sridharan-tcpm-ctcp-02 | |||
(work in progress), November 2008. | (work in progress), November 2008. | |||
[L4Sdemo16] | ||||
Bondarenko, O., De Schepper, K., Tsang, I., and B. | ||||
Briscoe, "Ultra-Low Delay for All: Live Experience, Live | ||||
Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | ||||
<http://dl.acm.org/citation.cfm?doid=2910017.2910633 | ||||
(videos of demos: https://riteproject.eu/ | ||||
dctth/#1511dispatchwg )>. | ||||
[Mathis09] | [Mathis09] | |||
Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , | Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , | |||
May 2009, <http://www.hpcc.jp/pfldnet2009/ | May 2009, <http://www.hpcc.jp/pfldnet2009/ | |||
Program_files/1569198525.pdf>. | Program_files/1569198525.pdf>. | |||
[MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a | [MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a | |||
simple scheduling algorithm for two real-time transport | simple scheduling algorithm for two real-time transport | |||
service classes with application in the UTRAN", Proc. IEEE | service classes with application in the UTRAN", Proc. IEEE | |||
Conference on Computer Communications (INFOCOM'03) Vol.2 | Conference on Computer Communications (INFOCOM'03) Vol.2 | |||
pp.1116-1122, March 2003. | pp.1116-1122, March 2003. | |||
skipping to change at page 20, line 36 ¶ | skipping to change at page 22, line 32 ¶ | |||
<https://www.rfc-editor.org/info/rfc3246>. | <https://www.rfc-editor.org/info/rfc3246>. | |||
[RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", | [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", | |||
RFC 3649, DOI 10.17487/RFC3649, December 2003, | RFC 3649, DOI 10.17487/RFC3649, December 2003, | |||
<https://www.rfc-editor.org/info/rfc3649>. | <https://www.rfc-editor.org/info/rfc3649>. | |||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | |||
<https://www.rfc-editor.org/info/rfc5681>. | <https://www.rfc-editor.org/info/rfc5681>. | |||
[RFC5706] Harrington, D., "Guidelines for Considering Operations and | ||||
Management of New Protocols and Protocol Extensions", | ||||
RFC 5706, DOI 10.17487/RFC5706, November 2009, | ||||
<https://www.rfc-editor.org/info/rfc5706>. | ||||
[RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF | [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF | |||
Recommendations Regarding Active Queue Management", | Recommendations Regarding Active Queue Management", | |||
BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, | BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, | |||
<https://www.rfc-editor.org/info/rfc7567>. | <https://www.rfc-editor.org/info/rfc7567>. | |||
[RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, | [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, | |||
"Proportional Integral Controller Enhanced (PIE): A | "Proportional Integral Controller Enhanced (PIE): A | |||
Lightweight Control Scheme to Address the Bufferbloat | Lightweight Control Scheme to Address the Bufferbloat | |||
Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, | Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, | |||
<https://www.rfc-editor.org/info/rfc8033>. | <https://www.rfc-editor.org/info/rfc8033>. | |||
skipping to change at page 21, line 26 ¶ | skipping to change at page 23, line 26 ¶ | |||
[RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | |||
Notification (ECN) Experimentation", RFC 8311, | Notification (ECN) Experimentation", RFC 8311, | |||
DOI 10.17487/RFC8311, January 2018, | DOI 10.17487/RFC8311, January 2018, | |||
<https://www.rfc-editor.org/info/rfc8311>. | <https://www.rfc-editor.org/info/rfc8311>. | |||
[RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | |||
R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | |||
RFC 8312, DOI 10.17487/RFC8312, February 2018, | RFC 8312, DOI 10.17487/RFC8312, February 2018, | |||
<https://www.rfc-editor.org/info/rfc8312>. | <https://www.rfc-editor.org/info/rfc8312>. | |||
[TCP-CA] Jacobson, V. and M. Karels, "Congestion Avoidance and | ||||
Control", Laurence Berkeley Labs Technical Report , | ||||
November 1988, <http://ee.lbl.gov/papers/congavoid.pdf>. | ||||
Appendix A. Example DualQ Coupled PI2 Algorithm | Appendix A. Example DualQ Coupled PI2 Algorithm | |||
As a first concrete example, the pseudocode below gives the DualPI2 | As a first concrete example, the pseudocode below gives the DualPI2 | |||
algorithm. DualPI2 follows the structure of the DualQ Coupled AQM | algorithm. DualPI2 follows the structure of the DualQ Coupled AQM | |||
framework in Figure 1. A simple step threshold (in units of queuing | framework in Figure 1. A simple step threshold (in units of queuing | |||
time) is used for the Native L4S AQM, but a ramp is also described as | time) is used for the Native L4S AQM, but a ramp is also described as | |||
an alternative. And the PI2 algorithm [PI2] is used for the Classic | an alternative. And the PI2 algorithm [PI2] is used for the Classic | |||
AQM. PI2 is an improved variant of the PIE AQM [RFC8033]. | AQM. PI2 is an improved variant of the PIE AQM [RFC8033]. | |||
We will introduce the pseudocode in two passes. The first pass | We will introduce the pseudocode in two passes. The first pass | |||
skipping to change at page 21, line 48 ¶ | skipping to change at page 23, line 52 ¶ | |||
between the two passes by using letter suffixes where the longer code | between the two passes by using letter suffixes where the longer code | |||
needs extra lines. | needs extra lines. | |||
A full open source implementation for Linux is available at: | A full open source implementation for Linux is available at: | |||
https://github.com/olgabo/dualpi2. | https://github.com/olgabo/dualpi2. | |||
A.1. Pass #1: Core Concepts | A.1. Pass #1: Core Concepts | |||
The pseudocode manipulates three main structures of variables: the | The pseudocode manipulates three main structures of variables: the | |||
packet (pkt), the L4S queue (lq) and the Classic queue (cq). The | packet (pkt), the L4S queue (lq) and the Classic queue (cq). The | |||
pseudocode consists of the following four functions: | pseudocode consists of the following five functions: | |||
o initialization code (Figure 2) that sets parameter defaults (the | o initialization code (Figure 2) that sets parameter defaults (the | |||
API for setting non-default values is omitted for brevity) | API for setting non-default values is omitted for brevity) | |||
o enqueue code (Figure 3) | o enqueue code (Figure 3) | |||
o dequeue code (Figure 4) | o dequeue code (Figure 4) | |||
o a ramp function (Figure 5) used to calculate the ECN-marking | ||||
probability for the L4S queue | ||||
o code to regularly update the base probability (p) used in the | o code to regularly update the base probability (p) used in the | |||
dequeue code (Figure 5). | dequeue code (Figure 6). | |||
It also uses the following functions that are not shown in full here: | It also uses the following functions that are not shown in full here: | |||
o scheduler(), which selects between the head packets of the two | o scheduler(), which selects between the head packets of the two | |||
queues; the choice of scheduler technology is discussed later; | queues; the choice of scheduler technology is discussed later; | |||
o cq.len() or lq.len() returns the current length (aka. backlog) of | o cq.len() or lq.len() returns the current length (aka. backlog) of | |||
the relevant queue in bytes; | the relevant queue in bytes; | |||
o cq.time() or lq.time() returns the current queuing delay (aka. | o cq.time() or lq.time() returns the current queuing delay (aka. | |||
skipping to change at page 23, line 21 ¶ | skipping to change at page 25, line 21 ¶ | |||
7: p_Cmax = 1/4 % Max Classic drop/mark prob | 7: p_Cmax = 1/4 % Max Classic drop/mark prob | |||
8: % Constants derived from PI2 AQM parameters | 8: % Constants derived from PI2 AQM parameters | |||
9: alpha_U = alpha *Tupdate % PI integral gain per update interval | 9: alpha_U = alpha *Tupdate % PI integral gain per update interval | |||
10: beta_U = beta * Tupdate % PI prop'nal gain per update interval | 10: beta_U = beta * Tupdate % PI prop'nal gain per update interval | |||
11: | 11: | |||
12: % DualQ Coupled framework parameters | 12: % DualQ Coupled framework parameters | |||
13: k = 2 % Coupling factor | 13: k = 2 % Coupling factor | |||
14: % scheduler weight or equival't parameter (scheduler-dependent) | 14: % scheduler weight or equival't parameter (scheduler-dependent) | |||
15: limit = MAX_LINK_RATE * 250 ms % Dual buffer size | 15: limit = MAX_LINK_RATE * 250 ms % Dual buffer size | |||
16: | 16: | |||
17: % L4S AQM parameters | 17: % L4S ramp AQM parameters | |||
18: T_time = 1 ms % L4S marking threshold in time | 18: minTh = 475 us % L4S min marking threshold in time units | |||
19: T_len = 2 * MTU % Min L4S marking threshold in bytes | 19: range = 525 us % Range of L4S ramp in time units | |||
20: % Constants derived from L4S AQM parameters | 20: Th_len = 2 * MTU % Min L4S marking threshold in bytes | |||
21: p_Lmax = min(k*sqrt(p_Cmax), 1) % Max L4S marking prob | 21: % Constants derived from L4S AQM parameters | |||
22: } | 22: p_Lmax = min(k*sqrt(p_Cmax), 1) % Max L4S marking prob | |||
23: floor = Th_len * 8 / MIN_LINK_RATE % MIN_LINK_RATE is in Mb/s | ||||
24: if (minTh < floor) { | ||||
25: % Adjust ramp to exceed serialization time of 2 MTU | ||||
26: range = max(range - (floor-minTh), 1) % 1us avoids /0 error | ||||
27: minTh = floor | ||||
28: } | ||||
29: maxTh = minTh+range % L4S min marking threshold in time units | ||||
30: } | ||||
Figure 2: Example Header Pseudocode for DualQ Coupled PI2 AQM | Figure 2: Example Header Pseudocode for DualQ Coupled PI2 AQM | |||
For brevity the pseudocode shows some parameters in units of | ||||
microseconds (us), but a real implementation would probably use | ||||
nanoseconds. | ||||
The overall goal of the code is to maintain the base probability (p), | The overall goal of the code is to maintain the base probability (p), | |||
which is an internal variable from which the marking and dropping | which is an internal variable from which the marking and dropping | |||
probabilities for L4S and Classic traffic (p_L and p_C) are derived. | probabilities for L4S and Classic traffic (p_L and p_C) are derived. | |||
The variable named p in the pseudocode and in this walk-through is | The variable named p in the pseudocode and in this walk-through is | |||
the same as p' (p-prime) in Section 2.4. The probabilities p_L and | the same as p' (p-prime) in Section 2.4. The probabilities p_L and | |||
p_C are derived in lines 3, 4 and 5 of the dualpi2_update() function | p_C are derived in lines 3, 4 and 5 of the dualpi2_update() function | |||
(Figure 5) then used in the dualpi2_dequeue() function (Figure 4). | (Figure 6) then used in the dualpi2_dequeue() function (Figure 4). | |||
The code walk-through below builds up to explaining that part of the | The code walk-through below builds up to explaining that part of the | |||
code eventually, but it starts from packet arrival. | code eventually, but it starts from packet arrival. | |||
1: dualpi2_enqueue(lq, cq, pkt) { % Test limit and classify lq or cq | 1: dualpi2_enqueue(lq, cq, pkt) { % Test limit and classify lq or cq | |||
2: if ( lq.len() + cq.len() > limit ) | 2: if ( lq.len() + cq.len() > limit ) | |||
3: drop(pkt) % drop packet if buffer is full | 3: drop(pkt) % drop packet if buffer is full | |||
4: else { % Packet classifier | 4: else { % Packet classifier | |||
5: if ( ecn(pkt) modulo 2 == 1 ) % ECN bits = ECT(1) or CE | 5: if ( ecn(pkt) modulo 2 == 1 ) % ECN bits = ECT(1) or CE | |||
6: lq.enqueue(pkt) | 6: lq.enqueue(pkt) | |||
7: else % ECN bits = not-ECT or ECT(0) | 7: else % ECN bits = not-ECT or ECT(0) | |||
8: cq.enqueue(pkt) | 8: cq.enqueue(pkt) | |||
9: } | 9: } | |||
10: } | 10: } | |||
Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM | Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM | |||
1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | |||
2: while ( lq.len() + cq.len() > 0 ) | 2: while ( lq.len() + cq.len() > 0 ) | |||
3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
4: lq.dequeue(pkt) % Scheduler chooses lq | 4: lq.dequeue(pkt) % Scheduler chooses lq | |||
5: p'_L = laqm(lq.time()) % Native L4S AQM | ||||
{ToDo: Generalize 5-7 for any L AQM (see email to Tom 9-Aug-18)} | 6: p_L = max(p'_L, p_CL) % Combining function | |||
7: if ( p_L > rand() ) % Linear marking | ||||
5: if ( ((lq.time() > T_time) % step marking ... | ||||
6: AND (lq.len() > T_len)) | ||||
7: OR (p_CL > rand()) ) % ...or linear marking | ||||
8: mark(pkt) | 8: mark(pkt) | |||
9: } else { | 9: } else { | |||
10: cq.dequeue(pkt) % Scheduler chooses cq | 10: cq.dequeue(pkt) % Scheduler chooses cq | |||
11: if ( p_C > rand() ) { % probability p_C = p^2 | 11: if ( p_C > rand() ) { % probability p_C = p^2 | |||
12: if ( ecn(pkt) == 0 ) { % if ECN field = not-ECT | 12: if ( ecn(pkt) == 0 ) { % if ECN field = not-ECT | |||
13: drop(pkt) % squared drop | 13: drop(pkt) % squared drop | |||
14: continue % continue to the top of the while loop | 14: continue % continue to the top of the while loop | |||
15: } | 15: } | |||
16: mark(pkt) % squared mark | 16: mark(pkt) % squared mark | |||
17: } | 17: } | |||
skipping to change at page 25, line 28 ¶ | skipping to change at page 27, line 38 ¶ | |||
add a whole queue of delay to the control signals, making the control | add a whole queue of delay to the control signals, making the control | |||
loop very sloppy. | loop very sloppy. | |||
All the dequeue code is contained within a large while loop so that | All the dequeue code is contained within a large while loop so that | |||
if it decides to drop a packet, it will continue until it selects a | if it decides to drop a packet, it will continue until it selects a | |||
packet to schedule. Line 3 of the dequeue pseudocode is where the | packet to schedule. Line 3 of the dequeue pseudocode is where the | |||
scheduler chooses between the L4S queue (lq) and the Classic queue | scheduler chooses between the L4S queue (lq) and the Classic queue | |||
(cq). Detailed implementation of the scheduler is not shown (see | (cq). Detailed implementation of the scheduler is not shown (see | |||
discussion later). | discussion later). | |||
o If an L4S packet is scheduled, lines 5 to 8 mark the packet if | o If an L4S packet is scheduled, lines 7 and 8 ECN-mark the packet | |||
either the L4S threshold (T_time) is exceeded, or if a random | if a random marking decision is drawn according to p_L. Line 6 | |||
marking decision is drawn according to p_CL (maintained by the | calculates p_L as the maximum of the coupled L4S probability p_CL | |||
dualpi2_update() function discussed below). This logical 'OR' on | and the probability from the native L4S AQM p'_L. This implements | |||
a per-packet basis implements the max() function shown in Figure 1 | the max() function shown in Figure 1 to couple the outputs of the | |||
to couple the outputs of the two AQMs together. The L4S threshold | two AQMs together. Of the two probabilities input to p_L in line | |||
is usually in units of time (default T_time = 1 ms). However, on | 6: | |||
slow links the packet serialization time can approach the | ||||
threshold T_time, so line 6 sets a floor of T_len (=2 MTU) to the | * p'_L is calculated per packet in line 5 by the laqm() function | |||
threshold, otherwise marking is always too frequent on slow links. | (see Figure 5), | |||
* whereas p_CL is maintained by the dualpi2_update() function | ||||
which runs every Tupdate (default 16ms) (see Figure 2). | ||||
o If a Classic packet is scheduled, lines 10 to 17 drop or mark the | o If a Classic packet is scheduled, lines 10 to 17 drop or mark the | |||
packet based on the squared probability p_C. | packet based on the squared probability p_C. | |||
There is some concern that using a step function for the Native L4S | The Native L4S AQM algorithm (Figure 5) is a ramp function, similar | |||
AQM requires end-systems to smooth the signal for a lot longer - | to the RED algorithm, but simpler due to the following differences: | |||
until its fidelity is sufficient. The latency benefits of a ramp are | ||||
being investigated as a simple alternative to the step. This ramp | ||||
would be similar to the RED algorithm, with the following | ||||
differences: | ||||
o The min and max of the ramp are defined in units of queuing delay, | o The min and max of the ramp are defined in units of queuing delay, | |||
not bytes, so that configuration remains invariant as the queue | not bytes, so that configuration remains invariant as the queue | |||
departure rate varies. | departure rate varies. | |||
o It uses instantaneous queueing delay without smoothing (smoothing | o It uses instantaneous queueing delay to remove smoothing delay | |||
is done in the end-systems). | (L4S senders smooth incoming ECN feedback when necessary). | |||
o Determinism is being experimented with instead of randomness; to | ||||
reduce the delay necessary to smooth out the noise of randomness | ||||
from the signal. For each packet, the algorithm would accumulate | ||||
p'_L in a counter and mark the packet that took the counter over | ||||
1, then subtract 1 from the counter and continue. | ||||
o The ramp rises linearly directly from 0 to 1, not to a an | o The ramp rises linearly directly from 0 to 1, not to a an | |||
intermediate value of p'_L as RED would, because there is no need | intermediate value of p'_L as RED would, because there is no need | |||
to keep ECN marking probability low. | to keep ECN marking probability low. | |||
This ramp algorithm would require two configuration parameters (min | o Marking does not have to be randomized. Determinism is being | |||
and max threshold in units of queuing time), in contrast to the | experimented with instead of randomness; to reduce the delay | |||
single parameter of a step. | necessary to smooth out the noise of randomness from the signal. | |||
In this case, for each packet, the algorithm would accumulate p_L | ||||
in a counter and mark the packet that took the counter over 1, | ||||
then subtract 1 from the counter and continue. | ||||
This ramp function requires two configuration parameters, the minimum | ||||
threshold (minTh) and the width of the ramp (range), both in units of | ||||
queuing time), as shown in the parameter initialization code in | ||||
Figure 2. A minimum marking threshold parameter (Th_len) in | ||||
transmission units (default 2 MTU) is also necessary to ensure that | ||||
the ramp does not trigger excessive marking on slow links. The code | ||||
in lines 23-28 of Figure 2 converts 2 MTU into time units and adjusts | ||||
the ramp thresholds to be no shallower than this floor. | ||||
An operator can effectively turn the ramp into a step function, as | ||||
used by DCTCP, by setting the range to its minimum value (e.g. 1 ns). | ||||
Then the condition for the ramp calculation will hardly ever arise. | ||||
There is some concern that using the step function of DCTCP for the | ||||
Native L4S AQM requires end-systems to smooth the signal for an | ||||
unnecessarily large number of round trips to ensure sufficient | ||||
fidelity. A ramp seems to be no worse than a step in initial | ||||
experiments with existing DCTCP. Therefore, it is recommended that a | ||||
ramp is configured in place of a step, which will allow congestion | ||||
control algorithms to investigate faster smoothing algorithms. | ||||
1: laqm(qdelay) { % Returns native L4S AQM probability | ||||
2: if (qdelay >= maxTh) | ||||
3: return 1 | ||||
4: else if (qdelay > minTh) | ||||
5: return (qdelay - minTh)/range % Divide would use a bit-shift | ||||
6: else | ||||
7: return 0 | ||||
8: } | ||||
Figure 5: Example Pseudocode for the Native L4S AQM | ||||
1: dualpi2_update(lq, cq, target) { % Update p every Tupdate | 1: dualpi2_update(lq, cq, target) { % Update p every Tupdate | |||
2: curq = cq.time() % use queuing time of first-in Classic packet | 2: curq = cq.time() % use queuing time of first-in Classic packet | |||
3: p = p + alpha_U * (curq - target) + beta_U * (curq - prevq) | 3: p = p + alpha_U * (curq - target) + beta_U * (curq - prevq) | |||
4: p_CL = p * k % Coupled L4S prob = base prob * coupling factor | 4: p_CL = p * k % Coupled L4S prob = base prob * coupling factor | |||
5: p_C = p^2 % Classic prob = (base prob)^2 | 5: p_C = p^2 % Classic prob = (base prob)^2 | |||
6: prevq = curq | 6: prevq = curq | |||
7: } | 7: } | |||
Figure 5: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | Figure 6: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | |||
The base probability (p) is kept up to date by the core PI algorithm | p_CL depends on the base probability (p), which is kept up to date by | |||
in Figure 5, which is executed every Tupdate. | the core PI algorithm in Figure 6 executed every Tupdate. | |||
Note that p solely depends on the queuing time in the Classic queue. | Note that p solely depends on the queuing time in the Classic queue. | |||
In line 2, the current queuing delay (curq) is evaluated from how | In line 2, the current queuing delay (curq) is evaluated from how | |||
long the head packet was in the Classic queue (cq). The function | long the head packet was in the Classic queue (cq). The function | |||
cq.time() (not shown) subtracts the time stamped at enqueue from the | cq.time() (not shown) subtracts the time stamped at enqueue from the | |||
current time and implicitly takes the current queuing delay as 0 if | current time and implicitly takes the current queuing delay as 0 if | |||
the queue is empty. | the queue is empty. | |||
The algorithm centres on line 3, which is a classical Proportional- | The algorithm centres on line 3, which is a classical Proportional- | |||
Integral (PI) controller that alters p dependent on: a) the error | Integral (PI) controller that alters p dependent on: a) the error | |||
skipping to change at page 27, line 22 ¶ | skipping to change at page 30, line 15 ¶ | |||
without over-compensating and therefore causing oscillations in the | without over-compensating and therefore causing oscillations in the | |||
queue. | queue. | |||
alpha and beta determine how much p ought to change if it was updated | alpha and beta determine how much p ought to change if it was updated | |||
every second. It is best to update p as frequently as possible, but | every second. It is best to update p as frequently as possible, but | |||
the update interval (Tupdate) will probably be constrained by | the update interval (Tupdate) will probably be constrained by | |||
hardware performance. For link rates from 4 - 200 Mb/s, we found | hardware performance. For link rates from 4 - 200 Mb/s, we found | |||
Tupdate=16ms (as recommended in [RFC8033]) is sufficient. However | Tupdate=16ms (as recommended in [RFC8033]) is sufficient. However | |||
small the chosen value of Tupdate, p should change by the same amount | small the chosen value of Tupdate, p should change by the same amount | |||
per second, but in finer more frequent steps. So the gain factors | per second, but in finer more frequent steps. So the gain factors | |||
used for updating p in Figure 5 need to be scaled by (Tupdate/1s), | used for updating p in Figure 6 need to be scaled by (Tupdate/1s), | |||
which is done in lines 9 and 10 of Figure 2). The suffix '_U' | which is done in lines 9 and 10 of Figure 2). The suffix '_U' | |||
represents 'per update time' (Tupdate). | represents 'per update time' (Tupdate). | |||
In corner cases, p can overflow the range [0,1] so the resulting | In corner cases, p can overflow the range [0,1] so the resulting | |||
value of p has to be bounded (omitted from the pseudocode). Then, as | value of p has to be bounded (omitted from the pseudocode). Then, as | |||
already explained, the coupled and Classic probabilities are derived | already explained, the coupled and Classic probabilities are derived | |||
from the new p in lines 4 and 5 as p_CL = k*p and p_C = p^2. | from the new p in lines 4 and 5 as p_CL = k*p and p_C = p^2. | |||
Because the coupled L4S marking probability (p_CL) is factored up by | Because the coupled L4S marking probability (p_CL) is factored up by | |||
k, the dynamic gain parameters alpha and beta are also inherently | k, the dynamic gain parameters alpha and beta are also inherently | |||
skipping to change at page 27, line 50 ¶ | skipping to change at page 30, line 43 ¶ | |||
are independent of p because the squaring applied to Classic traffic | are independent of p because the squaring applied to Classic traffic | |||
tunes them inherently. This is explained in [PI2], which also | tunes them inherently. This is explained in [PI2], which also | |||
explains why this more principled approach removes the need for most | explains why this more principled approach removes the need for most | |||
of the heuristics that had to be added to PIE. | of the heuristics that had to be added to PIE. | |||
{ToDo: Scaling beta with Tupdate and scaling both alpha & beta with | {ToDo: Scaling beta with Tupdate and scaling both alpha & beta with | |||
RTT} | RTT} | |||
A.2. Pass #2: Overload Details | A.2. Pass #2: Overload Details | |||
Figure 6 repeats the dequeue function of Figure 4, but with overload | Figure 7 repeats the dequeue function of Figure 4, but with overload | |||
details added. Similarly Figure 7 repeats the core PI algorithm of | details added. Similarly Figure 8 repeats the core PI algorithm of | |||
Figure 5 with overload details added. The initialization and enqueue | Figure 6 with overload details added. The initialization, enqueue | |||
functions are unchanged. | and L4S AQM functions are unchanged. | |||
In line 7 of the initialization function (Figure 2), the default | In line 7 of the initialization function (Figure 2), the default | |||
maximum Classic drop probability p_Cmax = 1/4 or 25%. This is the | maximum Classic drop probability p_Cmax = 1/4 or 25%. This is the | |||
point at which it is deemed that the Classic queue has become | point at which it is deemed that the Classic queue has become | |||
persistently overloaded, so it switches to using solely drop, even | persistently overloaded, so it switches to using solely drop, even | |||
for ECN-capable packets. This protects the queue against any | for ECN-capable packets. This protects the queue against any | |||
unresponsive traffic that falsely claims that it is responsive to ECN | unresponsive traffic that falsely claims that it is responsive to ECN | |||
marking, as required by [RFC3168] and [RFC7567]. | marking, as required by [RFC3168] and [RFC7567]. | |||
Line 21 of the initialization function translates this into a maximum | Line 22 of the initialization function translates this into a maximum | |||
L4S marking probability (p_Lmax) by rearranging Equation (1). With a | L4S marking probability (p_Lmax) by rearranging Equation (1). With a | |||
coupling factor of k=2 (the default) or greater, this translates to a | coupling factor of k=2 (the default) or greater, this translates to a | |||
maximum L4S marking probability of 1 (or 100%). This is intended to | maximum L4S marking probability of 1 (or 100%). This is intended to | |||
ensure that the L4S queue starts to introduce dropping once marking | ensure that the L4S queue starts to introduce dropping once marking | |||
saturates and can rise no further. The 'TCP Prague' requirements | saturates and can rise no further. The 'TCP Prague' requirements | |||
[I-D.ietf-tsvwg-ecn-l4s-id] state that, when an L4S congestion | [I-D.ietf-tsvwg-ecn-l4s-id] state that, when an L4S congestion | |||
control detects a drop, it falls back to a response that coexists | control detects a drop, it falls back to a response that coexists | |||
with 'Classic' TCP. So it is correct that the L4S queue drops | with 'Classic' TCP. So it is correct that the L4S queue drops | |||
packets proportional to p^2, as if they are Classic packets. | packets proportional to p^2, as if they are Classic packets. | |||
Both these switch-overs are triggered by the tests for overload | Both these switch-overs are triggered by the tests for overload | |||
introduced in lines 4b and 12b of the dequeue function (Figure 6). | introduced in lines 4b and 12b of the dequeue function (Figure 7). | |||
Lines 8c to 8g drop L4S packets with probability p^2. Lines 8h to 8i | Lines 8c to 8g drop L4S packets with probability p^2. Lines 8h to 8i | |||
mark the remaining packets with probability p_CL. If p_Lmax = 1, | mark the remaining packets with probability p_CL. If p_Lmax = 1, | |||
which is the suggested default configuration, all remaining packets | which is the suggested default configuration, all remaining packets | |||
will be marked because, to have reached the else block at line 8b, | will be marked because, to have reached the else block at line 8b, | |||
p_CL >= 1. | p_CL >= 1. | |||
Lines 2c to 2d in the core PI algorithm (Figure 7) deal with overload | Lines 2c to 2d in the core PI algorithm (Figure 8) deal with overload | |||
of the L4S queue when there is no Classic traffic. This is | of the L4S queue when there is no Classic traffic. This is | |||
necessary, because the core PI algorithm maintains the appropriate | necessary, because the core PI algorithm maintains the appropriate | |||
drop probability to regulate overload, but it depends on the length | drop probability to regulate overload, but it depends on the length | |||
of the Classic queue. If there is no Classic queue the naive | of the Classic queue. If there is no Classic queue the naive | |||
algorithm in Figure 5 drops nothing, even if the L4S queue is | algorithm in Figure 6 drops nothing, even if the L4S queue is | |||
overloaded - so tail drop would have to take over (lines 3 and 4 of | overloaded - so tail drop would have to take over (lines 3 and 4 of | |||
Figure 3). | Figure 3). | |||
If the test at line 2a finds that the Classic queue is empty, line 2d | If the test at line 2a finds that the Classic queue is empty, line 2d | |||
measures the current queue delay using the L4S queue instead. While | measures the current queue delay using the L4S queue instead. While | |||
the L4S queue is not overloaded, its delay will always be tiny | the L4S queue is not overloaded, its delay will always be tiny | |||
compared to the target Classic queue delay. So p_L will be driven to | compared to the target Classic queue delay. So p_L will be driven to | |||
zero, and the L4S queue will naturally be governed solely by | zero, and the L4S queue will naturally be governed solely by | |||
threshold marking (lines 5 and 6 of the dequeue algorithm in | threshold marking (lines 5 and 6 of the dequeue algorithm in | |||
Figure 6). But, if unresponsive L4S source(s) cause overload, the | Figure 7). But, if unresponsive L4S source(s) cause overload, the | |||
DualQ transitions smoothly to L4S marking based on the PI algorithm. | DualQ transitions smoothly to L4S marking based on the PI algorithm. | |||
And as overload increases, it naturally transitions from marking to | And as overload increases, it naturally transitions from marking to | |||
dropping by the switch-over mechanism already described. | dropping by the switch-over mechanism already described. | |||
1: dualpi2_dequeue(lq, cq) { % Couples L4S & Classic queues, lq & cq | 1: dualpi2_dequeue(lq, cq) { % Couples L4S & Classic queues, lq & cq | |||
2: while ( lq.len() + cq.len() > 0 ) | 2: while ( lq.len() + cq.len() > 0 ) | |||
3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
4a: lq.dequeue(pkt) | 4a: lq.dequeue(pkt) | |||
4b: if ( p_CL < p_Lmax ) { % Check for overload saturation | 4b: if ( p_CL < p_Lmax ) { % Check for overload saturation | |||
5: if ( ((lq.time() > T_time) % step marking ... | 5: p'_L = laqm(lq.time()) % Native L4S AQM | |||
6: AND (lq.len > T_len)) | 6: p_L = max(p'_L, p_CL) % Combining function | |||
7: OR (p_CL > rand()) ) % ...or linear marking | 7: if ( p_L > rand() ) % Linear marking | |||
8a: mark(pkt) | 8a: mark(pkt) | |||
8b: } else { % overload saturation | 8b: } else { % overload saturation | |||
8c: if ( p_C > rand() ) { % probability p_C = p^2 | 8c: if ( p_C > rand() ) { % probability p_C = p^2 | |||
8e: drop(pkt) % revert to Classic drop due to overload | 8e: drop(pkt) % revert to Classic drop due to overload | |||
8f: continue % continue to the top of the while loop | 8f: continue % continue to the top of the while loop | |||
8g: } | 8g: } | |||
8h: if ( p_CL > rand() ) % probability p_CL = k * p | 8h: if ( p_CL > rand() ) % probability p_CL = k * p | |||
8i: mark(pkt) % linear marking of remaining packets | 8i: mark(pkt) % linear marking of remaining packets | |||
8j: } | 8j: } | |||
9: } else { | 9: } else { | |||
10: cq.dequeue(pkt) | 10: cq.dequeue(pkt) | |||
skipping to change at page 29, line 41 ¶ | skipping to change at page 32, line 38 ¶ | |||
14: continue % continue to the top of the while loop | 14: continue % continue to the top of the while loop | |||
15: } | 15: } | |||
16: mark(pkt) % squared mark | 16: mark(pkt) % squared mark | |||
17: } | 17: } | |||
18: } | 18: } | |||
19: return(pkt) % return the packet and stop | 19: return(pkt) % return the packet and stop | |||
20: } | 20: } | |||
21: return(NULL) % no packet to dequeue | 21: return(NULL) % no packet to dequeue | |||
22: } | 22: } | |||
Figure 6: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | |||
(Including Integer Arithmetic and Overload Code) | (Including Integer Arithmetic and Overload Code) | |||
1: dualpi2_update(lq, cq, target) { % Update p every Tupdate | 1: dualpi2_update(lq, cq, target) { % Update p every Tupdate | |||
2a: if ( cq.len() > 0 ) | 2a: if ( cq.len() > 0 ) | |||
2b: curq = cq.time() %use queuing time of first-in Classic packet | 2b: curq = cq.time() %use queuing time of first-in Classic packet | |||
2c: else % Classic queue empty | 2c: else % Classic queue empty | |||
2d: curq = lq.time() % use queuing time of first-in L4S packet | 2d: curq = lq.time() % use queuing time of first-in L4S packet | |||
3: p = p + alpha_U * (curq - target) + beta_U * (curq - prevq) | 3: p = p + alpha_U * (curq - target) + beta_U * (curq - prevq) | |||
4: p_CL = p * k % Coupled L4S prob = base prob * coupling factor | 4: p_CL = p * k % Coupled L4S prob = base prob * coupling factor | |||
5: p_C = p^2 % Classic prob = (base prob)^2 | 5: p_C = p^2 % Classic prob = (base prob)^2 | |||
6: prevq = curq | 6: prevq = curq | |||
7: } | 7: } | |||
Figure 7: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | |||
(Including Overload Code) | (Including Overload Code) | |||
The choice of scheduler technology is critical to overload protection | The choice of scheduler technology is critical to overload protection | |||
(see Section 4.1). | (see Section 4.1). | |||
o A well-understood weighted scheduler such as weighted round robin | o A well-understood weighted scheduler such as weighted round robin | |||
(WRR) is recommended. The scheduler weight for Classic should be | (WRR) is recommended. The scheduler weight for Classic should be | |||
low, e.g. 1/16. | low, e.g. 1/16. | |||
o Alternatively, a time-shifted FIFO could be used. This is a very | o Alternatively, a time-shifted FIFO could be used. This is a very | |||
skipping to change at page 31, line 33 ¶ | skipping to change at page 34, line 33 ¶ | |||
14: return(NULL) % no packet to dequeue | 14: return(NULL) % no packet to dequeue | |||
15: } | 15: } | |||
16: maxrand(u) { % return the max of u random numbers | 16: maxrand(u) { % return the max of u random numbers | |||
17: maxr=0 | 17: maxr=0 | |||
18: while (u-- > 0) | 18: while (u-- > 0) | |||
19: maxr = max(maxr, rand()) % 0 <= rand() < 1 | 19: maxr = max(maxr, rand()) % 0 <= rand() < 1 | |||
20: return(maxr) | 20: return(maxr) | |||
21: } | 21: } | |||
Figure 8: Example Dequeue Pseudocode for DualQ Coupled Curvy RED AQM | Figure 9: Example Dequeue Pseudocode for DualQ Coupled Curvy RED AQM | |||
Packet classification code is not shown, as it is no different from | Packet classification code is not shown, as it is no different from | |||
Figure 3. Potential classification schemes are discussed in | Figure 3. Potential classification schemes are discussed in | |||
Section 2.3. The Curvy RED algorithm has not been maintained to the | Section 2.3. The Curvy RED algorithm has not been maintained to the | |||
same degree as the DualPI2 algorithm. Some ideas used in DualPI2 | same degree as the DualPI2 algorithm. Some ideas used in DualPI2 | |||
would need to be translated into Curvy RED, such as i) the | would need to be translated into Curvy RED, such as i) the | |||
conditional priority scheduler instead of strict priority ii) the | conditional priority scheduler instead of strict priority ii) the | |||
time-based L4S threshold; iii) turning off ECN as overload | time-based L4S threshold; iii) turning off ECN as overload | |||
protection; iv) Classic ECN support. These are not shown in the | protection; iv) Classic ECN support. These are not shown in the | |||
Curvy RED pseudocode, but would need to be implemented for | Curvy RED pseudocode, but would need to be implemented for | |||
skipping to change at page 32, line 42 ¶ | skipping to change at page 35, line 42 ¶ | |||
Specifically, in line 3a the marking probability p_L is set to the | Specifically, in line 3a the marking probability p_L is set to the | |||
Classic queueing time qc.sec() in seconds divided by the L4S | Classic queueing time qc.sec() in seconds divided by the L4S | |||
scaling parameter 2^S_L, which represents the queuing time (in | scaling parameter 2^S_L, which represents the queuing time (in | |||
seconds) at which marking probability would hit 100%. Then in line | seconds) at which marking probability would hit 100%. Then in line | |||
3d (if U=1) the result is compared with a uniformly distributed | 3d (if U=1) the result is compared with a uniformly distributed | |||
random number between 0 and 1, which ensures that marking | random number between 0 and 1, which ensures that marking | |||
probability will linearly increase with queueing time. The | probability will linearly increase with queueing time. The | |||
scaling parameter is expressed as a power of 2 so that division | scaling parameter is expressed as a power of 2 so that division | |||
can be implemented as a right bit-shift (>>) in line 3 of the | can be implemented as a right bit-shift (>>) in line 3 of the | |||
integer variant of the pseudocode (Figure 9). | integer variant of the pseudocode (Figure 10). | |||
Classic: If the test at line 7 determines that there is at least one | Classic: If the test at line 7 determines that there is at least one | |||
Classic packet to dequeue, the test at line 9b determines whether | Classic packet to dequeue, the test at line 9b determines whether | |||
to drop it. But before that, line 8b updates Q_C, which is an | to drop it. But before that, line 8b updates Q_C, which is an | |||
exponentially weighted moving average (Note 5) of the queuing time | exponentially weighted moving average (Note 5) of the queuing time | |||
in the Classic queue, where pkt.sec() is the instantaneous | in the Classic queue, where pkt.sec() is the instantaneous | |||
queueing time of the current Classic packet and alpha is the EWMA | queueing time of the current Classic packet and alpha is the EWMA | |||
constant for the classic queue. In line 8a, alpha is represented | constant for the classic queue. In line 8a, alpha is represented | |||
as an integer power of 2, so that in line 8 of the integer code | as an integer power of 2, so that in line 8 of the integer code | |||
the division needed to weight the moving average can be | the division needed to weight the moving average can be | |||
skipping to change at page 35, line 31 ¶ | skipping to change at page 38, line 31 ¶ | |||
7: while ( cq.dequeue(pkt) ) { | 7: while ( cq.dequeue(pkt) ) { | |||
8: Q_C += (pkt.ns() - Q_C) >> f_C % Classic Q EWMA | 8: Q_C += (pkt.ns() - Q_C) >> f_C % Classic Q EWMA | |||
9: if ( (Q_C >> (S_C-2) ) > maxrand(2*U) ) | 9: if ( (Q_C >> (S_C-2) ) > maxrand(2*U) ) | |||
10: drop(pkt) % Squared drop, redo loop | 10: drop(pkt) % Squared drop, redo loop | |||
11: else | 11: else | |||
12: return(pkt) % return the packet and stop here | 12: return(pkt) % return the packet and stop here | |||
13: } | 13: } | |||
14: return(NULL) % no packet to dequeue | 14: return(NULL) % no packet to dequeue | |||
15: } | 15: } | |||
Figure 9: Optimised Example Dequeue Pseudocode for Coupled DualQ AQM | Figure 10: Optimised Example Dequeue Pseudocode for Coupled DualQ AQM | |||
using Integer Arithmetic | using Integer Arithmetic | |||
Notes: | Notes: | |||
1. The drain rate of the queue can vary if it is scheduled relative | 1. The drain rate of the queue can vary if it is scheduled relative | |||
to other queues, or to cater for fluctuations in a wireless | to other queues, or to cater for fluctuations in a wireless | |||
medium. To auto-adjust to changes in drain rate, the queue must | medium. To auto-adjust to changes in drain rate, the queue must | |||
be measured in time, not bytes or packets [CoDel]. In our Linux | be measured in time, not bytes or packets [CoDel]. In our Linux | |||
implementation, it was easiest to measure queuing time at | implementation, it was easiest to measure queuing time at | |||
dequeue. Queuing time can be estimated when a packet is enqueued | dequeue. Queuing time can be estimated when a packet is enqueued | |||
skipping to change at page 36, line 26 ¶ | skipping to change at page 39, line 26 ¶ | |||
adaptive smoothing methods could be valid and it might be | adaptive smoothing methods could be valid and it might be | |||
appropriate to decrease the EWMA faster than it increases. | appropriate to decrease the EWMA faster than it increases. | |||
6. In practice at line 10 the Classic queue would probably test for | 6. In practice at line 10 the Classic queue would probably test for | |||
ECN capability on the packet to determine whether to drop or mark | ECN capability on the packet to determine whether to drop or mark | |||
the packet. However, for brevity such detail is omitted. All | the packet. However, for brevity such detail is omitted. All | |||
packets classified into the L4S queue have to be ECN-capable, so | packets classified into the L4S queue have to be ECN-capable, so | |||
no dropping logic is necessary at line 3. Nonetheless, L4S | no dropping logic is necessary at line 3. Nonetheless, L4S | |||
packets could be dropped by overload code (see Section 4.1). | packets could be dropped by overload code (see Section 4.1). | |||
7. In the integer variant of the pseudocode (Figure 9) real numbers | 7. In the integer variant of the pseudocode (Figure 10) real numbers | |||
are all represented as integers scaled up by 2^32. In lines 3 & | are all represented as integers scaled up by 2^32. In lines 3 & | |||
9 the function maxrand() is arranged to return an integer in the | 9 the function maxrand() is arranged to return an integer in the | |||
range 0 <= maxrand() < 2^32. Queuing times are also scaled up by | range 0 <= maxrand() < 2^32. Queuing times are also scaled up by | |||
2^32, but in two stages: i) In lines 3 and 8 queuing times | 2^32, but in two stages: i) In lines 3 and 8 queuing times | |||
cq.ns() and pkt.ns() are returned in integer nanoseconds, making | cq.ns() and pkt.ns() are returned in integer nanoseconds, making | |||
the values about 2^30 times larger than when the units were | the values about 2^30 times larger than when the units were | |||
seconds, ii) then in lines 3 and 9 an adjustment of -2 to the | seconds, ii) then in lines 3 and 9 an adjustment of -2 to the | |||
right bit-shift multiplies the result by 2^2, to complete the | right bit-shift multiplies the result by 2^2, to complete the | |||
scaling by 2^32. | scaling by 2^32. | |||
skipping to change at page 37, line 46 ¶ | skipping to change at page 40, line 46 ¶ | |||
For a typical mix of RTTs from local data centres and across the | For a typical mix of RTTs from local data centres and across the | |||
general Internet, a value of k'=1 (equivalent to k=2) is recommended | general Internet, a value of k'=1 (equivalent to k=2) is recommended | |||
as a good workable compromise. | as a good workable compromise. | |||
Appendix D. Open Issues | Appendix D. Open Issues | |||
Most of the following open issues are also tagged '{ToDo}' at the | Most of the following open issues are also tagged '{ToDo}' at the | |||
appropriate point in the document: | appropriate point in the document: | |||
Operational guidance to monitor L4S experiment | ||||
PI2 appendix: scaling of alpha & beta, esp. dependence of beta_U | PI2 appendix: scaling of alpha & beta, esp. dependence of beta_U | |||
on Tupdate | on Tupdate | |||
Curvy RED appendix: complete the unfinished parts | Curvy RED appendix: complete the unfinished parts | |||
Authors' Addresses | Authors' Addresses | |||
Koen De Schepper | Koen De Schepper | |||
Nokia Bell Labs | Nokia Bell Labs | |||
Antwerp | Antwerp | |||
End of changes. 90 change blocks. | ||||
278 lines changed or deleted | 402 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |