--- 1/draft-ietf-tsvwg-aqm-dualq-coupled-06.txt 2018-10-22 17:13:08.120433692 -0700
+++ 2/draft-ietf-tsvwg-aqm-dualq-coupled-07.txt 2018-10-22 17:13:08.200435632 -0700
@@ -1,24 +1,24 @@
Transport Area working group (tsvwg) K. De Schepper
Internet-Draft Nokia Bell Labs
Intended status: Experimental B. Briscoe, Ed.
-Expires: January 19, 2019 CableLabs
+Expires: April 25, 2019 CableLabs
O. Bondarenko
Simula Research Lab
I. Tsang
Nokia
- July 18, 2018
+ October 22, 2018
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput
(L4S)
- draft-ietf-tsvwg-aqm-dualq-coupled-06
+ draft-ietf-tsvwg-aqm-dualq-coupled-07
Abstract
Data Centre TCP (DCTCP) was designed to provide predictably low
queuing latency, near-zero loss, and throughput scalability using
explicit congestion notification (ECN) and an extremely simple
marking behaviour on switches. However, DCTCP does not co-exist with
existing TCP traffic---DCTCP is so aggressive that existing TCP
algorithms approach starvation. So, until now, DCTCP could only be
deployed where a clean-slate environment could be arranged, such as
@@ -42,21 +42,21 @@
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
- This Internet-Draft will expire on January 19, 2019.
+ This Internet-Draft will expire on April 25, 2019.
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
@@ -72,31 +72,31 @@
1.1. Problem and Scope . . . . . . . . . . . . . . . . . . . . 3
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Features . . . . . . . . . . . . . . . . . . . . . . . . 6
2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 7
2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 7
2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 8
2.3. Traffic Classification . . . . . . . . . . . . . . . . . 8
2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 9
2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 11
2.5.1. Functional Requirements . . . . . . . . . . . . . . . 11
- 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 12
- 2.5.2. Management Requirements . . . . . . . . . . . . . . . 13
- 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
- 4. Security Considerations . . . . . . . . . . . . . . . . . . . 14
- 4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 14
+ 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 13
+ 2.5.2. Management Requirements . . . . . . . . . . . . . . . 14
+ 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
+ 4. Security Considerations . . . . . . . . . . . . . . . . . . . 15
+ 4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 15
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput
or Delay? . . . . . . . . . . . . . . . . . . . . . . 15
4.1.2. Congestion Signal Saturation: Introduce L4S Drop or
Delay? . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.3. Protecting against Unresponsive ECN-Capable Traffic . 17
- 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17
+ 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18
6. References . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.1. Normative References . . . . . . . . . . . . . . . . . . 18
6.2. Informative References . . . . . . . . . . . . . . . . . 18
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 21
A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 21
A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 27
Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 30
Appendix C. Guidance on Controlling Throughput Equivalence . . . 36
Appendix D. Open Issues . . . . . . . . . . . . . . . . . . . . 37
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38
@@ -321,32 +321,36 @@
rate equation for DCTCP, for which cwnd is inversely proportional to
p (not the square root), where in this case p is the ECN marking
probability. DCTCP is not the only congestion control that behaves
like this, so the term 'L4S' traffic will be used for all similar
behaviour.
In order to make a DCTCP flow run at roughly the same rate as a Reno
TCP flow (all other factors being equal), the drop or marking
probability for Classic traffic, p_C has to be distinct from the
marking probability for L4S traffic, p_L (in contrast to RFC3168
- which requires them to be the same). It is necessary to make the
- Classic drop probability p_C proportional to the square of the L4S
- marking probability p_L. This makes the Reno flow rate roughly equal
- the DCTCP flow rate, because it squares the square root of p_C in the
- Reno rate equation to make it proportional to the straight p_L in the
- DCTCP rate equation.
+ which requires them to be the same). To remain stable, Classic
+ traffic needs p_C to change relatively slowly, whereas L4S traffic
+ needs to be controlled rapidly by a probability p_L that track the
+ instantaneous queue. It is necessary to make the Classic drop
+ probability p_C proportional to the square of a variable we shall
+ call p_CL, which is an input to the instantaneous L4S marking
+ probability p_L but changes as slowly as p_C. This makes the Reno
+ flow rate roughly equal the DCTCP flow rate, because it squares the
+ square root of p_C in the Reno rate equation to make it proportional
+ to the smoothed value of p_L used in the DCTCP rate equation.
Stating this as a formula, the relation between Classic drop
- probability, p_C, and L4S marking probability, p_L needs to take the
- form:
+ probability, p_C, and the input variable p_CL to the L4S marking
+ probability p_L needs to take the form:
- p_C = ( p_L / k )^2 (1)
+ p_C = ( p_CL / k )^2 (1)
where k is the constant of proportionality.
2.2. Dual Queue
Classic traffic typically builds a large queue to prevent under-
utilization. Therefore a separate queue is provided for L4S traffic,
and it is scheduled with priority over Classic. Priority is
conditional to prevent starvation of Classic traffic.
@@ -392,52 +396,63 @@
Figure 1 shows the overall structure that any DualQ Coupled AQM is
likely to have. This schematic is intended to aid understanding of
the current designs of DualQ Coupled AQMs. However, it is not
intended to preclude other innovative ways of satisfying the
normative requirements in Section 2.5 that minimally define a DualQ
Coupled AQM.
The classifier on the left separates incoming traffic between the two
queues (L and C). Each queue has its own AQM that determines the
- likelihood of dropping or marking (p_L and p_C). Nonetheless, the
- AQM for Classic traffic is implemented in two stages: i) a base stage
- that outputs an internal probability p' (pronounced p-prime); and ii)
- a squaring stage that outputs p_C, where
+ likelihood of marking or dropping (p_L and p_C). It has been proved
+ [PI2] that it is preferable to control TCP with a linear PI
+ controller, then square the output before applying it as a drop
+ probability to TCP. So, the AQM for Classic traffic needs to be
+ implemented in two stages: i) a base stage that outputs an internal
+ probability p' (pronounced p-prime); and ii) a squaring stage that
+ outputs p_C, where
p_C = (p')^2. (2)
- This allows p_L to be coupled to p_C by marking L4S traffic
- proportionately to the intermediate output from the first stage.
- Specifically, the output of the base AQM is coupled across to the L
- queue in proportion to the output of the base AQM:
+ Substituting for p_C in Eqn (1) gives:
+
+ p' = p_CL / k
+
+ So we get our slow-moving input to ECN marking in the L queue as:
p_CL = k*p', (3)
- where k is the constant coupling factor (see Appendix C) and p_CL is
- the output from the coupling between the C queue and the L queue.
+ where k is the constant coupling factor (see Appendix C).
- It can be seen in the following that these two transformations of p'
- implement the required coupling given in equation (1) earlier.
- Substituting for p' from equation (3) into (2):
+ It can be seen that these two transformations of p' implement the
+ required coupling given in equation (1) earlier. Substituting for p'
+ from equation (3) into (2):
p_C = ( p_CL / k )^2.
- The actual L4S marking probability p_L is the maximum of the coupled
- output (p_CL) and the output of a native L4S AQM (p'L), shown as
- '(MAX)' in the schematic. While the output of the Native L4S AQM is
- high (p'L > p_CL) it will dominate the way L traffic is marked. When
- the native L4S AQM output is lower, the way L traffic is marked will
- be driven by the coupling, that is p_L = p_CL. So, whenever the
- coupling is needed, as required from equation (1):
+ The actual probability p_L that we apply to the L queue needs to
+ track the immediate L queue delay, as well as track p_CL under
+ stationary conditions. So we use a native AQM in the L queue that
+ calculates a marking probability p'L as a function of the
+ instantaneous L queue. And, given the L queue has conditional strict
+ priority over the C queue, whenever the L queue grows, we should
+ apply marking probability p'_L, but p_L should not fall below p_CL.
+ This suggests:
- p_C = ( p_L / k )^2.
+ p_L = max(p'L, p_CL),
+
+ which has also been found to work very well in practice.
+
+ This allows p_L to be coupled to p_C by marking L4S traffic
+ proportionately to the intermediate output from the first stage.
+ Specifically, the output of the base AQM is coupled across to the L
+ queue in proportion to the output of the base AQM
_________
| | ,------.
L4S queue | |===>| ECN |
,'| _______|_| |marker|\
<' | | `------'\\
//`' v ^ p_L \\
// ,-------. | \\
// |Native |p'L | \\,.
// | L4S |-->(MAX) < | ___
@@ -563,23 +577,23 @@
leads to the need to specify what the AQM in each queue ought to do
with packets that do not carry the ECN field expected for that queue.
It is recommended that the AQM in each queue inspects the ECN field
to determine what sort of congestion notification to signal, then
decides whether to apply congestion notification to this particular
packet, as follows:
o If a packet that does not carry an ECT(1) or CE codepoint is
classified into the L queue:
- * if the packet is ECT(0), the L AQM SHOULD apply drop using a
- drop probability appropriate to Classic congestion control and
- appropriate to the target delay in the L queue
+ * if the packet is ECT(0), the L AQM SHOULD apply CE-marking
+ using a probability appropriate to Classic congestion control
+ and appropriate to the target delay in the L queue
* if the packet is Not-ECT, the appropriate action depends on
whether some other function is protecting the L queue from
misbehaving flows (e.g. per-flow queue protection or latency
policing):
+ If separate queue protection is provided, the L AQM SHOULD
ignore the packet and forward it unchanged, meaning it
should not calculate whether to apply congestion
notification and it should neither drop nor CE-mark the
@@ -632,21 +646,23 @@
o The maximum Classic ECN marking probability, p_Cmax, before
switching over to drop.
An experimental DualQ Coupled AQM SHOULD allow the operator to
monitor the following operational statistics:
o Bits forwarded (total and per queue per sample interval), from
which utilization can be calculated
- o Q delay (per queue over sample interval)
+ o Q delay (per queue over sample interval) {ToDo: max per interval,
+ histogram with configurable edges (from which percentile(s) can be
+ derived), not incl. medium access delay}
o Total packets arriving, enqueued and dequeued (per queue per
sample interval)
o ECN packets marked, non-ECN packets dropped, ECN packets dropped
(per queue per sample interval), from which marking and dropping
probabilities can be calculated
o Time and duration of each overload event.
@@ -1024,33 +1040,33 @@
explained as it is encountered in the walk-through of the pseudocode
below.
1: dualpi2_params_init(...) { % Set input parameter defaults
2: % PI2 AQM parameters
3: target = 15 ms % PI AQM Classic queue delay target
4: Tupdate = 16 ms % PI Classic queue sampling interval
5: alpha = 10 Hz^2 % PI integral gain
6: beta = 100 Hz^2 % PI proportional gain
7: p_Cmax = 1/4 % Max Classic drop/mark prob
- 8: % Derived PI2 AQM variables
+ 8: % Constants derived from PI2 AQM parameters
9: alpha_U = alpha *Tupdate % PI integral gain per update interval
10: beta_U = beta * Tupdate % PI prop'nal gain per update interval
11:
12: % DualQ Coupled framework parameters
13: k = 2 % Coupling factor
14: % scheduler weight or equival't parameter (scheduler-dependent)
15: limit = MAX_LINK_RATE * 250 ms % Dual buffer size
16:
17: % L4S AQM parameters
18: T_time = 1 ms % L4S marking threshold in time
19: T_len = 2 * MTU % Min L4S marking threshold in bytes
- 20: % Derived L4S AQM variables
+ 20: % Constants derived from L4S AQM parameters
21: p_Lmax = min(k*sqrt(p_Cmax), 1) % Max L4S marking prob
22: }
Figure 2: Example Header Pseudocode for DualQ Coupled PI2 AQM
The overall goal of the code is to maintain the base probability (p),
which is an internal variable from which the marking and dropping
probabilities for L4S and Classic traffic (p_L and p_C) are derived.
The variable named p in the pseudocode and in this walk-through is
the same as p' (p-prime) in Section 2.4. The probabilities p_L and
@@ -1069,20 +1085,23 @@
8: cq.enqueue(pkt)
9: }
10: }
Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM
1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues
2: while ( lq.len() + cq.len() > 0 )
3: if ( scheduler() == lq ) {
4: lq.dequeue(pkt) % Scheduler chooses lq
+
+ {ToDo: Generalize 5-7 for any L AQM (see email to Tom 9-Aug-18)}
+
5: if ( ((lq.time() > T_time) % step marking ...
6: AND (lq.len() > T_len))
7: OR (p_CL > rand()) ) % ...or linear marking
8: mark(pkt)
9: } else {
10: cq.dequeue(pkt) % Scheduler chooses cq
11: if ( p_C > rand() ) { % probability p_C = p^2
12: if ( ecn(pkt) == 0 ) { % if ECN field = not-ECT
13: drop(pkt) % squared drop
14: continue % continue to the top of the while loop
@@ -1093,29 +1112,42 @@
19: return(pkt) % return the packet and stop
20: }
21: return(NULL) % no packet to dequeue
22: }
Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM
When packets arrive, first a common queue limit is checked as shown
in line 2 of the enqueuing pseudocode in Figure 3. Note that the
limit is deliberately tested before enqueue to avoid any bias against
- larger packets (so the actual buffer has to be one MTU larger than
- limit). If limit is not exceeded, the packet will be classified and
- enqueued to the Classic or L4S queue dependent on the least
- significant bit of the ECN field in the IP header (line 5). Packets
- with a codepoint having an LSB of 0 (Not-ECT and ECT(0)) will be
- enqueued in the Classic queue. Otherwise, ECT(1) and CE packets will
- be enqueued in the L4S queue. Optional additional packet
- classification flexibility is omitted for brevity (see
- [I-D.ietf-tsvwg-ecn-l4s-id]).
+ larger packets (so depending whether the implementation stores a
+ packet while testing whether to drop it from the tail, it might be
+ necessary for the actual buffer memory to be one MTU larger than
+ limit).
+
+ Line 2 assumes an implementation where lq and cq share common buffer
+ memory. An alternative implementation could use separate buffers for
+ each queue, in which case the arriving packet would have to be
+ classified first to determine which buffer to check for available
+ space. The choice is a trade off; a shared buffer can use less
+ memory whereas separate buffers isolate the L4S queue from tail-drop
+ due to large bursts of Classic traffic (e.g. a Classic TCP during
+ slow-start over a long RTT).
+
+ Returning to the shared buffer case, if limit is not exceeded, the
+ packet will be classified and enqueued to the Classic or L4S queue
+ dependent on the least significant bit of the ECN field in the IP
+ header (line 5). Packets with a codepoint having an LSB of 0 (Not-
+ ECT and ECT(0)) will be enqueued in the Classic queue. Otherwise,
+ ECT(1) and CE packets will be enqueued in the L4S queue. Optional
+ additional packet classification flexibility is omitted for brevity
+ (see [I-D.ietf-tsvwg-ecn-l4s-id]).
The dequeue pseudocode (Figure 4) is repeatedly called whenever the
lower layer is ready to forward a packet. It schedules one packet
for dequeuing (or zero if the queue is empty) then returns control to
the caller, so that it does not block while that packet is being
forwarded. While making this dequeue decision, it also makes the
necessary AQM decisions on dropping or marking. The alternative of
applying the AQMs at enqueue would shift some processing from the
critical time when each packet is dequeued. However, it would also
add a whole queue of delay to the control signals, making the control
@@ -1266,21 +1299,24 @@
ensure that the L4S queue starts to introduce dropping once marking
saturates and can rise no further. The 'TCP Prague' requirements
[I-D.ietf-tsvwg-ecn-l4s-id] state that, when an L4S congestion
control detects a drop, it falls back to a response that coexists
with 'Classic' TCP. So it is correct that the L4S queue drops
packets proportional to p^2, as if they are Classic packets.
Both these switch-overs are triggered by the tests for overload
introduced in lines 4b and 12b of the dequeue function (Figure 6).
Lines 8c to 8g drop L4S packets with probability p^2. Lines 8h to 8i
- mark the remaining packets with probability p_CL.
+ mark the remaining packets with probability p_CL. If p_Lmax = 1,
+ which is the suggested default configuration, all remaining packets
+ will be marked because, to have reached the else block at line 8b,
+ p_CL >= 1.
Lines 2c to 2d in the core PI algorithm (Figure 7) deal with overload
of the L4S queue when there is no Classic traffic. This is
necessary, because the core PI algorithm maintains the appropriate
drop probability to regulate overload, but it depends on the length
of the Classic queue. If there is no Classic queue the naive
algorithm in Figure 5 drops nothing, even if the L4S queue is
overloaded - so tail drop would have to take over (lines 3 and 4 of
Figure 3).