draft-ietf-tsvwg-aqm-dualq-coupled-11.txt | draft-ietf-tsvwg-aqm-dualq-coupled-12.txt | |||
---|---|---|---|---|
Transport Area working group (tsvwg) K. De Schepper | Transport Area working group (tsvwg) K. De Schepper | |||
Internet-Draft Nokia Bell Labs | Internet-Draft Nokia Bell Labs | |||
Intended status: Experimental B. Briscoe, Ed. | Intended status: Experimental B. Briscoe, Ed. | |||
Expires: September 10, 2020 Independent | Expires: January 28, 2021 Independent | |||
G. White | G. White | |||
CableLabs | CableLabs | |||
March 9, 2020 | July 27, 2020 | |||
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | |||
(L4S) | (L4S) | |||
draft-ietf-tsvwg-aqm-dualq-coupled-11 | draft-ietf-tsvwg-aqm-dualq-coupled-12 | |||
Abstract | Abstract | |||
The Low Latency Low Loss Scalable Throughput (L4S) architecture | The Low Latency Low Loss Scalable Throughput (L4S) architecture | |||
allows data flows over the public Internet to achieve consistent low | allows data flows over the public Internet to achieve consistent low | |||
queuing latency, generally zero congestion loss and scaling of per- | queuing latency, generally zero congestion loss and scaling of per- | |||
flow throughput without the scaling problems of standard TCP Reno- | flow throughput without the scaling problems of standard TCP Reno- | |||
friendly congestion controls. To achieve this, L4S data flows have | friendly congestion controls. To achieve this, L4S data flows have | |||
to use one of the family of 'Scalable' congestion controls (TCP | to use one of the family of 'Scalable' congestion controls (TCP | |||
Prague and Data Center TCP are examples) and a form of Explicit | Prague and Data Center TCP are examples) and a form of Explicit | |||
skipping to change at page 2, line 20 ¶ | skipping to change at page 2, line 20 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on September 10, 2020. | This Internet-Draft will expire on January 28, 2021. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2020 IETF Trust and the persons identified as the | Copyright (c) 2020 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 25, line 32 ¶ | skipping to change at page 25, line 32 ¶ | |||
implementations were tested. | implementations were tested. | |||
7. References | 7. References | |||
7.1. Normative References | 7.1. Normative References | |||
[I-D.ietf-tsvwg-ecn-l4s-id] | [I-D.ietf-tsvwg-ecn-l4s-id] | |||
Schepper, K. and B. Briscoe, "Identifying Modified | Schepper, K. and B. Briscoe, "Identifying Modified | |||
Explicit Congestion Notification (ECN) Semantics for | Explicit Congestion Notification (ECN) Semantics for | |||
Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s- | Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s- | |||
id-09 (work in progress), February 2020. | id-10 (work in progress), March 2020. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
RFC 3168, DOI 10.17487/RFC3168, September 2001, | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
<https://www.rfc-editor.org/info/rfc3168>. | <https://www.rfc-editor.org/info/rfc3168>. | |||
skipping to change at page 27, line 31 ¶ | skipping to change at page 27, line 31 ¶ | |||
[I-D.briscoe-tsvwg-l4s-diffserv] | [I-D.briscoe-tsvwg-l4s-diffserv] | |||
Briscoe, B., "Interactions between Low Latency, Low Loss, | Briscoe, B., "Interactions between Low Latency, Low Loss, | |||
Scalable Throughput (L4S) and Differentiated Services", | Scalable Throughput (L4S) and Differentiated Services", | |||
draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), | draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), | |||
November 2018. | November 2018. | |||
[I-D.ietf-tsvwg-l4s-arch] | [I-D.ietf-tsvwg-l4s-arch] | |||
Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low | Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low | |||
Latency, Low Loss, Scalable Throughput (L4S) Internet | Latency, Low Loss, Scalable Throughput (L4S) Internet | |||
Service: Architecture", draft-ietf-tsvwg-l4s-arch-05 (work | Service: Architecture", draft-ietf-tsvwg-l4s-arch-06 (work | |||
in progress), February 2020. | in progress), March 2020. | |||
[I-D.ietf-tsvwg-nqb] | [I-D.ietf-tsvwg-nqb] | |||
White, G. and T. Fossati, "A Non-Queue-Building Per-Hop | White, G. and T. Fossati, "A Non-Queue-Building Per-Hop | |||
Behavior (NQB PHB) for Differentiated Services", draft- | Behavior (NQB PHB) for Differentiated Services", draft- | |||
ietf-tsvwg-nqb-00 (work in progress), November 2019. | ietf-tsvwg-nqb-01 (work in progress), March 2020. | |||
[L4Sdemo16] | [L4Sdemo16] | |||
Bondarenko, O., De Schepper, K., Tsang, I., and B. | Bondarenko, O., De Schepper, K., Tsang, I., and B. | |||
Briscoe, "Ultra-Low Delay for All: Live Experience, Live | Briscoe, "Ultra-Low Delay for All: Live Experience, Live | |||
Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | |||
<http://dl.acm.org/citation.cfm?doid=2910017.2910633 | <http://dl.acm.org/citation.cfm?doid=2910017.2910633 | |||
(videos of demos: | (videos of demos: | |||
https://riteproject.eu/dctth/#1511dispatchwg )>. | https://riteproject.eu/dctth/#1511dispatchwg )>. | |||
[LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | |||
skipping to change at page 46, line 9 ¶ | skipping to change at page 46, line 9 ¶ | |||
18: S_L = S_C - k' % L4S ramp scaling factor as a power of 2 | 18: S_L = S_C - k' % L4S ramp scaling factor as a power of 2 | |||
19: range_L = 2^S_L % Range of L4S ramp | 19: range_L = 2^S_L % Range of L4S ramp | |||
20: } | 20: } | |||
Figure 9: Example Header Pseudocode for DualQ Coupled Curvy RED AQM | Figure 9: Example Header Pseudocode for DualQ Coupled Curvy RED AQM | |||
1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | 1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | |||
2: while ( lq.len() + cq.len() > 0 ) { | 2: while ( lq.len() + cq.len() > 0 ) { | |||
3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
4: lq.dequeue(pkt) % L4S scheduled | 4: lq.dequeue(pkt) % L4S scheduled | |||
5a: p_CL = (cq.time() - minTh) / range_L | 5a: p_CL = (Q_C - minTh) / range_L | |||
5b: if ( ( lq.time() > T ) | 5b: if ( ( lq.time() > T ) | |||
5c: OR ( p_CL > maxrand(U) ) ) | 5c: OR ( p_CL > maxrand(U) ) ) | |||
6: mark(pkt) | 6: mark(pkt) | |||
7: } else { | 7: } else { | |||
8: cq.dequeue(pkt) % Classic scheduled | 8: cq.dequeue(pkt) % Classic scheduled | |||
9a: Q_C = gamma * qc.time() + (1-gamma) * Q_C % Classic Q EWMA | 9a: Q_C = gamma * cq.time() + (1-gamma) * Q_C % Classic Q EWMA | |||
10a: sqrt_p_C = (Q_C - minTh) / range_C | 10a: sqrt_p_C = (Q_C - minTh) / range_C | |||
10b: if ( sqrt_p_C > maxrand(2*U) ) { | 10b: if ( sqrt_p_C > maxrand(2*U) ) { | |||
11: if ( (ecn(pkt) == 0) { % ECN field = not-ECT | 11: if ( (ecn(pkt) == 0) { % ECN field = not-ECT | |||
12: drop(pkt) % Squared drop, redo loop | 12: drop(pkt) % Squared drop, redo loop | |||
13: continue % continue to the top of the while loop | 13: continue % continue to the top of the while loop | |||
14: } | 14: } | |||
15: mark(pkt) | 15: mark(pkt) | |||
16: } | 16: } | |||
17: } | 17: } | |||
18: return(pkt) % return the packet and stop here | 18: return(pkt) % return the packet and stop here | |||
skipping to change at page 47, line 27 ¶ | skipping to change at page 47, line 27 ¶ | |||
Within each queue, the decision whether to forward, drop or mark is | Within each queue, the decision whether to forward, drop or mark is | |||
taken as follows (to simplify the explanation, it is assumed that | taken as follows (to simplify the explanation, it is assumed that | |||
U=1): | U=1): | |||
L4S: If the test at line 3 determines there is an L4S packet to | L4S: If the test at line 3 determines there is an L4S packet to | |||
dequeue, the tests at lines 5b and 5c determine whether to mark | dequeue, the tests at lines 5b and 5c determine whether to mark | |||
it. The first is a simple test of whether the L4S queue delay | it. The first is a simple test of whether the L4S queue delay | |||
(lq.time()) is greater than a step threshold T (Note 3). The | (lq.time()) is greater than a step threshold T (Note 3). The | |||
second test is similar to the random ECN marking in RED, but with | second test is similar to the random ECN marking in RED, but with | |||
the following differences: ii) marking depends on queuing time, | the following differences: i) marking depends on queuing time, not | |||
not bytes, in order to scale for any link rate without being | bytes, in order to scale for any link rate without being | |||
reconfigured; ii) marking of the L4S queue does not depend on | reconfigured; ii) marking of the L4S queue depends on a logical OR | |||
itself, it depends on the queuing time of the _other_ (Classic) | of two tests; one against its own queuing time and one against the | |||
queue, where cq.time() is the queuing time of the packet at the | queuing time of the _other_ (Classic) queue; iii) the tests are | |||
head of the Classic queue (zero if empty); iii) marking depends on | against the instantaneous queuing time of the L4S queue, but a | |||
the instantaneous queuing time (of the other Classic queue), not a | smoothed average of the other (Classic) queue; iv) the queue is | |||
smoothed average; iv) the queue is compared with the maximum of U | compared with the maximum of U random numbers (but if U=1, this is | |||
random numbers (but if U=1, this is the same as the single random | the same as the single random number used in RED). | |||
number used in RED). | ||||
Specifically, in line 5a the coupled marking probability p_CL is | Specifically, in line 5a the coupled marking probability p_CL is | |||
set to the excess of the Classic queueing delay qc.time() above | set to the amount by which the averaged Classic queueing delay Q_C | |||
the minimum queuing delay threshold (minTh) all divided by the L4S | exceeds the minimum queuing delay threshold (minTh) all divided by | |||
scaling parameter range_L. range_L represents the queuing delay | the L4S scaling parameter range_L. range_L represents the queuing | |||
(in seconds) added to minTh at which marking probability would hit | delay (in seconds) added to minTh at which marking probability | |||
100%. Then in line 5c (if U=1) the result is compared with a | would hit 100%. Then in line 5c (if U=1) the result is compared | |||
uniformly distributed random number between 0 and 1, which ensures | with a uniformly distributed random number between 0 and 1, which | |||
that marking probability will linearly increase with queueing | ensures that, over range_L, marking probability will linearly | |||
time. | increase with queueing time. | |||
Classic: If the scheduler at line 3 chooses to dequeue a Classic | Classic: If the scheduler at line 3 chooses to dequeue a Classic | |||
packet and jumps to line 7, the test at line 10b determines | packet and jumps to line 7, the test at line 10b determines | |||
whether to drop or mark it. But before that, line 9a updates Q_C, | whether to drop or mark it. But before that, line 9a updates Q_C, | |||
which is an exponentially weighted moving average (Note 4) of the | which is an exponentially weighted moving average (Note 4) of the | |||
queuing time in the Classic queue, where qc.time() is the current | queuing time of the Classic queue, where cq.time() is the current | |||
instantaneous queueing time of the Classic queue and gamma is the | instantaneous queueing time of the packet at the head of the | |||
EWMA constant (default 1/32, see line 12 of the initialization | Classic queue (zero if empty) and gamma is the EWMA constant | |||
function). | (default 1/32, see line 12 of the initialization function). | |||
Lines 10a and 10b implement the Classic AQM. In line 10a the | Lines 10a and 10b implement the Classic AQM. In line 10a the | |||
averaged queuing time Q_C is divided by the Classic scaling | averaged queuing time Q_C is divided by the Classic scaling | |||
parameter range_C, in the same way that queuing time was scaled | parameter range_C, in the same way that queuing time was scaled | |||
for L4S marking. This scaled queuing time will be squared to | for L4S marking. This scaled queuing time will be squared to | |||
compute Classic drop probability so, before it is squared, it is | compute Classic drop probability so, before it is squared, it is | |||
effectively the square root of the drop probability, hence it is | effectively the square root of the drop probability, hence it is | |||
given the variable name sqrt_p_C. The squaring is done by | given the variable name sqrt_p_C. The squaring is done by | |||
comparing it with the maximum out of two random numbers (assuming | comparing it with the maximum out of two random numbers (assuming | |||
U=1). Comparing it with the maximum out of two is the same as the | U=1). Comparing it with the maximum out of two is the same as the | |||
skipping to change at page 48, line 38 ¶ | skipping to change at page 48, line 37 ¶ | |||
rather than queuing becomes the dominant cause of delay for short | rather than queuing becomes the dominant cause of delay for short | |||
flows, due to timeouts and tail losses. | flows, due to timeouts and tail losses. | |||
Curvy RED constrains delay with a softened target that allows some | Curvy RED constrains delay with a softened target that allows some | |||
increase in delay as load increases. This is achieved by increasing | increase in delay as load increases. This is achieved by increasing | |||
drop probability on a convex curve relative to queue growth (the | drop probability on a convex curve relative to queue growth (the | |||
square curve in the Classic queue, if U=1). Like RED, the curve hugs | square curve in the Classic queue, if U=1). Like RED, the curve hugs | |||
the zero axis while the queue is shallow. Then, as load increases, | the zero axis while the queue is shallow. Then, as load increases, | |||
it introduces a growing barrier to higher delay. But, unlike RED, it | it introduces a growing barrier to higher delay. But, unlike RED, it | |||
requires only two parameters, not three. The disadvantage of Curvy | requires only two parameters, not three. The disadvantage of Curvy | |||
RED is that it is not adapted to a wide range of RTTs. Curvy RED can | RED (compared to a PI controller for example) is that it is not | |||
be used as is when the RTT range to be supported is limited, | adapted to a wide range of RTTs. Curvy RED can be used as is when | |||
otherwise an adaptation mechanism is required. | the RTT range to be supported is limited, otherwise an adaptation | |||
mechanism is required. | ||||
From our limited experiments with Curvy RED so far, recommended | From our limited experiments with Curvy RED so far, recommended | |||
values of these parameters are: S_C = -1; g_C = 5; T = 5 * MTU at the | values of these parameters are: S_C = -1; g_C = 5; T = 5 * MTU at the | |||
link rate (about 1ms at 60Mb/s) for the range of base RTTs typical on | link rate (about 1ms at 60Mb/s) for the range of base RTTs typical on | |||
the public Internet. [CRED_Insights] explains why these parameters | the public Internet. [CRED_Insights] explains why these parameters | |||
are applicable whatever rate link this AQM implementation is deployed | are applicable whatever rate link this AQM implementation is deployed | |||
on and how the parameters would need to be adjusted for a scenario | on and how the parameters would need to be adjusted for a scenario | |||
with a different range of RTTs (e.g. a data centre). The setting of | with a different range of RTTs (e.g. a data centre). The setting of | |||
k depends on policy (see Section 2.5 and Appendix C.2 respectively | k depends on policy (see Section 2.5 and Appendix C.2 respectively | |||
for its recommended setting and guidance on alternatives). | for its recommended setting and guidance on alternatives). | |||
skipping to change at page 50, line 19 ¶ | skipping to change at page 50, line 19 ¶ | |||
in parallel out of band. For instance, if U=1, the Classic queue | in parallel out of band. For instance, if U=1, the Classic queue | |||
would require the maximum of two random numbers. So, instead of | would require the maximum of two random numbers. So, instead of | |||
calling maxrand(2*U) in-band, the maximum of every pair of values | calling maxrand(2*U) in-band, the maximum of every pair of values | |||
from a pseudorandom number generator could be generated out-of-band, | from a pseudorandom number generator could be generated out-of-band, | |||
and held in a buffer ready for the Classic queue to consume. | and held in a buffer ready for the Classic queue to consume. | |||
1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | 1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | |||
2: while ( lq.len() + cq.len() > 0 ) { | 2: while ( lq.len() + cq.len() > 0 ) { | |||
3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
4: lq.dequeue(pkt) % L4S scheduled | 4: lq.dequeue(pkt) % L4S scheduled | |||
5: if ((lq.time() > T) OR (cq.ns() >> (S_L-2) > maxrand(U))) | 5: if ((lq.time() > T) OR (Q_C >> (S_L-2) > maxrand(U))) | |||
6: mark(pkt) | 6: mark(pkt) | |||
7: } else { | 7: } else { | |||
8: cq.dequeue(pkt) % Classic scheduled | 8: cq.dequeue(pkt) % Classic scheduled | |||
9: Q_C += (cq.ns() - Q_C) >> g_C % Classic Q EWMA | 9: Q_C += (qc.ns() - Q_C) >> g_C % Classic Q EWMA | |||
10: if ( (Q_C >> (S_C-2) ) > maxrand(2*U) ) { | 10: if ( (Q_C >> (S_C-2) ) > maxrand(2*U) ) { | |||
11: if ( (ecn(pkt) == 0) { % ECN field = not-ECT | 11: if ( (ecn(pkt) == 0) { % ECN field = not-ECT | |||
12: drop(pkt) % Squared drop, redo loop | 12: drop(pkt) % Squared drop, redo loop | |||
13: continue % continue to the top of the while loop | 13: continue % continue to the top of the while loop | |||
14: } | 14: } | |||
15: mark(pkt) | 15: mark(pkt) | |||
16: } | 16: } | |||
17: } | 17: } | |||
18: return(pkt) % return the packet and stop here | 18: return(pkt) % return the packet and stop here | |||
19: } | 19: } | |||
skipping to change at page 50, line 51 ¶ | skipping to change at page 50, line 51 ¶ | |||
that division can be implemented as a right bit-shift (>>) in lines 5 | that division can be implemented as a right bit-shift (>>) in lines 5 | |||
and 10 of the integer variant of the pseudocode (Figure 11). | and 10 of the integer variant of the pseudocode (Figure 11). | |||
For the integer variant of the pseudocode, an integer version of the | For the integer variant of the pseudocode, an integer version of the | |||
rand() function used at line 25 of the maxrand(function) in Figure 10 | rand() function used at line 25 of the maxrand(function) in Figure 10 | |||
would be arranged to return an integer in the range 0 <= maxrand() < | would be arranged to return an integer in the range 0 <= maxrand() < | |||
2^32 (not shown). This would scale up all the floating point | 2^32 (not shown). This would scale up all the floating point | |||
probabilities in the range [0,1] by 2^32. | probabilities in the range [0,1] by 2^32. | |||
Queuing delays are also scaled up by 2^32, but in two stages: i) In | Queuing delays are also scaled up by 2^32, but in two stages: i) In | |||
lines 5 and 10 queuing times cq.ns() and pkt.ns() are returned in | line 9 queuing time qc.ns() is returned in integer nanoseconds, | |||
integer nanoseconds, making the values about 2^30 times larger than | making the value about 2^30 times larger than when the units were | |||
when the units were seconds, ii) then in lines 3 and 9 an adjustment | seconds, ii) then in lines 5 and 10 an adjustment of -2 to the right | |||
of -2 to the right bit-shift multiplies the result by 2^2, to | bit-shift multiplies the result by 2^2, to complete the scaling by | |||
complete the scaling by 2^32. | 2^32. | |||
In line 8 of the initialization function, the EWMA constant gamma is | In line 8 of the initialization function, the EWMA constant gamma is | |||
represented as an integer power of 2, g_C, so that in line 9 of the | represented as an integer power of 2, g_C, so that in line 9 of the | |||
integer code the division needed to weight the moving average can be | integer code the division needed to weight the moving average can be | |||
implemented by a right bit-shift (>> g_C). | implemented by a right bit-shift (>> g_C). | |||
Appendix C. Choice of Coupling Factor, k | Appendix C. Choice of Coupling Factor, k | |||
C.1. RTT-Dependence | C.1. RTT-Dependence | |||
End of changes. 16 change blocks. | ||||
42 lines changed or deleted | 42 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |