draft-ietf-tsvwg-aqm-dualq-coupled-20.txt | draft-ietf-tsvwg-aqm-dualq-coupled-21.txt | |||
---|---|---|---|---|
Transport Area working group (tsvwg) K. De Schepper | Transport Area working group (tsvwg) K. De Schepper | |||
Internet-Draft Nokia Bell Labs | Internet-Draft Nokia Bell Labs | |||
Intended status: Experimental B. Briscoe, Ed. | Intended status: Experimental B. Briscoe, Ed. | |||
Expires: 27 June 2022 Independent | Expires: 5 August 2022 Independent | |||
G. White | G. White | |||
CableLabs | CableLabs | |||
24 December 2021 | 1 February 2022 | |||
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | |||
(L4S) | (L4S) | |||
draft-ietf-tsvwg-aqm-dualq-coupled-20 | draft-ietf-tsvwg-aqm-dualq-coupled-21 | |||
Abstract | Abstract | |||
This specification defines a framework for coupling the Active Queue | This specification defines a framework for coupling the Active Queue | |||
Management (AQM) algorithms in two queues intended for flows with | Management (AQM) algorithms in two queues intended for flows with | |||
different responses to congestion. This provides a way for the | different responses to congestion. This provides a way for the | |||
Internet to transition from the scaling problems of standard TCP | Internet to transition from the scaling problems of standard TCP | |||
Reno-friendly ('Classic') congestion controls to the family of | Reno-friendly ('Classic') congestion controls to the family of | |||
'Scalable' congestion controls. These are designed for consistently | 'Scalable' congestion controls. These are designed for consistently | |||
very Low queuing Latency, very Low congestion Loss and Scaling of | very Low queuing Latency, very Low congestion Loss and Scaling of | |||
skipping to change at page 2, line 10 ¶ | skipping to change at page 2, line 10 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on 27 June 2022. | This Internet-Draft will expire on 5 August 2022. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
extracted from this document must include Revised BSD License text as | extracted from this document must include Revised BSD License text as | |||
described in Section 4.e of the Trust Legal Provisions and are | described in Section 4.e of the Trust Legal Provisions and are | |||
provided without warranty as described in the Revised BSD License. | provided without warranty as described in the Revised BSD License. | |||
skipping to change at page 3, line 14 ¶ | skipping to change at page 3, line 14 ¶ | |||
4.1.3. Protecting against Unresponsive ECN-Capable | 4.1.3. Protecting against Unresponsive ECN-Capable | |||
Traffic . . . . . . . . . . . . . . . . . . . . . . . 25 | Traffic . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 | 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 | |||
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 | 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 | 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 | |||
7.1. Normative References . . . . . . . . . . . . . . . . . . 27 | 7.1. Normative References . . . . . . . . . . . . . . . . . . 27 | |||
7.2. Informative References . . . . . . . . . . . . . . . . . 27 | 7.2. Informative References . . . . . . . . . . . . . . . . . 27 | |||
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 33 | Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 33 | |||
A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 33 | A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 33 | |||
A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 44 | A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 44 | |||
Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 48 | Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 48 | |||
B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 48 | B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 48 | |||
B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 54 | B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 54 | |||
Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 56 | Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 56 | |||
C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 56 | C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 56 | |||
C.2. Guidance on Controlling Throughput Equivalence . . . . . 57 | C.2. Guidance on Controlling Throughput Equivalence . . . . . 57 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61 | |||
1. Introduction | 1. Introduction | |||
skipping to change at page 4, line 26 ¶ | skipping to change at page 4, line 26 ¶ | |||
delay to vary or cause the link to be under-utilized. These AQMs | delay to vary or cause the link to be under-utilized. These AQMs | |||
are tuned to allow a typical capacity-seeking Reno-friendly flow | are tuned to allow a typical capacity-seeking Reno-friendly flow | |||
to induce an average queue that roughly doubles the base RTT, | to induce an average queue that roughly doubles the base RTT, | |||
adding 5-15 ms of queuing on average (cf. 500 microseconds with | adding 5-15 ms of queuing on average (cf. 500 microseconds with | |||
L4S for the same mix of long-running and web traffic). However, | L4S for the same mix of long-running and web traffic). However, | |||
for many applications low delay is not useful unless it is | for many applications low delay is not useful unless it is | |||
consistently low. With these AQMs, 99th percentile queuing delay | consistently low. With these AQMs, 99th percentile queuing delay | |||
is 20-30 ms (cf. 2 ms with the same traffic over L4S). | is 20-30 ms (cf. 2 ms with the same traffic over L4S). | |||
* Similarly, recent research into using e2e congestion control | * Similarly, recent research into using e2e congestion control | |||
without needing an AQM in the network (e.g.BBR [BBRv1], | without needing an AQM in the network | |||
[I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a | (e.g.BBR [I-D.cardwell-iccrg-bbr-congestion-control]) seems to | |||
similar lower limit to queuing delay of about 20ms on average (and | have hit a similar lower limit to queuing delay of about 20ms on | |||
any additional BBRv1 flow adds another 20ms of queuing) but there | average but there are also regular 25ms delay spikes due to | |||
are also regular 25ms delay spikes due to bandwidth probes and | bandwidth probes and 60ms spikes due to flow-starts. | |||
60ms spikes due to flow-starts. | ||||
L4S learns from the experience of Data Center TCP [RFC8257], which | L4S learns from the experience of Data Center TCP [RFC8257], which | |||
shows the power of complementary changes both in the network and on | shows the power of complementary changes both in the network and on | |||
end-systems. DCTCP teaches us that two small but radical changes to | end-systems. DCTCP teaches us that two small but radical changes to | |||
congestion control are needed to cut the two major outstanding causes | congestion control are needed to cut the two major outstanding causes | |||
of queuing delay variability: | of queuing delay variability: | |||
1. Far smaller rate variations (sawteeth) than Reno-friendly | 1. Far smaller rate variations (sawteeth) than Reno-friendly | |||
congestion controls; | congestion controls; | |||
skipping to change at page 6, line 20 ¶ | skipping to change at page 6, line 20 ¶ | |||
using the experimentally assigned explicit congestion notification | using the experimentally assigned explicit congestion notification | |||
(ECN) codepoints in the IP header: ECT(1) and CE [RFC8311] | (ECN) codepoints in the IP header: ECT(1) and CE [RFC8311] | |||
[I-D.ietf-tsvwg-ecn-l4s-id]. | [I-D.ietf-tsvwg-ecn-l4s-id]. | |||
Data Center TCP (DCTCP [RFC8257]) is an example of a Scalable | Data Center TCP (DCTCP [RFC8257]) is an example of a Scalable | |||
congestion control for controlled environments that has been deployed | congestion control for controlled environments that has been deployed | |||
for some time in Linux, Windows and FreeBSD operating systems. | for some time in Linux, Windows and FreeBSD operating systems. | |||
During the progress of this document through the IETF a number of | During the progress of this document through the IETF a number of | |||
other Scalable congestion controls were implemented, e.g. TCP | other Scalable congestion controls were implemented, e.g. TCP | |||
Prague [I-D.briscoe-iccrg-prague-congestion-control] [PragueLinux], | Prague [I-D.briscoe-iccrg-prague-congestion-control] [PragueLinux], | |||
BBRv2 [BBRv2], QUIC Prague and the L4S variant of SCREAM for real- | BBRv2 [I-D.cardwell-iccrg-bbr-congestion-control], QUIC Prague and | |||
time media [RFC8298]. | the L4S variant of SCREAM for real-time media [RFC8298]. | |||
The focus of this specification is to enable deployment of the | The focus of this specification is to enable deployment of the | |||
network part of the L4S service. Then, without any management | network part of the L4S service. Then, without any management | |||
intervention, applications can exploit this new network capability as | intervention, applications can exploit this new network capability as | |||
their operating systems migrate to Scalable congestion controls, | their operating systems migrate to Scalable congestion controls, | |||
which can then evolve _while_ their benefits are being enjoyed by | which can then evolve _while_ their benefits are being enjoyed by | |||
everyone on the Internet. | everyone on the Internet. | |||
The DualQ Coupled AQM framework can incorporate any AQM designed for | The DualQ Coupled AQM framework can incorporate any AQM designed for | |||
a single queue that generates a statistical or deterministic mark/ | a single queue that generates a statistical or deterministic mark/ | |||
skipping to change at page 8, line 26 ¶ | skipping to change at page 8, line 26 ¶ | |||
Scalable Congestion Control: A congestion control where the average | Scalable Congestion Control: A congestion control where the average | |||
time from one congestion signal to the next (the recovery time) | time from one congestion signal to the next (the recovery time) | |||
remains invariant as the flow rate scales, all other factors being | remains invariant as the flow rate scales, all other factors being | |||
equal. This maintains the same degree of control over queueing | equal. This maintains the same degree of control over queueing | |||
and utilization whatever the flow rate, as well as ensuring that | and utilization whatever the flow rate, as well as ensuring that | |||
high throughput is robust to disturbances. For instance, DCTCP | high throughput is robust to disturbances. For instance, DCTCP | |||
averages 2 congestion signals per round-trip whatever the flow | averages 2 congestion signals per round-trip whatever the flow | |||
rate, as do other recently developed scalable congestion controls, | rate, as do other recently developed scalable congestion controls, | |||
e.g. Relentless TCP [Mathis09], TCP Prague | e.g. Relentless TCP [Mathis09], TCP Prague | |||
[I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], | [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], | |||
BBRv2 [BBRv2] and the L4S variant of SCREAM for real-time | BBRv2 [I-D.cardwell-iccrg-bbr-congestion-control] and the L4S | |||
media [SCReAM], [RFC8298]). For the public Internet a Scalable | variant of SCREAM for real-time media [SCReAM], [RFC8298]). For | |||
transport has to comply with the requirements in Section 4 of | the public Internet a Scalable transport has to comply with the | |||
[I-D.ietf-tsvwg-ecn-l4s-id] (aka. the 'Prague L4S requirements'). | requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id] (aka. the | |||
'Prague L4S requirements'). | ||||
C: Abbreviation for Classic, e.g. when used as a subscript. | C: Abbreviation for Classic, e.g. when used as a subscript. | |||
L: Abbreviation for L4S, e.g. when used as a subscript. | L: Abbreviation for L4S, e.g. when used as a subscript. | |||
The terms Classic or L4S can also qualify other nouns, such as | The terms Classic or L4S can also qualify other nouns, such as | |||
'codepoint', 'identifier', 'classification', 'packet', 'flow'. | 'codepoint', 'identifier', 'classification', 'packet', 'flow'. | |||
For example: an L4S packet means a packet with an L4S identifier | For example: an L4S packet means a packet with an L4S identifier | |||
sent from an L4S congestion control. | sent from an L4S congestion control. | |||
skipping to change at page 21, line 12 ¶ | skipping to change at page 21, line 12 ¶ | |||
that the maximum delay to a C packet is w times that of an L | that the maximum delay to a C packet is w times that of an L | |||
packet. | packet. | |||
- for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.1.1) | - for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.1.1) | |||
a time-shift of tshift means that the maximum delay to a C | a time-shift of tshift means that the maximum delay to a C | |||
packet is tshift greater than that of an L packet. tshift could | packet is tshift greater than that of an L packet. tshift could | |||
be expressed as a multiple of the typical RTT rather than as an | be expressed as a multiple of the typical RTT rather than as an | |||
absolute delay. | absolute delay. | |||
* The maximum Classic ECN marking probability, p_Cmax, before | * The maximum Classic ECN marking probability, p_Cmax, before | |||
switching over to drop. | introducing drop. | |||
2.5.2.2. Monitoring | 2.5.2.2. Monitoring | |||
An experimental DualQ Coupled AQM SHOULD allow the operator to | An experimental DualQ Coupled AQM SHOULD allow the operator to | |||
monitor each of the following operational statistics on demand, per | monitor each of the following operational statistics on demand, per | |||
queue and per configurable sample interval, for performance | queue and per configurable sample interval, for performance | |||
monitoring and perhaps also for accounting in some cases: | monitoring and perhaps also for accounting in some cases: | |||
* Bits forwarded, from which utilization can be calculated; | * Bits forwarded, from which utilization can be calculated; | |||
skipping to change at page 25, line 19 ¶ | skipping to change at page 25, line 19 ¶ | |||
Drop on Saturation: Saturation can be avoided by setting a maximum | Drop on Saturation: Saturation can be avoided by setting a maximum | |||
threshold for L4S ECN marking (assuming k>1) before saturation | threshold for L4S ECN marking (assuming k>1) before saturation | |||
starts to make the flow rates of the different traffic types | starts to make the flow rates of the different traffic types | |||
diverge. Above that the drop probability of Classic traffic is | diverge. Above that the drop probability of Classic traffic is | |||
applied to all packets of all traffic types. Then experiments | applied to all packets of all traffic types. Then experiments | |||
have shown that queueing delay can be kept at the target in any | have shown that queueing delay can be kept at the target in any | |||
overload situation, including with unresponsive traffic, and no | overload situation, including with unresponsive traffic, and no | |||
further measures are required [DualQ-Test]. | further measures are required [DualQ-Test]. | |||
Delay on Saturation: When L4S marking saturates, instead of | Delay on Saturation: When L4S marking saturates, instead of | |||
switching to drop, the drop and marking probabilities could be | introducing L4S drop, the drop and marking probabilities of both | |||
capped. Beyond that, delay will grow either solely in the queue | queues could be capped. Beyond that, delay will grow either | |||
with unresponsive traffic (if WRR is used), or in both queues (if | solely in the queue with unresponsive traffic (if WRR is used), or | |||
time-shifted FIFO is used). In either case, the higher delay | in both queues (if time-shifted FIFO is used). In either case, | |||
ought to control temporary high congestion. If the overload is | the higher delay ought to control temporary high congestion. If | |||
more persistent, eventually the combined DualQ will overflow and | the overload is more persistent, eventually the combined DualQ | |||
tail drop will control congestion. | will overflow and tail drop will control congestion. | |||
The example implementation in Appendix A solely applies the "drop on | The example implementation in Appendix A solely applies the "drop on | |||
saturation" policy. The DOCSIS specification of a DualQ Coupled | saturation" policy. The DOCSIS specification of a DualQ Coupled | |||
AQM [DOCSIS3.1] also implements the 'drop on saturation' policy with | AQM [DOCSIS3.1] also implements the 'drop on saturation' policy with | |||
a very shallow L buffer. However, the addition of DOCSIS per-flow | a very shallow L buffer. However, the addition of DOCSIS per-flow | |||
Queue Protection [I-D.briscoe-docsis-q-protection] turns this into | Queue Protection [I-D.briscoe-docsis-q-protection] turns this into | |||
'delay on saturation' by redirecting some packets of the flow(s) most | 'delay on saturation' by redirecting some packets of the flow(s) most | |||
responsible for L queue overload into the C queue, which has a higher | responsible for L queue overload into the C queue, which has a higher | |||
delay target. If overload continues, this again becomes 'drop on | delay target. If overload continues, this again becomes 'drop on | |||
saturation' as the level of drop in the C queue rises to maintain the | saturation' as the level of drop in the C queue rises to maintain the | |||
skipping to change at page 27, line 13 ¶ | skipping to change at page 27, line 13 ¶ | |||
implementations were tested. | implementations were tested. | |||
7. References | 7. References | |||
7.1. Normative References | 7.1. Normative References | |||
[I-D.ietf-tsvwg-ecn-l4s-id] | [I-D.ietf-tsvwg-ecn-l4s-id] | |||
Schepper, K. D. and B. Briscoe, "Explicit Congestion | Schepper, K. D. and B. Briscoe, "Explicit Congestion | |||
Notification (ECN) Protocol for Very Low Queuing Delay | Notification (ECN) Protocol for Very Low Queuing Delay | |||
(L4S)", Work in Progress, Internet-Draft, draft-ietf- | (L4S)", Work in Progress, Internet-Draft, draft-ietf- | |||
tsvwg-ecn-l4s-id-22, 8 November 2021, | tsvwg-ecn-l4s-id-23, 24 December 2021, | |||
<https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- | <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- | |||
ecn-l4s-id-22>. | ecn-l4s-id-23>. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
RFC 3168, DOI 10.17487/RFC3168, September 2001, | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
<https://www.rfc-editor.org/info/rfc3168>. | <https://www.rfc-editor.org/info/rfc3168>. | |||
skipping to change at page 28, line 5 ¶ | skipping to change at page 28, line 5 ¶ | |||
Queue- based Active Queue Management Algorithms", Proc. | Queue- based Active Queue Management Algorithms", Proc. | |||
Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI: | Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI: | |||
10.1117/12.473021, 2002, | 10.1117/12.473021, 2002, | |||
<https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. | <https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. | |||
[ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | |||
Algorithm for Increasing the Robustness of RED's Active | Algorithm for Increasing the Robustness of RED's Active | |||
Queue Management", ACIRI Technical Report , August 2001, | Queue Management", ACIRI Technical Report , August 2001, | |||
<http://www.icir.org/floyd/red.html>. | <http://www.icir.org/floyd/red.html>. | |||
[BBRv1] Cardwell, N., Cheng, Y., Hassas Yeganeh, S., and V. | ||||
Jacobson, "BBR Congestion Control", Internet Draft draft- | ||||
cardwell-iccrg-bbr-congestion-control-00, July 2017, | ||||
<https://tools.ietf.org/html/draft-cardwell-iccrg-bbr- | ||||
congestion-control-00>. | ||||
[BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", github | ||||
repository; Linux congestion control module, | ||||
<https://github.com/google/bbr/blob/v2alpha/README.md>. | ||||
[CCcensus19] | [CCcensus19] | |||
Mishra, A., Sun, X., Jain, A., Pande, S., Joshi, R., and | Mishra, A., Sun, X., Jain, A., Pande, S., Joshi, R., and | |||
B. Leong, "The Great Internet TCP Congestion Control | B. Leong, "The Great Internet TCP Congestion Control | |||
Census", Proc. ACM on Measurement and Analysis of | Census", Proc. ACM on Measurement and Analysis of | |||
Computing Systems 3(3), December 2019, | Computing Systems 3(3), December 2019, | |||
<https://doi.org/10.1145/3366693>. | <https://doi.org/10.1145/3366693>. | |||
[CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | |||
ACM Queue 10(5), May 2012, | ACM Queue 10(5), May 2012, | |||
<http://queue.acm.org/issuedetail.cfm?issue=2208917>. | <http://queue.acm.org/issuedetail.cfm?issue=2208917>. | |||
skipping to change at page 29, line 19 ¶ | skipping to change at page 29, line 8 ¶ | |||
<https://www.duo.uio.no/bitstream/handle/10852/57424/ | <https://www.duo.uio.no/bitstream/handle/10852/57424/ | |||
thesis-henrste.pdf?sequence=1>. | thesis-henrste.pdf?sequence=1>. | |||
[Heist21] Heist, P. and J. Morton, "L4S Tests", github README, | [Heist21] Heist, P. and J. Morton, "L4S Tests", github README, | |||
August 2021, <https://github.com/heistp/l4s- | August 2021, <https://github.com/heistp/l4s- | |||
tests/#underutilization-with-bursty-traffic>. | tests/#underutilization-with-bursty-traffic>. | |||
[I-D.briscoe-docsis-q-protection] | [I-D.briscoe-docsis-q-protection] | |||
Briscoe, B. and G. White, "The DOCSIS(r) Queue Protection | Briscoe, B. and G. White, "The DOCSIS(r) Queue Protection | |||
Algorithm to Preserve Low Latency", Work in Progress, | Algorithm to Preserve Low Latency", Work in Progress, | |||
Internet-Draft, draft-briscoe-docsis-q-protection-01, 17 | Internet-Draft, draft-briscoe-docsis-q-protection-02, 31 | |||
December 2021, <https://datatracker.ietf.org/doc/html/ | January 2022, <https://datatracker.ietf.org/doc/html/ | |||
draft-briscoe-docsis-q-protection-01>. | draft-briscoe-docsis-q-protection-02>. | |||
[I-D.briscoe-iccrg-prague-congestion-control] | [I-D.briscoe-iccrg-prague-congestion-control] | |||
Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague | Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague | |||
Congestion Control", Work in Progress, Internet-Draft, | Congestion Control", Work in Progress, Internet-Draft, | |||
draft-briscoe-iccrg-prague-congestion-control-00, 9 March | draft-briscoe-iccrg-prague-congestion-control-00, 9 March | |||
2021, <https://datatracker.ietf.org/doc/html/draft- | 2021, <https://datatracker.ietf.org/doc/html/draft- | |||
briscoe-iccrg-prague-congestion-control-00>. | briscoe-iccrg-prague-congestion-control-00>. | |||
[I-D.briscoe-tsvwg-l4s-diffserv] | [I-D.briscoe-tsvwg-l4s-diffserv] | |||
Briscoe, B., "Interactions between Low Latency, Low Loss, | Briscoe, B., "Interactions between Low Latency, Low Loss, | |||
skipping to change at page 29, line 50 ¶ | skipping to change at page 29, line 39 ¶ | |||
Jacobson, "BBR Congestion Control", Work in Progress, | Jacobson, "BBR Congestion Control", Work in Progress, | |||
Internet-Draft, draft-cardwell-iccrg-bbr-congestion- | Internet-Draft, draft-cardwell-iccrg-bbr-congestion- | |||
control-01, 7 November 2021, | control-01, 7 November 2021, | |||
<https://datatracker.ietf.org/doc/html/draft-cardwell- | <https://datatracker.ietf.org/doc/html/draft-cardwell- | |||
iccrg-bbr-congestion-control-01>. | iccrg-bbr-congestion-control-01>. | |||
[I-D.ietf-tsvwg-l4s-arch] | [I-D.ietf-tsvwg-l4s-arch] | |||
Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, | Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, | |||
"Low Latency, Low Loss, Scalable Throughput (L4S) Internet | "Low Latency, Low Loss, Scalable Throughput (L4S) Internet | |||
Service: Architecture", Work in Progress, Internet-Draft, | Service: Architecture", Work in Progress, Internet-Draft, | |||
draft-ietf-tsvwg-l4s-arch-14, 8 November 2021, | draft-ietf-tsvwg-l4s-arch-15, 24 December 2021, | |||
<https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- | <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- | |||
l4s-arch-14>. | l4s-arch-15>. | |||
[L4Sdemo16] | [L4Sdemo16] | |||
Bondarenko, O., De Schepper, K., Tsang, I., and B. | Bondarenko, O., De Schepper, K., Tsang, I., and B. | |||
Briscoe, "Ultra-Low Delay for All: Live Experience, Live | Briscoe, "Ultra-Low Delay for All: Live Experience, Live | |||
Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | |||
<http://dl.acm.org/citation.cfm?doid=2910017.2910633 | <http://dl.acm.org/citation.cfm?doid=2910017.2910633 | |||
(videos of demos: | (videos of demos: | |||
https://riteproject.eu/dctth/#1511dispatchwg )>. | https://riteproject.eu/dctth/#1511dispatchwg )>. | |||
[L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C., | [L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C., | |||
skipping to change at page 33, line 20 ¶ | skipping to change at page 33, line 16 ¶ | |||
As a first concrete example, the pseudocode below gives the DualPI2 | As a first concrete example, the pseudocode below gives the DualPI2 | |||
algorithm. DualPI2 follows the structure of the DualQ Coupled AQM | algorithm. DualPI2 follows the structure of the DualQ Coupled AQM | |||
framework in Figure 1. A simple ramp function (configured in units | framework in Figure 1. A simple ramp function (configured in units | |||
of queuing time) with unsmoothed ECN marking is used for the Native | of queuing time) with unsmoothed ECN marking is used for the Native | |||
L4S AQM. The ramp can also be configured as a step function. The | L4S AQM. The ramp can also be configured as a step function. The | |||
PI2 algorithm [PI2] is used for the Classic AQM. PI2 is an improved | PI2 algorithm [PI2] is used for the Classic AQM. PI2 is an improved | |||
variant of the PIE AQM [RFC8033]. | variant of the PIE AQM [RFC8033]. | |||
The pseudocode will be introduced in two passes. The first pass | The pseudocode will be introduced in two passes. The first pass | |||
explains the core concepts, deferring handling of overload to the | explains the core concepts, deferring handling of edge-cases like | |||
second pass. To aid comparison, line numbers are kept in step | overload to the second pass. To aid comparison, line numbers are | |||
between the two passes by using letter suffixes where the longer code | kept in step between the two passes by using letter suffixes where | |||
needs extra lines. | the longer code needs extra lines. | |||
All variables are assumed to be floating point in their basic units | All variables are assumed to be floating point in their basic units | |||
(size in bytes, time in seconds, rates in bytes/second, alpha and | (size in bytes, time in seconds, rates in bytes/second, alpha and | |||
beta in Hz, and probabilities from 0 to 1. Constants expressed in k | beta in Hz, and probabilities from 0 to 1. Constants expressed in k | |||
(kilo), M (mega), G (giga), u (micro), m (milli) , %, ... are assumed | (kilo), M (mega), G (giga), u (micro), m (milli) , %, ... are assumed | |||
to be converted to their appropriate multiple or fraction to | to be converted to their appropriate multiple or fraction to | |||
represent the basic units. A real implementation that wants to use | represent the basic units. A real implementation that wants to use | |||
integer values needs to handle appropriate scaling factors and allow | integer values needs to handle appropriate scaling factors and allow | |||
accordingly appropriate resolution of its integer types (including | accordingly appropriate resolution of its integer types (including | |||
temporary internal values during calculations). | temporary internal values during calculations). | |||
skipping to change at page 36, line 22 ¶ | skipping to change at page 36, line 22 ¶ | |||
8: else % ECN bits = not-ECT or ECT(0) | 8: else % ECN bits = not-ECT or ECT(0) | |||
9: cq.enqueue(pkt) | 9: cq.enqueue(pkt) | |||
10: } | 10: } | |||
Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM | Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM | |||
1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | |||
2: while ( lq.byt() + cq.byt() > 0 ) { | 2: while ( lq.byt() + cq.byt() > 0 ) { | |||
3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
4: lq.dequeue(pkt) % Scheduler chooses lq | 4: lq.dequeue(pkt) % Scheduler chooses lq | |||
5: p'_L = laqm(lq.time()) && (lq.len() > Th_len) % Native LAQM | 5: p'_L = laqm(lq.time()) % Native LAQM | |||
6: p_L = max(p'_L, p_CL) % Combining function | 6: p_L = max(p'_L, p_CL) % Combining function | |||
7: if ( recur(lq, p_L) ) % Linear marking | 7: if ( recur(lq, p_L) ) % Linear marking | |||
8: mark(pkt) | 8: mark(pkt) | |||
9: } else { | 9: } else { | |||
10: cq.dequeue(pkt) % Scheduler chooses cq | 10: cq.dequeue(pkt) % Scheduler chooses cq | |||
11: if ( recur(cq, p_C) ) { % probability p_C = p'^2 | 11: if ( recur(cq, p_C) ) { % probability p_C = p'^2 | |||
12: if ( ecn(pkt) == 0 ) { % if ECN field = not-ECT | 12: if ( ecn(pkt) == 0 ) { % if ECN field = not-ECT | |||
13: drop(pkt) % squared drop | 13: drop(pkt) % squared drop | |||
14: continue % continue to the top of the while loop | 14: continue % continue to the top of the while loop | |||
15: } | 15: } | |||
skipping to change at page 38, line 49 ¶ | skipping to change at page 39, line 5 ¶ | |||
Although the DCTCP paper [Alizadeh-stability] recommends an ECN | Although the DCTCP paper [Alizadeh-stability] recommends an ECN | |||
marking threshold of 0.17*RTT_typ, it also shows that the threshold | marking threshold of 0.17*RTT_typ, it also shows that the threshold | |||
can be much shallower with hardly any worse under-utilization of the | can be much shallower with hardly any worse under-utilization of the | |||
link (because the amplitude of DCTCP's sawteeth is so small). Based | link (because the amplitude of DCTCP's sawteeth is so small). Based | |||
on extensive experiments, for the public Internet the default minimum | on extensive experiments, for the public Internet the default minimum | |||
ECN marking threshold (target) in Figure 2 is considered a good | ECN marking threshold (target) in Figure 2 is considered a good | |||
compromise, even though it is significantly smaller fraction of | compromise, even though it is significantly smaller fraction of | |||
RTT_typ. | RTT_typ. | |||
A minimum marking threshold parameter (Th_len, default 1 packet) is | ||||
also necessary to ensure that the ramp does not trigger excessive | ||||
marking on slow links. Where an implementation knows the link rate, | ||||
it can set up this minimum at the time it is configured. For | ||||
instance, it would divide 1 MTU by the link rate to convert it into a | ||||
serialization time, then if the lower threshold of the Native L AQM | ||||
ramp was lower than this serialization time, it could increase the | ||||
thresholds to shift the bottom of the ramp to 2 MTU. This is the | ||||
approach used in DOCSIS [DOCSIS3.1], because the configured link rate | ||||
is dedicated to the DualQ. | ||||
In software implementations, as shown in the pseudocode, the link | ||||
rate might be shared with other queues. The second part of the | ||||
logical AND condition in Line 5 of Figure 4 caters for such cases. | ||||
Even if the outcome of the Native L4S AQM function, laqm(), is true, | ||||
it does not mark a packet unless the queue also exceeds 1 packet (but | ||||
see note later about the Linux implementation). | ||||
1: laqm(qdelay) { % Returns native L4S AQM probability | 1: laqm(qdelay) { % Returns native L4S AQM probability | |||
2: if (qdelay >= maxTh) | 2: if (qdelay >= maxTh) | |||
3: return 1 | 3: return 1 | |||
4: else if (qdelay > minTh) | 4: else if (qdelay > minTh) | |||
5: return (qdelay - minTh)/range % Divide could use a bit-shift | 5: return (qdelay - minTh)/range % Divide could use a bit-shift | |||
6: else | 6: else | |||
7: return 0 | 7: return 0 | |||
8: } | 8: } | |||
Figure 5: Example Pseudocode for the Native L4S AQM | Figure 5: Example Pseudocode for the Native L4S AQM | |||
skipping to change at page 43, line 16 ¶ | skipping to change at page 43, line 5 ¶ | |||
Tupdate dependent on p'. Instead, in PI2, alpha and beta are | Tupdate dependent on p'. Instead, in PI2, alpha and beta are | |||
independent of p' because the squaring applied to Classic traffic | independent of p' because the squaring applied to Classic traffic | |||
tunes them inherently. This is explained in [PI2], which also | tunes them inherently. This is explained in [PI2], which also | |||
explains why this more principled approach removes the need for most | explains why this more principled approach removes the need for most | |||
of the heuristics that had to be added to PIE. | of the heuristics that had to be added to PIE. | |||
Nonetheless, an implementer might wish to add selected details to | Nonetheless, an implementer might wish to add selected details to | |||
either AQM. For instance the Linux reference DualPI2 implementation | either AQM. For instance the Linux reference DualPI2 implementation | |||
includes the following (not shown in the pseudocode above): | includes the following (not shown in the pseudocode above): | |||
* The check that the queue exceeds Th_len before marking with the | ||||
native L4S AQM is actually at enqueue, not dequeue, otherwise it | ||||
would exempt the last packet of a burst from being marked. The | ||||
result of the check is conveyed from enqueue to the dequeue | ||||
function via a boolean in the packet metadata. | ||||
* Classic and coupled marking or dropping (i.e. based on p_C and | * Classic and coupled marking or dropping (i.e. based on p_C and | |||
p_CL from the PI controller) is not applied to a packet if the | p_CL from the PI controller) is not applied to a packet if the | |||
respective queue length in bytes is < 2 MTU (prior to enqueuing | respective queue length in bytes is < 2 MTU (prior to enqueuing | |||
the packet or dequeuing it, depending on whether the AQM is | the packet or dequeuing it, depending on whether the AQM is | |||
configured to be applied at enqueue or dequeue); | configured to be applied at enqueue or dequeue); | |||
* In the WRR scheduler, the 'credit' indicating which queue should | * In the WRR scheduler, the 'credit' indicating which queue should | |||
transmit is only changed if there are packets in both queues | transmit is only changed if there are packets in both queues | |||
(i.e. if there is actual resource contention). This means that a | (i.e. if there is actual resource contention). This means that a | |||
properly paced L flow might never be delayed by the WRR. The WRR | properly paced L flow might never be delayed by the WRR. The WRR | |||
skipping to change at page 44, line 30 ¶ | skipping to change at page 44, line 11 ¶ | |||
that a ramp is configured in place of a step, which will allow | that a ramp is configured in place of a step, which will allow | |||
congestion control algorithms to investigate faster smoothing | congestion control algorithms to investigate faster smoothing | |||
algorithms. | algorithms. | |||
A ramp is more general that a step, because an operator can | A ramp is more general that a step, because an operator can | |||
effectively turn the ramp into a step function, as used by DCTCP, | effectively turn the ramp into a step function, as used by DCTCP, | |||
by setting the range to zero. There will not be a divide by zero | by setting the range to zero. There will not be a divide by zero | |||
problem at line 5 of Figure 5 because, if minTh is equal to | problem at line 5 of Figure 5 because, if minTh is equal to | |||
maxTh, the condition for this ramp calculation cannot arise. | maxTh, the condition for this ramp calculation cannot arise. | |||
A.2. Pass #2: Overload Details | A.2. Pass #2: Edge-Case Details | |||
Figure 7 repeats the dequeue function of Figure 4, but with overload | This section takes a second pass through the pseudocode adding | |||
details added. Similarly Figure 8 repeats the core PI algorithm of | details of two edge-cases: low link rate and overload. Figure 7 | |||
Figure 6 with overload details added. The initialization, enqueue, | repeats the dequeue function of Figure 4, but with details of both | |||
L4S AQM and recur functions are unchanged. | edge-cases added. Similarly Figure 8 repeats the core PI algorithm | |||
of Figure 6, but with overload details added. The initialization, | ||||
enqueue, L4S AQM and recur functions are unchanged. | ||||
The link rate can be so low that it takes a single packet queue | ||||
longer to serialize than the threshold delay at which ECN marking | ||||
starts to be applied in the L queue. Therefore, a minimum marking | ||||
threshold parameter in units of packets rather than time is necessary | ||||
(Th_len, default 1 packet in line 19 of Figure 2) to ensure that the | ||||
ramp does not trigger excessive marking on slow links. Where an | ||||
implementation knows the link rate, it can set up this minimum at the | ||||
time it is configured. For instance, it would divide 1 MTU by the | ||||
link rate to convert it into a serialization time, then if the lower | ||||
threshold of the Native L AQM ramp was lower than this serialization | ||||
time, it could increase the thresholds to shift the bottom of the | ||||
ramp to 2 MTU. This is the approach used in DOCSIS [DOCSIS3.1], | ||||
because the configured link rate is dedicated to the DualQ. | ||||
The pseudocode given here applies where the link rate is unknown, | ||||
which is more common for software implementations that might be | ||||
deployed in scenarios where the link is shared with other queues. In | ||||
lines 5a to 5d in Figure 7 the native L4S marking probability, p'_L, | ||||
is zeroed if the queue is only 1 packet (in the default | ||||
configuration). | ||||
Linux implementation note: | ||||
* In Linux, the check that the queue exceeds Th_len before marking | ||||
with the native L4S AQM is actually at enqueue, not dequeue, | ||||
otherwise it would exempt the last packet of a burst from being | ||||
marked. The result of the check is conveyed from enqueue to the | ||||
dequeue function via a boolean in the packet metadata. | ||||
Persistent overload is deemed to have occurred when Classic drop/ | ||||
marking probability reaches p_Cmax. Above this point, the Classic | ||||
drop probability is applied to both L and C queues, irrespective of | ||||
whether any packet is ECN-capable. ECT packets that are not dropped | ||||
can still be ECN-marked. | ||||
In line 10 of the initialization function (Figure 2), the maximum | In line 10 of the initialization function (Figure 2), the maximum | |||
Classic drop probability p_Cmax = min(1/k^2, 1) or 1/4 for the | Classic drop probability p_Cmax = min(1/k^2, 1) or 1/4 for the | |||
default coupling factor k=2. p_Cmax is the point at which it is | default coupling factor k=2. In practice, 25% has been found to be a | |||
deemed that the Classic queue has become persistently overloaded, so | good threshold to preserve fairness between ECN capable and non ECN | |||
it switches to using drop, even for ECN-capable packets. ECT packets | capable traffic. This protects the queues against both temporary | |||
that are not dropped can still be ECN-marked. | overload from responsive flows and more persistent overload from any | |||
unresponsive traffic that falsely claims to be responsive to ECN. | ||||
In practice, 25% has been found to be a good threshold to preserve | ||||
fairness between ECN capable and non ECN capable traffic. This | ||||
protects the queues against both temporary overload from responsive | ||||
flows and more persistent overload from any unresponsive traffic that | ||||
falsely claims to be responsive to ECN. | ||||
When the Classic ECN marking probability reaches the p_Cmax threshold | When the Classic ECN marking probability reaches the p_Cmax threshold | |||
(1/k^2), the marking probability coupled to the L4S queue, p_CL will | (1/k^2), the marking probability coupled to the L4S queue, p_CL will | |||
always be 100% for any k (by equation (1) in Section 2). So, for | always be 100% for any k (by equation (1) in Section 2). So, for | |||
readability, the constant p_Lmax is defined as 1 in line 22 of the | readability, the constant p_Lmax is defined as 1 in line 22 of the | |||
initialization function (Figure 2). This is intended to ensure that | initialization function (Figure 2). This is intended to ensure that | |||
the L4S queue starts to introduce dropping once ECN-marking saturates | the L4S queue starts to introduce dropping once ECN-marking saturates | |||
at 100% and can rise no further. The 'Prague L4S' | at 100% and can rise no further. The 'Prague L4S' | |||
requirements [I-D.ietf-tsvwg-ecn-l4s-id] state that, when an L4S | requirements [I-D.ietf-tsvwg-ecn-l4s-id] state that, when an L4S | |||
congestion control detects a drop, it falls back to a response that | congestion control detects a drop, it falls back to a response that | |||
coexists with 'Classic' Reno congestion control. So it is correct | coexists with 'Classic' Reno congestion control. So it is correct | |||
that, when the L4S queue drops packets, it drops them proportional to | that, when the L4S queue drops packets, it drops them proportional to | |||
p'^2, as if they are Classic packets. | p'^2, as if they are Classic packets. | |||
Both these switch-overs are triggered by the tests for overload | The two queues each test for overload in lines 4b and 12b of the | |||
introduced in lines 4b and 12b of the dequeue function (Figure 7). | dequeue function (Figure 7). Lines 8c to 8g drop L4S packets with | |||
Lines 8c to 8g drop L4S packets with probability p'^2. Lines 8h to | probability p'^2. Lines 8h to 8i mark the remaining packets with | |||
8i mark the remaining packets with probability p_CL. Given p_Lmax = | probability p_CL. Given p_Lmax = 1, all remaining packets will be | |||
1, all remaining packets will be marked because, to have reached the | marked because, to have reached the else block at line 8b, p_CL >= 1. | |||
else block at line 8b, p_CL >= 1. | ||||
Lines 2c to 2d in the core PI algorithm (Figure 8) deal with overload | Lines 2c to 2d in the core PI algorithm (Figure 8) deal with overload | |||
of the L4S queue when there is no Classic traffic. This is | of the L4S queue when there is no Classic traffic. This is | |||
necessary, because the core PI algorithm maintains the appropriate | necessary, because the core PI algorithm maintains the appropriate | |||
drop probability to regulate overload, but it depends on the length | drop probability to regulate overload, but it depends on the length | |||
of the Classic queue. If there is no Classic queue the naive PI | of the Classic queue. If there is no Classic queue the naive PI | |||
update function in Figure 6 would drop nothing, even if the L4S queue | update function in Figure 6 would drop nothing, even if the L4S queue | |||
were overloaded - so tail drop would have to take over (lines 2 and 3 | were overloaded - so tail drop would have to take over (lines 2 and 3 | |||
of Figure 3). | of Figure 3). | |||
skipping to change at page 45, line 41 ¶ | skipping to change at page 46, line 9 ¶ | |||
Figure 8 keeps delay on target using drop. If the test at line 2a of | Figure 8 keeps delay on target using drop. If the test at line 2a of | |||
Figure 8 finds that the Classic queue is empty, line 2d measures the | Figure 8 finds that the Classic queue is empty, line 2d measures the | |||
current queue delay using the L4S queue instead. While the L4S queue | current queue delay using the L4S queue instead. While the L4S queue | |||
is not overloaded, its delay will always be tiny compared to the | is not overloaded, its delay will always be tiny compared to the | |||
target Classic queue delay. So p_CL will be driven to zero, and the | target Classic queue delay. So p_CL will be driven to zero, and the | |||
L4S queue will naturally be governed solely by p'_L from the native | L4S queue will naturally be governed solely by p'_L from the native | |||
L4S AQM (lines 5 and 6 of the dequeue algorithm in Figure 7). But, | L4S AQM (lines 5 and 6 of the dequeue algorithm in Figure 7). But, | |||
if unresponsive L4S source(s) cause overload, the DualQ transitions | if unresponsive L4S source(s) cause overload, the DualQ transitions | |||
smoothly to L4S marking based on the PI algorithm. If overload | smoothly to L4S marking based on the PI algorithm. If overload | |||
increases further, it naturally transitions from marking to dropping | increases further, it naturally transitions from marking to dropping | |||
by the switch-over mechanism already described. | by the mechanism already described. | |||
1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | |||
2: while ( lq.byt() + cq.byt() > 0 ) { | 2: while ( lq.byt() + cq.byt() > 0 ) { | |||
3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
4a: lq.dequeue(pkt) % L4S scheduled | 4a: lq.dequeue(pkt) % L4S scheduled | |||
4b: if ( p_CL < p_Lmax ) { % Check for overload saturation | 4b: if ( p_CL < p_Lmax ) { % Check for overload saturation | |||
5: p'_L = laqm(lq.time()) && (lq.len()>Th_len) % Native LAQM | 5a: if (lq.len()>Th_len) % >1 packet queued | |||
5b: p'_L = laqm(lq.time()) % Native LAQM | ||||
5c: else | ||||
5d: p'_L = 0 % Suppress marking 1 pkt queue | ||||
6: p_L = max(p'_L, p_CL) % Combining function | 6: p_L = max(p'_L, p_CL) % Combining function | |||
7: if ( recur(lq, p_L) %Linear marking | 7: if ( recur(lq, p_L) %Linear marking | |||
8a: mark(pkt) | 8a: mark(pkt) | |||
8b: } else { % overload saturation | 8b: } else { % overload saturation | |||
8c: if ( recur(lq, p_C) ) { % probability p_C = p'^2 | 8c: if ( recur(lq, p_C) ) { % probability p_C = p'^2 | |||
8e: drop(pkt) % revert to Classic drop due to overload | 8e: drop(pkt) % revert to Classic drop due to overload | |||
8f: continue % continue to the top of the while loop | 8f: continue % continue to the top of the while loop | |||
8g: } | 8g: } | |||
8h: if ( recur(lq, p_CL) ) % probability p_CL = k * p' | 8h: if ( recur(lq, p_CL) ) % probability p_CL = k * p' | |||
8i: mark(pkt) % linear marking of remaining packets | 8i: mark(pkt) % linear marking of remaining packets | |||
skipping to change at page 46, line 39 ¶ | skipping to change at page 46, line 48 ¶ | |||
15: } | 15: } | |||
16: mark(pkt) % squared mark | 16: mark(pkt) % squared mark | |||
17: } | 17: } | |||
18: } | 18: } | |||
19: return(pkt) % return the packet and stop | 19: return(pkt) % return the packet and stop | |||
20: } | 20: } | |||
21: return(NULL) % no packet to dequeue | 21: return(NULL) % no packet to dequeue | |||
22: } | 22: } | |||
Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | |||
(Including Overload Code) | (Including Code for Edge-Cases) | |||
1: dualpi2_update(lq, cq) { % Update p' every Tupdate | 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | |||
2a: if ( cq.byt() > 0 ) | 2a: if ( cq.byt() > 0 ) | |||
2b: curq = cq.time() %use queuing time of first-in Classic packet | 2b: curq = cq.time() %use queuing time of first-in Classic packet | |||
2c: else % Classic queue empty | 2c: else % Classic queue empty | |||
2d: curq = lq.time() % use queuing time of first-in L4S packet | 2d: curq = lq.time() % use queuing time of first-in L4S packet | |||
3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | |||
4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor | 4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor | |||
5: p_C = p'^2 % Classic prob = (base prob)^2 | 5: p_C = p'^2 % Classic prob = (base prob)^2 | |||
6: prevq = curq | 6: prevq = curq | |||
skipping to change at page 58, line 38 ¶ | skipping to change at page 58, line 38 ¶ | |||
k* = 1.64 * (R_C / R_L) (7) | k* = 1.64 * (R_C / R_L) (7) | |||
We say that this coupling factor is theoretical, because it is in | We say that this coupling factor is theoretical, because it is in | |||
terms of two RTTs, which raises two practical questions: i) for | terms of two RTTs, which raises two practical questions: i) for | |||
multiple flows with different RTTs, the RTT for each traffic class | multiple flows with different RTTs, the RTT for each traffic class | |||
would have to be derived from the RTTs of all the flows in that class | would have to be derived from the RTTs of all the flows in that class | |||
(actually the harmonic mean would be needed); ii) a network node | (actually the harmonic mean would be needed); ii) a network node | |||
cannot easily know the RTT of any of the flows anyway. | cannot easily know the RTT of any of the flows anyway. | |||
RTT-dependence is cuased by window-based congestion control, so it | RTT-dependence is caused by window-based congestion control, so it | |||
ought to be reversed there, not in the network, Therefore, we use a | ought to be reversed there, not in the network. Therefore, we use a | |||
fixed coupling factor in the network, and reduce RTT-dependence in | fixed coupling factor in the network, and reduce RTT-dependence in | |||
L4S senders. We cannot expect Classic senders to all be updated to | L4S senders. We cannot expect Classic senders to all be updated to | |||
reduce their RTT-dependence. But solely addressing the problem in | reduce their RTT-dependence. But solely addressing the problem in | |||
L4S senders at least makes RTT-dependence no worse - not just between | L4S senders at least makes RTT-dependence no worse - not just between | |||
L4S senders, but also between L4S and Classic senders. | L4S senders, but also between L4S and Classic senders. | |||
Traditionally, throughput equivalence has been defined for flows | Traditionally, throughput equivalence has been defined for flows | |||
under comparable conditions, including with the same base RTT | under comparable conditions, including with the same base RTT | |||
[RFC2914]. So if we assume the same base RTT, R_b, for comparable | [RFC2914]. So if we assume the same base RTT, R_b, for comparable | |||
flows, we can put both R_C and R_L in terms of R_b. | flows, we can put both R_C and R_L in terms of R_b. | |||
End of changes. 29 change blocks. | ||||
98 lines changed or deleted | 98 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |