draft-ietf-tsvwg-byte-pkt-congest-12.txt | rfc7141.txt | |||
---|---|---|---|---|
Transport Area Working Group B. Briscoe | Internet Engineering Task Force (IETF) B. Briscoe | |||
Internet-Draft BT | Request for Comments: 7141 BT | |||
Updates: 2309 (if approved) J. Manner | BCP: 41 J. Manner | |||
Intended status: BCP Aalto University | Updates: 2309, 2914 Aalto University | |||
Expires: May 11, 2014 November 07, 2013 | Category: Best Current Practice February 2014 | |||
ISSN: 2070-1721 | ||||
Byte and Packet Congestion Notification | Byte and Packet Congestion Notification | |||
draft-ietf-tsvwg-byte-pkt-congest-12 | ||||
Abstract | Abstract | |||
This document provides recommendations of best current practice for | This document provides recommendations of best current practice for | |||
dropping or marking packets using any active queue management (AQM) | dropping or marking packets using any active queue management (AQM) | |||
algorithm, including random early detection (RED), BLUE, pre- | algorithm, including Random Early Detection (RED), BLUE, Pre- | |||
congestion notification (PCN) and newer schemes such as CoDel | Congestion Notification (PCN), and newer schemes such as CoDel | |||
(Controlled Delay) and PIE (Proportional Integral controller | (Controlled Delay) and PIE (Proportional Integral controller | |||
Enhanced). We give three strong recommendations: (1) packet size | Enhanced). We give three strong recommendations: (1) packet size | |||
should be taken into account when transports detect and respond to | should be taken into account when transports detect and respond to | |||
congestion indications, (2) packet size should not be taken into | congestion indications, (2) packet size should not be taken into | |||
account when network equipment creates congestion signals (marking, | account when network equipment creates congestion signals (marking, | |||
dropping), and therefore (3) in the specific case of RED, the byte- | dropping), and therefore (3) in the specific case of RED, the byte- | |||
mode packet drop variant that drops fewer small packets should not be | mode packet drop variant that drops fewer small packets should not be | |||
used. This memo updates RFC 2309 to deprecate deliberate | used. This memo updates RFC 2309 to deprecate deliberate | |||
preferential treatment of small packets in AQM algorithms. | preferential treatment of small packets in AQM algorithms. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This memo documents an Internet Best Current Practice. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at http://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
BCPs is available in Section 2 of RFC 5741. | ||||
This Internet-Draft will expire on May 11, 2014. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
http://www.rfc-editor.org/info/rfc7141. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2013 IETF Trust and the persons identified as the | Copyright (c) 2014 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.1. Terminology and Scoping . . . . . . . . . . . . . . . . . 6 | 1.1. Terminology and Scoping . . . . . . . . . . . . . . . . . 6 | |||
1.2. Example Comparing Packet-Mode Drop and Byte-Mode Drop . . 7 | 1.2. Example Comparing Packet-Mode Drop and Byte-Mode Drop . . 7 | |||
2. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 9 | 2. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
2.1. Recommendation on Queue Measurement . . . . . . . . . . . 9 | 2.1. Recommendation on Queue Measurement . . . . . . . . . . . 9 | |||
2.2. Recommendation on Encoding Congestion Notification . . . . 10 | 2.2. Recommendation on Encoding Congestion Notification . . . 10 | |||
2.3. Recommendation on Responding to Congestion . . . . . . . . 11 | 2.3. Recommendation on Responding to Congestion . . . . . . . 11 | |||
2.4. Recommendation on Handling Congestion Indications when | 2.4. Recommendation on Handling Congestion Indications When | |||
Splitting or Merging Packets . . . . . . . . . . . . . . . 12 | Splitting or Merging Packets . . . . . . . . . . . . . . 12 | |||
3. Motivating Arguments . . . . . . . . . . . . . . . . . . . . . 12 | 3. Motivating Arguments . . . . . . . . . . . . . . . . . . . . 13 | |||
3.1. Avoiding Perverse Incentives to (Ab)use Smaller Packets . 12 | 3.1. Avoiding Perverse Incentives to (Ab)use Smaller Packets . 13 | |||
3.2. Small != Control . . . . . . . . . . . . . . . . . . . . . 14 | 3.2. Small != Control . . . . . . . . . . . . . . . . . . . . 14 | |||
3.3. Transport-Independent Network . . . . . . . . . . . . . . 14 | 3.3. Transport-Independent Network . . . . . . . . . . . . . . 14 | |||
3.4. Partial Deployment of AQM . . . . . . . . . . . . . . . . 15 | 3.4. Partial Deployment of AQM . . . . . . . . . . . . . . . . 16 | |||
3.5. Implementation Efficiency . . . . . . . . . . . . . . . . 17 | 3.5. Implementation Efficiency . . . . . . . . . . . . . . . . 17 | |||
4. A Survey and Critique of Past Advice . . . . . . . . . . . . . 17 | 4. A Survey and Critique of Past Advice . . . . . . . . . . . . 17 | |||
4.1. Congestion Measurement Advice . . . . . . . . . . . . . . 18 | 4.1. Congestion Measurement Advice . . . . . . . . . . . . . . 18 | |||
4.1.1. Fixed Size Packet Buffers . . . . . . . . . . . . . . 18 | 4.1.1. Fixed-Size Packet Buffers . . . . . . . . . . . . . . 18 | |||
4.1.2. Congestion Measurement without a Queue . . . . . . . . 19 | 4.1.2. Congestion Measurement without a Queue . . . . . . . 19 | |||
4.2. Congestion Notification Advice . . . . . . . . . . . . . . 20 | 4.2. Congestion Notification Advice . . . . . . . . . . . . . 20 | |||
4.2.1. Network Bias when Encoding . . . . . . . . . . . . . . 20 | 4.2.1. Network Bias When Encoding . . . . . . . . . . . . . 20 | |||
4.2.2. Transport Bias when Decoding . . . . . . . . . . . . . 22 | 4.2.2. Transport Bias When Decoding . . . . . . . . . . . . 22 | |||
4.2.3. Making Transports Robust against Control Packet | 4.2.3. Making Transports Robust against Control Packet | |||
Losses . . . . . . . . . . . . . . . . . . . . . . . . 23 | Losses . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
4.2.4. Congestion Notification: Summary of Conflicting | 4.2.4. Congestion Notification: Summary of Conflicting | |||
Advice . . . . . . . . . . . . . . . . . . . . . . . . 24 | Advice . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
5. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 25 | 5. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 25 | |||
5.1. Bit-congestible Network . . . . . . . . . . . . . . . . . 25 | 5.1. Bit-congestible Network . . . . . . . . . . . . . . . . . 25 | |||
5.2. Bit- & Packet-congestible Network . . . . . . . . . . . . 25 | 5.2. Bit- and Packet-Congestible Network . . . . . . . . . . . 26 | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 26 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 26 | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 | 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 27 | |||
8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 28 | |||
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 28 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
10. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 28 | 9.1. Normative References . . . . . . . . . . . . . . . . . . 28 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 | 9.2. Informative References . . . . . . . . . . . . . . . . . 29 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . . 28 | Appendix A. Survey of RED Implementation Status . . . . . . . . 33 | |||
11.2. Informative References . . . . . . . . . . . . . . . . . . 28 | Appendix B. Sufficiency of Packet-Mode Drop . . . . . . . . . . 34 | |||
Appendix A. Survey of RED Implementation Status . . . . . . . . . 32 | B.1. Packet-Size (In)Dependence in Transports . . . . . . . . 35 | |||
Appendix B. Sufficiency of Packet-Mode Drop . . . . . . . . . . . 34 | B.2. Bit-Congestible and Packet-Congestible Indications . . . 38 | |||
B.1. Packet-Size (In)Dependence in Transports . . . . . . . . . 35 | Appendix C. Byte-Mode Drop Complicates Policing Congestion | |||
B.2. Bit-Congestible and Packet-Congestible Indications . . . . 38 | Response . . . . . . . . . . . . . . . . . . . . . . 39 | |||
Appendix C. Byte-mode Drop Complicates Policing Congestion | ||||
Response . . . . . . . . . . . . . . . . . . . . . . 39 | ||||
Appendix D. Changes from Previous Versions . . . . . . . . . . . 40 | ||||
1. Introduction | 1. Introduction | |||
This document provides recommendations of best current practice for | This document provides recommendations of best current practice for | |||
how we should correctly scale congestion control functions with | how we should correctly scale congestion control functions with | |||
respect to packet size for the long term. It also recognises that | respect to packet size for the long term. It also recognises that | |||
expediency may be necessary to deal with existing widely deployed | expediency may be necessary to deal with existing widely deployed | |||
protocols that don't live up to the long term goal. | protocols that don't live up to the long-term goal. | |||
When signalling congestion, the problem of how (and whether) to take | When signalling congestion, the problem of how (and whether) to take | |||
packet sizes into account has exercised the minds of researchers and | packet sizes into account has exercised the minds of researchers and | |||
practitioners for as long as active queue management (AQM) has been | practitioners for as long as active queue management (AQM) has been | |||
discussed. Indeed, one reason AQM was originally introduced was to | discussed. Indeed, one reason AQM was originally introduced was to | |||
reduce the lock-out effects that small packets can have on large | reduce the lock-out effects that small packets can have on large | |||
packets in drop-tail queues. This memo aims to state the principles | packets in tail-drop queues. This memo aims to state the principles | |||
we should be using and to outline how these principles will affect | we should be using and to outline how these principles will affect | |||
future protocol design, taking into account the existing deployments | future protocol design, taking into account pre-existing deployments. | |||
we have already. | ||||
The question of whether to take into account packet size arises at | The question of whether to take into account packet size arises at | |||
three stages in the congestion notification process: | three stages in the congestion notification process: | |||
Measuring congestion: When a congested resource measures locally how | Measuring congestion: When a congested resource measures locally how | |||
congested it is, should it measure its queue length in time, bytes | congested it is, should it measure its queue length in time, | |||
or packets? | bytes, or packets? | |||
Encoding congestion notification into the wire protocol: When a | Encoding congestion notification into the wire protocol: When a | |||
congested network resource signals its level of congestion, should | congested network resource signals its level of congestion, should | |||
it drop / mark each packet dependent on the size of the particular | the probability that it drops/marks each packet depend on the size | |||
packet in question? | of the particular packet in question? | |||
Decoding congestion notification from the wire protocol: When a | Decoding congestion notification from the wire protocol: When a | |||
transport interprets the notification in order to decide how much | transport interprets the notification in order to decide how much | |||
to respond to congestion, should it take into account the size of | to respond to congestion, should it take into account the size of | |||
each missing or marked packet? | each missing or marked packet? | |||
Consensus has emerged over the years concerning the first stage, | Consensus has emerged over the years concerning the first stage, | |||
which Section 2.1 records in the RFC Series. In summary: If possible | which Section 2.1 records in the RFC Series. In summary: If | |||
it is best to measure congestion by time in the queue, but otherwise | possible, it is best to measure congestion by time in the queue; | |||
the choice between bytes and packets solely depends on whether the | otherwise, the choice between bytes and packets solely depends on | |||
resource is congested by bytes or packets. | whether the resource is congested by bytes or packets. | |||
The controversy is mainly around the last two stages: whether to | The controversy is mainly around the last two stages: whether to | |||
allow for the size of the specific packet notifying congestion i) | allow for the size of the specific packet notifying congestion i) | |||
when the network encodes or ii) when the transport decodes the | when the network encodes or ii) when the transport decodes the | |||
congestion notification. | congestion notification. | |||
Currently, the RFC series is silent on this matter other than a paper | Currently, the RFC series is silent on this matter other than a paper | |||
trail of advice referenced from [RFC2309], which conditionally | trail of advice referenced from [RFC2309], which conditionally | |||
recommends byte-mode (packet-size dependent) drop [pktByteEmail]. | recommends byte-mode (packet-size dependent) drop [pktByteEmail]. | |||
Reducing drop of small packets certainly has some tempting | ||||
advantages: i) it drops less control packets, which tend to be small | Reducing the number of small packets dropped certainly has some | |||
and ii) it makes TCP's bit-rate less dependent on packet size. | tempting advantages: i) it drops fewer control packets, which tend to | |||
However, there are ways of addressing these issues at the transport | be small and ii) it makes TCP's bit rate less dependent on packet | |||
layer, rather than reverse engineering network forwarding to fix the | size. However, there are ways of addressing these issues at the | |||
problems. | transport layer, rather than reverse engineering network forwarding | |||
to fix the problems. | ||||
This memo updates [RFC2309] to deprecate deliberate preferential | This memo updates [RFC2309] to deprecate deliberate preferential | |||
treatment of packets in AQM algorithms solely because of their size. | treatment of packets in AQM algorithms solely because of their size. | |||
It recommends that (1) packet size should be taken into account when | It recommends that (1) packet size should be taken into account when | |||
transports detect and respond to congestion indications, (2) not when | transports detect and respond to congestion indications, (2) not when | |||
network equipment creates them. This memo also adds to the | network equipment creates them. This memo also adds to the | |||
congestion control principles enumerated in BCP 41 [RFC2914]. | congestion control principles enumerated in BCP 41 [RFC2914]. | |||
In the particular case of Random early Detection (RED), this means | In the particular case of Random Early Detection (RED), this means | |||
that the byte-mode packet drop variant should not be used to drop | that the byte-mode packet drop variant should not be used to drop | |||
fewer small packets, because that creates a perverse incentive for | fewer small packets, because that creates a perverse incentive for | |||
transports to use tiny segments, consequently also opening up a DoS | transports to use tiny segments, consequently also opening up a DoS | |||
vulnerability. Fortunately all the RED implementers who responded to | vulnerability. Fortunately, all the RED implementers who responded | |||
our admittedly limited survey (Section 4.2.4) have not followed the | to our admittedly limited survey (Section 4.2.4) have not followed | |||
earlier advice to use byte-mode drop, so the position this memo | the earlier advice to use byte-mode drop, so the position this memo | |||
argues for seems to already exist in implementations. | argues for seems to already exist in implementations. | |||
However, at the transport layer, TCP congestion control is a widely | However, at the transport layer, TCP congestion control is a widely | |||
deployed protocol that doesn't scale with packet size (i.e. its | deployed protocol that doesn't scale with packet size (i.e., its | |||
reduction in rate does not take into account the size of a lost | reduction in rate does not take into account the size of a lost | |||
packet). To date this hasn't been a significant problem because most | packet). To date, this hasn't been a significant problem because | |||
TCP implementations have been used with similar packet sizes. But, | most TCP implementations have been used with similar packet sizes. | |||
as we design new congestion control mechanisms, this memo recommends | But, as we design new congestion control mechanisms, this memo | |||
that we should build in scaling with packet size rather than assuming | recommends that we build in scaling with packet size rather than | |||
we should follow TCP's example. | assuming that we should follow TCP's example. | |||
This memo continues as follows. First it discusses terminology and | This memo continues as follows. First, it discusses terminology and | |||
scoping. Section 2 gives the concrete formal recommendations, | scoping. Section 2 gives concrete formal recommendations, followed | |||
followed by motivating arguments in Section 3. We then critically | by motivating arguments in Section 3. We then critically survey the | |||
survey the advice given previously in the RFC series and the research | advice given previously in the RFC Series and the research literature | |||
literature (Section 4), referring to an assessment of whether or not | (Section 4), referring to an assessment of whether or not this advice | |||
this advice has been followed in production networks (Appendix A). | has been followed in production networks (Appendix A). To wrap up, | |||
To wrap up, outstanding issues are discussed that will need | outstanding issues are discussed that will need resolution both to | |||
resolution both to inform future protocol designs and to handle | inform future protocol designs and to handle legacy AQM deployments | |||
legacy (Section 5). Then security issues are collected together in | (Section 5). Then security issues are collected together in | |||
Section 6 before conclusions are drawn in Section 8. The interested | Section 6 before conclusions are drawn in Section 7. The interested | |||
reader can find discussion of more detailed issues on the theme of | reader can find discussion of more detailed issues on the theme of | |||
byte vs. packet in the appendices. | byte vs. packet in the appendices. | |||
This memo intentionally includes a non-negligible amount of material | This memo intentionally includes a non-negligible amount of material | |||
on the subject. For the busy reader Section 2 summarises the | on the subject. For the busy reader, Section 2 summarises the | |||
recommendations for the Internet community. | recommendations for the Internet community. | |||
1.1. Terminology and Scoping | 1.1. Terminology and Scoping | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
This memo applies to the design of all AQM algorithms, for example, | This memo applies to the design of all AQM algorithms, for example, | |||
Random Early Detection (RED) [RFC2309], BLUE [BLUE02], Pre-Congestion | Random Early Detection (RED) [RFC2309], BLUE [BLUE02], Pre-Congestion | |||
Notification (PCN) [RFC5670], Controlled Delay (CoDel) | Notification (PCN) [RFC5670], Controlled Delay (CoDel) [CoDel], and | |||
[I-D.nichols-tsvwg-codel] and the Proportional Integral controller | the Proportional Integral controller Enhanced (PIE) [PIE]. | |||
Enhanced (PIE) [I-D.pan-tsvwg-pie]. Throughout, RED is used as a | Throughout, RED is used as a concrete example because it is a widely | |||
concrete example because it is a widely known and deployed AQM | known and deployed AQM algorithm. There is no intention to imply | |||
algorithm. There is no intention to imply that the advice is any | that the advice is any less applicable to the other algorithms, nor | |||
less applicable to the other algorithms, nor that RED is preferred. | that RED is preferred. | |||
Congestion Notification: Congestion notification is a changing | Congestion Notification: Congestion notification is a changing | |||
signal that aims to communicate the probability that the network | signal that aims to communicate the probability that the network | |||
resource(s) will not be able to forward the level of traffic load | resource(s) will not be able to forward the level of traffic load | |||
offered (or that there is an impending risk that they will not be | offered (or that there is an impending risk that they will not be | |||
able to). | able to). | |||
The `impending risk' qualifier is added, because AQM systems set a | The 'impending risk' qualifier is added, because AQM systems set a | |||
virtual limit smaller than the actual limit to the resource, then | virtual limit smaller than the actual limit to the resource, then | |||
notify when this virtual limit is exceeded in order to avoid | notify the transport when this virtual limit is exceeded in order | |||
uncontrolled congestion of the actual capacity. | to avoid uncontrolled congestion of the actual capacity. | |||
Congestion notification communicates a real number bounded by the | Congestion notification communicates a real number bounded by the | |||
range [ 0 , 1 ]. This ties in with the most well-understood | range [ 0 , 1 ]. This ties in with the most well-understood | |||
measure of congestion notification: drop probability. | measure of congestion notification: drop probability. | |||
Explicit and Implicit Notification: The byte vs. packet dilemma | Explicit and Implicit Notification: The byte vs. packet dilemma | |||
concerns congestion notification irrespective of whether it is | concerns congestion notification irrespective of whether it is | |||
signalled implicitly by drop or using Explicit Congestion | signalled implicitly by drop or explicitly using ECN [RFC3168] or | |||
Notification (ECN [RFC3168] or PCN [RFC5670]). Throughout this | PCN [RFC5670]. Throughout this document, unless clear from the | |||
document, unless clear from the context, the term marking will be | context, the term 'marking' will be used to mean notifying | |||
used to mean notifying congestion explicitly, while congestion | congestion explicitly, while 'congestion notification' will be | |||
notification will be used to mean notifying congestion either | used to mean notifying congestion either implicitly by drop or | |||
implicitly by drop or explicitly by marking. | explicitly by marking. | |||
Bit-congestible vs. Packet-congestible: If the load on a resource | Bit-congestible vs. Packet-congestible: If the load on a resource | |||
depends on the rate at which packets arrive, it is called packet- | depends on the rate at which packets arrive, it is called 'packet- | |||
congestible. If the load depends on the rate at which bits arrive | congestible'. If the load depends on the rate at which bits | |||
it is called bit-congestible. | arrive, it is called 'bit-congestible'. | |||
Examples of packet-congestible resources are route look-up engines | Examples of packet-congestible resources are route look-up engines | |||
and firewalls, because load depends on how many packet headers | and firewalls, because load depends on how many packet headers | |||
they have to process. Examples of bit-congestible resources are | they have to process. Examples of bit-congestible resources are | |||
transmission links, radio power and most buffer memory, because | transmission links, radio power, and most buffer memory, because | |||
the load depends on how many bits they have to transmit or store. | the load depends on how many bits they have to transmit or store. | |||
Some machine architectures use fixed size packet buffers, so | Some machine architectures use fixed-size packet buffers, so | |||
buffer memory in these cases is packet-congestible (see | buffer memory in these cases is packet-congestible (see | |||
Section 4.1.1). | Section 4.1.1). | |||
The path through a machine will typically encounter both packet- | The path through a machine will typically encounter both packet- | |||
congestible and bit-congestible resources. However, currently, a | congestible and bit-congestible resources. However, currently, a | |||
design goal of network processing equipment such as routers and | design goal of network processing equipment such as routers and | |||
firewalls is to size the packet-processing engine(s) relative to | firewalls is to size the packet-processing engine(s) relative to | |||
the lines in order to keep packet processing uncongested even | the lines in order to keep packet processing uncongested, even | |||
under worst case packet rates with runs of minimum size packets. | under worst-case packet rates with runs of minimum-size packets. | |||
Therefore, packet-congestion is currently rare [RFC6077; S.3.3], | Therefore, packet congestion is currently rare (see Section 3.3 of | |||
but there is no guarantee that it will not become more common in | [RFC6077]), but there is no guarantee that it will not become more | |||
future. | common in the future. | |||
Note that information is generally processed or transmitted with a | Note that information is generally processed or transmitted with a | |||
minimum granularity greater than a bit (e.g. octets). The | minimum granularity greater than a bit (e.g., octets). The | |||
appropriate granularity for the resource in question should be | appropriate granularity for the resource in question should be | |||
used, but for the sake of brevity we will talk in terms of bytes | used, but for the sake of brevity we will talk in terms of bytes | |||
in this memo. | in this memo. | |||
Coarser Granularity: Resources may be congestible at higher levels | Coarser Granularity: Resources may be congestible at higher levels | |||
of granularity than bits or packets, for instance stateful | of granularity than bits or packets, for instance stateful | |||
firewalls are flow-congestible and call-servers are session- | firewalls are flow-congestible and call-servers are session- | |||
congestible. This memo focuses on congestion of connectionless | congestible. This memo focuses on congestion of connectionless | |||
resources, but the same principles may be applicable for | resources, but the same principles may be applicable for | |||
congestion notification protocols controlling per-flow and per- | congestion notification protocols controlling per-flow and per- | |||
session processing or state. | session processing or state. | |||
RED Terminology: In RED whether to use packets or bytes when | RED Terminology: In RED, whether to use packets or bytes when | |||
measuring queues is called respectively "packet-mode queue | measuring queues is called, respectively, 'packet-mode queue | |||
measurement" or "byte-mode queue measurement". And whether the | measurement' or 'byte-mode queue measurement'. And whether the | |||
probability of dropping a particular packet is independent or | probability of dropping a particular packet is independent or | |||
dependent on its size is called respectively "packet-mode drop" or | dependent on its size is called, respectively, 'packet-mode drop' | |||
"byte-mode drop". The terms byte-mode and packet-mode should not | or 'byte-mode drop'. The terms 'byte-mode' and 'packet-mode' | |||
be used without specifying whether they apply to queue measurement | should not be used without specifying whether they apply to queue | |||
or to drop. | measurement or to drop. | |||
1.2. Example Comparing Packet-Mode Drop and Byte-Mode Drop | 1.2. Example Comparing Packet-Mode Drop and Byte-Mode Drop | |||
Taking RED as a well-known example algorithm, a central question | Taking RED as a well-known example algorithm, a central question | |||
addressed by this document is whether to recommend RED's packet-mode | addressed by this document is whether to recommend RED's packet-mode | |||
drop variant and to deprecate byte-mode drop. Table 1 compares how | drop variant and to deprecate byte-mode drop. Table 1 compares how | |||
packet-mode and byte-mode drop affect two flows of different size | packet-mode and byte-mode drop affect two flows of different size | |||
packets. For each it gives the expected number of packets and of | packets. For each it gives the expected number of packets and of | |||
bits dropped in one second. Each example flow runs at the same bit- | bits dropped in one second. Each example flow runs at the same bit | |||
rate of 48Mb/s, but one is broken up into small 60 byte packets and | rate of 48 Mbps, but one is broken up into small 60 byte packets and | |||
the other into large 1500 byte packets. | the other into large 1,500 byte packets. | |||
To keep up the same bit-rate, in one second there are about 25 times | To keep up the same bit rate, in one second there are about 25 times | |||
more small packets because they are 25 times smaller. As can be seen | more small packets because they are 25 times smaller. As can be seen | |||
from the table, the packet rate is 100,000 small packets versus 4,000 | from the table, the packet rate is 100,000 small packets versus 4,000 | |||
large packets per second (pps). | large packets per second (pps). | |||
Parameter Formula Small packets Large packets | Parameter Formula Small packets Large packets | |||
-------------------- -------------- ------------- ------------- | -------------------- --------------- ------------- ------------- | |||
Packet size s/8 60B 1,500B | Packet size s/8 60 B 1,500 B | |||
Packet size s 480b 12,000b | Packet size s 480 b 12,000 b | |||
Bit-rate x 48Mbps 48Mbps | Bit rate x 48 Mbps 48 Mbps | |||
Packet-rate u = x/s 100kpps 4kpps | Packet rate u = x/s 100 kpps 4 kpps | |||
Packet-mode Drop | Packet-mode Drop | |||
Pkt loss probability p 0.1% 0.1% | Pkt-loss probability p 0.1% 0.1% | |||
Pkt loss-rate p*u 100pps 4pps | Pkt-loss rate p*u 100 pps 4 pps | |||
Bit loss-rate p*u*s 48kbps 48kbps | Bit-loss rate p*u*s 48 kbps 48 kbps | |||
Byte-mode Drop MTU, M=12,000b | Byte-mode Drop MTU, M=12,000 b | |||
Pkt loss probability b = p*s/M 0.004% 0.1% | Pkt-loss probability b = p*s/M 0.004% 0.1% | |||
Pkt loss-rate b*u 4pps 4pps | Pkt-loss rate b*u 4 pps 4 pps | |||
Bit loss-rate b*u*s 1.92kbps 48kbps | Bit-loss rate b*u*s 1.92 kbps 48 kbps | |||
Table 1: Example Comparing Packet-mode and Byte-mode Drop | Table 1: Example Comparing Packet-Mode and Byte-Mode Drop | |||
For packet-mode drop, we illustrate the effect of a drop probability | For packet-mode drop, we illustrate the effect of a drop probability | |||
of 0.1%, which the algorithm applies to all packets irrespective of | of 0.1%, which the algorithm applies to all packets irrespective of | |||
size. Because there are 25 times more small packets in one second, | size. Because there are 25 times more small packets in one second, | |||
it naturally drops 25 times more small packets, that is 100 small | it naturally drops 25 times more small packets, that is, 100 small | |||
packets but only 4 large packets. But if we count how many bits it | packets but only 4 large packets. But if we count how many bits it | |||
drops, there are 48,000 bits in 100 small packets and 48,000 bits in | drops, there are 48,000 bits in 100 small packets and 48,000 bits in | |||
4 large packets--the same number of bits of small packets as large. | 4 large packets -- the same number of bits of small packets as large. | |||
The packet-mode drop algorithm drops any bit with the same | The packet-mode drop algorithm drops any bit with the same | |||
probability whether the bit is in a small or a large packet. | probability whether the bit is in a small or a large packet. | |||
For byte-mode drop, again we use an example drop probability of 0.1%, | For byte-mode drop, again we use an example drop probability of 0.1%, | |||
but only for maximum size packets (assuming the link maximum | but only for maximum size packets (assuming the link maximum | |||
transmission unit (MTU) is 1,500B or 12,000b). The byte-mode | transmission unit (MTU) is 1,500 B or 12,000 b). The byte-mode | |||
algorithm reduces the drop probability of smaller packets | algorithm reduces the drop probability of smaller packets | |||
proportional to their size, making the probability that it drops a | proportional to their size, making the probability that it drops a | |||
small packet 25 times smaller at 0.004%. But there are 25 times more | small packet 25 times smaller at 0.004%. But there are 25 times more | |||
small packets, so dropping them with 25 times lower probability | small packets, so dropping them with 25 times lower probability | |||
results in dropping the same number of packets: 4 drops in both | results in dropping the same number of packets: 4 drops in both | |||
cases. The 4 small dropped packets contain 25 times less bits than | cases. The 4 small dropped packets contain 25 times less bits than | |||
the 4 large dropped packets: 1,920 compared to 48,000. | the 4 large dropped packets: 1,920 compared to 48,000. | |||
The byte-mode drop algorithm drops any bit with a probability | The byte-mode drop algorithm drops any bit with a probability | |||
proportionate to the size of the packet it is in. | proportionate to the size of the packet it is in. | |||
2. Recommendations | 2. Recommendations | |||
This section gives recommendations related to network equipment in | This section gives recommendations related to network equipment in | |||
Sections 2.1 and 2.2, and in Sections 2.3 and 2.4 we discuss the | Sections 2.1 and 2.2, and we discuss the implications on transport | |||
implications on the transport protocols. | protocols in Sections 2.3 and 2.4. | |||
2.1. Recommendation on Queue Measurement | 2.1. Recommendation on Queue Measurement | |||
Ideally, an AQM would measure the service time of the queue to | Ideally, an AQM would measure the service time of the queue to | |||
measure congestion of a resource. However service time can only be | measure congestion of a resource. However service time can only be | |||
measured as packets leave the queue, where it is not always expedient | measured as packets leave the queue, where it is not always expedient | |||
to implement a full AQM algorithm. To predict the service time as | to implement a full AQM algorithm. To predict the service time as | |||
packets join the queue, an AQM algorithm needs to measure the length | packets join the queue, an AQM algorithm needs to measure the length | |||
of the queue. | of the queue. | |||
In this case, if the resource is bit-congestible, the AQM | In this case, if the resource is bit-congestible, the AQM | |||
implementation SHOULD measure the length of the queue in bytes and, | implementation SHOULD measure the length of the queue in bytes and, | |||
if the resource is packet-congestible, the implementation SHOULD | if the resource is packet-congestible, the implementation SHOULD | |||
measure the length of the queue in packets. Subject to the | measure the length of the queue in packets. Subject to the | |||
exceptions below, no other choice makes sense, because the number of | exceptions below, no other choice makes sense, because the number of | |||
packets waiting in the queue isn't relevant if the resource gets | packets waiting in the queue isn't relevant if the resource gets | |||
congested by bytes and vice versa. For example, the length of the | congested by bytes and vice versa. For example, the length of the | |||
queue into a transmission line would be measured in bytes, while the | queue into a transmission line would be measured in bytes, while the | |||
length of the queue into a firewall would be measured in packets. | length of the queue into a firewall would be measured in packets. | |||
To avoid the pathological effects of drop tail, the AQM can then | To avoid the pathological effects of tail drop, the AQM can then | |||
transform this service time or queue length into the probability of | transform this service time or queue length into the probability of | |||
dropping or marking a packet (e.g. RED's piecewise linear function | dropping or marking a packet (e.g., RED's piecewise linear function | |||
between thresholds). | between thresholds). | |||
What this advice means for RED as a specific example: | What this advice means for RED as a specific example: | |||
1. A RED implementation SHOULD use byte mode queue measurement for | 1. A RED implementation SHOULD use byte-mode queue measurement for | |||
measuring the congestion of bit-congestible resources and packet | measuring the congestion of bit-congestible resources and packet- | |||
mode queue measurement for packet-congestible resources. | mode queue measurement for packet-congestible resources. | |||
2. An implementation SHOULD NOT make it possible to configure the | 2. An implementation SHOULD NOT make it possible to configure the | |||
way a queue measures itself, because whether a queue is bit- | way a queue measures itself, because whether a queue is bit- | |||
congestible or packet-congestible is an inherent property of the | congestible or packet-congestible is an inherent property of the | |||
queue. | queue. | |||
Exceptions to these recommendations might be necessary, for instance | Exceptions to these recommendations might be necessary, for instance | |||
where a packet-congestible resource has to be configured as a proxy | where a packet-congestible resource has to be configured as a proxy | |||
bottleneck for a bit-congestible resource in an adjacent box that | bottleneck for a bit-congestible resource in an adjacent box that | |||
does not support AQM. | does not support AQM. | |||
The recommended approach in less straightforward scenarios, such as | The recommended approach in less straightforward scenarios, such as | |||
fixed size packet buffers, resources without a queue and buffers | fixed-size packet buffers, resources without a queue, and buffers | |||
comprising a mix of packet and bit-congestible resources, is | comprising a mix of packet and bit-congestible resources, is | |||
discussed in Section 4.1. For instance, Section 4.1.1 explains that | discussed in Section 4.1. For instance, Section 4.1.1 explains that | |||
the queue into a line should be measured in bytes even if the queue | the queue into a line should be measured in bytes even if the queue | |||
consists of fixed-size packet-buffers, because the root-cause of any | consists of fixed-size packet buffers, because the root cause of any | |||
congestion is bytes arriving too fast for the line--packets filling | congestion is bytes arriving too fast for the line -- packets filling | |||
buffers are merely a symptom of the underlying congestion of the | buffers are merely a symptom of the underlying congestion of the | |||
line. | line. | |||
2.2. Recommendation on Encoding Congestion Notification | 2.2. Recommendation on Encoding Congestion Notification | |||
When encoding congestion notification (e.g. by drop, ECN or PCN), the | When encoding congestion notification (e.g., by drop, ECN, or PCN), | |||
probability that network equipment drops or marks a particular packet | the probability that network equipment drops or marks a particular | |||
to notify congestion SHOULD NOT depend on the size of the packet in | packet to notify congestion SHOULD NOT depend on the size of the | |||
question. As the example in Section 1.2 illustrates, to drop any bit | packet in question. As the example in Section 1.2 illustrates, to | |||
with probability 0.1% it is only necessary to drop every packet with | drop any bit with probability 0.1%, it is only necessary to drop | |||
probability 0.1% without regard to the size of each packet. | every packet with probability 0.1% without regard to the size of each | |||
packet. | ||||
This approach ensures the network layer offers sufficient congestion | This approach ensures the network layer offers sufficient congestion | |||
information for all known and future transport protocols and also | information for all known and future transport protocols and also | |||
ensures no perverse incentives are created that would encourage | ensures no perverse incentives are created that would encourage | |||
transports to use inappropriately small packet sizes. | transports to use inappropriately small packet sizes. | |||
What this advice means for RED as a specific example: | What this advice means for RED as a specific example: | |||
1. The RED AQM algorithm SHOULD NOT use byte-mode drop, i.e. it | 1. The RED AQM algorithm SHOULD NOT use byte-mode drop, i.e., it | |||
ought to use packet-mode drop. Byte-mode drop is more complex, | ought to use packet-mode drop. Byte-mode drop is more complex, | |||
it creates the perverse incentive to fragment segments into tiny | it creates the perverse incentive to fragment segments into tiny | |||
pieces and it is vulnerable to floods of small packets. | pieces and it is vulnerable to floods of small packets. | |||
2. If a vendor has implemented byte-mode drop, and an operator has | 2. If a vendor has implemented byte-mode drop, and an operator has | |||
turned it on, it is RECOMMENDED to switch it to packet-mode drop, | turned it on, it is RECOMMENDED that the operator use packet-mode | |||
after establishing if there are any implications on the relative | drop instead, after establishing if there are any implications on | |||
performance of applications using different packet sizes. The | the relative performance of applications using different packet | |||
unlikely possibility of some application-specific legacy use of | sizes. The unlikely possibility of some application-specific | |||
byte-mode drop is the only reason that all the above | legacy use of byte-mode drop is the only reason that all the | |||
recommendations on encoding congestion notification are not | above recommendations on encoding congestion notification are not | |||
phrased more strongly. | phrased more strongly. | |||
RED as a whole SHOULD NOT be switched off. Without RED, a drop | RED as a whole SHOULD NOT be switched off. Without RED, a tail- | |||
tail queue biases against large packets and is vulnerable to | drop queue biases against large packets and is vulnerable to | |||
floods of small packets. | floods of small packets. | |||
Note well that RED's byte-mode queue drop is completely orthogonal to | Note well that RED's byte-mode queue drop is completely orthogonal to | |||
byte-mode queue measurement and should not be confused with it. If a | byte-mode queue measurement and should not be confused with it. If a | |||
RED implementation has a byte-mode but does not specify what sort of | RED implementation has a byte-mode but does not specify what sort of | |||
byte-mode, it is most probably byte-mode queue measurement, which is | byte-mode, it is most probably byte-mode queue measurement, which is | |||
fine. However, if in doubt, the vendor should be consulted. | fine. However, if in doubt, the vendor should be consulted. | |||
A survey (Appendix A) showed that there appears to be little, if any, | A survey (Appendix A) showed that there appears to be little, if any, | |||
installed base of the byte-mode drop variant of RED. This suggests | installed base of the byte-mode drop variant of RED. This suggests | |||
skipping to change at page 11, line 19 | skipping to change at page 11, line 28 | |||
deployment impact. | deployment impact. | |||
2.3. Recommendation on Responding to Congestion | 2.3. Recommendation on Responding to Congestion | |||
When a transport detects that a packet has been lost or congestion | When a transport detects that a packet has been lost or congestion | |||
marked, it SHOULD consider the strength of the congestion indication | marked, it SHOULD consider the strength of the congestion indication | |||
as proportionate to the size in octets (bytes) of the missing or | as proportionate to the size in octets (bytes) of the missing or | |||
marked packet. | marked packet. | |||
In other words, when a packet indicates congestion (by being lost or | In other words, when a packet indicates congestion (by being lost or | |||
marked) it can be considered conceptually as if there is a congestion | marked), it can be considered conceptually as if there is a | |||
indication on every octet of the packet, not just one indication per | congestion indication on every octet of the packet, not just one | |||
packet. | indication per packet. | |||
To be clear, the above recommendation solely describes how a | To be clear, the above recommendation solely describes how a | |||
transport should interpret the meaning of a congestion indication, as | transport should interpret the meaning of a congestion indication, as | |||
a long term goal. It makes no recommendation on whether a transport | a long term goal. It makes no recommendation on whether a transport | |||
should act differently based on this interpretation. It merely aids | should act differently based on this interpretation. It merely aids | |||
interoperablity between transports, if they choose to make their | interoperability between transports, if they choose to make their | |||
actions depend on the strength of congestion indications. | actions depend on the strength of congestion indications. | |||
This definition will be useful as the IETF transport area continues | This definition will be useful as the IETF transport area continues | |||
its programme of; | its programme of: | |||
o updating host-based congestion control protocols to take account | o updating host-based congestion control protocols to take packet | |||
of packet size | size into account, and | |||
o making transports less sensitive to losing control packets like | o making transports less sensitive to losing control packets like | |||
SYNs and pure ACKs. | SYNs and pure ACKs. | |||
What this advice means for the case of TCP: | What this advice means for the case of TCP: | |||
1. If two TCP flows with different packet sizes are required to run | 1. If two TCP flows with different packet sizes are required to run | |||
at equal bit rates under the same path conditions, this SHOULD be | at equal bit rates under the same path conditions, this SHOULD be | |||
done by altering TCP (Section 4.2.2), not network equipment (the | done by altering TCP (Section 4.2.2), not network equipment (the | |||
latter affects other transports besides TCP). | latter affects other transports besides TCP). | |||
2. If it is desired to improve TCP performance by reducing the | 2. If it is desired to improve TCP performance by reducing the | |||
chance that a SYN or a pure ACK will be dropped, this SHOULD be | chance that a SYN or a pure ACK will be dropped, this SHOULD be | |||
done by modifying TCP (Section 4.2.3), not network equipment. | done by modifying TCP (Section 4.2.3), not network equipment. | |||
To be clear, we are not recommending at all that TCPs under | To be clear, we are not recommending at all that TCPs under | |||
equivalent conditions should aim for equal bit-rates. We are merely | equivalent conditions should aim for equal bit rates. We are merely | |||
saying that anyone trying to do such a thing should modify their TCP | saying that anyone trying to do such a thing should modify their TCP | |||
algorithm, not the network. | algorithm, not the network. | |||
These recommendations are phrased as 'SHOULD' rather than 'MUST', | These recommendations are phrased as 'SHOULD' rather than 'MUST', | |||
because there may be cases where expediency dictates that | because there may be cases where expediency dictates that | |||
compatibility with pre-existing versions of a transport protocol make | compatibility with pre-existing versions of a transport protocol make | |||
the recommendations impractical. | the recommendations impractical. | |||
2.4. Recommendation on Handling Congestion Indications when Splitting | 2.4. Recommendation on Handling Congestion Indications When Splitting | |||
or Merging Packets | or Merging Packets | |||
Packets carrying congestion indications may be split or merged in | Packets carrying congestion indications may be split or merged in | |||
some circumstances (e.g. at a RTP/RTCP transcoder or during IP | some circumstances (e.g., at an RTP / RTP Control Protocol (RTCP) | |||
fragment reassembly). Splitting and merging only make sense in the | transcoder or during IP fragment reassembly). Splitting and merging | |||
context of ECN, not loss. | only make sense in the context of ECN, not loss. | |||
The general rule to follow is that the number of octets in packets | The general rule to follow is that the number of octets in packets | |||
with congestion indications SHOULD be equivalent before and after | with congestion indications SHOULD be equivalent before and after | |||
merging or splitting. This is based on the principle used above; | merging or splitting. This is based on the principle used above; | |||
that an indication of congestion on a packet can be considered as an | that an indication of congestion on a packet can be considered as an | |||
indication of congestion on each octet of the packet. | indication of congestion on each octet of the packet. | |||
The above rule is not phrased with the word "MUST" to allow the | The above rule is not phrased with the word 'MUST' to allow the | |||
following exception. There are cases where pre-existing protocols | following exception. There are cases in which pre-existing protocols | |||
were not designed to conserve congestion marked octets (e.g. IP | were not designed to conserve congestion-marked octets (e.g., IP | |||
fragment reassembly [RFC3168] or loss statistics in RTCP receiver | fragment reassembly [RFC3168] or loss statistics in RTCP receiver | |||
reports [RFC3550] before ECN was added [RFC6679]). When any such | reports [RFC3550] before ECN was added [RFC6679]). When any such | |||
protocol is updated, it SHOULD comply with the above rule to conserve | protocol is updated, it SHOULD comply with the above rule to conserve | |||
marked octets. However, the rule may be relaxed if it would | marked octets. However, the rule may be relaxed if it would | |||
otherwise become too complex to interoperate with pre-existing | otherwise become too complex to interoperate with pre-existing | |||
implementations of the protocol. | implementations of the protocol. | |||
One can think of a splitting or merging process as if all the | One can think of a splitting or merging process as if all the | |||
incoming congestion-marked octets increment a counter and all the | incoming congestion-marked octets increment a counter and all the | |||
outgoing marked octets decrement the same counter. In order to | outgoing marked octets decrement the same counter. In order to | |||
ensure that congestion indications remain timely, even the smallest | ensure that congestion indications remain timely, even the smallest | |||
positive remainder in the conceptual counter should trigger the next | positive remainder in the conceptual counter should trigger the next | |||
outgoing packet to be marked (causing the counter to go negative). | outgoing packet to be marked (causing the counter to go negative). | |||
3. Motivating Arguments | 3. Motivating Arguments | |||
This section is informative. It justifies the recommendations given | This section is informative. It justifies the recommendations made | |||
in the previous section. | in the previous section. | |||
3.1. Avoiding Perverse Incentives to (Ab)use Smaller Packets | 3.1. Avoiding Perverse Incentives to (Ab)use Smaller Packets | |||
Increasingly, it is being recognised that a protocol design must take | Increasingly, it is being recognised that a protocol design must take | |||
care not to cause unintended consequences by giving the parties in | care not to cause unintended consequences by giving the parties in | |||
the protocol exchange perverse incentives [Evol_cc][RFC3426]. Given | the protocol exchange perverse incentives [Evol_cc] [RFC3426]. Given | |||
there are many good reasons why larger path maximum transmission | there are many good reasons why larger path maximum transmission | |||
units (PMTUs) would help solve a number of scaling issues, we do not | units (PMTUs) would help solve a number of scaling issues, we do not | |||
want to create any bias against large packets that is greater than | want to create any bias against large packets that is greater than | |||
their true cost. | their true cost. | |||
Imagine a scenario where the same bit rate of packets will contribute | Imagine a scenario where the same bit rate of packets will contribute | |||
the same to bit-congestion of a link irrespective of whether it is | the same to bit congestion of a link irrespective of whether it is | |||
sent as fewer larger packets or more smaller packets. A protocol | sent as fewer larger packets or more smaller packets. A protocol | |||
design that caused larger packets to be more likely to be dropped | design that caused larger packets to be more likely to be dropped | |||
than smaller ones would be dangerous in both the following cases: | than smaller ones would be dangerous in both of the following cases: | |||
Malicious transports: A queue that gives an advantage to small | Malicious transports: A queue that gives an advantage to small | |||
packets can be used to amplify the force of a flooding attack. By | packets can be used to amplify the force of a flooding attack. By | |||
sending a flood of small packets, the attacker can get the queue | sending a flood of small packets, the attacker can get the queue | |||
to discard more traffic in large packets, allowing more attack | to discard more large-packet traffic, allowing more attack traffic | |||
traffic to get through to cause further damage. Such a queue | to get through to cause further damage. Such a queue allows | |||
allows attack traffic to have a disproportionately large effect on | attack traffic to have a disproportionately large effect on | |||
regular traffic without the attacker having to do much work. | regular traffic without the attacker having to do much work. | |||
Non-malicious transports: Even if an application designer is not | Non-malicious transports: Even if an application designer is not | |||
actually malicious, if over time it is noticed that small packets | actually malicious, if over time it is noticed that small packets | |||
tend to go faster, designers will act in their own interest and | tend to go faster, designers will act in their own interest and | |||
use smaller packets. Queues that give advantage to small packets | use smaller packets. Queues that give advantage to small packets | |||
create an evolutionary pressure for applications or transports to | create an evolutionary pressure for applications or transports to | |||
send at the same bit-rate but break their data stream down into | send at the same bit rate but break their data stream down into | |||
tiny segments to reduce their drop rate. Encouraging a high | tiny segments to reduce their drop rate. Encouraging a high | |||
volume of tiny packets might in turn unnecessarily overload a | volume of tiny packets might in turn unnecessarily overload a | |||
completely unrelated part of the system, perhaps more limited by | completely unrelated part of the system, perhaps more limited by | |||
header-processing than bandwidth. | header processing than bandwidth. | |||
Imagine two unresponsive flows arrive at a bit-congestible | Imagine that two unresponsive flows arrive at a bit-congestible | |||
transmission link each with the same bit rate, say 1Mbps, but one | transmission link each with the same bit rate, say 1 Mbps, but one | |||
consists of 1500B and the other 60B packets, which are 25x smaller. | consists of 1,500 B and the other 60 B packets, which are 25x | |||
Consider a scenario where gentle RED [gentle_RED] is used, along with | smaller. Consider a scenario where gentle RED [gentle_RED] is used, | |||
the variant of RED we advise against, i.e. where the RED algorithm is | along with the variant of RED we advise against, i.e., where the RED | |||
configured to adjust the drop probability of packets in proportion to | algorithm is configured to adjust the drop probability of packets in | |||
each packet's size (byte mode packet drop). In this case, RED aims | proportion to each packet's size (byte-mode packet drop). In this | |||
to drop 25x more of the larger packets than the smaller ones. Thus, | case, RED aims to drop 25x more of the larger packets than the | |||
for example if RED drops 25% of the larger packets, it will aim to | smaller ones. Thus, for example, if RED drops 25% of the larger | |||
drop 1% of the smaller packets (but in practice it may drop more as | packets, it will aim to drop 1% of the smaller packets (but, in | |||
congestion increases [RFC4828; Appx B.4]). Even though both flows | practice, it may drop more as congestion increases; see Appendix B.4 | |||
arrive with the same bit rate, the bit rate the RED queue aims to | of [RFC4828]). Even though both flows arrive with the same bit rate, | |||
pass to the line will be 750kbps for the flow of larger packets but | the bit rate the RED queue aims to pass to the line will be 750 kbps | |||
990kbps for the smaller packets (because of rate variations it will | for the flow of larger packets but 990 kbps for the smaller packets | |||
actually be a little less than this target). | (because of rate variations, it will actually be a little less than | |||
this target). | ||||
Note that, although the byte-mode drop variant of RED amplifies small | Note that, although the byte-mode drop variant of RED amplifies | |||
packet attacks, drop-tail queues amplify small packet attacks even | small-packet attacks, tail-drop queues amplify small-packet attacks | |||
more (see Security Considerations in Section 6). Wherever possible | even more (see Security Considerations in Section 6). Wherever | |||
neither should be used. | possible, neither should be used. | |||
3.2. Small != Control | 3.2. Small != Control | |||
Dropping fewer control packets considerably improves performance. It | Dropping fewer control packets considerably improves performance. It | |||
is tempting to drop small packets with lower probability in order to | is tempting to drop small packets with lower probability in order to | |||
improve performance, because many control packets tend to be smaller | improve performance, because many control packets tend to be smaller | |||
(TCP SYNs & ACKs, DNS queries & responses, SIP messages, HTTP GETs, | (TCP SYNs and ACKs, DNS queries and responses, SIP messages, HTTP | |||
etc). However, we must not give control packets preference purely by | GETs, etc). However, we must not give control packets preference | |||
virtue of their smallness, otherwise it is too easy for any data | purely by virtue of their smallness, otherwise it is too easy for any | |||
source to get the same preferential treatment simply by sending data | data source to get the same preferential treatment simply by sending | |||
in smaller packets. Again we should not create perverse incentives | data in smaller packets. Again, we should not create perverse | |||
to favour small packets rather than to favour control packets, which | incentives to favour small packets rather than to favour control | |||
is what we intend. | packets, which is what we intend. | |||
Just because many control packets are small does not mean all small | Just because many control packets are small does not mean all small | |||
packets are control packets. | packets are control packets. | |||
So, rather than fix these problems in the network, we argue that the | So, rather than fix these problems in the network, we argue that the | |||
transport should be made more robust against losses of control | transport should be made more robust against losses of control | |||
packets (see 'Making Transports Robust against Control Packet Losses' | packets (see Section 4.2.3). | |||
in Section 4.2.3). | ||||
3.3. Transport-Independent Network | 3.3. Transport-Independent Network | |||
TCP congestion control ensures that flows competing for the same | TCP congestion control ensures that flows competing for the same | |||
resource each maintain the same number of segments in flight, | resource each maintain the same number of segments in flight, | |||
irrespective of segment size. So under similar conditions, flows | irrespective of segment size. So under similar conditions, flows | |||
with different segment sizes will get different bit-rates. | with different segment sizes will get different bit rates. | |||
To counter this effect it seems tempting not to follow our | To counter this effect, it seems tempting not to follow our | |||
recommendation, and instead for the network to bias congestion | recommendation, and instead for the network to bias congestion | |||
notification by packet size in order to equalise the bit-rates of | notification by packet size in order to equalise the bit rates of | |||
flows with different packet sizes. However, in order to do this, the | flows with different packet sizes. However, in order to do this, the | |||
queuing algorithm has to make assumptions about the transport, which | queuing algorithm has to make assumptions about the transport, which | |||
become embedded in the network. Specifically: | become embedded in the network. Specifically: | |||
o The queuing algorithm has to assume how aggressively the transport | o The queuing algorithm has to assume how aggressively the transport | |||
will respond to congestion (see Section 4.2.4). If the network | will respond to congestion (see Section 4.2.4). If the network | |||
assumes the transport responds as aggressively as TCP NewReno, it | assumes the transport responds as aggressively as TCP NewReno, it | |||
will be wrong for Compound TCP and differently wrong for Cubic | will be wrong for Compound TCP and differently wrong for Cubic | |||
TCP, etc. To achieve equal bit-rates, each transport then has to | TCP, etc. To achieve equal bit rates, each transport then has to | |||
guess what assumption the network made, and work out how to | guess what assumption the network made, and work out how to | |||
replace this assumed aggressiveness with its own aggressiveness. | replace this assumed aggressiveness with its own aggressiveness. | |||
o Also, if the network biases congestion notification by packet size | o Also, if the network biases congestion notification by packet | |||
it has to assume a baseline packet size--all proposed algorithms | size, it has to assume a baseline packet size -- all proposed | |||
use the local MTU (for example see the byte-mode loss probability | algorithms use the local MTU (for example, see the byte-mode loss | |||
formula in Table 1). Then if the non-Reno transports mentioned | probability formula in Table 1). Then if the non-Reno transports | |||
above are trying to reverse engineer what the network assumed, | mentioned above are trying to reverse engineer what the network | |||
they also have to guess the MTU of the congested link. | assumed, they also have to guess the MTU of the congested link. | |||
Even though reducing the drop probability of small packets (e.g. | Even though reducing the drop probability of small packets (e.g., | |||
RED's byte-mode drop) helps ensure TCP flows with different packet | RED's byte-mode drop) helps ensure TCP flows with different packet | |||
sizes will achieve similar bit rates, we argue this correction should | sizes will achieve similar bit rates, we argue that this correction | |||
be made to any future transport protocols based on TCP, not to the | should be made to any future transport protocols based on TCP, not to | |||
network in order to fix one transport, no matter how predominant it | the network in order to fix one transport, no matter how predominant | |||
is. Effectively, favouring small packets is reverse engineering of | it is. Effectively, favouring small packets is reverse engineering | |||
network equipment around one particular transport protocol (TCP), | of network equipment around one particular transport protocol (TCP), | |||
contrary to the excellent advice in [RFC3426], which asks designers | contrary to the excellent advice in [RFC3426], which asks designers | |||
to question "Why are you proposing a solution at this layer of the | to question "Why are you proposing a solution at this layer of the | |||
protocol stack, rather than at another layer?" | protocol stack, rather than at another layer?" | |||
In contrast, if the network never takes account of packet size, the | In contrast, if the network never takes packet size into account, the | |||
transport can be certain it will never need to guess any assumptions | transport can be certain it will never need to guess any assumptions | |||
the network has made. And the network passes two pieces of | that the network has made. And the network passes two pieces of | |||
information to the transport that are sufficient in all cases: i) | information to the transport that are sufficient in all cases: i) | |||
congestion notification on the packet and ii) the size of the packet. | congestion notification on the packet and ii) the size of the packet. | |||
Both are available for the transport to combine (by taking account of | Both are available for the transport to combine (by taking packet | |||
packet size when responding to congestion) or not. Appendix B checks | size into account when responding to congestion) or not. Appendix B | |||
that these two pieces of information are sufficient for all relevant | checks that these two pieces of information are sufficient for all | |||
scenarios. | relevant scenarios. | |||
When the network does not take account of packet size, it allows | When the network does not take packet size into account, it allows | |||
transport protocols to choose whether to take account of packet size | transport protocols to choose whether or not to take packet size into | |||
or not. However, if the network were to bias congestion notification | account. However, if the network were to bias congestion | |||
by packet size, transport protocols would have no choice; those that | notification by packet size, transport protocols would have no | |||
did not take account of packet size themselves would unwittingly | choice; those that did not take into account packet size themselves | |||
become dependent on packet size, and those that already took account | would unwittingly become dependent on packet size, and those that | |||
of packet size would end up taking account of it twice. | already took packet size into account would end up taking it into | |||
account twice. | ||||
3.4. Partial Deployment of AQM | 3.4. Partial Deployment of AQM | |||
In overview, the argument in this section runs as follows: | In overview, the argument in this section runs as follows: | |||
o Because the network does not and cannot always drop packets in | o Because the network does not and cannot always drop packets in | |||
proportion to their size, it shouldn't be given the task of making | proportion to their size, it shouldn't be given the task of making | |||
drop signals depend on packet size at all. | drop signals depend on packet size at all. | |||
o Transports on the other hand don't always want to make their rate | o Transports on the other hand don't always want to make their rate | |||
response proportional to the size of dropped packets, but if they | response proportional to the size of dropped packets, but if they | |||
want to, they always can. | want to, they always can. | |||
The argument is similar to the end-to-end argument that says "Don't | The argument is similar to the end-to-end argument that says "Don't | |||
do X in the network if end-systems can do X by themselves, and they | do X in the network if end systems can do X by themselves, and they | |||
want to be able to choose whether to do X anyway." Actually the | want to be able to choose whether to do X anyway". Actually the | |||
following argument is stronger; in addition it says "Don't give the | following argument is stronger; in addition it says "Don't give the | |||
network task X that could be done by the end-systems, if X is not | network task X that could be done by the end systems, if X is not | |||
deployed on all network nodes, and end-systems won't be able to tell | deployed on all network nodes, and end systems won't be able to tell | |||
whether their network is doing X, or whether they need to do X | whether their network is doing X, or whether they need to do X | |||
themselves." In this case, the X in question is "making the response | themselves." In this case, the X in question is "making the response | |||
to congestion depend on packet size". | to congestion depend on packet size". | |||
We will now re-run this argument taking each step in more depth. The | We will now re-run this argument reviewing each step in more depth. | |||
argument applies solely to drop, not to ECN marking. | The argument applies solely to drop, not to ECN marking. | |||
A queue drops packets for either of two reasons: a) to signal to host | A queue drops packets for either of two reasons: a) to signal to host | |||
congestion controls that they should reduce the load and b) because | congestion controls that they should reduce the load and b) because | |||
there is no buffer left to store the packets. Active queue | there is no buffer left to store the packets. Active queue | |||
management tries to use drops as a signal for hosts to slow down | management tries to use drops as a signal for hosts to slow down | |||
(case a) so that drop due to buffer exhaustion (case b) should not be | (case a) so that drops due to buffer exhaustion (case b) should not | |||
necessary. | be necessary. | |||
AQM is not universally deployed in every queue in the Internet; many | AQM is not universally deployed in every queue in the Internet; many | |||
cheap Ethernet bridges, software firewalls, NATs on consumer devices, | cheap Ethernet bridges, software firewalls, NATs on consumer devices, | |||
etc implement simple tail-drop buffers. Even if AQM were universal, | etc implement simple tail-drop buffers. Even if AQM were universal, | |||
it has to be able to cope with buffer exhaustion (by switching to a | it has to be able to cope with buffer exhaustion (by switching to a | |||
behaviour like tail-drop), in order to cope with unresponsive or | behaviour like tail drop), in order to cope with unresponsive or | |||
excessive transports. For these reasons networks will sometimes be | excessive transports. For these reasons networks will sometimes be | |||
dropping packets as a last resort (case b) rather than under AQM | dropping packets as a last resort (case b) rather than under AQM | |||
control (case a). | control (case a). | |||
When buffers are exhausted (case b), they don't naturally drop | When buffers are exhausted (case b), they don't naturally drop | |||
packets in proportion to their size. The network can only reduce the | packets in proportion to their size. The network can only reduce the | |||
probability of dropping smaller packets if it has enough space to | probability of dropping smaller packets if it has enough space to | |||
store them somewhere while it waits for a larger packet that it can | store them somewhere while it waits for a larger packet that it can | |||
drop. If the buffer is exhausted, it does not have this choice. | drop. If the buffer is exhausted, it does not have this choice. | |||
Admittedly tail-drop does naturally drop somewhat fewer small | Admittedly tail drop does naturally drop somewhat fewer small | |||
packets, but exactly how few depends more on the mix of sizes than | packets, but exactly how few depends more on the mix of sizes than | |||
the size of the packet in question. Nonetheless, in general, if we | the size of the packet in question. Nonetheless, in general, if we | |||
wanted networks to do size-dependent drop, we would need universal | wanted networks to do size-dependent drop, we would need universal | |||
deployment of (packet-size dependent) AQM code, which is currently | deployment of (packet-size dependent) AQM code, which is currently | |||
unrealistic. | unrealistic. | |||
A host transport cannot know whether any particular drop was a | A host transport cannot know whether any particular drop was a | |||
deliberate signal from an AQM or a sign of a queue shedding packets | deliberate signal from an AQM or a sign of a queue shedding packets | |||
due to buffer exhaustion. Therefore, because the network cannot | due to buffer exhaustion. Therefore, because the network cannot | |||
universally do size-dependent drop, it should not do it all. | universally do size-dependent drop, it should not do it all. | |||
Whereas universality is desirable in the network, diversity is | Whereas universality is desirable in the network, diversity is | |||
desirable between different transport layer protocols - some, like | desirable between different transport-layer protocols -- some, like | |||
NewReno TCP [RFC5681], may not choose to make their rate response | standards track TCP congestion control [RFC5681], may not choose to | |||
proportionate to the size of each dropped packet, while others will | make their rate response proportionate to the size of each dropped | |||
(e.g. TFRC-SP [RFC4828]). | packet, while others will (e.g., TCP-Friendly Rate Control for Small | |||
Packets (TFRC-SP) [RFC4828]). | ||||
3.5. Implementation Efficiency | 3.5. Implementation Efficiency | |||
Biasing against large packets typically requires an extra multiply | Biasing against large packets typically requires an extra multiply | |||
and divide in the network (see the example byte-mode drop formula in | and divide in the network (see the example byte-mode drop formula in | |||
Table 1). Allowing for packet size at the transport rather than in | Table 1). Taking packet size into account at the transport rather | |||
the network ensures that neither the network nor the transport needs | than in the network ensures that neither the network nor the | |||
to do a multiply operation--multiplication by packet size is | transport needs to do a multiply operation -- multiplication by | |||
effectively achieved as a repeated add when the transport adds to its | packet size is effectively achieved as a repeated add when the | |||
count of marked bytes as each congestion event is fed to it. Also | transport adds to its count of marked bytes as each congestion event | |||
the work to do the biasing is spread over many hosts, rather than | is fed to it. Also, the work to do the biasing is spread over many | |||
concentrated in just the congested network element. These aren't | hosts, rather than concentrated in just the congested network | |||
principled reasons in themselves, but they are a happy consequence of | element. These aren't principled reasons in themselves, but they are | |||
the other principled reasons. | a happy consequence of the other principled reasons. | |||
4. A Survey and Critique of Past Advice | 4. A Survey and Critique of Past Advice | |||
This section is informative, not normative. | This section is informative, not normative. | |||
The original 1993 paper on RED [RED93] proposed two options for the | The original 1993 paper on RED [RED93] proposed two options for the | |||
RED active queue management algorithm: packet mode and byte mode. | RED active queue management algorithm: packet mode and byte mode. | |||
Packet mode measured the queue length in packets and dropped (or | Packet mode measured the queue length in packets and dropped (or | |||
marked) individual packets with a probability independent of their | marked) individual packets with a probability independent of their | |||
size. Byte mode measured the queue length in bytes and marked an | size. Byte mode measured the queue length in bytes and marked an | |||
skipping to change at page 17, line 43 | skipping to change at page 18, line 11 | |||
further work, it was stated that no recommendation had been made on | further work, it was stated that no recommendation had been made on | |||
whether the queue size should be measured in bytes or packets, but | whether the queue size should be measured in bytes or packets, but | |||
noted that the difference could be significant. | noted that the difference could be significant. | |||
When RED was recommended for general deployment in 1998 [RFC2309], | When RED was recommended for general deployment in 1998 [RFC2309], | |||
the two modes were mentioned implying the choice between them was a | the two modes were mentioned implying the choice between them was a | |||
question of performance, referring to a 1997 email [pktByteEmail] for | question of performance, referring to a 1997 email [pktByteEmail] for | |||
advice on tuning. A later addendum to this email introduced the | advice on tuning. A later addendum to this email introduced the | |||
insight that there are in fact two orthogonal choices: | insight that there are in fact two orthogonal choices: | |||
o whether to measure queue length in bytes or packets (Section 4.1) | o whether to measure queue length in bytes or packets (Section 4.1), | |||
and | ||||
o whether the drop probability of an individual packet should depend | o whether the drop probability of an individual packet should depend | |||
on its own size (Section 4.2). | on its own size (Section 4.2). | |||
The rest of this section is structured accordingly. | The rest of this section is structured accordingly. | |||
4.1. Congestion Measurement Advice | 4.1. Congestion Measurement Advice | |||
The choice of which metric to use to measure queue length was left | The choice of which metric to use to measure queue length was left | |||
open in RFC2309. It is now well understood that queues for bit- | open in RFC 2309. It is now well understood that queues for bit- | |||
congestible resources should be measured in bytes, and queues for | congestible resources should be measured in bytes, and queues for | |||
packet-congestible resources should be measured in packets | packet-congestible resources should be measured in packets | |||
[pktByteEmail]. | [pktByteEmail]. | |||
Congestion in some legacy bit-congestible buffers is only measured in | Congestion in some legacy bit-congestible buffers is only measured in | |||
packets not bytes. In such cases, the operator has to set the | packets not bytes. In such cases, the operator has to take into | |||
thresholds mindful of a typical mix of packets sizes. Any AQM | account a typical mix of packet sizes when setting the thresholds. | |||
algorithm on such a buffer will be oversensitive to high proportions | Any AQM algorithm on such a buffer will be oversensitive to high | |||
of small packets, e.g. a DoS attack, and under-sensitive to high | proportions of small packets, e.g., a DoS attack, and under-sensitive | |||
proportions of large packets. However, there is no need to make | to high proportions of large packets. However, there is no need to | |||
allowances for the possibility of such legacy in future protocol | make allowances for the possibility of such a legacy in future | |||
design. This is safe because any under-sensitivity during unusual | protocol design. This is safe because any under-sensitivity during | |||
traffic mixes cannot lead to congestion collapse given the buffer | unusual traffic mixes cannot lead to congestion collapse given that | |||
will eventually revert to tail drop, discarding proportionately more | the buffer will eventually revert to tail drop, which discards | |||
large packets. | proportionately more large packets. | |||
4.1.1. Fixed Size Packet Buffers | 4.1.1. Fixed-Size Packet Buffers | |||
The question of whether to measure queues in bytes or packets seems | The question of whether to measure queues in bytes or packets seems | |||
to be well understood. However, measuring congestion is confusing | to be well understood. However, measuring congestion is confusing | |||
when the resource is bit congestible but the queue into the resource | when the resource is bit-congestible but the queue into the resource | |||
is packet congestible. This section outlines the approach to take. | is packet-congestible. This section outlines the approach to take. | |||
Some, mostly older, queuing hardware allocates fixed sized buffers in | Some, mostly older, queuing hardware allocates fixed-size buffers in | |||
which to store each packet in the queue. This hardware forwards to | which to store each packet in the queue. This hardware forwards | |||
the line in one of two ways: | packets to the line in one of two ways: | |||
o With some hardware, any fixed sized buffers not completely filled | o With some hardware, any fixed-size buffers not completely filled | |||
by a packet are padded when transmitted to the wire. This case, | by a packet are padded when transmitted to the wire. This case | |||
should clearly be treated as packet-congestible, because both | should clearly be treated as packet-congestible, because both | |||
queuing and transmission are in fixed MTU-sized units. Therefore | queuing and transmission are in fixed MTU-size units. Therefore, | |||
the queue length in packets is a good model of congestion of the | the queue length in packets is a good model of congestion of the | |||
link. | link. | |||
o More commonly, hardware with fixed size packet buffers transmits | o More commonly, hardware with fixed-size packet buffers transmits | |||
packets to line without padding. This implies a hybrid forwarding | packets to the line without padding. This implies a hybrid | |||
system with transmission congestion dependent on the size of | forwarding system with transmission congestion dependent on the | |||
packets but queue congestion dependent on the number of packets, | size of packets but queue congestion dependent on the number of | |||
irrespective of their size. | packets, irrespective of their size. | |||
Nonetheless, there would be no queue at all unless the line had | Nonetheless, there would be no queue at all unless the line had | |||
become congested--the root-cause of any congestion is too many | become congested -- the root cause of any congestion is too many | |||
bytes arriving for the line. Therefore, the AQM should measure | bytes arriving for the line. Therefore, the AQM should measure | |||
the queue length as the sum of all the packet sizes in bytes that | the queue length as the sum of all the packet sizes in bytes that | |||
are queued up waiting to be serviced by the line, irrespective of | are queued up waiting to be serviced by the line, irrespective of | |||
whether each packet is held in a fixed size buffer. | whether each packet is held in a fixed-size buffer. | |||
In the (unlikely) first case where use of padding means the queue | In the (unlikely) first case where use of padding means the queue | |||
should be measured in packets, further confusion is likely because | should be measured in packets, further confusion is likely because | |||
the fixed buffers are rarely all one size. Typically pools of | the fixed buffers are rarely all one size. Typically, pools of | |||
different sized buffers are provided (Cisco uses the term 'buffer | different-sized buffers are provided (Cisco uses the term 'buffer | |||
carving' for the process of dividing up memory into these pools | carving' for the process of dividing up memory into these pools | |||
[IOSArch]). Usually, if the pool of small buffers is exhausted, | [IOSArch]). Usually, if the pool of small buffers is exhausted, | |||
arriving small packets can borrow space in the pool of large buffers, | arriving small packets can borrow space in the pool of large buffers, | |||
but not vice versa. However, there is no need to consider all this | but not vice versa. However, there is no need to consider all this | |||
complexity, because the root-cause of any congestion is still line | complexity, because the root cause of any congestion is still line | |||
overload--buffer consumption is only the symptom. Therefore, the | overload -- buffer consumption is only the symptom. Therefore, the | |||
length of the queue should be measured as the sum of the bytes in the | length of the queue should be measured as the sum of the bytes in the | |||
queue that will be transmitted to line, including any padding. In | queue that will be transmitted to the line, including any padding. | |||
the (unusual) case of transmission with padding this means the sum of | In the (unusual) case of transmission with padding, this means the | |||
the sizes of the small buffers queued plus the sum of the sizes of | sum of the sizes of the small buffers queued plus the sum of the | |||
the large buffers queued. | sizes of the large buffers queued. | |||
We will return to borrowing of fixed sized buffers when we discuss | We will return to borrowing of fixed-size buffers when we discuss | |||
biasing the drop/marking probability of a specific packet because of | biasing the drop/marking probability of a specific packet because of | |||
its size in Section 4.2.1. But here we can repeat the simple rule | its size in Section 4.2.1. But here, we can repeat the simple rule | |||
for how to measure the length of queues of fixed buffers: no matter | for how to measure the length of queues of fixed buffers: no matter | |||
how complicated the buffering scheme is, ultimately a transmission | how complicated the buffering scheme is, ultimately a transmission | |||
line is nearly always bit-congestible so the number of bytes queued | line is nearly always bit-congestible so the number of bytes queued | |||
up waiting for the line measures how congested the line is, and it is | up waiting for the line measures how congested the line is, and it is | |||
rarely important to measure how congested the buffering system is. | rarely important to measure how congested the buffering system is. | |||
4.1.2. Congestion Measurement without a Queue | 4.1.2. Congestion Measurement without a Queue | |||
AQM algorithms are nearly always described assuming there is a queue | AQM algorithms are nearly always described assuming there is a queue | |||
for a congested resource and the algorithm can use the queue length | for a congested resource and the algorithm can use the queue length | |||
to determine the probability that it will drop or mark each packet. | to determine the probability that it will drop or mark each packet. | |||
But not all congested resources lead to queues. For instance, power | But not all congested resources lead to queues. For instance, power- | |||
limited resources are usually bit-congestible if energy is primarily | limited resources are usually bit-congestible if energy is primarily | |||
required for transmission rather than header processing, but it is | required for transmission rather than header processing, but it is | |||
rare for a link protocol to build a queue as it approaches maximum | rare for a link protocol to build a queue as it approaches maximum | |||
power. | power. | |||
Nonetheless, AQM algorithms do not require a queue in order to work. | Nonetheless, AQM algorithms do not require a queue in order to work. | |||
For instance spectrum congestion can be modelled by signal quality | For instance, spectrum congestion can be modelled by signal quality | |||
using target bit-energy-to-noise-density ratio. And, to model radio | using the target bit-energy-to-noise-density ratio. And, to model | |||
power exhaustion, transmission power levels can be measured and | radio power exhaustion, transmission-power levels can be measured and | |||
compared to the maximum power available. [ECNFixedWireless] proposes | compared to the maximum power available. [ECNFixedWireless] proposes | |||
a practical and theoretically sound way to combine congestion | a practical and theoretically sound way to combine congestion | |||
notification for different bit-congestible resources at different | notification for different bit-congestible resources at different | |||
layers along an end to end path, whether wireless or wired, and | layers along an end-to-end path, whether wireless or wired, and | |||
whether with or without queues. | whether with or without queues. | |||
In wireless protocols that use request to send / clear to send (RTS / | In wireless protocols that use request to send / clear to send | |||
CTS) control, such as some variants of IEEE802.11, it is reasonable | (RTS / CTS) control, such as some variants of IEEE802.11, it is | |||
to base an AQM on the time spent waiting for transmission | reasonable to base an AQM on the time spent waiting for transmission | |||
opportunities (TXOPs) even though wireless spectrum is usually | opportunities (TXOPs) even though the wireless spectrum is usually | |||
regarded as congested by bits (for a given coding scheme). This is | regarded as congested by bits (for a given coding scheme). This is | |||
because requests for TXOPs queue up as the spectrum gets congested by | because requests for TXOPs queue up as the spectrum gets congested by | |||
all the bits being transferred. So the time that TXOPs are queued | all the bits being transferred. So the time that TXOPs are queued | |||
directly reflects bit congestion of the spectrum. | directly reflects bit congestion of the spectrum. | |||
4.2. Congestion Notification Advice | 4.2. Congestion Notification Advice | |||
4.2.1. Network Bias when Encoding | 4.2.1. Network Bias When Encoding | |||
4.2.1.1. Advice on Packet Size Bias in RED | 4.2.1.1. Advice on Packet-Size Bias in RED | |||
The previously mentioned email [pktByteEmail] referred to by | The previously mentioned email [pktByteEmail] referred to by | |||
[RFC2309] advised that most scarce resources in the Internet were | [RFC2309] advised that most scarce resources in the Internet were | |||
bit-congestible, which is still believed to be true (Section 1.1). | bit-congestible, which is still believed to be true (Section 1.1). | |||
But it went on to offer advice that is updated by this memo. It said | But it went on to offer advice that is updated by this memo. It said | |||
that drop probability should depend on the size of the packet being | that drop probability should depend on the size of the packet being | |||
considered for drop if the resource is bit-congestible, but not if it | considered for drop if the resource is bit-congestible, but not if it | |||
is packet-congestible. The argument continued that if packet drops | is packet-congestible. The argument continued that if packet drops | |||
were inflated by packet size (byte-mode dropping), "a flow's fraction | were inflated by packet size (byte-mode dropping), "a flow's fraction | |||
of the packet drops is then a good indication of that flow's fraction | of the packet drops is then a good indication of that flow's fraction | |||
of the link bandwidth in bits per second". This was consistent with | of the link bandwidth in bits per second". This was consistent with | |||
a referenced policing mechanism being worked on at the time for | a referenced policing mechanism being worked on at the time for | |||
detecting unusually high bandwidth flows, eventually published in | detecting unusually high bandwidth flows, eventually published in | |||
1999 [pBox]. However, the problem could and should have been solved | 1999 [pBox]. However, the problem could and should have been solved | |||
by making the policing mechanism count the volume of bytes randomly | by making the policing mechanism count the volume of bytes randomly | |||
dropped, not the number of packets. | dropped, not the number of packets. | |||
A few months before RFC2309 was published, an addendum was added to | A few months before RFC 2309 was published, an addendum was added to | |||
the above archived email referenced from the RFC, in which the final | the above archived email referenced from the RFC, in which the final | |||
paragraph seemed to partially retract what had previously been said. | paragraph seemed to partially retract what had previously been said. | |||
It clarified that the question of whether the probability of | It clarified that the question of whether the probability of | |||
dropping/marking a packet should depend on its size was not related | dropping/marking a packet should depend on its size was not related | |||
to whether the resource itself was bit congestible, but a completely | to whether the resource itself was bit-congestible, but a completely | |||
orthogonal question. However the only example given had the queue | orthogonal question. However, the only example given had the queue | |||
measured in packets but packet drop depended on the size of the | measured in packets but packet drop depended on the size of the | |||
packet in question. No example was given the other way round. | packet in question. No example was given the other way round. | |||
In 2000, Cnodder et al [REDbyte] pointed out that there was an error | In 2000, Cnodder et al. [REDbyte] pointed out that there was an error | |||
in the part of the original 1993 RED algorithm that aimed to | in the part of the original 1993 RED algorithm that aimed to | |||
distribute drops uniformly, because it didn't correctly take into | distribute drops uniformly, because it didn't correctly take into | |||
account the adjustment for packet size. They recommended an | account the adjustment for packet size. They recommended an | |||
algorithm called RED_4 to fix this. But they also recommended a | algorithm called RED_4 to fix this. But they also recommended a | |||
further change, RED_5, to adjust drop rate dependent on the square of | further change, RED_5, to adjust the drop rate dependent on the | |||
relative packet size. This was indeed consistent with one implied | square of the relative packet size. This was indeed consistent with | |||
motivation behind RED's byte mode drop--that we should reverse | one implied motivation behind RED's byte-mode drop -- that we should | |||
engineer the network to improve the performance of dominant end-to- | reverse engineer the network to improve the performance of dominant | |||
end congestion control mechanisms. This memo makes a different | end-to-end congestion control mechanisms. This memo makes a | |||
recommendations in Section 2. | different recommendations in Section 2. | |||
By 2003, a further change had been made to the adjustment for packet | By 2003, a further change had been made to the adjustment for packet | |||
size, this time in the RED algorithm of the ns2 simulator. Instead | size, this time in the RED algorithm of the ns2 simulator. Instead | |||
of taking each packet's size relative to a `maximum packet size' it | of taking each packet's size relative to a 'maximum packet size', it | |||
was taken relative to a `mean packet size', intended to be a static | was taken relative to a 'mean packet size', intended to be a static | |||
value representative of the `typical' packet size on the link. We | value representative of the 'typical' packet size on the link. We | |||
have not been able to find a justification in the literature for this | have not been able to find a justification in the literature for this | |||
change, however Eddy and Allman conducted experiments [REDbias] that | change; however, Eddy and Allman conducted experiments [REDbias] that | |||
assessed how sensitive RED was to this parameter, amongst other | assessed how sensitive RED was to this parameter, amongst other | |||
things. However, this changed algorithm can often lead to drop | things. This changed algorithm can often lead to drop probabilities | |||
probabilities of greater than 1 (which gives a hint that there is | of greater than 1 (which gives a hint that there is probably a | |||
probably a mistake in the theory somewhere). | mistake in the theory somewhere). | |||
On 10-Nov-2004, this variant of byte-mode packet drop was made the | On 10-Nov-2004, this variant of byte-mode packet drop was made the | |||
default in the ns2 simulator. It seems unlikely that byte-mode drop | default in the ns2 simulator. It seems unlikely that byte-mode drop | |||
has ever been implemented in production networks (Appendix A), | has ever been implemented in production networks (Appendix A); | |||
therefore any conclusions based on ns2 simulations that use RED | therefore, any conclusions based on ns2 simulations that use RED | |||
without disabling byte-mode drop are likely to behave very | without disabling byte-mode drop are likely to behave very | |||
differently from RED in production networks. | differently from RED in production networks. | |||
4.2.1.2. Packet Size Bias Regardless of AQM | 4.2.1.2. Packet-Size Bias Regardless of AQM | |||
The byte-mode drop variant of RED (or a similar variant of other AQM | The byte-mode drop variant of RED (or a similar variant of other AQM | |||
algorithms) is not the only possible bias towards small packets in | algorithms) is not the only possible bias towards small packets in | |||
queueing systems. We have already mentioned that tail-drop queues | queuing systems. We have already mentioned that tail-drop queues | |||
naturally tend to lock-out large packets once they are full. | naturally tend to lock out large packets once they are full. | |||
But also queues with fixed sized buffers reduce the probability that | But also, queues with fixed-size buffers reduce the probability that | |||
small packets will be dropped if (and only if) they allow small | small packets will be dropped if (and only if) they allow small | |||
packets to borrow buffers from the pools for larger packets (see | packets to borrow buffers from the pools for larger packets (see | |||
Section 4.1.1). Borrowing effectively makes the maximum queue size | Section 4.1.1). Borrowing effectively makes the maximum queue size | |||
for small packets greater than that for large packets, because more | for small packets greater than that for large packets, because more | |||
buffers can be used by small packets while less will fit large | buffers can be used by small packets while less will fit large | |||
packets. Incidentally, the bias towards small packets from buffer | packets. Incidentally, the bias towards small packets from buffer | |||
borrowing is nothing like as large as that of RED's byte-mode drop. | borrowing is nothing like as large as that of RED's byte-mode drop. | |||
Nonetheless, fixed-buffer memory with tail drop is still prone to | Nonetheless, fixed-buffer memory with tail drop is still prone to | |||
lock-out large packets, purely because of the tail-drop aspect. So, | lock out large packets, purely because of the tail-drop aspect. So, | |||
fixed size packet-buffers should be augmented with a good AQM | fixed-size packet buffers should be augmented with a good AQM | |||
algorithm and packet-mode drop. If an AQM is too complicated to | algorithm and packet-mode drop. If an AQM is too complicated to | |||
implement with multiple fixed buffer pools, the minimum necessary to | implement with multiple fixed buffer pools, the minimum necessary to | |||
prevent large packet lock-out is to ensure smaller packets never use | prevent large-packet lockout is to ensure that smaller packets never | |||
the last available buffer in any of the pools for larger packets. | use the last available buffer in any of the pools for larger packets. | |||
4.2.2. Transport Bias when Decoding | 4.2.2. Transport Bias When Decoding | |||
The above proposals to alter the network equipment to bias towards | The above proposals to alter the network equipment to bias towards | |||
smaller packets have largely carried on outside the IETF process. | smaller packets have largely carried on outside the IETF process. | |||
Whereas, within the IETF, there are many different proposals to alter | Whereas, within the IETF, there are many different proposals to alter | |||
transport protocols to achieve the same goals, i.e. either to make | transport protocols to achieve the same goals, i.e., either to make | |||
the flow bit-rate take account of packet size, or to protect control | the flow bit rate take into account packet size, or to protect | |||
packets from loss. This memo argues that altering transport | control packets from loss. This memo argues that altering transport | |||
protocols is the more principled approach. | protocols is the more principled approach. | |||
A recently approved experimental RFC adapts its transport layer | A recently approved experimental RFC adapts its transport-layer | |||
protocol to take account of packet sizes relative to typical TCP | protocol to take into account packet sizes relative to typical TCP | |||
packet sizes. This proposes a new small-packet variant of TCP- | packet sizes. This proposes a new small-packet variant of TCP- | |||
friendly rate control [RFC5348] called TFRC-SP [RFC4828]. | friendly rate control (TFRC [RFC5348]), which is called TFRC-SP | |||
Essentially, it proposes a rate equation that inflates the flow rate | [RFC4828]. Essentially, it proposes a rate equation that inflates | |||
by the ratio of a typical TCP segment size (1500B including TCP | the flow rate by the ratio of a typical TCP segment size (1,500 B | |||
header) over the actual segment size [PktSizeEquCC]. (There are also | including TCP header) over the actual segment size [PktSizeEquCC]. | |||
other important differences of detail relative to TFRC, such as using | (There are also other important differences of detail relative to | |||
virtual packets [CCvarPktSize] to avoid responding to multiple losses | TFRC, such as using virtual packets [CCvarPktSize] to avoid | |||
per round trip and using a minimum inter-packet interval.) | responding to multiple losses per round trip and using a minimum | |||
inter-packet interval.) | ||||
Section 4.5.1 of this TFRC-SP spec discusses the implications of | Section 4.5.1 of the TFRC-SP specification discusses the implications | |||
operating in an environment where queues have been configured to drop | of operating in an environment where queues have been configured to | |||
smaller packets with proportionately lower probability than larger | drop smaller packets with proportionately lower probability than | |||
ones. But it only discusses TCP operating in such an environment, | larger ones. But it only discusses TCP operating in such an | |||
only mentioning TFRC-SP briefly when discussing how to define | environment, only mentioning TFRC-SP briefly when discussing how to | |||
fairness with TCP. And it only discusses the byte-mode dropping | define fairness with TCP. And it only discusses the byte-mode | |||
version of RED as it was before Cnodder et al pointed out it didn't | dropping version of RED as it was before Cnodder et al. pointed out | |||
sufficiently bias towards small packets to make TCP independent of | that it didn't sufficiently bias towards small packets to make TCP | |||
packet size. | independent of packet size. | |||
So the TFRC-SP spec doesn't address the issue of which of the network | So the TFRC-SP specification doesn't address the issue of whether the | |||
or the transport _should_ handle fairness between different packet | network or the transport _should_ handle fairness between different | |||
sizes. In its Appendix B.4 it discusses the possibility of both | packet sizes. In Appendix B.4 of RFC 4828, it discusses the | |||
TFRC-SP and some network buffers duplicating each other's attempts to | possibility of both TFRC-SP and some network buffers duplicating each | |||
deliberately bias towards small packets. But the discussion is not | other's attempts to deliberately bias towards small packets. But the | |||
conclusive, instead reporting simulations of many of the | discussion is not conclusive, instead reporting simulations of many | |||
possibilities in order to assess performance but not recommending any | of the possibilities in order to assess performance but not | |||
particular course of action. | recommending any particular course of action. | |||
The paper originally proposing TFRC with virtual packets (VP-TFRC) | The paper originally proposing TFRC with virtual packets (VP-TFRC) | |||
[CCvarPktSize] proposed that there should perhaps be two variants to | [CCvarPktSize] proposed that there should perhaps be two variants to | |||
cater for the different variants of RED. However, as the TFRC-SP | cater for the different variants of RED. However, as the TFRC-SP | |||
authors point out, there is no way for a transport to know whether | authors point out, there is no way for a transport to know whether | |||
some queues on its path have deployed RED with byte-mode packet drop | some queues on its path have deployed RED with byte-mode packet drop | |||
(except if an exhaustive survey found that no-one has deployed it!-- | (except if an exhaustive survey found that no one has deployed it! -- | |||
see Appendix A). Incidentally, VP-TFRC also proposed that byte-mode | see Appendix A). Incidentally, VP-TFRC also proposed that byte-mode | |||
RED dropping should really square the packet-size compensation-factor | RED dropping should really square the packet-size compensation factor | |||
(like that of Cnodder's RED_5, but apparently unaware of it). | (like that of Cnodder's RED_5, but apparently unaware of it). | |||
Pre-congestion notification [RFC5670] is an IETF technology to use a | Pre-congestion notification [RFC5670] is an IETF technology to use a | |||
virtual queue for AQM marking for packets within one Diffserv class | virtual queue for AQM marking for packets within one Diffserv class | |||
in order to give early warning prior to any real queuing. The PCN | in order to give early warning prior to any real queuing. The PCN- | |||
marking algorithms have been designed not to take account of packet | marking algorithms have been designed not to take into account packet | |||
size when forwarding through queues. Instead the general principle | size when forwarding through queues. Instead, the general principle | |||
has been to take account of the sizes of marked packets when | has been to take the sizes of marked packets into account when | |||
monitoring the fraction of marking at the edge of the network, as | monitoring the fraction of marking at the edge of the network, as | |||
recommended here. | recommended here. | |||
4.2.3. Making Transports Robust against Control Packet Losses | 4.2.3. Making Transports Robust against Control Packet Losses | |||
Recently, two RFCs have defined changes to TCP that make it more | Recently, two RFCs have defined changes to TCP that make it more | |||
robust against losing small control packets [RFC5562] [RFC5690]. In | robust against losing small control packets [RFC5562] [RFC5690]. In | |||
both cases they note that the case for these two TCP changes would be | both cases, they note that the case for these two TCP changes would | |||
weaker if RED were biased against dropping small packets. We argue | be weaker if RED were biased against dropping small packets. We | |||
here that these two proposals are a safer and more principled way to | argue here that these two proposals are a safer and more principled | |||
achieve TCP performance improvements than reverse engineering RED to | way to achieve TCP performance improvements than reverse engineering | |||
benefit TCP. | RED to benefit TCP. | |||
Although there are no known proposals, it would also be possible and | Although there are no known proposals, it would also be possible and | |||
perfectly valid to make control packets robust against drop by | perfectly valid to make control packets robust against drop by | |||
requesting a scheduling class with lower drop probability, by re- | requesting a scheduling class with lower drop probability, which | |||
marking to a Diffserv code point [RFC2474] within the same behaviour | would be achieved by re-marking to a Diffserv code point [RFC2474] | |||
aggregate. | within the same behaviour aggregate. | |||
Although not brought to the IETF, a simple proposal from Wischik | Although not brought to the IETF, a simple proposal from Wischik | |||
[DupTCP] suggests that the first three packets of every TCP flow | [DupTCP] suggests that the first three packets of every TCP flow | |||
should be routinely duplicated after a short delay. It shows that | should be routinely duplicated after a short delay. It shows that | |||
this would greatly improve the chances of short flows completing | this would greatly improve the chances of short flows completing | |||
quickly, but it would hardly increase traffic levels on the Internet, | quickly, but it would hardly increase traffic levels on the Internet, | |||
because Internet bytes have always been concentrated in the large | because Internet bytes have always been concentrated in the large | |||
flows. It further shows that the performance of many typical | flows. It further shows that the performance of many typical | |||
applications depends on completion of long serial chains of short | applications depends on completion of long serial chains of short | |||
messages. It argues that, given most of the value people get from | messages. It argues that, given most of the value people get from | |||
the Internet is concentrated within short flows, this simple | the Internet is concentrated within short flows, this simple | |||
expedient would greatly increase the value of the best efforts | expedient would greatly increase the value of the best-effort | |||
Internet at minimal cost. A similar but more extensive approach has | Internet at minimal cost. A similar but more extensive approach has | |||
been evaluated on Google servers [GentleAggro]. | been evaluated on Google servers [GentleAggro]. | |||
The proposals discussed in this sub-section are experimental | The proposals discussed in this sub-section are experimental | |||
approaches that are not yet in wide operational use, but they are | approaches that are not yet in wide operational use, but they are | |||
existence proofs that transports can make themselves robust against | existence proofs that transports can make themselves robust against | |||
loss of control packets. The examples are all TCP-based, but | loss of control packets. The examples are all TCP-based, but | |||
applications over non-TCP transports could mitigate loss of control | applications over non-TCP transports could mitigate loss of control | |||
packets by making similar use of Diffserv, data duplication, FEC etc. | packets by making similar use of Diffserv, data duplication, FEC, | |||
etc. | ||||
4.2.4. Congestion Notification: Summary of Conflicting Advice | 4.2.4. Congestion Notification: Summary of Conflicting Advice | |||
+-----------+----------------+-----------------+--------------------+ | +-----------+-----------------+-----------------+-------------------+ | |||
| transport | RED_1 (packet | RED_4 (linear | RED_5 (square byte | | | transport | RED_1 (packet- | RED_4 (linear | RED_5 (square | | |||
| cc | mode drop) | byte mode drop) | mode drop) | | | cc | mode drop) | byte-mode drop) | byte-mode drop) | | |||
+-----------+----------------+-----------------+--------------------+ | +-----------+-----------------+-----------------+-------------------+ | |||
| TCP or | s/sqrt(p) | sqrt(s/p) | 1/sqrt(p) | | | TCP or | s/sqrt(p) | sqrt(s/p) | 1/sqrt(p) | | |||
| TFRC | | | | | | TFRC | | | | | |||
| TFRC-SP | 1/sqrt(p) | 1/sqrt(sp) | 1/(s.sqrt(p)) | | | TFRC-SP | 1/sqrt(p) | 1/sqrt(s*p) | 1/(s*sqrt(p)) | | |||
+-----------+----------------+-----------------+--------------------+ | +-----------+-----------------+-----------------+-------------------+ | |||
Table 2: Dependence of flow bit-rate per RTT on packet size, s, and | Table 2: Dependence of flow bit rate per RTT on packet size, s, and | |||
drop probability, p, when network and/or transport bias towards small | drop probability, p, when there is network and/or transport bias | |||
packets to varying degrees | towards small packets to varying degrees | |||
Table 2 aims to summarise the potential effects of all the advice | Table 2 aims to summarise the potential effects of all the advice | |||
from different sources. Each column shows a different possible AQM | from different sources. Each column shows a different possible AQM | |||
behaviour in different queues in the network, using the terminology | behaviour in different queues in the network, using the terminology | |||
of Cnodder et al outlined earlier (RED_1 is basic RED with packet- | of Cnodder et al. outlined earlier (RED_1 is basic RED with packet- | |||
mode drop). Each row shows a different transport behaviour: TCP | mode drop). Each row shows a different transport behaviour: TCP | |||
[RFC5681] and TFRC [RFC5348] on the top row with TFRC-SP [RFC4828] | [RFC5681] and TFRC [RFC5348] on the top row with TFRC-SP [RFC4828] | |||
below. Each cell shows how the bits per round trip of a flow depends | below. Each cell shows how the bits per round trip of a flow depends | |||
on packet size, s, and drop probability, p. In order to declutter | on packet size, s, and drop probability, p. In order to declutter | |||
the formulae to focus on packet-size dependence they are all given | the formulae to focus on packet-size dependence, they are all given | |||
per round trip, which removes any RTT term. | per round trip, which removes any RTT term. | |||
Let us assume that the goal is for the bit-rate of a flow to be | Let us assume that the goal is for the bit rate of a flow to be | |||
independent of packet size. Suppressing all inessential details, the | independent of packet size. Suppressing all inessential details, the | |||
table shows that this should either be achievable by not altering the | table shows that this should either be achievable by not altering the | |||
TCP transport in a RED_5 network, or using the small packet TFRC-SP | TCP transport in a RED_5 network, or using the small packet TFRC-SP | |||
transport (or similar) in a network without any byte-mode dropping | transport (or similar) in a network without any byte-mode dropping | |||
RED (top right and bottom left). Top left is the `do nothing' | RED (top right and bottom left). Top left is the 'do nothing' | |||
scenario, while bottom right is the `do-both' scenario in which bit- | scenario, while bottom right is the 'do both' scenario in which the | |||
rate would become far too biased towards small packets. Of course, | bit rate would become far too biased towards small packets. Of | |||
if any form of byte-mode dropping RED has been deployed on a subset | course, if any form of byte-mode dropping RED has been deployed on a | |||
of queues that congest, each path through the network will present a | subset of queues that congest, each path through the network will | |||
different hybrid scenario to its transport. | present a different hybrid scenario to its transport. | |||
Whatever, we can see that the linear byte-mode drop column in the | Whatever the case, we can see that the linear byte-mode drop column | |||
middle would considerably complicate the Internet. It's a half-way | in the middle would considerably complicate the Internet. Even if | |||
house that doesn't bias enough towards small packets even if one | one believes the network should be doing the biasing, linear byte- | |||
believes the network should be doing the biasing. Section 2 | mode drop is a half-way house that doesn't bias enough towards small | |||
recommends that _all_ bias in network equipment towards small packets | packets. Section 2 recommends that _all_ bias in network equipment | |||
should be turned off--if indeed any equipment vendors have | towards small packets should be turned off -- if indeed any equipment | |||
implemented it--leaving packet-size bias solely as the preserve of | vendors have implemented it -- leaving packet-size bias solely as the | |||
the transport layer (solely the leftmost, packet-mode drop column). | preserve of the transport layer (solely the leftmost, packet-mode | |||
drop column). | ||||
In practice it seems that no deliberate bias towards small packets | In practice, it seems that no deliberate bias towards small packets | |||
has been implemented for production networks. Of the 19% of vendors | has been implemented for production networks. Of the 19% of vendors | |||
who responded to a survey of 84 equipment vendors, none had | who responded to a survey of 84 equipment vendors, none had | |||
implemented byte-mode drop in RED (see Appendix A for details). | implemented byte-mode drop in RED (see Appendix A for details). | |||
5. Outstanding Issues and Next Steps | 5. Outstanding Issues and Next Steps | |||
5.1. Bit-congestible Network | 5.1. Bit-congestible Network | |||
For a connectionless network with nearly all resources being bit- | For a connectionless network with nearly all resources being bit- | |||
congestible the recommended position is clear--that the network | congestible, the recommended position is clear -- the network should | |||
should not make allowance for packet sizes and the transport should. | not make allowance for packet sizes and the transport should. This | |||
This leaves two outstanding issues: | leaves two outstanding issues: | |||
o How to handle any legacy of AQM with byte-mode drop already | o The question of how to handle any legacy AQM deployments using | |||
deployed; | byte-mode drop; | |||
o The need to start a programme to update transport congestion | o The need to start a programme to update transport congestion | |||
control protocol standards to take account of packet size. | control protocol standards to take packet size into account. | |||
A survey of equipment vendors (Section 4.2.4) found no evidence that | A survey of equipment vendors (Section 4.2.4) found no evidence that | |||
byte-mode packet drop had been implemented, so deployment will be | byte-mode packet drop had been implemented, so deployment will be | |||
sparse at best. A migration strategy is not really needed to remove | sparse at best. A migration strategy is not really needed to remove | |||
an algorithm that may not even be deployed. | an algorithm that may not even be deployed. | |||
A programme of experimental updates to take account of packet size in | A programme of experimental updates to take packet size into account | |||
transport congestion control protocols has already started with | in transport congestion control protocols has already started with | |||
TFRC-SP [RFC4828]. | TFRC-SP [RFC4828]. | |||
5.2. Bit- & Packet-congestible Network | 5.2. Bit- and Packet-Congestible Network | |||
The position is much less clear-cut if the Internet becomes populated | The position is much less clear-cut if the Internet becomes populated | |||
by a more even mix of both packet-congestible and bit-congestible | by a more even mix of both packet-congestible and bit-congestible | |||
resources (see Appendix B.2). This problem is not pressing, because | resources (see Appendix B.2). This problem is not pressing, because | |||
most Internet resources are designed to be bit-congestible before | most Internet resources are designed to be bit-congestible before | |||
packet processing starts to congest (see Section 1.1). | packet processing starts to congest (see Section 1.1). | |||
The IRTF Internet congestion control research group (ICCRG) has set | The IRTF's Internet Congestion Control Research Group (ICCRG) has set | |||
itself the task of reaching consensus on generic forwarding | itself the task of reaching consensus on generic forwarding | |||
mechanisms that are necessary and sufficient to support the | mechanisms that are necessary and sufficient to support the | |||
Internet's future congestion control requirements (the first | Internet's future congestion control requirements (the first | |||
challenge in [RFC6077]). The research question of whether packet | challenge in [RFC6077]). The research question of whether packet | |||
congestion might become common and what to do if it does may in the | congestion might become common and what to do if it does may in the | |||
future be explored in the IRTF (the "Challenge 3: Packet Size" in | future be explored in the IRTF (the "Challenge 3: Packet Size" in | |||
[RFC6077]). | [RFC6077]). | |||
Note that sometimes it seems that resources might be congested by | Note that sometimes it seems that resources might be congested by | |||
neither bits nor packets, e.g. where the queue for access to a | neither bits nor packets, e.g., where the queue for access to a | |||
wireless medium is in units of transmission opportunities. However, | wireless medium is in units of transmission opportunities. However, | |||
the root cause of congestion of the underlying spectrum is overload | the root cause of congestion of the underlying spectrum is overload | |||
of bits (see Section 4.1.2). | of bits (see Section 4.1.2). | |||
6. Security Considerations | 6. Security Considerations | |||
This memo recommends that queues do not bias drop probability due to | This memo recommends that queues do not bias drop probability due to | |||
packets size. For instance dropping small packets less often than | packets size. For instance, dropping small packets less often than | |||
large creates a perverse incentive for transports to break down their | large ones creates a perverse incentive for transports to break down | |||
flows into tiny segments. One of the benefits of implementing AQM | their flows into tiny segments. One of the benefits of implementing | |||
was meant to be to remove this perverse incentive that drop-tail | AQM was meant to be to remove this perverse incentive that tail-drop | |||
queues gave to small packets. | queues gave to small packets. | |||
In practice, transports cannot all be trusted to respond to | In practice, transports cannot all be trusted to respond to | |||
congestion. So another reason for recommending that queues do not | congestion. So another reason for recommending that queues not bias | |||
bias drop probability towards small packets is to avoid the | drop probability towards small packets is to avoid the vulnerability | |||
vulnerability to small packet DDoS attacks that would otherwise | to small-packet DDoS attacks that would otherwise result. One of the | |||
result. One of the benefits of implementing AQM was meant to be to | benefits of implementing AQM was meant to be to remove tail drop's | |||
remove drop-tail's DoS vulnerability to small packets, so we | DoS vulnerability to small packets, so we shouldn't add it back | |||
shouldn't add it back again. | again. | |||
If most queues implemented AQM with byte-mode drop, the resulting | If most queues implemented AQM with byte-mode drop, the resulting | |||
network would amplify the potency of a small packet DDoS attack. At | network would amplify the potency of a small-packet DDoS attack. At | |||
the first queue the stream of packets would push aside a greater | the first queue, the stream of packets would push aside a greater | |||
proportion of large packets, so more of the small packets would | proportion of large packets, so more of the small packets would | |||
survive to attack the next queue. Thus a flood of small packets | survive to attack the next queue. Thus a flood of small packets | |||
would continue on towards the destination, pushing regular traffic | would continue on towards the destination, pushing regular traffic | |||
with large packets out of the way in one queue after the next, but | with large packets out of the way in one queue after the next, but | |||
suffering much less drop itself. | suffering much less drop itself. | |||
Appendix C explains why the ability of networks to police the | Appendix C explains why the ability of networks to police the | |||
response of _any_ transport to congestion depends on bit-congestible | response of _any_ transport to congestion depends on bit-congestible | |||
network resources only doing packet-mode not byte-mode drop. In | network resources only doing packet-mode drop, not byte-mode drop. | |||
summary, it says that making drop probability depend on the size of | In summary, it says that making drop probability depend on the size | |||
the packets that bits happen to be divided into simply encourages the | of the packets that bits happen to be divided into simply encourages | |||
bits to be divided into smaller packets. Byte-mode drop would | the bits to be divided into smaller packets. Byte-mode drop would | |||
therefore irreversibly complicate any attempt to fix the Internet's | therefore irreversibly complicate any attempt to fix the Internet's | |||
incentive structures. | incentive structures. | |||
7. IANA Considerations | 7. Conclusions | |||
This document has no actions for IANA. | ||||
8. Conclusions | ||||
This memo identifies the three distinct stages of the congestion | This memo identifies the three distinct stages of the congestion | |||
notification process where implementations need to decide whether to | notification process where implementations need to decide whether to | |||
take packet size into account. The recommendations provided in | take packet size into account. The recommendations provided in | |||
Section 2 of this memo are different in each case: | Section 2 of this memo are different in each case: | |||
o When network equipment measures the length of a queue, if it is | o When network equipment measures the length of a queue, if it is | |||
not feasible to use time it is recommended to count in bytes if | not feasible to use time; it is recommended to count in bytes if | |||
the network resource is congested by bytes, or to count in packets | the network resource is congested by bytes, or to count in packets | |||
if is congested by packets. | if is congested by packets. | |||
o When network equipment decides whether to drop (or mark) a packet, | o When network equipment decides whether to drop (or mark) a packet, | |||
it is recommended that the size of the particular packet should | it is recommended that the size of the particular packet should | |||
not be taken into account | not be taken into account. | |||
o However, when a transport algorithm responds to a dropped or | o However, when a transport algorithm responds to a dropped or | |||
marked packet, the size of the rate reduction should be | marked packet, the size of the rate reduction should be | |||
proportionate to the size of the packet. | proportionate to the size of the packet. | |||
In summary, the answers are 'it depends', 'no' and 'yes' respectively | In summary, the answers are 'it depends', 'no', and 'yes', | |||
respectively. | ||||
For the specific case of RED, this means that byte-mode queue | For the specific case of RED, this means that byte-mode queue | |||
measurement will often be appropriate but the use of byte-mode drop | measurement will often be appropriate, but the use of byte-mode drop | |||
is very strongly discouraged. | is very strongly discouraged. | |||
At the transport layer the IETF should continue updating congestion | At the transport layer, the IETF should continue updating congestion | |||
control protocols to take account of the size of each packet that | control protocols to take into account the size of each packet that | |||
indicates congestion. Also the IETF should continue to make | indicates congestion. Also, the IETF should continue to make | |||
protocols less sensitive to losing control packets like SYNs, pure | protocols less sensitive to losing control packets like SYNs, pure | |||
ACKs and DNS exchanges. Although many control packets happen to be | ACKs, and DNS exchanges. Although many control packets happen to be | |||
small, the alternative of network equipment favouring all small | small, the alternative of network equipment favouring all small | |||
packets would be dangerous. That would create perverse incentives to | packets would be dangerous. That would create perverse incentives to | |||
split data transfers into smaller packets. | split data transfers into smaller packets. | |||
The memo develops these recommendations from principled arguments | The memo develops these recommendations from principled arguments | |||
concerning scaling, layering, incentives, inherent efficiency, | concerning scaling, layering, incentives, inherent efficiency, | |||
security and policeability. But it also addresses practical issues | security, and 'policeability'. It also addresses practical issues | |||
such as specific buffer architectures and incremental deployment. | such as specific buffer architectures and incremental deployment. | |||
Indeed a limited survey of RED implementations is discussed, which | Indeed, a limited survey of RED implementations is discussed, which | |||
shows there appears to be little, if any, installed base of RED's | shows there appears to be little, if any, installed base of RED's | |||
byte-mode drop. Therefore it can be deprecated with little, if any, | byte-mode drop. Therefore, it can be deprecated with little, if any, | |||
incremental deployment complications. | incremental deployment complications. | |||
The recommendations have been developed on the well-founded basis | The recommendations have been developed on the well-founded basis | |||
that most Internet resources are bit-congestible not packet- | that most Internet resources are bit-congestible, not packet- | |||
congestible. We need to know the likelihood that this assumption | congestible. We need to know the likelihood that this assumption | |||
will prevail longer term and, if it might not, what protocol changes | will prevail in the longer term and, if it might not, what protocol | |||
will be needed to cater for a mix of the two. The IRTF Internet | changes will be needed to cater for a mix of the two. The IRTF | |||
Congestion Control Research Group (ICCRG) is currently working on | Internet Congestion Control Research Group (ICCRG) is currently | |||
these problems [RFC6077]. | working on these problems [RFC6077]. | |||
9. Acknowledgements | 8. Acknowledgements | |||
Thank you to Sally Floyd, who gave extensive and useful review | Thank you to Sally Floyd, who gave extensive and useful review | |||
comments. Also thanks for the reviews from Philip Eardley, David | comments. Also thanks for the reviews from Philip Eardley, David | |||
Black, Fred Baker, David Taht, Toby Moncaster, Arnaud Jacquet and | Black, Fred Baker, David Taht, Toby Moncaster, Arnaud Jacquet, and | |||
Mirja Kuehlewind as well as helpful explanations of different | Mirja Kuehlewind, as well as helpful explanations of different | |||
hardware approaches from Larry Dunn and Fred Baker. We are grateful | hardware approaches from Larry Dunn and Fred Baker. We are grateful | |||
to Bruce Davie and his colleagues for providing a timely and | to Bruce Davie and his colleagues for providing a timely and | |||
efficient survey of RED implementation in Cisco's product range. | efficient survey of RED implementation in Cisco's product range. | |||
Also grateful thanks to Toby Moncaster, Will Dormann, John Regnault, | Also, grateful thanks to Toby Moncaster, Will Dormann, John Regnault, | |||
Simon Carter and Stefaan De Cnodder who further helped survey the | Simon Carter, and Stefaan De Cnodder who further helped survey the | |||
current status of RED implementation and deployment and, finally, | current status of RED implementation and deployment, and, finally, | |||
thanks to the anonymous individuals who responded. | thanks to the anonymous individuals who responded. | |||
Bob Briscoe and Jukka Manner were partly funded by Trilogy, a | Bob Briscoe and Jukka Manner were partly funded by Trilogy and | |||
research project (ICT- 216372) supported by the European Community | Trilogy 2, research projects (ICT-216372, ICT-317756) supported by | |||
under its Seventh Framework Programme. The views expressed here are | the European Community under its Seventh Framework Programme. The | |||
those of the authors only. | views expressed here are those of the authors only. | |||
10. Comments Solicited | ||||
Comments and questions are encouraged and very welcome. They can be | ||||
addressed to the IETF Transport Area working group mailing list | ||||
<tsvwg@ietf.org>, and/or to the authors. | ||||
11. References | 9. References | |||
11.1. Normative References | 9.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Indicate Requirement Levels", BCP 14, | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
RFC 2119, March 1997. | ||||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, | [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | |||
"The Addition of Explicit Congestion | S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | |||
Notification (ECN) to IP", RFC 3168, | Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | |||
September 2001. | S., Wroclawski, J., and L. Zhang, "Recommendations on | |||
Queue Management and Congestion Avoidance in the | ||||
Internet", RFC 2309, April 1998. | ||||
11.2. Informative References | [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC | |||
2914, September 2000. | ||||
[BLUE02] Feng, W-c., Shin, K., Kandlur, D., and D. | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
Saha, "The BLUE active queue management | of Explicit Congestion Notification (ECN) to IP", RFC | |||
algorithms", IEEE/ACM Transactions on | 3168, September 2001. | |||
Networking 10(4) 513--528, August 2002, <h | ||||
ttp://dx.doi.org/10.1109/ | ||||
TNET.2002.801399>. | ||||
[CCvarPktSize] Widmer, J., Boutremans, C., and J-Y. Le | 9.2. Informative References | |||
Boudec, "Congestion Control for Flows with | ||||
Variable Packet Size", ACM CCR 34(2) 137-- | ||||
151, 2004, | ||||
<http://doi.acm.org/10.1145/ | ||||
997150.997162>. | ||||
[CHOKe_Var_Pkt] Psounis, K., Pan, R., and B. Prabhaker, | [BLUE02] Feng, W-c., Shin, K., Kandlur, D., and D. Saha, "The BLUE | |||
"Approximate Fair Dropping for Variable | active queue management algorithms", IEEE/ACM Transactions | |||
Length Packets", IEEE Micro 21(1):48--56, | on Networking 10(4) 513-528, August 2002, | |||
January-February 2001, <http:// | <http://dx.doi.org/10.1109/TNET.2002.801399>. | |||
www.stanford.edu/~balaji/papers/ | ||||
01approximatefair.pdf}>. | ||||
[DRQ] Shin, M., Chong, S., and I. Rhee, "Dual- | [CCvarPktSize] | |||
Resource TCP/AQM for Processing- | Widmer, J., Boutremans, C., and J-Y. Le Boudec, "End-to- | |||
Constrained Networks", IEEE/ACM | end congestion control for TCP-friendly flows with | |||
Transactions on Networking Vol 16, issue | variable packet size", ACM CCR 34(2) 137-151, April 2004, | |||
2, April 2008, <http://dx.doi.org/10.1109/ | <http://doi.acm.org/10.1145/997150.997162>. | |||
TNET.2007.900415>. | ||||
[DupTCP] Wischik, D., "Short messages", | [CHOKe_Var_Pkt] | |||
Philosphical Transactions of the Royal | Psounis, K., Pan, R., and B. Prabhaker, "Approximate Fair | |||
Society A 366(1872):1941-1953, June 2008, | Dropping for Variable-Length Packets", IEEE Micro | |||
<http://rsta.royalsocietypublishing.org/ | 21(1):48-56, January-February 2001, | |||
content/366/1872/1941.full.pdf+html>. | <http://ieeexplore.ieee.org/xpl/ | |||
articleDetails.jsp?arnumber=903061>. | ||||
[ECNFixedWireless] Siris, V., "Resource Control for Elastic | [CoDel] Nichols, K. and V. Jacobson, "Controlled Delay Active | |||
Traffic in CDMA Networks", Proc. ACM | Queue Management", Work in Progress, February 2013. | |||
MOBICOM'02 , September 2002, <http:// | ||||
www.ics.forth.gr/netlab/publications/ | ||||
resource_control_elastic_cdma.html>. | ||||
[Evol_cc] Gibbens, R. and F. Kelly, "Resource | [DRQ] Shin, M., Chong, S., and I. Rhee, "Dual-Resource TCP/AQM | |||
pricing and the evolution of congestion | for Processing-Constrained Networks", IEEE/ACM | |||
control", Automatica 35(12)1969--1985, | Transactions on Networking Vol 16, issue 2, April 2008, | |||
December 1999, <http:// | <http://dx.doi.org/10.1109/TNET.2007.900415>. | |||
www.statslab.cam.ac.uk/~frank/evol.html>. | ||||
[GentleAggro] Flach, T., Dukkipati, N., Terzis, A., | [DupTCP] Wischik, D., "Short messages", Philosophical Transactions | |||
Raghavan, B., Cardwell, N., Cheng, Y., | of the Royal Society A 366(1872):1941-1953, June 2008, | |||
Jain, A., Hao, S., Katz-Bassett, E., and | <http://rsta.royalsocietypublishing.org/content/366/1872/ | |||
R. Govindan, "Reducing Web Latency: the | 1941.full.pdf+html>. | |||
Virtue of Gentle Aggression", ACM SIGCOMM | ||||
CCR 43(4)159--170, August 2013, <http:// | ||||
doi.acm.org/10.1145/2486001.2486014>. | ||||
[I-D.nichols-tsvwg-codel] Nichols, K. and V. Jacobson, "Controlled | [ECNFixedWireless] | |||
Delay Active Queue Management", | Siris, V., "Resource Control for Elastic Traffic in CDMA | |||
draft-nichols-tsvwg-codel-01 (work in | Networks", Proc. ACM MOBICOM'02 , September 2002, | |||
progress), February 2013. | <http://www.ics.forth.gr/netlab/publications/ | |||
resource_control_elastic_cdma.html>. | ||||
[I-D.pan-tsvwg-pie] Pan, R., Natarajan, P., Piglione, C., and | [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the | |||
M. Prabhu, "PIE: A Lightweight Control | evolution of congestion control", Automatica | |||
Scheme To Address the Bufferbloat | 35(12)1969-1985, December 1999, | |||
Problem", draft-pan-tsvwg-pie-00 (work in | <http://www.sciencedirect.com/science/article/pii/ | |||
progress), December 2012. | S0005109899001351>. | |||
[IOSArch] Bollapragada, V., White, R., and C. | [GentleAggro] | |||
Murphy, "Inside Cisco IOS Software | Flach, T., Dukkipati, N., Terzis, A., Raghavan, B., | |||
Architecture", Cisco Press: CCIE | Cardwell, N., Cheng, Y., Jain, A., Hao, S., Katz-Bassett, | |||
Professional Development ISBN13: 978-1- | E., and R. Govindan, "Reducing web latency: the virtue of | |||
57870-181-0, July 2000. | gentle aggression", ACM SIGCOMM CCR 43(4)159-170, August | |||
2013, <http://doi.acm.org/10.1145/2486001.2486014>. | ||||
[PktSizeEquCC] Vasallo, P., "Variable Packet Size | [IOSArch] Bollapragada, V., White, R., and C. Murphy, "Inside Cisco | |||
Equation-Based Congestion Control", ICSI | IOS Software Architecture", Cisco Press: CCIE Professional | |||
Technical Report tr-00-008, 2000, <http:// | Development ISBN13: 978-1-57870-181-0, July 2000. | |||
http.icsi.berkeley.edu/ftp/global/pub/ | ||||
techreports/2000/tr-00-008.pdf>. | ||||
[RED93] Floyd, S. and V. Jacobson, "Random Early | [PIE] Pan, R., Natarajan, P., Piglione, C., Prabhu, M., | |||
Detection (RED) gateways for Congestion | Subramanian, V., Baker, F., and B. Steeg, "PIE: A | |||
Avoidance", IEEE/ACM Transactions on | Lightweight Control Scheme To Address the Bufferbloat | |||
Networking 1(4) 397--413, August 1993, <ht | Problem", Work in Progress, February 2014. | |||
tp://www.icir.org/floyd/papers/red/ | ||||
red.html>. | ||||
[REDbias] Eddy, W. and M. Allman, "A Comparison of | [PktSizeEquCC] | |||
RED's Byte and Packet Modes", Computer | Vasallo, P., "Variable Packet Size Equation-Based | |||
Networks 42(3) 261--280, June 2003, <http: | Congestion Control", ICSI Technical Report tr-00-008, | |||
//www.ir.bbn.com/documents/articles/ | 2000, <http://http.icsi.berkeley.edu/ftp/global/pub/ | |||
redbias.ps>. | techreports/2000/tr-00-008.pdf>. | |||
[REDbyte] De Cnodder, S., Elloumi, O., and K. | [RED93] Floyd, S. and V. Jacobson, "Random Early Detection (RED) | |||
Pauwels, "RED behavior with different | gateways for Congestion Avoidance", IEEE/ACM Transactions | |||
packet sizes", Proc. 5th IEEE Symposium on | on Networking 1(4) 397--413, August 1993, | |||
Computers and Communications (ISCC) 793-- | <http://ieeexplore.ieee.org/xpls/ | |||
799, July 2000, <http://www.icir.org/ | abs_all.jsp?arnumber=251892>. | |||
floyd/red/Elloumi99.pdf>. | ||||
[RFC2309] Braden, B., Clark, D., Crowcroft, J., | [REDbias] Eddy, W. and M. Allman, "A Comparison of RED's Byte and | |||
Davie, B., Deering, S., Estrin, D., Floyd, | Packet Modes", Computer Networks 42(3) 261--280, June | |||
S., Jacobson, V., Minshall, G., Partridge, | 2003, | |||
C., Peterson, L., Ramakrishnan, K., | <http://www.ir.bbn.com/documents/articles/redbias.ps>. | |||
Shenker, S., Wroclawski, J., and L. Zhang, | ||||
"Recommendations on Queue Management and | ||||
Congestion Avoidance in the Internet", | ||||
RFC 2309, April 1998. | ||||
[RFC2474] Nichols, K., Blake, S., Baker, F., and D. | [REDbyte] De Cnodder, S., Elloumi, O., and K. Pauwels, "Effect of | |||
Black, "Definition of the Differentiated | different packet sizes on RED performance", Proc. 5th IEEE | |||
Services Field (DS Field) in the IPv4 and | Symposium on Computers and Communications (ISCC) 793-799, | |||
IPv6 Headers", RFC 2474, December 1998. | July 2000, <http://ieeexplore.ieee.org/xpls/ | |||
abs_all.jsp?arnumber=860741>. | ||||
[RFC2914] Floyd, S., "Congestion Control | [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, | |||
Principles", BCP 41, RFC 2914, | "Definition of the Differentiated Services Field (DS | |||
September 2000. | Field) in the IPv4 and IPv6 Headers", RFC 2474, December | |||
1998. | ||||
[RFC3426] Floyd, S., "General Architectural and | [RFC3426] Floyd, S., "General Architectural and Policy | |||
Policy Considerations", RFC 3426, | Considerations", RFC 3426, November 2002. | |||
November 2002. | ||||
[RFC3550] Schulzrinne, H., Casner, S., Frederick, | [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. | |||
R., and V. Jacobson, "RTP: A Transport | Jacobson, "RTP: A Transport Protocol for Real-Time | |||
Protocol for Real-Time Applications", | Applications", STD 64, RFC 3550, July 2003. | |||
STD 64, RFC 3550, July 2003. | ||||
[RFC3714] Floyd, S. and J. Kempf, "IAB Concerns | [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | |||
Regarding Congestion Control for Voice | Control for Voice Traffic in the Internet", RFC 3714, | |||
Traffic in the Internet", RFC 3714, | March 2004. | |||
March 2004. | ||||
[RFC4828] Floyd, S. and E. Kohler, "TCP Friendly | [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control | |||
Rate Control (TFRC): The Small-Packet (SP) | (TFRC): The Small-Packet (SP) Variant", RFC 4828, April | |||
Variant", RFC 4828, April 2007. | 2007. | |||
[RFC5348] Floyd, S., Handley, M., Padhye, J., and J. | [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP | |||
Widmer, "TCP Friendly Rate Control (TFRC): | Friendly Rate Control (TFRC): Protocol Specification", RFC | |||
Protocol Specification", RFC 5348, | 5348, September 2008. | |||
September 2008. | ||||
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and | [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | |||
K. Ramakrishnan, "Adding Explicit | Ramakrishnan, "Adding Explicit Congestion Notification | |||
Congestion Notification (ECN) Capability | (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, June | |||
to TCP's SYN/ACK Packets", RFC 5562, | 2009. | |||
June 2009. | ||||
[RFC5670] Eardley, P., "Metering and Marking | [RFC5670] Eardley, P., "Metering and Marking Behaviour of PCN- | |||
Behaviour of PCN-Nodes", RFC 5670, | Nodes", RFC 5670, November 2009. | |||
November 2009. | ||||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
"TCP Congestion Control", RFC 5681, | Control", RFC 5681, September 2009. | |||
September 2009. | ||||
[RFC5690] Floyd, S., Arcia, A., Ros, D., and J. | [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding | |||
Iyengar, "Adding Acknowledgement | Acknowledgement Congestion Control to TCP", RFC 5690, | |||
Congestion Control to TCP", RFC 5690, | February 2010. | |||
February 2010. | ||||
[RFC6077] Papadimitriou, D., Welzl, M., Scharf, M., | [RFC6077] Papadimitriou, D., Welzl, M., Scharf, M., and B. Briscoe, | |||
and B. Briscoe, "Open Research Issues in | "Open Research Issues in Internet Congestion Control", RFC | |||
Internet Congestion Control", RFC 6077, | 6077, February 2011. | |||
February 2011. | ||||
[RFC6679] Westerlund, M., Johansson, I., Perkins, | [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., | |||
C., O'Hanlon, P., and K. Carlberg, | and K. Carlberg, "Explicit Congestion Notification (ECN) | |||
"Explicit Congestion Notification (ECN) | for RTP over UDP", RFC 6679, August 2012. | |||
for RTP over UDP", RFC 6679, August 2012. | ||||
[RFC6789] Briscoe, B., Woundy, R., and A. Cooper, | [RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion | |||
"Congestion Exposure (ConEx) Concepts and | Exposure (ConEx) Concepts and Use Cases", RFC 6789, | |||
Use Cases", RFC 6789, December 2012. | December 2012. | |||
[Rate_fair_Dis] Briscoe, B., "Flow Rate Fairness: | [Rate_fair_Dis] | |||
Dismantling a Religion", ACM | Briscoe, B., "Flow Rate Fairness: Dismantling a Religion", | |||
CCR 37(2)63--74, April 2007, <http:// | ACM CCR 37(2)63-74, April 2007, | |||
portal.acm.org/citation.cfm?id=1232926>. | <http://portal.acm.org/citation.cfm?id=1232926>. | |||
[gentle_RED] Floyd, S., "Recommendation on using the | [gentle_RED] | |||
"gentle_" variant of RED", Web page , | Floyd, S., "Recommendation on using the "gentle_" variant | |||
March 2000, <http://www.icir.org/floyd/ | of RED", Web page , March 2000, | |||
red/gentle.html>. | <http://www.icir.org/floyd/red/gentle.html>. | |||
[pBox] Floyd, S. and K. Fall, "Promoting the Use | [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End | |||
of End-to-End Congestion Control in the | Congestion Control", IEEE/ACM Transactions on Networking | |||
Internet", IEEE/ACM Transactions on | 7(4) 458--472, August 1999, <http://ieeexplore.ieee.org/ | |||
Networking 7(4) 458--472, August 1999, <ht | xpls/abs_all.jsp?arnumber=793002>. | |||
tp://www.aciri.org/floyd/ | ||||
end2end-paper.html>. | ||||
[pktByteEmail] Floyd, S., "RED: Discussions of Byte and | [pktByteEmail] | |||
Packet Modes", email , March 1997, <http:/ | Floyd, S., "RED: Discussions of Byte and Packet Modes", | |||
/www-nrg.ee.lbl.gov/floyd/ | email, March 1997, | |||
REDaveraging.txt>. | <http://ee.lbl.gov/floyd/REDaveraging.txt>. | |||
Appendix A. Survey of RED Implementation Status | Appendix A. Survey of RED Implementation Status | |||
This Appendix is informative, not normative. | This Appendix is informative, not normative. | |||
In May 2007 a survey was conducted of 84 vendors to assess how widely | In May 2007 a survey was conducted of 84 vendors to assess how widely | |||
drop probability based on packet size has been implemented in RED | drop probability based on packet size has been implemented in RED | |||
Table 3. About 19% of those surveyed replied, giving a sample size | Table 3. About 19% of those surveyed replied, giving a sample size | |||
of 16. Although in most cases we do not have permission to identify | of 16. Although in most cases we do not have permission to identify | |||
the respondents, we can say that those that have responded include | the respondents, we can say that those that have responded include | |||
skipping to change at page 33, line 18 | skipping to change at page 33, line 26 | |||
and Alcatel-Lucent. The others range across the large network | and Alcatel-Lucent. The others range across the large network | |||
equipment vendors at L3 & L2, firewall vendors, wireless equipment | equipment vendors at L3 & L2, firewall vendors, wireless equipment | |||
vendors, as well as large software businesses with a small selection | vendors, as well as large software businesses with a small selection | |||
of networking products. All those who responded confirmed that they | of networking products. All those who responded confirmed that they | |||
have not implemented the variant of RED with drop dependent on packet | have not implemented the variant of RED with drop dependent on packet | |||
size (2 were fairly sure they had not but needed to check more | size (2 were fairly sure they had not but needed to check more | |||
thoroughly). At the time the survey was conducted, Linux did not | thoroughly). At the time the survey was conducted, Linux did not | |||
implement RED with packet-size bias of drop, although we have not | implement RED with packet-size bias of drop, although we have not | |||
investigated a wider range of open source code. | investigated a wider range of open source code. | |||
+-------------------------------+----------------+-----------------+ | +-------------------------------+----------------+--------------+ | |||
| Response | No. of vendors | %age of vendors | | | Response | No. of vendors | % of vendors | | |||
+-------------------------------+----------------+-----------------+ | +-------------------------------+----------------+--------------+ | |||
| Not implemented | 14 | 17% | | | Not implemented | 14 | 17% | | |||
| Not implemented (probably) | 2 | 2% | | | Not implemented (probably) | 2 | 2% | | |||
| Implemented | 0 | 0% | | | Implemented | 0 | 0% | | |||
| No response | 68 | 81% | | | No response | 68 | 81% | | |||
| Total companies/orgs surveyed | 84 | 100% | | | Total companies/orgs surveyed | 84 | 100% | | |||
+-------------------------------+----------------+-----------------+ | +-------------------------------+----------------+--------------+ | |||
Table 3: Vendor Survey on byte-mode drop variant of RED (lower drop | Table 3: Vendor Survey on byte-mode drop variant of RED (lower drop | |||
probability for small packets) | probability for small packets) | |||
Where reasons have been given, the extra complexity of packet bias | Where reasons were given for why the byte-mode drop variant had not | |||
code has been most prevalent, though one vendor had a more principled | been implemented, the extra complexity of packet-bias code was most | |||
reason for avoiding it--similar to the argument of this document. | prevalent, though one vendor had a more principled reason for | |||
avoiding it -- similar to the argument of this document. | ||||
Our survey was of vendor implementations, so we cannot be certain | Our survey was of vendor implementations, so we cannot be certain | |||
about operator deployment. But we believe many queues in the | about operator deployment. But we believe many queues in the | |||
Internet are still tail-drop. The company of one of the co-authors | Internet are still tail drop. The company of one of the co-authors | |||
(BT) has widely deployed RED, but many tail-drop queues are bound to | (BT) has widely deployed RED; however, many tail-drop queues are | |||
still exist, particularly in access network equipment and on | bound to still exist, particularly in access network equipment and on | |||
middleboxes like firewalls, where RED is not always available. | middleboxes like firewalls, where RED is not always available. | |||
Routers using a memory architecture based on fixed size buffers with | Routers using a memory architecture based on fixed-size buffers with | |||
borrowing may also still be prevalent in the Internet. As explained | borrowing may also still be prevalent in the Internet. As explained | |||
in Section 4.2.1, these also provide a marginal (but legitimate) bias | in Section 4.2.1, these also provide a marginal (but legitimate) bias | |||
towards small packets. So even though RED byte-mode drop is not | towards small packets. So even though RED byte-mode drop is not | |||
prevalent, it is likely there is still some bias towards small | prevalent, it is likely there is still some bias towards small | |||
packets in the Internet due to tail drop and fixed buffer borrowing. | packets in the Internet due to tail-drop and fixed-buffer borrowing. | |||
Appendix B. Sufficiency of Packet-Mode Drop | Appendix B. Sufficiency of Packet-Mode Drop | |||
This Appendix is informative, not normative. | This Appendix is informative, not normative. | |||
Here we check that packet-mode drop (or marking) in the network gives | Here we check that packet-mode drop (or marking) in the network gives | |||
sufficiently generic information for the transport layer to use. We | sufficiently generic information for the transport layer to use. We | |||
check against a 2x2 matrix of four scenarios that may occur now or in | check against a 2x2 matrix of four scenarios that may occur now or in | |||
the future (Table 4). The horizontal and vertical dimensions have | the future (Table 4). Checking the two scenarios in each of the | |||
been chosen because each tests extremes of sensitivity to packet size | horizontal and vertical dimensions tests the extremes of sensitivity | |||
in the transport and in the network respectively. | to packet size in the transport and in the network respectively. | |||
Note that this section does not consider byte-mode drop at all. | Note that this section does not consider byte-mode drop at all. | |||
Having deprecated byte-mode drop, the goal here is to check that | Having deprecated byte-mode drop, the goal here is to check that | |||
packet-mode drop will be sufficient in all cases. | packet-mode drop will be sufficient in all cases. | |||
+-------------------------------+-----------------+-----------------+ | +-------------------------------+-----------------+-----------------+ | |||
| Transport | a) Independent | b) Dependent on | | | Transport -> | a) Independent | b) Dependent on | | |||
| | of packet size | packet size of | | | ----------------------------- | of packet size | packet size of | | |||
| Network | of congestion | congestion | | | Network | of congestion | congestion | | |||
| | notifications | notifications | | | | notifications | notifications | | |||
+-------------------------------+-----------------+-----------------+ | +-------------------------------+-----------------+-----------------+ | |||
| 1) Predominantly | Scenario a1) | Scenario b1) | | | 1) Predominantly bit- | Scenario a1) | Scenario b1) | | |||
| bit-congestible network | | | | | congestible network | | | | |||
| 2) Mix of bit-congestible and | Scenario a2) | Scenario b2) | | | 2) Mix of bit-congestible and | Scenario a2) | Scenario b2) | | |||
| pkt-congestible network | | | | | pkt-congestible network | | | | |||
+-------------------------------+-----------------+-----------------+ | +-------------------------------+-----------------+-----------------+ | |||
Table 4: Four Possible Congestion Scenarios | Table 4: Four Possible Congestion Scenarios | |||
Appendix B.1 focuses on the horizontal dimension of Table 4 checking | Appendix B.1 focuses on the horizontal dimension of Table 4 checking | |||
that packet-mode drop (or marking) gives sufficient information, | that packet-mode drop (or marking) gives sufficient information, | |||
whether or not the transport uses it--scenarios b) and a) | whether or not the transport uses it -- scenarios b) and a) | |||
respectively. | respectively. | |||
Appendix B.2 focuses on the vertical dimension of Table 4, checking | Appendix B.2 focuses on the vertical dimension of Table 4, checking | |||
that packet-mode drop gives sufficient information to the transport | that packet-mode drop gives sufficient information to the transport | |||
whether resources in the network are bit-congestible or packet- | whether resources in the network are bit-congestible or packet- | |||
congestible (these terms are defined in Section 1.1). | congestible (these terms are defined in Section 1.1). | |||
Notation: To be concrete, we will compare two flows with different | Notation: To be concrete, we will compare two flows with different | |||
packet sizes, s_1 and s_2. As an example, we will take s_1 = 60B | packet sizes, s_1 and s_2. As an example, we will take | |||
= 480b and s_2 = 1500B = 12,000b. | s_1 = 60 B = 480 b and s_2 = 1,500 B = 12,000 b. | |||
A flow's bit rate, x [bps], is related to its packet rate, u | A flow's bit rate, x [bps], is related to its packet rate, u | |||
[pps], by | [pps], by | |||
x(t) = s.u(t). | x(t) = s*u(t). | |||
In the bit-congestible case, path congestion will be denoted by | In the bit-congestible case, path congestion will be denoted by | |||
p_b, and in the packet-congestible case by p_p. When either case | p_b, and in the packet-congestible case by p_p. When either case | |||
is implied, the letter p alone will denote path congestion. | is implied, the letter p alone will denote path congestion. | |||
B.1. Packet-Size (In)Dependence in Transports | B.1. Packet-Size (In)Dependence in Transports | |||
In all cases we consider a packet-mode drop queue that indicates | In all cases, we consider a packet-mode drop queue that indicates | |||
congestion by dropping (or marking) packets with probability p | congestion by dropping (or marking) packets with probability p | |||
irrespective of packet size. We use an example value of loss | irrespective of packet size. We use an example value of loss | |||
(marking) probability, p=0.1%. | (marking) probability, p=0.1%. | |||
A transport like RFC5681 TCP treats a congestion notification on any | A transport like TCP as specified in RFC 5681 treats a congestion | |||
packet whatever its size as one event. However, a network with just | notification on any packet whatever its size as one event. However, | |||
the packet-mode drop algorithm does give more information if the | a network with just the packet-mode drop algorithm gives more | |||
transport chooses to use it. We will use Table 5 to illustrate this. | information if the transport chooses to use it. We will use Table 5 | |||
to illustrate this. | ||||
We will set aside the last column until later. The columns labelled | We will set aside the last column until later. The columns labelled | |||
"Flow 1" and "Flow 2" compare two flows consisting of 60B and 1500B | 'Flow 1' and 'Flow 2' compare two flows consisting of 60 B and | |||
packets respectively. The body of the table considers two separate | 1,500 B packets respectively. The body of the table considers two | |||
cases, one where the flows have equal bit-rate and the other with | separate cases, one where the flows have an equal bit rate and the | |||
equal packet-rates. In both cases, the two flows fill a 96Mbps link. | other with equal packet rates. In both cases, the two flows fill a | |||
Therefore, in the equal bit-rate case they each have half the bit- | 96 Mbps link. Therefore, in the equal bit rate case, they each have | |||
rate (48Mbps). Whereas, with equal packet-rates, flow 1 uses 25 | half the bit rate (48Mbps). Whereas, with equal packet rates, Flow 1 | |||
times smaller packets so it gets 25 times less bit-rate--it only gets | uses 25 times smaller packets so it gets 25 times less bit rate -- it | |||
1/(1+25) of the link capacity (96Mbps/26 = 4Mbps after rounding). In | only gets 1/(1+25) of the link capacity (96 Mbps / 26 = 4 Mbps after | |||
contrast flow 2 gets 25 times more bit-rate (92Mbps) in the equal | rounding). In contrast Flow 2 gets 25 times more bit rate (92 Mbps) | |||
packet rate case because its packets are 25 times larger. The packet | in the equal packet rate case because its packets are 25 times | |||
rate shown for each flow could easily be derived once the bit-rate | larger. The packet rate shown for each flow could easily be derived | |||
was known by dividing bit-rate by packet size, as shown in the column | once the bit rate was known by dividing the bit rate by packet size, | |||
labelled "Formula". | as shown in the column labelled 'Formula'. | |||
Parameter Formula Flow 1 Flow 2 Combined | Parameter Formula Flow 1 Flow 2 Combined | |||
----------------------- ----------- ------- ------- -------- | ----------------------- ----------- -------- -------- -------- | |||
Packet size s/8 60B 1,500B (Mix) | Packet size s/8 60 B 1,500 B (Mix) | |||
Packet size s 480b 12,000b (Mix) | Packet size s 480 b 12,000 b (Mix) | |||
Pkt loss probability p 0.1% 0.1% 0.1% | Pkt loss probability p 0.1% 0.1% 0.1% | |||
EQUAL BIT-RATE CASE | EQUAL BIT RATE CASE | |||
Bit-rate x 48Mbps 48Mbps 96Mbps | Bit rate x 48 Mbps 48 Mbps 96 Mbps | |||
Packet-rate u = x/s 100kpps 4kpps 104kpps | Packet rate u = x/s 100 kpps 4 kpps 104 kpps | |||
Absolute pkt-loss-rate p*u 100pps 4pps 104pps | Absolute pkt-loss rate p*u 100 pps 4 pps 104 pps | |||
Absolute bit-loss-rate p*u*s 48kbps 48kbps 96kbps | Absolute bit-loss rate p*u*s 48 kbps 48 kbps 96 kbps | |||
Ratio of lost/sent pkts p*u/u 0.1% 0.1% 0.1% | Ratio of lost/sent pkts p*u/u 0.1% 0.1% 0.1% | |||
Ratio of lost/sent bits p*u*s/(u*s) 0.1% 0.1% 0.1% | Ratio of lost/sent bits p*u*s/(u*s) 0.1% 0.1% 0.1% | |||
EQUAL PACKET-RATE CASE | EQUAL PACKET RATE CASE | |||
Bit-rate x 4Mbps 92Mbps 96Mbps | Bit rate x 4 Mbps 92 Mbps 96 Mbps | |||
Packet-rate u = x/s 8kpps 8kpps 15kpps | Packet rate u = x/s 8 kpps 8 kpps 15 kpps | |||
Absolute pkt-loss-rate p*u 8pps 8pps 15pps | Absolute pkt-loss rate p*u 8 pps 8 pps 15 pps | |||
Absolute bit-loss-rate p*u*s 4kbps 92kbps 96kbps | Absolute bit-loss rate p*u*s 4 kbps 92 kbps 96 kbps | |||
Ratio of lost/sent pkts p*u/u 0.1% 0.1% 0.1% | Ratio of lost/sent pkts p*u/u 0.1% 0.1% 0.1% | |||
Ratio of lost/sent bits p*u*s/(u*s) 0.1% 0.1% 0.1% | Ratio of lost/sent bits p*u*s/(u*s) 0.1% 0.1% 0.1% | |||
Table 5: Absolute Loss Rates and Loss Ratios for Flows of Small and | Table 5: Absolute Loss Rates and Loss Ratios for Flows of Small and | |||
Large Packets and Both Combined | Large Packets and Both Combined | |||
So far we have merely set up the scenarios. We now consider | So far, we have merely set up the scenarios. We now consider | |||
congestion notification in the scenario. Two TCP flows with the same | congestion notification in the scenario. Two TCP flows with the same | |||
round trip time aim to equalise their packet-loss-rates over time. | round-trip time aim to equalise their packet-loss rates over time; | |||
That is the number of packets lost in a second, which is the packets | that is, the number of packets lost in a second, which is the packets | |||
per second (u) multiplied by the probability that each one is dropped | per second (u) multiplied by the probability that each one is dropped | |||
(p). Thus TCP converges on the "Equal packet-rate" case, where both | (p). Thus, TCP converges on the case labelled 'Equal packet rate' in | |||
flows aim for the same "Absolute packet-loss-rate" (both 8pps in the | the table, where both flows aim for the same absolute packet-loss | |||
table). | rate (both 8 pps in the table). | |||
Packet-mode drop actually gives flows sufficient information to | Packet-mode drop actually gives flows sufficient information to | |||
measure their loss-rate in bits per second, if they choose, not just | measure their loss rate in bits per second, if they choose, not just | |||
packets per second. Each flow can count the size of a lost or marked | packets per second. Each flow can count the size of a lost or marked | |||
packet and scale its rate-response in proportion (as TFRC-SP does). | packet and scale its rate response in proportion (as TFRC-SP does). | |||
The result is shown in the row entitled "Absolute bit-loss-rate", | The result is shown in the row entitled 'Absolute bit-loss rate', | |||
where the bits lost in a second is the packets per second (u) | where the bits lost in a second is the packets per second (u) | |||
multiplied by the probability of losing a packet (p) multiplied by | multiplied by the probability of losing a packet (p) multiplied by | |||
the packet size (s). Such an algorithm would try to remove any | the packet size (s). Such an algorithm would try to remove any | |||
imbalance in bit-loss-rate such as the wide disparity in the "Equal | imbalance in the bit-loss rate such as the wide disparity in the case | |||
packet-rate" case (4kbps vs. 92kbps). Instead, a packet-size- | labelled 'Equal packet rate' (4k bps vs. 92 kbps). Instead, a | |||
dependent algorithm would aim for equal bit-loss-rates, which would | packet-size-dependent algorithm would aim for equal bit-loss rates, | |||
drive both flows towards the "Equal bit-rate" case, by driving them | which would drive both flows towards the case labelled 'Equal bit | |||
to equal bit-loss-rates (both 48kbps in this example). | rate', by driving them to equal bit-loss rates (both 48 kbps in this | |||
example). | ||||
The explanation so far has assumed that each flow consists of packets | The explanation so far has assumed that each flow consists of packets | |||
of only one constant size. Nonetheless, it extends naturally to | of only one constant size. Nonetheless, it extends naturally to | |||
flows with mixed packet sizes. In the right-most column of Table 5 a | flows with mixed packet sizes. In the right-most column of Table 5, | |||
flow of mixed size packets is created simply by considering flow 1 | a flow of mixed-size packets is created simply by considering Flow 1 | |||
and flow 2 as a single aggregated flow. There is no need for a flow | and Flow 2 as a single aggregated flow. There is no need for a flow | |||
to maintain an average packet size. It is only necessary for the | to maintain an average packet size. It is only necessary for the | |||
transport to scale its response to each congestion indication by the | transport to scale its response to each congestion indication by the | |||
size of each individual lost (or marked) packet. Taking for example | size of each individual lost (or marked) packet. Taking, for | |||
the "Equal packet-rate" case, in one second about 8 small packets and | example, the case labelled 'Equal packet rate', in one second about 8 | |||
8 large packets are lost (making closer to 15 than 16 losses per | small packets and 8 large packets are lost (making closer to 15 than | |||
second due to rounding). If the transport multiplies each loss by | 16 losses per second due to rounding). If the transport multiplies | |||
its size, in one second it responds to 8*480b and 8*12,000b lost | each loss by its size, in one second it responds to 8*480 and | |||
bits, adding up to 96,000 lost bits in a second. This double checks | 8*12,000 lost bits, adding up to 96,000 lost bits in a second. This | |||
correctly, being the same as 0.1% of the total bit-rate of 96Mbps. | double checks correctly, being the same as 0.1% of the total bit rate | |||
For completeness, the formula for absolute bit-loss-rate is p(u1*s1+ | of 96 Mbps. For completeness, the formula for absolute bit-loss rate | |||
u2*s2). | is p(u1*s1+u2*s2). | |||
Incidentally, a transport will always measure the loss probability | Incidentally, a transport will always measure the loss probability | |||
the same irrespective of whether it measures in packets or in bytes. | the same, irrespective of whether it measures in packets or in bytes. | |||
In other words, the ratio of lost to sent packets will be the same as | In other words, the ratio of lost packets to sent packets will be the | |||
the ratio of lost to sent bytes. (This is why TCP's bit rate is | same as the ratio of lost bytes to sent bytes. (This is why TCP's | |||
still proportional to packet size even when byte-counting is used, as | bit rate is still proportional to packet size, even when byte | |||
recommended for TCP in [RFC5681], mainly for orthogonal security | counting is used, as recommended for TCP in [RFC5681], mainly for | |||
reasons.) This is intuitively obvious by comparing two example | orthogonal security reasons.) This is intuitively obvious by | |||
flows; one with 60B packets, the other with 1500B packets. If both | comparing two example flows; one with 60 B packets, the other with | |||
flows pass through a queue with drop probability 0.1%, each flow will | 1,500 B packets. If both flows pass through a queue with drop | |||
lose 1 in 1,000 packets. In the stream of 60B packets the ratio of | probability 0.1%, each flow will lose 1 in 1,000 packets. In the | |||
bytes lost to sent will be 60B in every 60,000B; and in the stream of | stream of 60 B packets, the ratio of lost bytes to sent bytes will be | |||
1500B packets, the loss ratio will be 1,500B out of 1,500,000B. When | 60 B in every 60,000 B; and in the stream of 1,500 B packets, the | |||
the transport responds to the ratio of lost to sent packets, it will | loss ratio will be 1,500 B out of 1,500,000 B. When the transport | |||
measure the same ratio whether it measures in packets or bytes: 0.1% | responds to the ratio of lost to sent packets, it will measure the | |||
in both cases. The fact that this ratio is the same whether measured | same ratio whether it measures in packets or bytes: 0.1% in both | |||
in packets or bytes can be seen in Table 5, where the ratio of lost | cases. The fact that this ratio is the same whether measured in | |||
to sent packets and the ratio of lost to sent bytes is always 0.1% in | packets or bytes can be seen in Table 5, where the ratio of lost | |||
all cases (recall that the scenario was set up with p=0.1%). | packets to sent packets and the ratio of lost bytes to sent bytes is | |||
always 0.1% in all cases (recall that the scenario was set up with | ||||
p=0.1%). | ||||
This discussion of how the ratio can be measured in packets or bytes | This discussion of how the ratio can be measured in packets or bytes | |||
is only raised here to highlight that it is irrelevant to this memo! | is only raised here to highlight that it is irrelevant to this memo! | |||
Whether a transport depends on packet size or not depends on how this | Whether or not a transport depends on packet size depends on how this | |||
ratio is used within the congestion control algorithm. | ratio is used within the congestion control algorithm. | |||
So far we have shown that packet-mode drop passes sufficient | So far, we have shown that packet-mode drop passes sufficient | |||
information to the transport layer so that the transport can take | information to the transport layer so that the transport can take bit | |||
account of bit-congestion, by using the sizes of the packets that | congestion into account, by using the sizes of the packets that | |||
indicate congestion. We have also shown that the transport can | indicate congestion. We have also shown that the transport can | |||
choose not to take packet size into account if it wishes. We will | choose not to take packet size into account if it wishes. We will | |||
now consider whether the transport can know which to do. | now consider whether the transport can know which to do. | |||
B.2. Bit-Congestible and Packet-Congestible Indications | B.2. Bit-Congestible and Packet-Congestible Indications | |||
As a thought-experiment, imagine an idealised congestion notification | As a thought-experiment, imagine an idealised congestion notification | |||
protocol that supports both bit-congestible and packet-congestible | protocol that supports both bit-congestible and packet-congestible | |||
resources. It would require at least two ECN flags, one for each of | resources. It would require at least two ECN flags, one for each of | |||
bit-congestible and packet-congestible resources. | the bit-congestible and packet-congestible resources. | |||
1. A packet-congestible resource trying to code congestion level p_p | 1. A packet-congestible resource trying to code congestion level p_p | |||
into a packet stream should mark the idealised `packet | into a packet stream should mark the idealised 'packet | |||
congestion' field in each packet with probability p_p | congestion' field in each packet with probability p_p | |||
irrespective of the packet's size. The transport should then | irrespective of the packet's size. The transport should then | |||
take a packet with the packet congestion field marked to mean | take a packet with the packet congestion field marked to mean | |||
just one mark, irrespective of the packet size. | just one mark, irrespective of the packet size. | |||
2. A bit-congestible resource trying to code time-varying byte- | 2. A bit-congestible resource trying to code time-varying byte- | |||
congestion level p_b into a packet stream should mark the `byte | congestion level p_b into a packet stream should mark the 'byte | |||
congestion' field in each packet with probability p_b, again | congestion' field in each packet with probability p_b, again | |||
irrespective of the packet's size. Unlike before, the transport | irrespective of the packet's size. Unlike before, the transport | |||
should take a packet with the byte congestion field marked to | should take a packet with the byte congestion field marked to | |||
count as a mark on each byte in the packet. | count as a mark on each byte in the packet. | |||
This hides a fundamental problem--much more fundamental than whether | This hides a fundamental problem -- much more fundamental than | |||
we can magically create header space for yet another ECN flag, or | whether we can magically create header space for yet another ECN | |||
whether it would work while being deployed incrementally. | flag, or whether it would work while being deployed incrementally. | |||
Distinguishing drop from delivery naturally provides just one | Distinguishing drop from delivery naturally provides just one | |||
implicit bit of congestion indication information--the packet is | implicit bit of congestion indication information -- the packet is | |||
either dropped or not. It is hard to drop a packet in two ways that | either dropped or not. It is hard to drop a packet in two ways that | |||
are distinguishable remotely. This is a similar problem to that of | are distinguishable remotely. This is a similar problem to that of | |||
distinguishing wireless transmission losses from congestive losses. | distinguishing wireless transmission losses from congestive losses. | |||
This problem would not be solved even if ECN were universally | This problem would not be solved, even if ECN were universally | |||
deployed. A congestion notification protocol must survive a | deployed. A congestion notification protocol must survive a | |||
transition from low levels of congestion to high. Marking two states | transition from low levels of congestion to high. Marking two states | |||
is feasible with explicit marking, but much harder if packets are | is feasible with explicit marking, but it is much harder if packets | |||
dropped. Also, it will not always be cost-effective to implement AQM | are dropped. Also, it will not always be cost-effective to implement | |||
at every low level resource, so drop will often have to suffice. | AQM at every low-level resource, so drop will often have to suffice. | |||
We are not saying two ECN fields will be needed (and we are not | We are not saying two ECN fields will be needed (and we are not | |||
saying that somehow a resource should be able to drop a packet in one | saying that somehow a resource should be able to drop a packet in one | |||
of two different ways so that the transport can distinguish which | of two different ways so that the transport can distinguish which | |||
sort of drop it was!). These two congestion notification channels | sort of drop it was!). These two congestion notification channels | |||
are a conceptual device to illustrate a dilemma we could face in the | are a conceptual device to illustrate a dilemma we could face in the | |||
future. Section 3 gives four good reasons why it would be a bad idea | future. Section 3 gives four good reasons why it would be a bad idea | |||
to allow for packet size by biasing drop probability in favour of | to allow for packet size by biasing drop probability in favour of | |||
small packets within the network. The impracticality of our thought | small packets within the network. The impracticality of our thought | |||
experiment shows that it will be hard to give transports a practical | experiment shows that it will be hard to give transports a practical | |||
way to know whether to take account of the size of congestion | way to know whether or not to take into account the size of | |||
indication packets or not. | congestion indication packets. | |||
Fortunately, this dilemma is not pressing because by design most | Fortunately, this dilemma is not pressing because by design most | |||
equipment becomes bit-congested before its packet-processing becomes | equipment becomes bit-congested before its packet processing becomes | |||
congested (as already outlined in Section 1.1). Therefore transports | congested (as already outlined in Section 1.1). Therefore, | |||
can be designed on the relatively sound assumption that a congestion | transports can be designed on the relatively sound assumption that a | |||
indication will usually imply bit-congestion. | congestion indication will usually imply bit congestion. | |||
Nonetheless, although the above idealised protocol isn't intended for | Nonetheless, although the above idealised protocol isn't intended for | |||
implementation, we do want to emphasise that research is needed to | implementation, we do want to emphasise that research is needed to | |||
predict whether there are good reasons to believe that packet | predict whether there are good reasons to believe that packet | |||
congestion might become more common, and if so, to find a way to | congestion might become more common, and if so, to find a way to | |||
somehow distinguish between bit and packet congestion [RFC3714]. | somehow distinguish between bit and packet congestion [RFC3714]. | |||
Recently, the dual resource queue (DRQ) proposal [DRQ] has been made | Recently, the dual resource queue (DRQ) proposal [DRQ] has been made | |||
on the premise that, as network processors become more cost | on the premise that, as network processors become more cost- | |||
effective, per packet operations will become more complex | effective, per-packet operations will become more complex | |||
(irrespective of whether more function in the network is desirable). | (irrespective of whether more function in the network is desirable). | |||
Consequently the premise is that CPU congestion will become more | Consequently the premise is that CPU congestion will become more | |||
common. DRQ is a proposed modification to the RED algorithm that | common. DRQ is a proposed modification to the RED algorithm that | |||
folds both bit congestion and packet congestion into one signal | folds both bit congestion and packet congestion into one signal | |||
(either loss or ECN). | (either loss or ECN). | |||
Finally, we note one further complication. Strictly, packet- | Finally, we note one further complication. Strictly, packet- | |||
congestible resources are often cycle-congestible. For instance, for | congestible resources are often cycle-congestible. For instance, for | |||
routing look-ups load depends on the complexity of each look-up and | routing lookups, load depends on the complexity of each lookup and | |||
whether the pattern of arrivals is amenable to caching or not. This | whether or not the pattern of arrivals is amenable to caching. This | |||
also reminds us that any solution must not require a forwarding | also reminds us that any solution must not require a forwarding | |||
engine to use excessive processor cycles in order to decide how to | engine to use excessive processor cycles in order to decide how to | |||
say it has no spare processor cycles. | say it has no spare processor cycles. | |||
Appendix C. Byte-mode Drop Complicates Policing Congestion Response | Appendix C. Byte-Mode Drop Complicates Policing Congestion Response | |||
This section is informative, not normative. | This section is informative, not normative. | |||
There are two main classes of approach to policing congestion | There are two main classes of approach to policing congestion | |||
response: i) policing at each bottleneck link or ii) policing at the | response: (i) policing at each bottleneck link or (ii) policing at | |||
edges of networks. Packet-mode drop in RED is compatible with | the edges of networks. Packet-mode drop in RED is compatible with | |||
either, while byte-mode drop precludes edge policing. | either, while byte-mode drop precludes edge policing. | |||
The simplicity of an edge policer relies on one dropped or marked | The simplicity of an edge policer relies on one dropped or marked | |||
packet being equivalent to another of the same size without having to | packet being equivalent to another of the same size without having to | |||
know which link the drop or mark occurred at. However, the byte-mode | know which link the drop or mark occurred at. However, the byte-mode | |||
drop algorithm has to depend on the local MTU of the line--it needs | drop algorithm has to depend on the local MTU of the line -- it needs | |||
to use some concept of a 'normal' packet size. Therefore, one | to use some concept of a 'normal' packet size. Therefore, one | |||
dropped or marked packet from a byte-mode drop algorithm is not | dropped or marked packet from a byte-mode drop algorithm is not | |||
necessarily equivalent to another from a different link. A policing | necessarily equivalent to another from a different link. A policing | |||
function local to the link can know the local MTU where the | function local to the link can know the local MTU where the | |||
congestion occurred. However, a policer at the edge of the network | congestion occurred. However, a policer at the edge of the network | |||
cannot, at least not without a lot of complexity. | cannot, at least not without a lot of complexity. | |||
The early research proposals for type (i) policing at a bottleneck | The early research proposals for type (i) policing at a bottleneck | |||
link [pBox] used byte-mode drop, then detected flows that contributed | link [pBox] used byte-mode drop, then detected flows that contributed | |||
disproportionately to the number of packets dropped. However, with | disproportionately to the number of packets dropped. However, with | |||
no extra complexity, later proposals used packet mode drop and looked | no extra complexity, later proposals used packet-mode drop and looked | |||
for flows that contributed a disproportionate amount of dropped bytes | for flows that contributed a disproportionate amount of dropped bytes | |||
[CHOKe_Var_Pkt]. | [CHOKe_Var_Pkt]. | |||
Work is progressing on the congestion exposure protocol (ConEx | Work is progressing on the Congestion Exposure (ConEx) protocol | |||
[RFC6789]), which enables a type (ii) edge policer located at a | [RFC6789], which enables a type (ii) edge policer located at a user's | |||
user's attachment point. The idea is to be able to take an | attachment point. The idea is to be able to take an integrated view | |||
integrated view of the effect of all a user's traffic on any link in | of the effect of all a user's traffic on any link in the | |||
the internetwork. However, byte-mode drop would effectively preclude | internetwork. However, byte-mode drop would effectively preclude | |||
such edge policing because of the MTU issue above. | such edge policing because of the MTU issue above. | |||
Indeed, making drop probability depend on the size of the packets | Indeed, making drop probability depend on the size of the packets | |||
that bits happen to be divided into would simply encourage the bits | that bits happen to be divided into would simply encourage the bits | |||
to be divided into smaller packets in order to confuse policing. In | to be divided into smaller packets in order to confuse policing. In | |||
contrast, as long as a dropped/marked packet is taken to mean that | contrast, as long as a dropped/marked packet is taken to mean that | |||
all the bytes in the packet are dropped/marked, a policer can remain | all the bytes in the packet are dropped/marked, a policer can remain | |||
robust against bits being re-divided into different size packets or | robust against sequences of bits being re-divided into different size | |||
across different size flows [Rate_fair_Dis]. | packets or across different size flows [Rate_fair_Dis]. | |||
Appendix D. Changes from Previous Versions | ||||
To be removed by the RFC Editor on publication. | ||||
Full incremental diffs between each version are available at | ||||
<http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-byte-pkt-congest/> | ||||
(courtesy of the rfcdiff tool): | ||||
From -11 to -12: Following the second pass through the IESG: | ||||
* Section 2.1 [Barry Leiba]: | ||||
+ s/No other choice makes sense,/Subject to the exceptions | ||||
below, no other choice makes sense,/ | ||||
+ s/Exceptions to these recommendations MAY be necessary | ||||
/Exceptions to these recommendations may be necessary / | ||||
* Sections 3.2 and 4.2.3 [Joel Jaeggli]: | ||||
+ Added comment to section 4.2.3 that the examples given are | ||||
not in widespread production use, but they give evidence | ||||
that it is possible to follow the advice given. | ||||
+ Section 4.2.3: | ||||
- OLD: Although there are no known proposals, it would also | ||||
be possible and perfectly valid to make control packets | ||||
robust against drop by explicitly requesting a lower drop | ||||
probability using their Diffserv code point [RFC2474] to | ||||
request a scheduling class with lower drop. | ||||
NEW: Although there are no known proposals, it would also | ||||
be possible and perfectly valid to make control packets | ||||
robust against drop by requesting a scheduling class with | ||||
lower drop probability, by re-marking to a Diffserv code | ||||
point [RFC2474] within the same behaviour aggregate. | ||||
- appended "Similarly applications, over non-TCP transports | ||||
could make any packets that are effectively control | ||||
packets more robust by using Diffserv, data duplication, | ||||
FEC etc." | ||||
+ Updated Wischik ref and added "Reducing Web Latency: the | ||||
Virtue of Gentle Aggression" ref. | ||||
* Expanded more abbreviations (CoDel, PIE, MTU). | ||||
* Section 1. Intro [Stephen Farrell]: | ||||
+ In the places where the doc desribes the dichotomy between | ||||
'long-term goal' and 'expediency' the words long term goal | ||||
and expedient have been introduced, to more explicitly refer | ||||
back to this introductory para (S.2.1 & S.2.3). | ||||
+ Added explanation of what scaling with packet size means. | ||||
* Conclusions [Benoit Claise]: | ||||
+ OLD: For the specific case of RED, this means that byte-mode | ||||
queue measurement will often be appropriate although byte- | ||||
mode drop is strongly deprecated. | ||||
NEW: For the specific case of RED, this means that byte-mode | ||||
queue measurement will often be appropriate but the use of | ||||
byte-mode drop is very strongly discouraged. | ||||
From -10 to -11: Following a further WGLC: | ||||
* Abstract: clarified that advice applies to all AQMs including | ||||
newer ones | ||||
* Abstract & Intro: changed 'read' to 'detect', because you don't | ||||
read losses, you detect them. | ||||
* S.1. Introduction: Disambiguated summary of advice on queue | ||||
measurement. | ||||
* Clarified that the doc deprecates any preference based solely | ||||
on packet size, it's not only against preferring smaller | ||||
packets. | ||||
* S.4.1.2. Congestion Measurement without a Queue: Explained | ||||
that a queue of TXOPs represents a queue into spectrum | ||||
congested by too many bits. | ||||
* S.5.2: Bit- & Packet-congestible Network: Referred to | ||||
explanation in S.4.1.2 to make the point that TXOPs are not a | ||||
primary unit of workload like bits and packets are, even though | ||||
you get queues of TXOPs. | ||||
* 6. Security: Disambiguated 'bias towards'. | ||||
* 8. Conclusions: Made consistent with recommendation to use | ||||
time if possible for queue measurement. | ||||
From -09 to -10: Following IESG review: | ||||
* Updates 2309: Left header unchanged reflecting eventual IESG | ||||
consensus [Sean Turner, Pete Resnick]. | ||||
* S.1 Intro: This memo adds to the congestion control principles | ||||
enumerated in BCP 41 [Pete Resnick] | ||||
* Abstract, S.1, S.1.1, s.1.2 Intro, Scoping and Example: Made | ||||
applicability to all AQMs clearer listing some more example | ||||
AQMs and explained that we always use RED for examples, but | ||||
this doesn't mean it's not applicable to other AQMs. [A number | ||||
of reviewers have described the draft as "about RED"] | ||||
* S.1 & S.2.1 Queue measurement: Explained that the choice | ||||
between measuring the queue in packets or bytes is only | ||||
relevant if measuring it in time units is infeasible [So as not | ||||
to imply that we haven't noticed the advances made by PDPC & | ||||
CoDel] | ||||
* S.1.1. Terminology: Better explained why hybrid systems | ||||
congested by both packets and bytes are often designed to be | ||||
treated as bit-congestible [Richard Barnes]. | ||||
* S.2.1. Queue measurement advice: Added examples. Added a | ||||
counter-example to justify SHOULDs rather than MUSTs. Pointed | ||||
to S.4.1 for a list of more complicated scenarios. [Benson | ||||
Schliesser, OpsDir] | ||||
* S2.2. Recommendation on Encoding Congestion Notification: | ||||
Removed SHOULD treat packets equally, leaving only SHOULD NOT | ||||
drop dependent on packet size, to avoid it sounding like we're | ||||
saying QoS is not allowed. Pointed to possible app-specific | ||||
legacy use of byte-mode as a counter-example that prevents us | ||||
saying MUST NOT. [Pete Resnick] | ||||
* S.2.3. Recommendation on Responding to Congestion: capitalised | ||||
the two SHOULDs in recommendations for TCP, and gave possible | ||||
counter-examples. [noticed while dealing with Pete Resnick's | ||||
point] | ||||
* S2.4. Splitting & Merging: RTCP -> RTP/RTCP [Pete McCann, Gen- | ||||
ART] | ||||
* S.3.2 Small != Control: many control packets are small -> | ||||
...tend to be small [Stephen Farrell] | ||||
* S.3.1 Perverse incentives: Changed transport designers to app | ||||
developers [Stephen Farrell] | ||||
* S.4.1.1. Fixed Size Packet Buffers: Nearly completely re- | ||||
written to simplify and to reverse the advice when the | ||||
underlying resource is bit-congestible, irrespective of whether | ||||
the buffer consists of fixed-size packet buffers. [Richard | ||||
Barnes & Benson Schliesser] | ||||
* S.4.2.1.2. Packet Size Bias Regardless of AQM: Largely re- | ||||
written to reflect the earlier change in advice about fixed- | ||||
size packet buffers, and to primarily focus on getting rid of | ||||
tail-drop, not various nuances of tail-drop. [Richard Barnes & | ||||
Benson Schliesser] | ||||
* Editorial corrections [Tim Bray, AppsDir, Pete McCann, Gen-ART | ||||
and others] | ||||
* Updated refs (two I-Ds have become RFCs). [Pete McCann] | ||||
From -08 to -09: Following WG last call: | ||||
* S.2.1: Made RED-related queue measurement recommendations | ||||
clearer | ||||
* S.2.3: Added to "Recommendation on Responding to Congestion" to | ||||
make it clear that we are definitely not saying transports have | ||||
to equalise bit-rates, just how to do it and not do it, if you | ||||
want to. | ||||
* S.3: Clarified motivation sections S.3.3 "Transport-Independent | ||||
Network" and S.3.5 "Implementation Efficiency" | ||||
* S.3.4: Completely changed motivating argument from "Scaling | ||||
Congestion Control with Packet Size" to "Partial Deployment of | ||||
AQM". | ||||
From -07 to -08: | ||||
* Altered abstract to say it provides best current practice and | ||||
highlight that it updates RFC2309 | ||||
* Added null IANA section | ||||
* Updated refs | ||||
From -06 to -07: | ||||
* A mix-up with the corollaries and their naming in 2.1 to 2.3 | ||||
fixed. | ||||
From -05 to -06: | ||||
* Primarily editorial fixes. | ||||
From -04 to -05: | ||||
* Changed from Informational to BCP and highlighted non-normative | ||||
sections and appendices | ||||
* Removed language about consensus | ||||
* Added "Example Comparing Packet-Mode Drop and Byte-Mode Drop" | ||||
* Arranged "Motivating Arguments" into a more logical order and | ||||
completely rewrote "Transport-Independent Network" & "Scaling | ||||
Congestion Control with Packet Size" arguments. Removed "Why | ||||
Now?" | ||||
* Clarified applicability of certain recommendations | ||||
* Shifted vendor survey to an Appendix | ||||
* Cut down "Outstanding Issues and Next Steps" | ||||
* Re-drafted the start of the conclusions to highlight the three | ||||
distinct areas of concern | ||||
* Completely re-wrote appendices | ||||
* Editorial corrections throughout. | ||||
From -03 to -04: | ||||
* Reordered Sections 2 and 3, and some clarifications here and | ||||
there based on feedback from Colin Perkins and Mirja | ||||
Kuehlewind. | ||||
From -02 to -03 (this version) | ||||
* Structural changes: | ||||
+ Split off text at end of "Scaling Congestion Control with | ||||
Packet Size" into new section "Transport-Independent | ||||
Network" | ||||
+ Shifted "Recommendations" straight after "Motivating | ||||
Arguments" and added "Conclusions" at end to reinforce | ||||
Recommendations | ||||
+ Added more internal structure to Recommendations, so that | ||||
recommendations specific to RED or to TCP are just | ||||
corollaries of a more general recommendation, rather than | ||||
being listed as a separate recommendation. | ||||
+ Renamed "State of the Art" as "Critical Survey of Existing | ||||
Advice" and retitled a number of subsections with more | ||||
descriptive titles. | ||||
+ Split end of "Congestion Coding: Summary of Status" into a | ||||
new subsection called "RED Implementation Status". | ||||
+ Removed text that had been in the Appendix "Congestion | ||||
Notification Definition: Further Justification". | ||||
* Reordered the intro text a little. | ||||
* Made it clearer when advice being reported is deprecated and | ||||
when it is not. | ||||
* Described AQM as in network equipment, rather than saying "at | ||||
the network layer" (to side-step controversy over whether | ||||
functions like AQM are in the transport layer but in network | ||||
equipment). | ||||
* Minor improvements to clarity throughout | ||||
From -01 to -02: | ||||
* Restructured the whole document for (hopefully) easier reading | ||||
and clarity. The concrete recommendation, in RFC2119 language, | ||||
is now in Section 8. | ||||
From -00 to -01: | ||||
* Minor clarifications throughout and updated references | ||||
From briscoe-byte-pkt-mark-02 to ietf-byte-pkt-congest-00: | ||||
* Added note on relationship to existing RFCs | ||||
* Posed the question of whether packet-congestion could become | ||||
common and deferred it to the IRTF ICCRG. Added ref to the | ||||
dual-resource queue (DRQ) proposal. | ||||
* Changed PCN references from the PCN charter & architecture to | ||||
the PCN marking behaviour draft most likely to imminently | ||||
become the standards track WG item. | ||||
From -01 to -02: | ||||
* Abstract reorganised to align with clearer separation of issue | ||||
in the memo. | ||||
* Introduction reorganised with motivating arguments removed to | ||||
new Section 3. | ||||
* Clarified avoiding lock-out of large packets is not the main or | ||||
only motivation for RED. | ||||
* Mentioned choice of drop or marking explicitly throughout, | ||||
rather than trying to coin a word to mean either. | ||||
* Generalised the discussion throughout to any packet forwarding | ||||
function on any network equipment, not just routers. | ||||
* Clarified the last point about why this is a good time to sort | ||||
out this issue: because it will be hard / impossible to design | ||||
new transports unless we decide whether the network or the | ||||
transport is allowing for packet size. | ||||
* Added statement explaining the horizon of the memo is long | ||||
term, but with short term expediency in mind. | ||||
* Added material on scaling congestion control with packet size | ||||
(Section 3.4). | ||||
* Separated out issue of normalising TCP's bit rate from issue of | ||||
preference to control packets (Section 3.2). | ||||
* Divided up Congestion Measurement section for clarity, | ||||
including new material on fixed size packet buffers and buffer | ||||
carving (Section 4.1.1 & Section 4.2.1) and on congestion | ||||
measurement in wireless link technologies without queues | ||||
(Section 4.1.2). | ||||
* Added section on 'Making Transports Robust against Control | ||||
Packet Losses' (Section 4.2.3) with existing & new material | ||||
included. | ||||
* Added tabulated results of vendor survey on byte-mode drop | ||||
variant of RED (Table 3). | ||||
From -00 to -01: | ||||
* Clarified applicability to drop as well as ECN. | ||||
* Highlighted DoS vulnerability. | ||||
* Emphasised that drop-tail suffers from similar problems to | ||||
byte-mode drop, so only byte-mode drop should be turned off, | ||||
not RED itself. | ||||
* Clarified the original apparent motivations for recommending | ||||
byte-mode drop included protecting SYNs and pure ACKs more than | ||||
equalising the bit rates of TCPs with different segment sizes. | ||||
Removed some conjectured motivations. | ||||
* Added support for updates to TCP in progress (ackcc & ecn-syn- | ||||
ack). | ||||
* Updated survey results with newly arrived data. | ||||
* Pulled all recommendations together into the conclusions. | ||||
* Moved some detailed points into two additional appendices and a | ||||
note. | ||||
* Considerable clarifications throughout. | ||||
* Updated references | ||||
Authors' Addresses | Authors' Addresses | |||
Bob Briscoe | Bob Briscoe | |||
BT | BT | |||
B54/77, Adastral Park | B54/77, Adastral Park | |||
Martlesham Heath | Martlesham Heath | |||
Ipswich IP5 3RE | Ipswich IP5 3RE | |||
UK | UK | |||
End of changes. 259 change blocks. | ||||
1177 lines changed or deleted | 780 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |