draft-ietf-tsvwg-byte-pkt-congest-00.txt | draft-ietf-tsvwg-byte-pkt-congest-01.txt | |||
---|---|---|---|---|
Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
Internet-Draft BT & UCL | Internet-Draft BT | |||
Intended status: Informational August 07, 2008 | Updates: 2309 (if approved) October 23, 2009 | |||
Expires: February 8, 2009 | Intended status: Informational | |||
Expires: April 26, 2010 | ||||
Byte and Packet Congestion Notification | Byte and Packet Congestion Notification | |||
draft-ietf-tsvwg-byte-pkt-congest-00 | draft-ietf-tsvwg-byte-pkt-congest-01 | |||
Status of this Memo | Status of this Memo | |||
By submitting this Internet-Draft, each author represents that any | This Internet-Draft is submitted to IETF in full conformance with the | |||
applicable patent or other IPR claims of which he or she is aware | provisions of BCP 78 and BCP 79. | |||
have been or will be disclosed, and any of which he or she becomes | ||||
aware will be disclosed, in accordance with Section 6 of BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
Drafts. | Drafts. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on February 8, 2009. | This Internet-Draft will expire on April 26, 2010. | |||
Copyright Notice | ||||
Copyright (c) 2009 IETF Trust and the persons identified as the | ||||
document authors. All rights reserved. | ||||
This document is subject to BCP 78 and the IETF Trust's Legal | ||||
Provisions Relating to IETF Documents in effect on the date of | ||||
publication of this document (http://trustee.ietf.org/license-info). | ||||
Please review these documents carefully, as they describe your rights | ||||
and restrictions with respect to this document. | ||||
Abstract | Abstract | |||
This memo concerns dropping or marking packets using active queue | This memo concerns dropping or marking packets using active queue | |||
management (AQM) such as random early detection (RED) or pre- | management (AQM) such as random early detection (RED) or pre- | |||
congestion notification (PCN). The primary conclusion is that packet | congestion notification (PCN). The primary conclusion is that packet | |||
size should be taken into account when transports read congestion | size should be taken into account when transports read congestion | |||
indications, not when network equipment writes them. Reducing drop | indications, not when network equipment writes them. Reducing drop | |||
of small packets has some tempting advantages: i) it drops less | of small packets has some tempting advantages: i) it drops less | |||
control packets, which tend to be small and ii) it makes TCP's bit- | control packets, which tend to be small and ii) it makes TCP's bit- | |||
rate less dependent on packet size. However, there are ways of | rate less dependent on packet size. However, there are ways of | |||
addressing these issues at the transport layer, rather than reverse | addressing these issues at the transport layer, rather than reverse | |||
engineering network forwarding to fix specific transport problems. | engineering network forwarding to fix specific transport problems. | |||
Network layer algorithms like the byte-mode packet drop variant of | Network layer algorithms like the byte-mode packet drop variant of | |||
RED should not be used to drop fewer small packets, because that | RED should not be used to drop fewer small packets, because that | |||
creates a perverse incentive for transports to use tiny segments, | creates a perverse incentive for transports to use tiny segments, | |||
consequently also opening up a DoS vulnerability. | consequently also opening up a DoS vulnerability. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
2. Motivating Arguments . . . . . . . . . . . . . . . . . . . . . 8 | 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 9 | |||
2.1. Scaling Congestion Control with Packet Size . . . . . . . 8 | 2. Motivating Arguments . . . . . . . . . . . . . . . . . . . . . 9 | |||
2.1. Scaling Congestion Control with Packet Size . . . . . . . 9 | ||||
2.2. Avoiding Perverse Incentives to (ab)use Smaller Packets . 10 | 2.2. Avoiding Perverse Incentives to (ab)use Smaller Packets . 10 | |||
2.3. Small != Control . . . . . . . . . . . . . . . . . . . . . 11 | 2.3. Small != Control . . . . . . . . . . . . . . . . . . . . . 12 | |||
3. Working Definition of Congestion Notification . . . . . . . . 11 | 2.4. Implementation Efficiency . . . . . . . . . . . . . . . . 12 | |||
4. Congestion Measurement . . . . . . . . . . . . . . . . . . . . 12 | 3. Working Definition of Congestion Notification . . . . . . . . 12 | |||
4.1. Congestion Measurement by Queue Length . . . . . . . . . . 12 | 4. Congestion Measurement . . . . . . . . . . . . . . . . . . . . 13 | |||
4.1.1. Fixed Size Packet Buffers . . . . . . . . . . . . . . 12 | 4.1. Congestion Measurement by Queue Length . . . . . . . . . . 13 | |||
4.2. Congestion Measurement without a Queue . . . . . . . . . . 13 | 4.1.1. Fixed Size Packet Buffers . . . . . . . . . . . . . . 13 | |||
5. Idealised Wire Protocol Coding . . . . . . . . . . . . . . . . 14 | 4.2. Congestion Measurement without a Queue . . . . . . . . . . 14 | |||
6. The State of the Art . . . . . . . . . . . . . . . . . . . . . 15 | 5. Idealised Wire Protocol Coding . . . . . . . . . . . . . . . . 15 | |||
6.1. Congestion Measurement: Status . . . . . . . . . . . . . . 16 | 6. The State of the Art . . . . . . . . . . . . . . . . . . . . . 17 | |||
6.2. Congestion Coding: Status . . . . . . . . . . . . . . . . 17 | 6.1. Congestion Measurement: Status . . . . . . . . . . . . . . 17 | |||
6.2.1. Network Bias when Encoding . . . . . . . . . . . . . . 17 | 6.2. Congestion Coding: Status . . . . . . . . . . . . . . . . 18 | |||
6.2.2. Transport Bias when Decoding . . . . . . . . . . . . . 19 | 6.2.1. Network Bias when Encoding . . . . . . . . . . . . . . 18 | |||
6.2.2. Transport Bias when Decoding . . . . . . . . . . . . . 20 | ||||
6.2.3. Making Transports Robust against Control Packet | 6.2.3. Making Transports Robust against Control Packet | |||
Losses . . . . . . . . . . . . . . . . . . . . . . . . 20 | Losses . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
6.2.4. Congestion Coding: Summary of Status . . . . . . . . . 21 | 6.2.4. Congestion Coding: Summary of Status . . . . . . . . . 22 | |||
7. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 23 | 7. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 24 | |||
7.1. Bit-congestible World . . . . . . . . . . . . . . . . . . 23 | 7.1. Bit-congestible World . . . . . . . . . . . . . . . . . . 24 | |||
7.2. Bit- & Packet-congestible World . . . . . . . . . . . . . 23 | 7.2. Bit- & Packet-congestible World . . . . . . . . . . . . . 24 | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 24 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 25 | |||
9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 25 | 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 27 | 11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 28 | |||
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 | 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
12.1. Normative References . . . . . . . . . . . . . . . . . . . 27 | 12.1. Normative References . . . . . . . . . . . . . . . . . . . 28 | |||
12.2. Informative References . . . . . . . . . . . . . . . . . . 27 | 12.2. Informative References . . . . . . . . . . . . . . . . . . 29 | |||
Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . . | Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . . | |||
Appendix A. Example Scenarios . . . . . . . . . . . . . . . . . . 31 | Appendix A. Example Scenarios . . . . . . . . . . . . . . . . . . 32 | |||
A.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 31 | A.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 32 | |||
A.2. Bit-congestible resource, equal bit rates (Ai) . . . . . . 31 | A.2. Bit-congestible resource, equal bit rates (Ai) . . . . . . 32 | |||
A.3. Bit-congestible resource, equal packet rates (Bi) . . . . 32 | A.3. Bit-congestible resource, equal packet rates (Bi) . . . . 33 | |||
A.4. Pkt-congestible resource, equal bit rates (Aii) . . . . . 33 | A.4. Pkt-congestible resource, equal bit rates (Aii) . . . . . 34 | |||
A.5. Pkt-congestible resource, equal packet rates (Bii) . . . . 34 | A.5. Pkt-congestible resource, equal packet rates (Bii) . . . . 35 | |||
Appendix B. Congestion Notification Definition: Further | Appendix B. Congestion Notification Definition: Further | |||
Justification . . . . . . . . . . . . . . . . . . . . 34 | Justification . . . . . . . . . . . . . . . . . . . . 35 | |||
Appendix C. Byte-mode Drop Complicates Policing Congestion | Appendix C. Byte-mode Drop Complicates Policing Congestion | |||
Response . . . . . . . . . . . . . . . . . . . . . . 35 | Response . . . . . . . . . . . . . . . . . . . . . . 36 | |||
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 36 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 37 | |||
Intellectual Property and Copyright Statements . . . . . . . . . . 37 | ||||
Relationship to existing RFCs | ||||
To be removed by the RFC Editor on publication (with appropriate | ||||
changes to the 'Updates:' header and the RFC Index as appropriate). | ||||
This memo intends to update RFC2309, which stated an interim view but | ||||
requested that further research was needed on this topic. | ||||
Changes from Previous Versions | Changes from Previous Versions | |||
To be removed by the RFC Editor on publication. | To be removed by the RFC Editor on publication. | |||
Full incremental diffs between each version are available at | Full incremental diffs between each version are available at | |||
<http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#byte-pkt-congest> | <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#byte-pkt-congest> | |||
or | or | |||
<http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-byte-pkt-congest/> | <http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-byte-pkt-congest/> | |||
(courtesy of the rfcdiff tool): | (courtesy of the rfcdiff tool): | |||
From briscoe-byte-pkt-mark-02 to ietf-byte-pkt-congest-00 (this | From -00 to -01 (this version): | |||
version): | ||||
Added note on relationship to existing RFCs | * Minor clarifications throughout and updated references | |||
Posed the question of whether packet-congestion could become | From briscoe-byte-pkt-mark-02 to ietf-byte-pkt-congest-00: | |||
* Added note on relationship to existing RFCs | ||||
* Posed the question of whether packet-congestion could become | ||||
common and deferred it to the IRTF ICCRG. Added ref to the | common and deferred it to the IRTF ICCRG. Added ref to the | |||
dual-resource queue (DRQ) proposal. | dual-resource queue (DRQ) proposal. | |||
Changed PCN references from the PCN charter & architecture to | * Changed PCN references from the PCN charter & architecture to | |||
the PCN marking behaviour draft most likely to imminently | the PCN marking behaviour draft most likely to imminently | |||
become the standards track WG item. | become the standards track WG item. | |||
From -01 to -02: | From -01 to -02: | |||
Abstract reorganised to align with clearer separation of issue | * Abstract reorganised to align with clearer separation of issue | |||
in the memo. | in the memo. | |||
Introduction reorganised with motivating arguments removed to | * Introduction reorganised with motivating arguments removed to | |||
new Section 2. | new Section 2. | |||
Clarified avoiding lock-out of large packets is not the main or | * Clarified avoiding lock-out of large packets is not the main or | |||
only motivation for RED. | only motivation for RED. | |||
Mentioned choice of drop or marking explicitly throughout, | * Mentioned choice of drop or marking explicitly throughout, | |||
rather than trying to coin a word to mean either. | rather than trying to coin a word to mean either. | |||
Generalised the discussion throughout to any packet forwarding | * Generalised the discussion throughout to any packet forwarding | |||
function on any network equipment, not just routers. | function on any network equipment, not just routers. | |||
Clarified the last point about why this is a good time to sort | * Clarified the last point about why this is a good time to sort | |||
out this issue: because it will be hard / impossible to design | out this issue: because it will be hard / impossible to design | |||
new transports unless we decide whether the network or the | new transports unless we decide whether the network or the | |||
transport is allowing for packet size. | transport is allowing for packet size. | |||
Added statement explaining the horizon of the memo is long | * Added statement explaining the horizon of the memo is long | |||
term, but with short term expediency in mind. | term, but with short term expediency in mind. | |||
Added material on scaling congestion control with packet size | * Added material on scaling congestion control with packet size | |||
(Section 2.1). | (Section 2.1). | |||
Separated out issue of normalising TCP's bit rate from issue of | * Separated out issue of normalising TCP's bit rate from issue of | |||
preference to control packets (Section 2.3). | preference to control packets (Section 2.3). | |||
Divided up Congestion Measurement section for clarity, | * Divided up Congestion Measurement section for clarity, | |||
including new material on fixed size packet buffers and buffer | including new material on fixed size packet buffers and buffer | |||
carving (Section 4.1.1 & Section 6.2.1) and on congestion | carving (Section 4.1.1 & Section 6.2.1) and on congestion | |||
measurement in wireless link technologies without queues | measurement in wireless link technologies without queues | |||
(Section 4.2). | (Section 4.2). | |||
Added section on 'Making Transports Robust against Control | * Added section on 'Making Transports Robust against Control | |||
Packet Losses' (Section 6.2.3) with existing & new material | Packet Losses' (Section 6.2.3) with existing & new material | |||
included. | included. | |||
Added tabulated results of vendor survey on byte-mode drop | * Added tabulated results of vendor survey on byte-mode drop | |||
variant of RED (Table 2). | variant of RED (Table 2). | |||
* | ||||
From -00 to -01: | From -00 to -01: | |||
Clarified applicability to drop as well as ECN. | * Clarified applicability to drop as well as ECN. | |||
Highlighted DoS vulnerability. | * Highlighted DoS vulnerability. | |||
Emphasised that drop-tail suffers from similar problems to | * Emphasised that drop-tail suffers from similar problems to | |||
byte-mode drop, so only byte-mode drop should be turned off, | byte-mode drop, so only byte-mode drop should be turned off, | |||
not RED itself. | not RED itself. | |||
Clarified the original apparent motivations for recommending | * Clarified the original apparent motivations for recommending | |||
byte-mode drop included protecting SYNs and pure ACKs more than | byte-mode drop included protecting SYNs and pure ACKs more than | |||
equalising the bit rates of TCPs with different segment sizes. | equalising the bit rates of TCPs with different segment sizes. | |||
Removed some conjectured motivations. | Removed some conjectured motivations. | |||
Added support for updates to TCP in progress (ackcc & ecn-syn- | * Added support for updates to TCP in progress (ackcc & ecn-syn- | |||
ack). | ack). | |||
Updated survey results with newly arrived data. | * Updated survey results with newly arrived data. | |||
Pulled all recommendations together into the conclusions. | * Pulled all recommendations together into the conclusions. | |||
Moved some detailed points into two additional appendices and a | * Moved some detailed points into two additional appendices and a | |||
note. | note. | |||
Considerable clarifications throughout. | * Considerable clarifications throughout. | |||
Updated references | ||||
Requirements notation | ||||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | * Updated references | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||
document are to be interpreted as described in [RFC2119]. | ||||
1. Introduction | 1. Introduction | |||
When notifying congestion, the problem of how (and whether) to take | When notifying congestion, the problem of how (and whether) to take | |||
packet sizes into account has exercised the minds of researchers and | packet sizes into account has exercised the minds of researchers and | |||
practitioners for as long as active queue management (AQM) has been | practitioners for as long as active queue management (AQM) has been | |||
discussed. Indeed, one reason AQM was originally introduced was to | discussed. Indeed, one reason AQM was originally introduced was to | |||
reduce the lock-out effects that small packets can have on large | reduce the lock-out effects that small packets can have on large | |||
packets in drop-tail queues. This memo aims to state the principles | packets in drop-tail queues. This memo aims to state the principles | |||
we should be using and to come to conclusions on what these | we should be using and to come to conclusions on what these | |||
principles will mean for future protocol design, taking into account | principles will mean for future protocol design, taking into account | |||
the deployments we have already. | the deployments we have already. | |||
Note that the byte vs. packet dilemma concerns congestion | Note that the byte vs. packet dilemma concerns congestion | |||
notification irrespective of whether it is signalled implicitly by | notification irrespective of whether it is signalled implicitly by | |||
drop or using explicit congestion notification (ECN [RFC3168] or PCN | drop or using explicit congestion notification (ECN [RFC3168] or PCN | |||
[I-D.eardley-pcn-marking-behaviour]). Throughout this document, | [I-D.ietf-pcn-marking-behaviour]). Throughout this document, unless | |||
unless clear from the context, the term marking will be used to mean | clear from the context, the term marking will be used to mean | |||
notifying congestion explicitly, while congestion notification will | notifying congestion explicitly, while congestion notification will | |||
be used to mean notifying congestion either implicitly by drop or | be used to mean notifying congestion either implicitly by drop or | |||
explicitly by marking. | explicitly by marking. | |||
If the load on a resource depends on the rate at which packets | If the load on a resource depends on the rate at which packets | |||
arrive, it is called packet-congestible. If the load depends on the | arrive, it is called packet-congestible. If the load depends on the | |||
rate at which bits arrive it is called bit-congestible. | rate at which bits arrive it is called bit-congestible. | |||
Examples of packet-congestible resources are route look-up engines | Examples of packet-congestible resources are route look-up engines | |||
and firewalls, because load depends on how many packet headers they | and firewalls, because load depends on how many packet headers they | |||
have to process. Examples of bit-congestible resources are | have to process. Examples of bit-congestible resources are | |||
transmission links, and most buffer memory, because the load depends | transmission links, radio power and most buffer memory, because the | |||
on how many bits they have to transmit or store. Some machine | load depends on how many bits they have to transmit or store. Some | |||
architectures use fixed size packet buffers, so buffer memory in | machine architectures use fixed size packet buffers, so buffer memory | |||
these cases is packet-congestible (see Section 4.1.1). | in these cases is packet-congestible (see Section 4.1.1). | |||
Note that information is generally processed or transmitted with a | Note that information is generally processed or transmitted with a | |||
minimum granularity greater than a bit (e.g. octets). The | minimum granularity greater than a bit (e.g. octets). The | |||
appropriate granularity for the resource in question SHOULD be used, | appropriate granularity for the resource in question SHOULD be used, | |||
but for the sake of brevity we will talk in terms of bytes in this | but for the sake of brevity we will talk in terms of bytes in this | |||
memo. | memo. | |||
Resources may be congestible at higher levels of granularity than | Resources may be congestible at higher levels of granularity than | |||
packets, for instance stateful firewalls are flow-congestible and | packets, for instance stateful firewalls are flow-congestible and | |||
call-servers are session-congestible. This memo focuses on | call-servers are session-congestible. This memo focuses on | |||
congestion of connectionless resources, but the same principles may | congestion of connectionless resources, but the same principles may | |||
be applied for congestion notification protocols controlling per-flow | be applicable for congestion notification protocols controlling per- | |||
and per-session processing or state. | flow and per-session processing or state. | |||
The byte vs. packet dilemma arises at three stages in the congestion | The byte vs. packet dilemma arises at three stages in the congestion | |||
notification process: | notification process: | |||
Measuring congestion When the congested resource decides locally how | Measuring congestion When the congested resource decides locally how | |||
to measure how congested it is. (Should the queue be measured in | to measure how congested it is. (Should the queue be measured in | |||
bytes or packets?); | bytes or packets?); | |||
Coding congestion notification into the wire protocol: When the | Coding congestion notification into the wire protocol: When the | |||
congested resource decides how to notify the level of congestion. | congested resource decides how to notify the level of congestion. | |||
skipping to change at page 7, line 8 | skipping to change at page 7, line 40 | |||
decodes congestion notification. In RED, the variant that reduces | decodes congestion notification. In RED, the variant that reduces | |||
drop probability for packets based on their size in bytes is called | drop probability for packets based on their size in bytes is called | |||
byte-mode drop, while the variant that doesn't is called packet mode | byte-mode drop, while the variant that doesn't is called packet mode | |||
drop. Whether queues are measured in bytes or packets is an | drop. Whether queues are measured in bytes or packets is an | |||
orthogonal choice, termed byte-mode queue measurement or packet-mode | orthogonal choice, termed byte-mode queue measurement or packet-mode | |||
queue measurement. | queue measurement. | |||
Currently, the RFC series is silent on this matter other than a paper | Currently, the RFC series is silent on this matter other than a paper | |||
trail of advice referenced from [RFC2309], which conditionally | trail of advice referenced from [RFC2309], which conditionally | |||
recommends byte-mode (packet-size dependent) drop [pktByteEmail]. | recommends byte-mode (packet-size dependent) drop [pktByteEmail]. | |||
However, all the implementers who responded to our survey have not | However, all the implementers who responded to our survey | |||
followed this advice. The primary purpose of this memo is to build a | (Section 6.2.4) have not followed this advice. The primary purpose | |||
definitive consensus against deliberate preferential treatment for | of this memo is to build a definitive consensus against deliberate | |||
small packets in AQM algorithms and to record this advice within the | preferential treatment for small packets in AQM algorithms and to | |||
RFC series. | record this advice within the RFC series. | |||
Now is a good time to discuss whether fairness between different | Now is a good time to discuss whether fairness between different | |||
sized packets would best be implemented in the network layer, or at | sized packets would best be implemented in the network layer, or at | |||
the transport, for a number of reasons: | the transport, for a number of reasons: | |||
1. The packet vs. byte issue requires speedy resolution because the | 1. The packet vs. byte issue requires speedy resolution because the | |||
IETF pre-congestion notification (PCN) working group is about to | IETF pre-congestion notification (PCN) working group is about to | |||
standardise the external behaviour of a PCN congestion | standardise the external behaviour of a PCN congestion | |||
notification (AQM) algorithm [I-D.eardley-pcn-marking-behaviour]; | notification (AQM) algorithm [I-D.ietf-pcn-marking-behaviour]; | |||
2. [RFC2309] says RED may either take account of packet size or not | 2. [RFC2309] says RED may either take account of packet size or not | |||
when dropping, but gives no recommendation between the two, | when dropping, but gives no recommendation between the two, | |||
referring instead to advice on the performance implications in an | referring instead to advice on the performance implications in an | |||
email [pktByteEmail], which recommends byte-mode drop. Further, | email [pktByteEmail], which recommends byte-mode drop. Further, | |||
just before RFC2309 was issued, an addendum was added to the | just before RFC2309 was issued, an addendum was added to the | |||
archived email that revisited the issue of packet vs. byte-mode | archived email that revisited the issue of packet vs. byte-mode | |||
drop in its last para, making the recommendation less clear-cut; | drop in its last para, making the recommendation less clear-cut; | |||
3. Without the present memo, the only advice in the RFC series on | 3. Without the present memo, the only advice in the RFC series on | |||
skipping to change at page 8, line 40 | skipping to change at page 9, line 25 | |||
congestion notification in Section 3 then determining the correct way | congestion notification in Section 3 then determining the correct way | |||
to measure congestion (Section 4) and to design an idealised | to measure congestion (Section 4) and to design an idealised | |||
congestion notification protocol (Section 5). It then surveys the | congestion notification protocol (Section 5). It then surveys the | |||
advice given previously in the RFC series, the research literature | advice given previously in the RFC series, the research literature | |||
and the deployed legacy (Section 6) before listing outstanding issues | and the deployed legacy (Section 6) before listing outstanding issues | |||
(Section 7) that will need resolution both to achieve the ideal | (Section 7) that will need resolution both to achieve the ideal | |||
protocol and to handle legacy. After discussing security | protocol and to handle legacy. After discussing security | |||
considerations (Section 8) strong recommendations for the way forward | considerations (Section 8) strong recommendations for the way forward | |||
are given in the conclusions (Section 9). | are given in the conclusions (Section 9). | |||
1.1. Requirements Notation | ||||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||
document are to be interpreted as described in [RFC2119]. | ||||
2. Motivating Arguments | 2. Motivating Arguments | |||
2.1. Scaling Congestion Control with Packet Size | 2.1. Scaling Congestion Control with Packet Size | |||
There are two ways of interpreting a dropped or marked packet. It | There are two ways of interpreting a dropped or marked packet. It | |||
can either be considered as a single loss event or as loss/marking of | can either be considered as a single loss event or as loss/marking of | |||
the bytes in the packet. Here we try to design a test to see which | the bytes in the packet. Here we try to design a test to see which | |||
approach scales with packet size. | approach scales with packet size. | |||
Imagine a bit-congestible link shared by many flows, so that each | Given bit-congestible is the more common case, consider a bit- | |||
busy period tends to cause packets to be lost from different flows. | congestible link shared by many flows, so that each busy period tends | |||
to cause packets to be lost from different flows. The test compares | ||||
The test compares two identical scenarios with the same applications, | two identical scenarios with the same applications, the same numbers | |||
the same numbers of sources and the same load. But the sources break | of sources and the same load. But the sources break the load into | |||
the load into large packets in one scenario and small packets in the | large packets in one scenario and small packets in the other. Of | |||
other. Of course, because the load is the same, there will be | course, because the load is the same, there will be proportionately | |||
proportionately more packets in the small packet case. | more packets in the small packet case. | |||
The test of whether a congestion control scales with packet size is | The test of whether a congestion control scales with packet size is | |||
that it should respond in the same way to the same congestion | that it should respond in the same way to the same congestion | |||
excursion, irrespective of the size of the packets that the bytes | excursion, irrespective of the size of the packets that the bytes | |||
causing congestion happen to be broken down into. | causing congestion happen to be broken down into. | |||
A bit-congestible queue suffering a congestion excursion has to drop | A bit-congestible queue suffering a congestion excursion has to drop | |||
or mark the same excess bytes whether they are in a few large packets | or mark the same excess bytes whether they are in a few large packets | |||
or many small packets. So for the same congestion excursion, the | or many small packets. So for the same congestion excursion, the | |||
same amount of bytes have to be shed to get the load back to its | same amount of bytes have to be shed to get the load back to its | |||
operating point. But, of course, for smaller packets more packets | operating point. But, of course, for smaller packets more packets | |||
will have to be discarded to shed the same bytes. | will have to be discarded to shed the same bytes. | |||
If all the transports interpret each drop/mark as a single loss event | If all the transports interpret each drop/mark as a single loss event | |||
irrespective of the size of the packet dropped, they will respond | irrespective of the size of the packet dropped, those with smaller | |||
more to the same congestion excursion, failing our test. On the | packets will respond more to the same congestion excursion, failing | |||
other hand, if they respond proportionately less when smaller packets | our test. On the other hand, if they respond proportionately less | |||
are dropped/marked, overall they will be able to respond the same to | when smaller packets are dropped/marked, overall they will be able to | |||
the same congestion excursion. | respond the same to the same congestion excursion. | |||
Therefore, for a congestion control to scale with packet size it | Therefore, for a congestion control to scale with packet size it | |||
should respond to dropped or marked bytes (as TFRC-SP [RFC4828] | should respond to dropped or marked bytes (as TFRC-SP [RFC4828] | |||
effectively does), not just to dropped or marked packets irrespective | effectively does), not just to dropped or marked packets irrespective | |||
of packet size (as TCP does). | of packet size (as TCP does). | |||
The email [pktByteEmail] referred to by RFC2309 says the question of | The email [pktByteEmail] referred to by RFC2309 says the question of | |||
whether a packet's own size should affect its drop probability | whether a packet's own size should affect its drop probability | |||
"depends on the dominant end-to-end congestion control mechanisms". | "depends on the dominant end-to-end congestion control mechanisms". | |||
But we argue the network layer should not be optimised for whatever | But we argue the network layer should not be optimised for whatever | |||
skipping to change at page 10, line 23 | skipping to change at page 11, line 15 | |||
as fewer larger packets or more smaller packets. A protocol design | as fewer larger packets or more smaller packets. A protocol design | |||
that caused larger packets to be more likely to be dropped than | that caused larger packets to be more likely to be dropped than | |||
smaller ones would be dangerous in this case: | smaller ones would be dangerous in this case: | |||
Malicious transports: A queue that gives an advantage to small | Malicious transports: A queue that gives an advantage to small | |||
packets can be used to amplify the force of a flooding attack. By | packets can be used to amplify the force of a flooding attack. By | |||
sending a flood of small packets, the attacker can get the queue | sending a flood of small packets, the attacker can get the queue | |||
to discard more traffic in large packets, allowing more attack | to discard more traffic in large packets, allowing more attack | |||
traffic to get through to cause further damage. Such a queue | traffic to get through to cause further damage. Such a queue | |||
allows attack traffic to have a disproportionately large effect on | allows attack traffic to have a disproportionately large effect on | |||
regular traffic without the attacker having to do much work. The | regular traffic without the attacker having to do much work. | |||
byte-mode drop variant of RED amplifies small packet attacks. | ||||
Drop-tail queues amplify small packet attacks even more than RED | Note that, although the byte-mode drop variant of RED amplifies | |||
byte-mode drop (see the Security Considerations section | small packet attacks, drop-tail queues amplify small packet | |||
Section 8). Wherever possible neither should be used. | attacks even more (see Security Considerations in Section 8). | |||
Wherever possible neither should be used. | ||||
Normal transports: Even if a transport is not malicious, if it finds | Normal transports: Even if a transport is not malicious, if it finds | |||
small packets go faster, it will tend to act in its own interest | small packets go faster, it will tend to act in its own interest | |||
and use them. Queues that give advantage to small packets create | and use them. Queues that give advantage to small packets create | |||
an evolutionary pressure for transports to send at the same bit- | an evolutionary pressure for transports to send at the same bit- | |||
rate but break their data stream down into tiny segments to reduce | rate but break their data stream down into tiny segments to reduce | |||
their drop rate. Encouraging a high volume of tiny packets might | their drop rate. Encouraging a high volume of tiny packets might | |||
in turn unnecessarily overload a completely unrelated part of the | in turn unnecessarily overload a completely unrelated part of the | |||
system, perhaps more limited by header-processing than bandwidth. | system, perhaps more limited by header-processing than bandwidth. | |||
Imagine two flows arrive at a bit-congestible transmission link each | Imagine two unresponsive flows arrive at a bit-congestible | |||
with the same bit rate, say 1Mbps, but one consists of 1500B and the | transmission link each with the same bit rate, say 1Mbps, but one | |||
other 60B packets, which are 25x smaller. Consider a scenario where | consists of 1500B and the other 60B packets, which are 25x smaller. | |||
gentle RED [gentle_RED] is used, along with the variant of RED we | Consider a scenario where gentle RED [gentle_RED] is used, along with | |||
advise against, i.e. where the RED algorithm is configured to adjust | the variant of RED we advise against, i.e. where the RED algorithm is | |||
the drop probability of packets in proportion to each packet's size | configured to adjust the drop probability of packets in proportion to | |||
(byte mode packet drop). In this case, if RED drops 25% of the | each packet's size (byte mode packet drop). In this case, if RED | |||
larger packets, it will aim to drop 1% of the smaller packets (but in | drops 25% of the larger packets, it will aim to drop 1% of the | |||
practice it may drop more as congestion increases | smaller packets (but in practice it may drop more as congestion | |||
[RFC4828](S.B.4)[Note_Variation]). Even though both flows arrive | increases [RFC4828](S.B.4)[Note_Variation]). Even though both flows | |||
with the same bit rate, the bit rate the RED queue aims to pass to | arrive with the same bit rate, the bit rate the RED queue aims to | |||
the line will be 750k for the flow of larger packet but 990k for the | pass to the line will be 750k for the flow of larger packet but 990k | |||
smaller packets (but because of rate variation it will be less than | for the smaller packets (but because of rate variation it will be | |||
this target). It can be seen that this behaviour reopens the same | less than this target). | |||
denial of service vulnerability that drop tail queues offer to floods | ||||
of small packet, though not necessarily as strongly (see Section 8). | It can be seen that this behaviour reopens the same denial of service | |||
vulnerability that drop tail queues offer to floods of small packet, | ||||
though not necessarily as strongly (see Section 8). | ||||
2.3. Small != Control | 2.3. Small != Control | |||
It is tempting to drop small packets with lower probability to | It is tempting to drop small packets with lower probability to | |||
improve performance, because many control packets are small (TCP SYNs | improve performance, because many control packets are small (TCP SYNs | |||
& ACKs, DNS queries & responses, SIP messages, HTTP GETs, etc) and | & ACKs, DNS queries & responses, SIP messages, HTTP GETs, etc) and | |||
dropping fewer control packets considerably improves performance. | dropping fewer control packets considerably improves performance. | |||
However, we must not give control packets preference purely by virtue | However, we must not give control packets preference purely by virtue | |||
of their smallness, otherwise it is too easy for any data source to | of their smallness, otherwise it is too easy for any data source to | |||
get the same preferential treatment simply by sending data in smaller | get the same preferential treatment simply by sending data in smaller | |||
skipping to change at page 11, line 28 | skipping to change at page 12, line 26 | |||
intend. | intend. | |||
Just because many control packets are small does not mean all small | Just because many control packets are small does not mean all small | |||
packets are control packets. | packets are control packets. | |||
So again, rather than fix these problems in the network layer, we | So again, rather than fix these problems in the network layer, we | |||
argue that the transport should be made more robust against losses of | argue that the transport should be made more robust against losses of | |||
control packets (see 'Making Transports Robust against Control Packet | control packets (see 'Making Transports Robust against Control Packet | |||
Losses' in Section 6.2.3). | Losses' in Section 6.2.3). | |||
2.4. Implementation Efficiency | ||||
Allowing for packet size at the transport rather than in the network | ||||
ensures that neither the network nor the transport needs to do a | ||||
multiply operation--multiplication by packet size is effectively | ||||
achieved as a repeated add when the transport adds to its count of | ||||
marked bytes as each congestion event is fed to it. This isn't a | ||||
principled reason in itself, but it is a happy consequence of the | ||||
other principled reasons. | ||||
3. Working Definition of Congestion Notification | 3. Working Definition of Congestion Notification | |||
Rather than aim to achieve what many have tried and failed, this memo | Rather than aim to achieve what many have tried and failed, this memo | |||
will not try to define congestion. It will give a working definition | will not try to define congestion. It will give a working definition | |||
of what congestion notification should be taken to mean for this | of what congestion notification should be taken to mean for this | |||
document. Congestion notification is a changing signal that aims to | document. Congestion notification is a changing signal that aims to | |||
communicate the ratio E/L, where E is the instantaneous excess load | communicate the ratio E/L, where E is the instantaneous excess load | |||
offered to a resource that it cannot (or would not) serve and L is | offered to a resource that it cannot (or would not) serve and L is | |||
the instantaneous offered load. | the instantaneous offered load. | |||
The phrase `would not serve' is added, because AQM systems (e.g. | The phrase `would not serve' is added, because AQM systems (e.g. | |||
RED, PCN [I-D.eardley-pcn-marking-behaviour]) use a virtual capacity | RED, PCN [I-D.ietf-pcn-marking-behaviour]) use a virtual capacity | |||
smaller than actual capacity, then notify congestion of this virtual | smaller than actual capacity, then notify congestion of this virtual | |||
capacity in order to avoid congestion of the actual capacity. | capacity in order to avoid congestion of the actual capacity. | |||
Note that the denominator is offered load, not capacity. Therefore | Note that the denominator is offered load, not capacity. Therefore | |||
congestion notification is a real number bounded by the range [0,1]. | congestion notification is a real number bounded by the range [0,1]. | |||
This ties in with the most well-understood form of congestion | This ties in with the most well-understood measure of congestion | |||
notification: drop rate. It also means that congestion has a natural | notification: drop fraction (often loosely called loss rate). It | |||
interpretation as a probability; the probability of offered traffic | also means that congestion has a natural interpretation as a | |||
not being served (or being marked as at risk of not being served). | probability; the probability of offered traffic not being served (or | |||
Appendix B describes a further incidental benefit that arises from | being marked as at risk of not being served). Appendix B describes a | |||
using load as the denominator of congestion notification. | further incidental benefit that arises from using load as the | |||
denominator of congestion notification. | ||||
4. Congestion Measurement | 4. Congestion Measurement | |||
4.1. Congestion Measurement by Queue Length | 4.1. Congestion Measurement by Queue Length | |||
Queue length is usually the most correct and simplest way to measure | Queue length is usually the most correct and simplest way to measure | |||
congestion of a resource. To avoid the pathological effects of drop | congestion of a resource. To avoid the pathological effects of drop | |||
tail, an AQM function can then be used to transform queue length into | tail, an AQM function can then be used to transform queue length into | |||
the probability of dropping or marking a packet (e.g. RED's | the probability of dropping or marking a packet (e.g. RED's | |||
piecewise linear function between thresholds). If the resource is | piecewise linear function between thresholds). If the resource is | |||
skipping to change at page 14, line 42 | skipping to change at page 15, line 50 | |||
case bit rates with minimum packet sizes. Therefore, packet- | case bit rates with minimum packet sizes. Therefore, packet- | |||
congestion is currently rare, but there is no guarantee that it will | congestion is currently rare, but there is no guarantee that it will | |||
not become common with future technology trends. | not become common with future technology trends. | |||
The idealised wire protocol is given below. It accounts for packet | The idealised wire protocol is given below. It accounts for packet | |||
sizes at the transport layer, not in the network, and then only in | sizes at the transport layer, not in the network, and then only in | |||
the case of bit-congestible resources. This avoids the perverse | the case of bit-congestible resources. This avoids the perverse | |||
incentive to send smaller packets and the DoS vulnerability that | incentive to send smaller packets and the DoS vulnerability that | |||
would otherwise result if the network were to bias towards them (see | would otherwise result if the network were to bias towards them (see | |||
the motivating argument about avoiding perverse incentives in | the motivating argument about avoiding perverse incentives in | |||
Section 2.2). Incidentally, it also ensures neither the network nor | Section 2.2): | |||
the transport needs to do a multiply operation--multiplication by | ||||
packet size is effectively achieved as a repeated add when the | ||||
transport adds to its count of marked bytes as each congestion event | ||||
is fed to it: | ||||
o A packet-congestible resource trying to code congestion level p_p | 1. A packet-congestible resource trying to code congestion level p_p | |||
into a packet stream should mark the idealised `packet congestion' | into a packet stream should mark the idealised `packet | |||
field in each packet with probability p_p irrespective of the | congestion' field in each packet with probability p_p | |||
packet's size. The transport should then take a packet with the | irrespective of the packet's size. The transport should then | |||
packet congestion field marked to mean just one mark, irrespective | take a packet with the packet congestion field marked to mean | |||
of the packet size. | just one mark, irrespective of the packet size. | |||
o A bit-congestible resource trying to code time-varying byte- | 2. A bit-congestible resource trying to code time-varying byte- | |||
congestion level p_b into a packet stream should mark the `byte | congestion level p_b into a packet stream should mark the `byte | |||
congestion' field in each packet with probability p_b, again | congestion' field in each packet with probability p_b, again | |||
irrespective of the packet's size. Unlike before, the transport | irrespective of the packet's size. Unlike before, the transport | |||
should take a packet with the byte congestion field marked to | should take a packet with the byte congestion field marked to | |||
count as a mark on each byte in the packet. | count as a mark on each byte in the packet. | |||
The worked examples in Appendix A show that transports can extract | The worked examples in Appendix A show that transports can extract | |||
sufficient and correct congestion notification from these protocols | sufficient and correct congestion notification from these protocols | |||
for cases when two flows with different packet sizes have matching | for cases when two flows with different packet sizes have matching | |||
bit rates or matching packet rates. Examples are also given that mix | bit rates or matching packet rates. Examples are also given that mix | |||
these two flows into one to show that a flow with mixed packet sizes | these two flows into one to show that a flow with mixed packet sizes | |||
would still be able to extract sufficient and correct information. | would still be able to extract sufficient and correct information. | |||
Sufficient and correct congestion information means that there is | Sufficient and correct congestion information means that there is | |||
sufficient information for the two different types of transport | sufficient information for the two different types of transport | |||
requirements: | requirements: | |||
Ratio-based: Established transport congestion controls like TCP's | Ratio-based: Established transport congestion controls like TCP's | |||
[RFC2581] aim to achieve equal segment rates per RTT through the | [RFC5681] aim to achieve equal segment rates per RTT through the | |||
same bottleneck--TCP friendliness [RFC3448]. They work with the | same bottleneck--TCP friendliness [RFC3448]. They work with the | |||
ratio of dropped to delivered segments (or marked to unmarked | ratio of dropped to delivered segments (or marked to unmarked | |||
segments in the case of ECN). The example scenarios show that | segments in the case of ECN). The example scenarios show that | |||
these ratio-based transports are effectively the same whether | these ratio-based transports are effectively the same whether | |||
counting in bytes or packets, because the units cancel out. | counting in bytes or packets, because the units cancel out. | |||
(Incidentally, this is why TCP's bit rate is still proportional to | (Incidentally, this is why TCP's bit rate is still proportional to | |||
packet size even when byte-counting is used, as recommended for | packet size even when byte-counting is used, as recommended for | |||
TCP in [I-D.ietf-tcpm-rfc2581bis], mainly for orthogonal security | TCP in [RFC5681], mainly for orthogonal security reasons.) | |||
reasons.) | ||||
Absolute-target-based: Other congestion controls proposed in the | Absolute-target-based: Other congestion controls proposed in the | |||
research community aim to limit the volume of congestion caused to | research community aim to limit the volume of congestion caused to | |||
a constant weight parameter. [MulTCP][WindowPropFair] are | a constant weight parameter. [MulTCP][WindowPropFair] are | |||
examples of weighted proportionally fair transports designed for | examples of weighted proportionally fair transports designed for | |||
cost-fair environments [Rate_fair_Dis]. In this case, the | cost-fair environments [Rate_fair_Dis]. In this case, the | |||
transport requires a count (not a ratio) of dropped/marked bytes | transport requires a count (not a ratio) of dropped/marked bytes | |||
in the bit-congestible case and of dropped/marked packets in the | in the bit-congestible case and of dropped/marked packets in the | |||
packet congestible case. | packet congestible case. | |||
skipping to change at page 20, line 7 | skipping to change at page 21, line 11 | |||
The paper originally proposing TFRC with virtual packets (VP-TFRC) | The paper originally proposing TFRC with virtual packets (VP-TFRC) | |||
[CCvarPktSize] proposed that there should perhaps be two variants to | [CCvarPktSize] proposed that there should perhaps be two variants to | |||
cater for the different variants of RED. However, as the TFRC-SP | cater for the different variants of RED. However, as the TFRC-SP | |||
authors point out, there is no way for a transport to know whether | authors point out, there is no way for a transport to know whether | |||
some queues on its path have deployed RED with byte-mode packet drop | some queues on its path have deployed RED with byte-mode packet drop | |||
(except if an exhaustive survey found that no-one has deployed it!-- | (except if an exhaustive survey found that no-one has deployed it!-- | |||
see Section 6.2.4). Incidentally, VP-TFRC also proposed that byte- | see Section 6.2.4). Incidentally, VP-TFRC also proposed that byte- | |||
mode RED dropping should really square the packet size compensation | mode RED dropping should really square the packet size compensation | |||
factor (like that of RED_5, but apparently unaware of it). | factor (like that of RED_5, but apparently unaware of it). | |||
Pre-congestion notification [I-D.eardley-pcn-marking-behaviour] is a | Pre-congestion notification [I-D.ietf-pcn-marking-behaviour] is a | |||
proposal to use a virtual queue for AQM marking for packets within | proposal to use a virtual queue for AQM marking for packets within | |||
one Diffserv class in order to give early warning prior to any real | one Diffserv class in order to give early warning prior to any real | |||
queuing. The proposed PCN marking algorithms have been designed not | queuing. The proposed PCN marking algorithms have been designed not | |||
to take account of packet size when forwarding through queues. | to take account of packet size when forwarding through queues. | |||
Instead the general principle has been to take account of the sizes | Instead the general principle has been to take account of the sizes | |||
of marked packets when monitoring the fraction of marking at the edge | of marked packets when monitoring the fraction of marking at the edge | |||
of the network. | of the network. | |||
6.2.3. Making Transports Robust against Control Packet Losses | 6.2.3. Making Transports Robust against Control Packet Losses | |||
skipping to change at page 20, line 32 | skipping to change at page 21, line 36 | |||
small packets. We argue here that these two proposals are a safer | small packets. We argue here that these two proposals are a safer | |||
and more principled way to achieve TCP performance improvements than | and more principled way to achieve TCP performance improvements than | |||
reverse engineering RED to benefit TCP. | reverse engineering RED to benefit TCP. | |||
Although no proposals exist as far as we know, it would also be | Although no proposals exist as far as we know, it would also be | |||
possible and perfectly valid to make control packets robust against | possible and perfectly valid to make control packets robust against | |||
drop by explicitly requesting a lower drop probability using their | drop by explicitly requesting a lower drop probability using their | |||
Diffserv code point [RFC2474] to request a scheduling class with | Diffserv code point [RFC2474] to request a scheduling class with | |||
lower drop. | lower drop. | |||
The re-ECN protocol proposal [Re-TCP] is designed so that transports | The re-ECN protocol proposal [I-D.briscoe-tsvwg-re-ecn-tcp] is | |||
can be made more robust against losing control packets. It gives | designed so that transports can be made more robust against losing | |||
queues an incentive to optionally give preference against drop to | control packets. It gives queues an incentive to optionally give | |||
packets with the 'feedback not established' codepoint in the proposed | preference against drop to packets with the 'feedback not | |||
'extended ECN' field. Senders have incentives to use this codepoint | established' codepoint in the proposed 'extended ECN' field. Senders | |||
sparingly, but they can use it on control packets to reduce their | have incentives to use this codepoint sparingly, but they can use it | |||
chance of being dropped. For instance, the proposed modification to | on control packets to reduce their chance of being dropped. For | |||
TCP for re-ECN uses this codepoint on the SYN and SYN-ACK. | instance, the proposed modification to TCP for re-ECN uses this | |||
codepoint on the SYN and SYN-ACK. | ||||
Although not brought to the IETF, a simple proposal from Wischik | Although not brought to the IETF, a simple proposal from Wischik | |||
[DupTCP] suggests that the first three packets of every TCP flow | [DupTCP] suggests that the first three packets of every TCP flow | |||
should be routinely duplicated after a short delay. It shows that | should be routinely duplicated after a short delay. It shows that | |||
this would greatly improve the chances of short flows completing | this would greatly improve the chances of short flows completing | |||
quickly, but it would hardly increase traffic levels on the Internet, | quickly, but it would hardly increase traffic levels on the Internet, | |||
because Internet bytes have always been concentrated in the large | because Internet bytes have always been concentrated in the large | |||
flows. It further shows that the performance of many typical | flows. It further shows that the performance of many typical | |||
applications depends on completion of long serial chains of short | applications depends on completion of long serial chains of short | |||
messages. It argues that, given most of the value people get from | messages. It argues that, given most of the value people get from | |||
skipping to change at page 21, line 25 | skipping to change at page 22, line 30 | |||
+-----------+----------------+-----------------+--------------------+ | +-----------+----------------+-----------------+--------------------+ | |||
Table 1: Dependence of flow bit-rate per RTT on packet size s and | Table 1: Dependence of flow bit-rate per RTT on packet size s and | |||
drop rate p when network and/or transport bias towards small packets | drop rate p when network and/or transport bias towards small packets | |||
to varying degrees | to varying degrees | |||
Table 1 aims to summarise the positions we may now be in. Each | Table 1 aims to summarise the positions we may now be in. Each | |||
column shows a different possible AQM behaviour in different queues | column shows a different possible AQM behaviour in different queues | |||
in the network, using the terminology of Cnodder et al outlined | in the network, using the terminology of Cnodder et al outlined | |||
earlier (RED_1 is basic RED with packet-mode drop). Each row shows a | earlier (RED_1 is basic RED with packet-mode drop). Each row shows a | |||
different transport behaviour: TCP [RFC2581] and TFRC [RFC3448] on | different transport behaviour: TCP [RFC5681] and TFRC [RFC3448] on | |||
the top row with TFRC-SP [RFC4828] below. Suppressing all | the top row with TFRC-SP [RFC4828] below. Suppressing all | |||
inessential details the table shows that independence from packet | inessential details the table shows that independence from packet | |||
size should either be achievable by not altering the TCP transport in | size should either be achievable by not altering the TCP transport in | |||
a RED_5 network, or using the small packet TFRC-SP transport in a | a RED_5 network, or using the small packet TFRC-SP transport in a | |||
network without any byte-mode dropping RED (top right and bottom | network without any byte-mode dropping RED (top right and bottom | |||
left). Top left is the `do nothing' scenario, while bottom right is | left). Top left is the `do nothing' scenario, while bottom right is | |||
the `do-both' scenario in which bit-rate would become far too biased | the `do-both' scenario in which bit-rate would become far too biased | |||
towards small packets. Of course, if any form of byte-mode dropping | towards small packets. Of course, if any form of byte-mode dropping | |||
RED has been deployed on a selection of congested queues, each path | RED has been deployed on a selection of congested queues, each path | |||
will present a different hybrid scenario to its transport. | will present a different hybrid scenario to its transport. | |||
skipping to change at page 23, line 32 | skipping to change at page 24, line 37 | |||
The sample of returns from our vendor survey Section 6.2.4 suggest | The sample of returns from our vendor survey Section 6.2.4 suggest | |||
that byte-mode packet drop seems not to be implemented at all let | that byte-mode packet drop seems not to be implemented at all let | |||
alone deployed, or if it is, it is likely to be very sparse. | alone deployed, or if it is, it is likely to be very sparse. | |||
Therefore, we do not really need a migration strategy from all but | Therefore, we do not really need a migration strategy from all but | |||
nothing to nothing. | nothing to nothing. | |||
A programme of standards updates to take account of packet size in | A programme of standards updates to take account of packet size in | |||
transport congestion control protocols has started with TFRC-SP | transport congestion control protocols has started with TFRC-SP | |||
[RFC4828], while weighted TCPs implemented in the research community | [RFC4828], while weighted TCPs implemented in the research community | |||
[WindowPropFair] could form the basis of a future change to TCP | [WindowPropFair] could form the basis of a future change to TCP | |||
congestion control [RFC2581] itself. | congestion control [RFC5681] itself. | |||
7.2. Bit- & Packet-congestible World | 7.2. Bit- & Packet-congestible World | |||
Nonetheless, a connectionless network with both bit-congestible and | Nonetheless, a connectionless network with both bit-congestible and | |||
packet-congestible resources is a different matter. If we believe we | packet-congestible resources is a different matter. If we believe we | |||
should allow for this possibility in the future, this space contains | should allow for this possibility in the future, this space contains | |||
a truly open research issue. | a truly open research issue. | |||
The idealised wire protocol coding described in Section 5 requires at | The idealised wire protocol coding described in Section 5 requires at | |||
least two flags for congestion of bit-congestible and packet- | least two flags for congestion of bit-congestible and packet- | |||
skipping to change at page 26, line 18 | skipping to change at page 27, line 23 | |||
dropped, because they are small. But we SHOULD NOT hack the network | dropped, because they are small. But we SHOULD NOT hack the network | |||
layer to improve or fix certain transport protocols. No matter how | layer to improve or fix certain transport protocols. No matter how | |||
predominant a transport protocol is (even if it's TCP), trying to | predominant a transport protocol is (even if it's TCP), trying to | |||
correct for its failings by biasing towards small packets in the | correct for its failings by biasing towards small packets in the | |||
network layer creates a perverse incentive to break down all flows | network layer creates a perverse incentive to break down all flows | |||
from all transports into tiny segments. | from all transports into tiny segments. | |||
So far, our survey of 84 vendors across the industry has drawn | So far, our survey of 84 vendors across the industry has drawn | |||
responses from about 19%, none of whom have implemented the byte mode | responses from about 19%, none of whom have implemented the byte mode | |||
packet drop variant of RED. Given there appears to be little, if | packet drop variant of RED. Given there appears to be little, if | |||
any, installed base recommending removal of byte-mode drop from RED | any, installed base it seems we can recommend removal of byte-mode | |||
is possibly only a paper exercise with few, if any, incremental | drop from RED with little, if any, incremental deployment impact. | |||
deployment issues. | ||||
If a vendor has implemented byte-mode drop, and an operator has | If a vendor has implemented byte-mode drop, and an operator has | |||
turned it on, it is strongly RECOMMENDED that it SHOULD be turned | turned it on, it is strongly RECOMMENDED that it SHOULD be turned | |||
off. Note that RED as a whole SHOULD NOT be turned off, as without | off. Note that RED as a whole SHOULD NOT be turned off, as without | |||
it, a drop tail queue also biases against large packets. But note | it, a drop tail queue also biases against large packets. But note | |||
also that turning off byte-mode may alter the relative performance of | also that turning off byte-mode may alter the relative performance of | |||
applications using different packet sizes, so it would be advisable | applications using different packet sizes, so it would be advisable | |||
to establish the implications before turning it off. | to establish the implications before turning it off. | |||
Instead, the IETF transport area should continue its programme of | Instead, the IETF transport area should continue its programme of | |||
skipping to change at page 27, line 8 | skipping to change at page 28, line 8 | |||
most, if not all, resources being primarily bit-congestible. A | most, if not all, resources being primarily bit-congestible. A | |||
secondary conclusion of this memo is that we may see more packet- | secondary conclusion of this memo is that we may see more packet- | |||
congestible resources in the future, so research may be needed to | congestible resources in the future, so research may be needed to | |||
extend the Internet's congestion notification (drop or ECN) so that | extend the Internet's congestion notification (drop or ECN) so that | |||
it can handle a mix of bit-congestible and packet-congestible | it can handle a mix of bit-congestible and packet-congestible | |||
resources. | resources. | |||
10. Acknowledgements | 10. Acknowledgements | |||
Thank you to Sally Floyd, who gave extensive and useful review | Thank you to Sally Floyd, who gave extensive and useful review | |||
comments. Also thanks for the reviews from Toby Moncaster and Arnaud | comments. Also thanks for the reviews from Philip Eardley, Toby | |||
Jacquet. I am grateful to Bruce Davie and his colleagues for | Moncaster and Arnaud Jacquet as well as helpful explanations of | |||
providing a timely and efficient survey of RED implementation in | different hardware approaches from Larry Dunn and Fred Baker. I am | |||
Cisco's product range. Also grateful thanks to Toby Moncaster, Will | grateful to Bruce Davie and his colleagues for providing a timely and | |||
Dormann, John Regnault, Simon Carter and Stefaan De Cnodder who | efficient survey of RED implementation in Cisco's product range. | |||
further helped survey the current status of RED implementation and | Also grateful thanks to Toby Moncaster, Will Dormann, John Regnault, | |||
deployment and, finally, thanks to the anonymous individuals who | Simon Carter and Stefaan De Cnodder who further helped survey the | |||
responded. | current status of RED implementation and deployment and, finally, | |||
thanks to the anonymous individuals who responded. | ||||
Bob Briscoe is partly funded by Trilogy, a research project (ICT- | ||||
216372) supported by the European Community under its Seventh | ||||
Framework Programme. The views expressed here are those of the | ||||
author only. | ||||
11. Comments Solicited | 11. Comments Solicited | |||
Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
addressed to the IETF Transport Area working group mailing list | addressed to the IETF Transport Area working group mailing list | |||
<tsvwg@ietf.org>, and/or to the authors. | <tsvwg@ietf.org>, and/or to the authors. | |||
12. References | 12. References | |||
12.1. Normative References | 12.1. Normative References | |||
skipping to change at page 28, line 28 | skipping to change at page 29, line 34 | |||
Siris, V., "Resource Control for Elastic Traffic in CDMA | Siris, V., "Resource Control for Elastic Traffic in CDMA | |||
Networks", Proc. ACM MOBICOM'02 , September 2002, <http:// | Networks", Proc. ACM MOBICOM'02 , September 2002, <http:// | |||
www.ics.forth.gr/netlab/publications/ | www.ics.forth.gr/netlab/publications/ | |||
resource_control_elastic_cdma.html>. | resource_control_elastic_cdma.html>. | |||
[Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the | [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the | |||
evolution of congestion control", Automatica 35(12)1969-- | evolution of congestion control", Automatica 35(12)1969-- | |||
1985, December 1999, | 1985, December 1999, | |||
<http://www.statslab.cam.ac.uk/~frank/evol.html>. | <http://www.statslab.cam.ac.uk/~frank/evol.html>. | |||
[I-D.eardley-pcn-marking-behaviour] | [I-D.briscoe-tsvwg-re-ecn-tcp] | |||
Eardley, P., "Marking behaviour of PCN-nodes", | Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | |||
draft-eardley-pcn-marking-behaviour-01 (work in progress), | "Re-ECN: Adding Accountability for Causing Congestion to | |||
June 2008. | TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-07 (work in | |||
progress), March 2009. | ||||
[I-D.falk-xcp-spec] | ||||
Falk, A., "Specification for the Explicit Control Protocol | ||||
(XCP)", draft-falk-xcp-spec-03 (work in progress), | ||||
July 2007. | ||||
[I-D.floyd-tcpm-ackcc] | [I-D.floyd-tcpm-ackcc] | |||
Floyd, S. and I. Property, "Adding Acknowledgement | Floyd, S., "Adding Acknowledgement Congestion Control to | |||
Congestion Control to TCP", draft-floyd-tcpm-ackcc-02 | TCP", draft-floyd-tcpm-ackcc-06 (work in progress), | |||
(work in progress), November 2007. | July 2009. | |||
[I-D.ietf-pcn-marking-behaviour] | ||||
Eardley, P., "Metering and marking behaviour of PCN- | ||||
nodes", draft-ietf-pcn-marking-behaviour-05 (work in | ||||
progress), August 2009. | ||||
[I-D.ietf-tcpm-ecnsyn] | [I-D.ietf-tcpm-ecnsyn] | |||
Floyd, S., "Adding Explicit Congestion Notification (ECN) | Floyd, S., "Adding Explicit Congestion Notification (ECN) | |||
Capability to TCP's SYN/ACK Packets", | Capability to TCP's SYN/ACK Packets", | |||
draft-ietf-tcpm-ecnsyn-05 (work in progress), | draft-ietf-tcpm-ecnsyn-10 (work in progress), May 2009. | |||
February 2008. | ||||
[I-D.ietf-tcpm-rfc2581bis] | ||||
Allman, M., "TCP Congestion Control", | ||||
draft-ietf-tcpm-rfc2581bis-03 (work in progress), | ||||
September 2007. | ||||
[I-D.irtf-iccrg-welzl-congestion-control-open-research] | [I-D.irtf-iccrg-welzl-congestion-control-open-research] | |||
Papadimitriou, D., "Open Research Issues in Internet | Welzl, M., Scharf, M., Briscoe, B., and D. Papadimitriou, | |||
Congestion Control", | "Open Research Issues in Internet Congestion Control", | |||
draft-irtf-iccrg-welzl-congestion-control-open-research-00 | draft-irtf-iccrg-welzl-congestion-control-open-research-05 | |||
(work in progress), July 2007. | (work in progress), September 2009. | |||
[IOSArch] Bollapragada, V., White, R., and C. Murphy, "Inside Cisco | [IOSArch] Bollapragada, V., White, R., and C. Murphy, "Inside Cisco | |||
IOS Software Architecture", Cisco Press: CCIE Professional | IOS Software Architecture", Cisco Press: CCIE Professional | |||
Development ISBN13: 978-1-57870-181-0, July 2000. | Development ISBN13: 978-1-57870-181-0, July 2000. | |||
[MulTCP] Crowcroft, J. and Ph. Oechslin, "Differentiated End to End | [MulTCP] Crowcroft, J. and Ph. Oechslin, "Differentiated End to End | |||
Internet Services using a Weighted Proportional Fair | Internet Services using a Weighted Proportional Fair | |||
Sharing TCP", CCR 28(3) 53--69, July 1998, <http:// | Sharing TCP", CCR 28(3) 53--69, July 1998, <http:// | |||
www.cs.ucl.ac.uk/staff/J.Crowcroft/hipparch/pricing.html>. | www.cs.ucl.ac.uk/staff/J.Crowcroft/hipparch/pricing.html>. | |||
skipping to change at page 29, line 47 | skipping to change at page 30, line 48 | |||
[REDbyte] De Cnodder, S., Elloumi, O., and K. Pauwels, "RED behavior | [REDbyte] De Cnodder, S., Elloumi, O., and K. Pauwels, "RED behavior | |||
with different packet sizes", Proc. 5th IEEE Symposium on | with different packet sizes", Proc. 5th IEEE Symposium on | |||
Computers and Communications (ISCC) 793--799, July 2000, | Computers and Communications (ISCC) 793--799, July 2000, | |||
<http://www.icir.org/floyd/red/Elloumi99.pdf>. | <http://www.icir.org/floyd/red/Elloumi99.pdf>. | |||
[RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, | [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, | |||
"Definition of the Differentiated Services Field (DS | "Definition of the Differentiated Services Field (DS | |||
Field) in the IPv4 and IPv6 Headers", RFC 2474, | Field) in the IPv4 and IPv6 Headers", RFC 2474, | |||
December 1998. | December 1998. | |||
[RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion | ||||
Control", RFC 2581, April 1999. | ||||
[RFC3448] Handley, M., Floyd, S., Padhye, J., and J. Widmer, "TCP | [RFC3448] Handley, M., Floyd, S., Padhye, J., and J. Widmer, "TCP | |||
Friendly Rate Control (TFRC): Protocol Specification", | Friendly Rate Control (TFRC): Protocol Specification", | |||
RFC 3448, January 2003. | RFC 3448, January 2003. | |||
[RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | |||
Control for Voice Traffic in the Internet", RFC 3714, | Control for Voice Traffic in the Internet", RFC 3714, | |||
March 2004. | March 2004. | |||
[RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- | [RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- | |||
Start for TCP and IP", RFC 4782, January 2007. | Start for TCP and IP", RFC 4782, January 2007. | |||
[RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control | [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control | |||
(TFRC): The Small-Packet (SP) Variant", RFC 4828, | (TFRC): The Small-Packet (SP) Variant", RFC 4828, | |||
April 2007. | April 2007. | |||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | ||||
Control", RFC 5681, September 2009. | ||||
[Rate_fair_Dis] | [Rate_fair_Dis] | |||
Briscoe, B., "Flow Rate Fairness: Dismantling a Religion", | Briscoe, B., "Flow Rate Fairness: Dismantling a Religion", | |||
ACM CCR 37(2)63--74, April 2007, | ACM CCR 37(2)63--74, April 2007, | |||
<http://portal.acm.org/citation.cfm?id=1232926>. | <http://portal.acm.org/citation.cfm?id=1232926>. | |||
[Re-TCP] Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | ||||
"Re-ECN: Adding Accountability for Causing Congestion to | ||||
TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-05 (work in | ||||
progress), January 2008. | ||||
[WindowPropFair] | [WindowPropFair] | |||
Siris, V., "Service Differentiation and Performance of | Siris, V., "Service Differentiation and Performance of | |||
Weighted Window-Based Congestion Control and Packet | Weighted Window-Based Congestion Control and Packet | |||
Marking Algorithms in ECN Networks", Computer | Marking Algorithms in ECN Networks", Computer | |||
Communications 26(4) 314--326, 2002, <http:// | Communications 26(4) 314--326, 2002, <http:// | |||
www.ics.forth.gr/netgroup/publications/ | www.ics.forth.gr/netgroup/publications/ | |||
weighted_window_control.html>. | weighted_window_control.html>. | |||
[gentle_RED] | [gentle_RED] | |||
Floyd, S., "Recommendation on using the "gentle_" variant | Floyd, S., "Recommendation on using the "gentle_" variant | |||
skipping to change at page 30, line 50 | skipping to change at page 31, line 47 | |||
[pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End | [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End | |||
Congestion Control in the Internet", IEEE/ACM Transactions | Congestion Control in the Internet", IEEE/ACM Transactions | |||
on Networking 7(4) 458--472, August 1999, | on Networking 7(4) 458--472, August 1999, | |||
<http://www.aciri.org/floyd/end2end-paper.html>. | <http://www.aciri.org/floyd/end2end-paper.html>. | |||
[pktByteEmail] | [pktByteEmail] | |||
Floyd, S., "RED: Discussions of Byte and Packet Modes", | Floyd, S., "RED: Discussions of Byte and Packet Modes", | |||
email , March 1997, | email , March 1997, | |||
<http://www-nrg.ee.lbl.gov/floyd/REDaveraging.txt>. | <http://www-nrg.ee.lbl.gov/floyd/REDaveraging.txt>. | |||
[xcp-spec] | ||||
Falk, A., "Specification for the Explicit Control Protocol | ||||
(XCP)", draft-falk-xcp-spec-03 (work in progress), | ||||
July 2007. | ||||
(Expired) | ||||
Editorial Comments | Editorial Comments | |||
[Note_Variation] The algorithm of the byte-mode drop variant of RED | [Note_Variation] The algorithm of the byte-mode drop variant of RED | |||
switches off any bias towards small packets | switches off any bias towards small packets | |||
whenever the smoothed queue length dictates that | whenever the smoothed queue length dictates that | |||
the drop probability of large packets should be | the drop probability of large packets should be | |||
100%. In the example in the Introduction, as the | 100%. In the example in the Introduction, as the | |||
large packet drop probability varies around 25% the | large packet drop probability varies around 25% the | |||
small packet drop probability will vary around 1%, | small packet drop probability will vary around 1%, | |||
but with occasional jumps to 100% whenever the | but with occasional jumps to 100% whenever the | |||
instantaneous queue (after drop) manages to sustain | instantaneous queue (after drop) manages to sustain | |||
a length above the 100% drop point for longer than | a length above the 100% drop point for longer than | |||
the queue averaging period. | the queue averaging period. | |||
Appendix A. Example Scenarios | Appendix A. Example Scenarios | |||
A.1. Notation | A.1. Notation | |||
To prove the two sets of assertions in the idealised wire protocol | To prove our idealised wire protocol (Section 5) is correct, we will | |||
(Section 5) are true, we will compare two flows with different packet | compare two flows with different packet sizes, s_1 and s_2 [bit/pkt], | |||
sizes, s_1 and s_2 [bit/pkt], to make sure their transports each see | to make sure their transports each see the correct congestion | |||
the correct congestion notification. Initially, within each flow we | notification. Initially, within each flow we will take all packets | |||
will take all packets as having equal sizes, but later we will | as having equal sizes, but later we will generalise to flows within | |||
generalise to flows within which packet sizes vary. A flow's bit | which packet sizes vary. A flow's bit rate, x [bit/s], is related to | |||
rate, x [bit/s], is related to its packet rate, u [pkt/s], by | its packet rate, u [pkt/s], by | |||
x(t) = s.u(t). | x(t) = s.u(t). | |||
We will consider a 2x2 matrix of four scenarios: | We will consider a 2x2 matrix of four scenarios: | |||
+-----------------------------+------------------+------------------+ | +-----------------------------+------------------+------------------+ | |||
| resource type and | A) Equal bit | B) Equal pkt | | | resource type and | A) Equal bit | B) Equal pkt | | |||
| congestion level | rates | rates | | | congestion level | rates | rates | | |||
+-----------------------------+------------------+------------------+ | +-----------------------------+------------------+------------------+ | |||
| i) bit-congestible, p_b | (Ai) | (Bi) | | | i) bit-congestible, p_b | (Ai) | (Bi) | | |||
skipping to change at page 32, line 40 | skipping to change at page 33, line 42 | |||
However, where an absolute target rather than relative volume of | However, where an absolute target rather than relative volume of | |||
congestion caused is important (Section 5), as it is for congestion | congestion caused is important (Section 5), as it is for congestion | |||
accountability [Rate_fair_Dis], the transport must count marked bytes | accountability [Rate_fair_Dis], the transport must count marked bytes | |||
not packets, in this bit-congestible case. Aside from the goal of | not packets, in this bit-congestible case. Aside from the goal of | |||
congestion accountability, this is how the bit rate of a transport | congestion accountability, this is how the bit rate of a transport | |||
can be made independent of packet size; by ensuring the rate of | can be made independent of packet size; by ensuring the rate of | |||
congestion caused is kept to a constant weight [WindowPropFair], | congestion caused is kept to a constant weight [WindowPropFair], | |||
rather than merely responding to the ratio of marked and unmarked | rather than merely responding to the ratio of marked and unmarked | |||
bytes. | bytes. | |||
Note the unit of byte-congestion volume is the byte. | Note the unit of byte-congestion-volume is the byte. | |||
A.3. Bit-congestible resource, equal packet rates (Bi) | A.3. Bit-congestible resource, equal packet rates (Bi) | |||
If two flows send different packet sizes but at the same packet rate, | If two flows send different packet sizes but at the same packet rate, | |||
their bit rates will be in the same ratio as their packet sizes, x_2/ | their bit rates will be in the same ratio as their packet sizes, x_2/ | |||
x_1 = s_2/s_1. For instance, a flow sending 1500B packets at the | x_1 = s_2/s_1. For instance, a flow sending 1500B packets at the | |||
same packet rate as another sending 60B packets will be sending at | same packet rate as another sending 60B packets will be sending at | |||
25x greater bit rate. In this case, if a congested resource marks | 25x greater bit rate. In this case, if a congested resource marks | |||
proportion p_b of packets irrespective of size, the ratio of packets | proportion p_b of packets irrespective of size, the ratio of packets | |||
received with the byte-congestion field marked by each transport will | received with the byte-congestion field marked by each transport will | |||
skipping to change at page 34, line 8 | skipping to change at page 35, line 11 | |||
combined packet rate times the marking probability, p_p(u_1+u_2), 26x | combined packet rate times the marking probability, p_p(u_1+u_2), 26x | |||
faster than packet congestion accumulates in the single 1500B packet | faster than packet congestion accumulates in the single 1500B packet | |||
flow of our example, as required. | flow of our example, as required. | |||
But if the transport is interested in the absolute number of packet | But if the transport is interested in the absolute number of packet | |||
congestion, it should just count how many marked packets arrive. For | congestion, it should just count how many marked packets arrive. For | |||
instance, a flow sending 60B packets will see 25x more marked packets | instance, a flow sending 60B packets will see 25x more marked packets | |||
than one sending 1500B packets at the same bit rate, because it is | than one sending 1500B packets at the same bit rate, because it is | |||
sending more packets through a packet-congestible resource. | sending more packets through a packet-congestible resource. | |||
Note the unit of packet congestion is packets. | Note the unit of packet congestion is a packet. | |||
A.5. Pkt-congestible resource, equal packet rates (Bii) | A.5. Pkt-congestible resource, equal packet rates (Bii) | |||
Finally, if two flows with the same packet rate, pass through a | Finally, if two flows with the same packet rate, pass through a | |||
packet-congestible resource, they will both suffer the same | packet-congestible resource, they will both suffer the same | |||
proportion of marking, p_p, irrespective of their packet sizes. On | proportion of marking, p_p, irrespective of their packet sizes. On | |||
detecting that the pkt-congestion field is marked, the transport | detecting that the pkt-congestion field is marked, the transport | |||
should count packets, and it will be able to extract the ratio p_p of | should count packets, and it will be able to extract the ratio p_p of | |||
marked to unmarked packets from both flows, irrespective of packet | marked to unmarked packets from both flows, irrespective of packet | |||
sizes. | sizes. | |||
skipping to change at page 34, line 34 | skipping to change at page 35, line 37 | |||
And if the two equal packet rates of different size packets are mixed | And if the two equal packet rates of different size packets are mixed | |||
together in one flow, the packet rate will double, so the absolute | together in one flow, the packet rate will double, so the absolute | |||
volume of packet-congestion will accumulate at twice the rate of | volume of packet-congestion will accumulate at twice the rate of | |||
either flow, 2p_p.u_1 = p_p(u_1+u_2). | either flow, 2p_p.u_1 = p_p(u_1+u_2). | |||
Appendix B. Congestion Notification Definition: Further Justification | Appendix B. Congestion Notification Definition: Further Justification | |||
In Section 3 on the definition of congestion notification, load not | In Section 3 on the definition of congestion notification, load not | |||
capacity was used as the denominator. This also has a subtle | capacity was used as the denominator. This also has a subtle | |||
significance in the related debate over the design of new transport | significance in the related debate over the design of new transport | |||
protocols--typical new protocol designs (e.g. in XCP | protocols--typical new protocol designs (e.g. in XCP [xcp-spec] & | |||
[I-D.falk-xcp-spec] & Quickstart [RFC4782]) expect the sending | Quickstart [RFC4782]) expect the sending transport to communicate its | |||
transport to communicate its desired flow rate to the network and | desired flow rate to the network and network elements to | |||
network elements to progressively subtract from this so that the | progressively subtract from this so that the achievable flow rate | |||
achievable flow rate emerges at the receiving transport. | emerges at the receiving transport. | |||
Congestion notification with total load in the denominator can serve | Congestion notification with total load in the denominator can serve | |||
a similar purpose (though in retrospect not in advance like XCP & | a similar purpose (though in retrospect not in advance like XCP & | |||
QuickStart). Congestion notification is a dimensionless fraction but | QuickStart). Congestion notification is a dimensionless fraction but | |||
each source can extract necessary rate information from it because it | each source can extract necessary rate information from it because it | |||
already knows what its own rate is. Even though congestion | already knows what its own rate is. Even though congestion | |||
notification doesn't communicate a rate explicitly, from each | notification doesn't communicate a rate explicitly, from each | |||
source's point of view congestion notification represents the | source's point of view congestion notification represents the | |||
fraction of the rate it was sending a round trip ago that couldn't | fraction of the rate it was sending a round trip ago that couldn't | |||
(or wouldn't) be served by available resources. After they were | (or wouldn't) be served by available resources. After they were | |||
skipping to change at page 35, line 22 | skipping to change at page 36, line 24 | |||
response of _any_ transport to congestion depends on bit-congestible | response of _any_ transport to congestion depends on bit-congestible | |||
network resources only doing packet-mode not byte-mode drop. | network resources only doing packet-mode not byte-mode drop. | |||
To be able to police a transport's response to congestion when | To be able to police a transport's response to congestion when | |||
fairness can only be judged over time and over all an individual's | fairness can only be judged over time and over all an individual's | |||
flows, the policer has to have an integrated view of all the | flows, the policer has to have an integrated view of all the | |||
congestion an individual (not just one flow) has caused due to all | congestion an individual (not just one flow) has caused due to all | |||
traffic entering the Internet from that individual. This is termed | traffic entering the Internet from that individual. This is termed | |||
congestion accountability. | congestion accountability. | |||
But with byte-mode drop, one dropped or marked packet is not | But a byte-mode drop algorithm has to depend on the local MTU of the | |||
necessarily equivalent to another unless you know the MTU that caused | line - an algorithm needs to use some concept of a 'normal' packet | |||
it to be dropped/marked. To have an integrated view of a user, we | size. Therefore, one dropped or marked packet is not necessarily | |||
equivalent to another unless you know the MTU at the queue that where | ||||
it was dropped/marked. To have an integrated view of a user, we | ||||
believe congestion policing has to be located at an individual's | believe congestion policing has to be located at an individual's | |||
attachment point to the Internet [Re-TCP]. But from there it cannot | attachment point to the Internet [I-D.briscoe-tsvwg-re-ecn-tcp]. But | |||
know the MTU of each remote queue that caused each drop/mark. | from there it cannot know the MTU of each remote queue that caused | |||
Therefore it cannot take an integrated approach to policing all the | each drop/mark. Therefore it cannot take an integrated approach to | |||
responses to congestion of all the transports of one individual. | policing all the responses to congestion of all the transports of one | |||
Therefore it cannot police anything. | individual. Therefore it cannot police anything. | |||
The security/incentive argument _for_ packet-mode drop is similar. | The security/incentive argument _for_ packet-mode drop is similar. | |||
Firstly, confining RED to packet-mode drop would not preclude | Firstly, confining RED to packet-mode drop would not preclude | |||
bottleneck policing approaches such as [pBox] as it seems likely they | bottleneck policing approaches such as [pBox] as it seems likely they | |||
could work just as well by monitoring the volume of dropped bytes | could work just as well by monitoring the volume of dropped bytes | |||
rather than packets. Secondly packet-mode dropping/marking naturally | rather than packets. Secondly packet-mode dropping/marking naturally | |||
allows the congestion notification of packets to be globally | allows the congestion notification of packets to be globally | |||
meaningful without relying on MTU information held elsewhere. | meaningful without relying on MTU information held elsewhere. | |||
Because we recommend that a dropped/marked packet should be taken to | Because we recommend that a dropped/marked packet should be taken to | |||
skipping to change at page 36, line 9 | skipping to change at page 37, line 14 | |||
In summary, making drop probability depend on the size of the packets | In summary, making drop probability depend on the size of the packets | |||
that bits happen to be divided into simply encourages the bits to be | that bits happen to be divided into simply encourages the bits to be | |||
divided into smaller packets. Byte-mode drop would therefore | divided into smaller packets. Byte-mode drop would therefore | |||
irreversibly complicate any attempt to fix the Internet's incentive | irreversibly complicate any attempt to fix the Internet's incentive | |||
structures. | structures. | |||
Author's Address | Author's Address | |||
Bob Briscoe | Bob Briscoe | |||
BT & UCL | BT | |||
B54/77, Adastral Park | B54/77, Adastral Park | |||
Martlesham Heath | Martlesham Heath | |||
Ipswich IP5 3RE | Ipswich IP5 3RE | |||
UK | UK | |||
Phone: +44 1473 645196 | Phone: +44 1473 645196 | |||
Email: bob.briscoe@bt.com | Email: bob.briscoe@bt.com | |||
URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ | URI: http://bobbriscoe.net/ | |||
Full Copyright Statement | ||||
Copyright (C) The IETF Trust (2008). | ||||
This document is subject to the rights, licenses and restrictions | ||||
contained in BCP 78, and except as set forth therein, the authors | ||||
retain all their rights. | ||||
This document and the information contained herein are provided on an | ||||
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND | ||||
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS | ||||
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF | ||||
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
Intellectual Property | ||||
The IETF takes no position regarding the validity or scope of any | ||||
Intellectual Property Rights or other rights that might be claimed to | ||||
pertain to the implementation or use of the technology described in | ||||
this document or the extent to which any license under such rights | ||||
might or might not be available; nor does it represent that it has | ||||
made any independent effort to identify any such rights. Information | ||||
on the procedures with respect to rights in RFC documents can be | ||||
found in BCP 78 and BCP 79. | ||||
Copies of IPR disclosures made to the IETF Secretariat and any | ||||
assurances of licenses to be made available, or the result of an | ||||
attempt made to obtain a general license or permission for the use of | ||||
such proprietary rights by implementers or users of this | ||||
specification can be obtained from the IETF on-line IPR repository at | ||||
http://www.ietf.org/ipr. | ||||
The IETF invites any interested party to bring to its attention any | ||||
copyrights, patents or patent applications, or other proprietary | ||||
rights that may cover technology that may be required to implement | ||||
this standard. Please address the information to the IETF at | ||||
ietf-ipr@ietf.org. | ||||
Acknowledgment | ||||
This document was produced using xml2rfc v1.33 (of | ||||
http://xml.resource.org/) from a source in RFC-2629 XML format. | ||||
End of changes. 77 change blocks. | ||||
238 lines changed or deleted | 260 lines changed or added | |||
This html diff was produced by rfcdiff 1.37a. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |