draft-ietf-tsvwg-byte-pkt-congest-11.txt | draft-ietf-tsvwg-byte-pkt-congest-12.txt | |||
---|---|---|---|---|
Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
Internet-Draft BT | Internet-Draft BT | |||
Updates: 2309 (if approved) J. Manner | Updates: 2309 (if approved) J. Manner | |||
Intended status: BCP Aalto University | Intended status: BCP Aalto University | |||
Expires: February 2, 2014 August 1, 2013 | Expires: May 11, 2014 November 07, 2013 | |||
Byte and Packet Congestion Notification | Byte and Packet Congestion Notification | |||
draft-ietf-tsvwg-byte-pkt-congest-11 | draft-ietf-tsvwg-byte-pkt-congest-12 | |||
Abstract | Abstract | |||
This document provides recommendations of best current practice for | This document provides recommendations of best current practice for | |||
dropping or marking packets using any active queue management (AQM) | dropping or marking packets using any active queue management (AQM) | |||
algorithm, including random early detection (RED), BLUE, pre- | algorithm, including random early detection (RED), BLUE, pre- | |||
congestion notification (PCN) and newer schemes such as CoDel and | congestion notification (PCN) and newer schemes such as CoDel | |||
PIE. We give three strong recommendations: (1) packet size should be | (Controlled Delay) and PIE (Proportional Integral controller | |||
taken into account when transports detect and respond to congestion | Enhanced). We give three strong recommendations: (1) packet size | |||
indications, (2) packet size should not be taken into account when | should be taken into account when transports detect and respond to | |||
network equipment creates congestion signals (marking, dropping), and | congestion indications, (2) packet size should not be taken into | |||
therefore (3) in the specific case of RED, the byte-mode packet drop | account when network equipment creates congestion signals (marking, | |||
variant that drops fewer small packets should not be used. This memo | dropping), and therefore (3) in the specific case of RED, the byte- | |||
updates RFC 2309 to deprecate deliberate preferential treatment of | mode packet drop variant that drops fewer small packets should not be | |||
small packets in AQM algorithms. | used. This memo updates RFC 2309 to deprecate deliberate | |||
preferential treatment of small packets in AQM algorithms. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on February 2, 2014. | This Internet-Draft will expire on May 11, 2014. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2013 IETF Trust and the persons identified as the | Copyright (c) 2013 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 3, line 23 | skipping to change at page 3, line 23 | |||
2.3. Recommendation on Responding to Congestion . . . . . . . . 11 | 2.3. Recommendation on Responding to Congestion . . . . . . . . 11 | |||
2.4. Recommendation on Handling Congestion Indications when | 2.4. Recommendation on Handling Congestion Indications when | |||
Splitting or Merging Packets . . . . . . . . . . . . . . . 12 | Splitting or Merging Packets . . . . . . . . . . . . . . . 12 | |||
3. Motivating Arguments . . . . . . . . . . . . . . . . . . . . . 12 | 3. Motivating Arguments . . . . . . . . . . . . . . . . . . . . . 12 | |||
3.1. Avoiding Perverse Incentives to (Ab)use Smaller Packets . 12 | 3.1. Avoiding Perverse Incentives to (Ab)use Smaller Packets . 12 | |||
3.2. Small != Control . . . . . . . . . . . . . . . . . . . . . 14 | 3.2. Small != Control . . . . . . . . . . . . . . . . . . . . . 14 | |||
3.3. Transport-Independent Network . . . . . . . . . . . . . . 14 | 3.3. Transport-Independent Network . . . . . . . . . . . . . . 14 | |||
3.4. Partial Deployment of AQM . . . . . . . . . . . . . . . . 15 | 3.4. Partial Deployment of AQM . . . . . . . . . . . . . . . . 15 | |||
3.5. Implementation Efficiency . . . . . . . . . . . . . . . . 17 | 3.5. Implementation Efficiency . . . . . . . . . . . . . . . . 17 | |||
4. A Survey and Critique of Past Advice . . . . . . . . . . . . . 17 | 4. A Survey and Critique of Past Advice . . . . . . . . . . . . . 17 | |||
4.1. Congestion Measurement Advice . . . . . . . . . . . . . . 17 | 4.1. Congestion Measurement Advice . . . . . . . . . . . . . . 18 | |||
4.1.1. Fixed Size Packet Buffers . . . . . . . . . . . . . . 18 | 4.1.1. Fixed Size Packet Buffers . . . . . . . . . . . . . . 18 | |||
4.1.2. Congestion Measurement without a Queue . . . . . . . . 19 | 4.1.2. Congestion Measurement without a Queue . . . . . . . . 19 | |||
4.2. Congestion Notification Advice . . . . . . . . . . . . . . 20 | 4.2. Congestion Notification Advice . . . . . . . . . . . . . . 20 | |||
4.2.1. Network Bias when Encoding . . . . . . . . . . . . . . 20 | 4.2.1. Network Bias when Encoding . . . . . . . . . . . . . . 20 | |||
4.2.2. Transport Bias when Decoding . . . . . . . . . . . . . 21 | 4.2.2. Transport Bias when Decoding . . . . . . . . . . . . . 22 | |||
4.2.3. Making Transports Robust against Control Packet | 4.2.3. Making Transports Robust against Control Packet | |||
Losses . . . . . . . . . . . . . . . . . . . . . . . . 23 | Losses . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
4.2.4. Congestion Notification: Summary of Conflicting | 4.2.4. Congestion Notification: Summary of Conflicting | |||
Advice . . . . . . . . . . . . . . . . . . . . . . . . 23 | Advice . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
5. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 24 | 5. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 25 | |||
5.1. Bit-congestible Network . . . . . . . . . . . . . . . . . 24 | 5.1. Bit-congestible Network . . . . . . . . . . . . . . . . . 25 | |||
5.2. Bit- & Packet-congestible Network . . . . . . . . . . . . 25 | 5.2. Bit- & Packet-congestible Network . . . . . . . . . . . . 25 | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 25 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 26 | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 | |||
8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26 | 8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26 | |||
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27 | 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
10. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 28 | 10. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 28 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . . 28 | 11.1. Normative References . . . . . . . . . . . . . . . . . . . 28 | |||
11.2. Informative References . . . . . . . . . . . . . . . . . . 28 | 11.2. Informative References . . . . . . . . . . . . . . . . . . 28 | |||
Appendix A. Survey of RED Implementation Status . . . . . . . . . 32 | Appendix A. Survey of RED Implementation Status . . . . . . . . . 32 | |||
Appendix B. Sufficiency of Packet-Mode Drop . . . . . . . . . . . 33 | Appendix B. Sufficiency of Packet-Mode Drop . . . . . . . . . . . 34 | |||
B.1. Packet-Size (In)Dependence in Transports . . . . . . . . . 34 | B.1. Packet-Size (In)Dependence in Transports . . . . . . . . . 35 | |||
B.2. Bit-Congestible and Packet-Congestible Indications . . . . 37 | B.2. Bit-Congestible and Packet-Congestible Indications . . . . 38 | |||
Appendix C. Byte-mode Drop Complicates Policing Congestion | Appendix C. Byte-mode Drop Complicates Policing Congestion | |||
Response . . . . . . . . . . . . . . . . . . . . . . 38 | Response . . . . . . . . . . . . . . . . . . . . . . 39 | |||
Appendix D. Changes from Previous Versions . . . . . . . . . . . 39 | Appendix D. Changes from Previous Versions . . . . . . . . . . . 40 | |||
1. Introduction | 1. Introduction | |||
This document provides recommendations of best current practice for | This document provides recommendations of best current practice for | |||
how we should correctly scale congestion control functions with | how we should correctly scale congestion control functions with | |||
respect to packet size for the long term. It also recognises that | respect to packet size for the long term. It also recognises that | |||
expediency may be necessary to deal with existing widely deployed | expediency may be necessary to deal with existing widely deployed | |||
protocols that don't live up to the long term goal. | protocols that don't live up to the long term goal. | |||
When signalling congestion, the problem of how (and whether) to take | When signalling congestion, the problem of how (and whether) to take | |||
skipping to change at page 5, line 29 | skipping to change at page 5, line 29 | |||
In the particular case of Random early Detection (RED), this means | In the particular case of Random early Detection (RED), this means | |||
that the byte-mode packet drop variant should not be used to drop | that the byte-mode packet drop variant should not be used to drop | |||
fewer small packets, because that creates a perverse incentive for | fewer small packets, because that creates a perverse incentive for | |||
transports to use tiny segments, consequently also opening up a DoS | transports to use tiny segments, consequently also opening up a DoS | |||
vulnerability. Fortunately all the RED implementers who responded to | vulnerability. Fortunately all the RED implementers who responded to | |||
our admittedly limited survey (Section 4.2.4) have not followed the | our admittedly limited survey (Section 4.2.4) have not followed the | |||
earlier advice to use byte-mode drop, so the position this memo | earlier advice to use byte-mode drop, so the position this memo | |||
argues for seems to already exist in implementations. | argues for seems to already exist in implementations. | |||
However, at the transport layer, TCP congestion control is a widely | However, at the transport layer, TCP congestion control is a widely | |||
deployed protocol that doesn't scale with packet size. To date this | deployed protocol that doesn't scale with packet size (i.e. its | |||
hasn't been a significant problem because most TCP implementations | reduction in rate does not take into account the size of a lost | |||
have been used with similar packet sizes. But, as we design new | packet). To date this hasn't been a significant problem because most | |||
congestion control mechanisms, this memo recommends that we should | TCP implementations have been used with similar packet sizes. But, | |||
build in scaling with packet size rather than assuming we should | as we design new congestion control mechanisms, this memo recommends | |||
follow TCP's example. | that we should build in scaling with packet size rather than assuming | |||
we should follow TCP's example. | ||||
This memo continues as follows. First it discusses terminology and | This memo continues as follows. First it discusses terminology and | |||
scoping. Section 2 gives the concrete formal recommendations, | scoping. Section 2 gives the concrete formal recommendations, | |||
followed by motivating arguments in Section 3. We then critically | followed by motivating arguments in Section 3. We then critically | |||
survey the advice given previously in the RFC series and the research | survey the advice given previously in the RFC series and the research | |||
literature (Section 4), referring to an assessment of whether or not | literature (Section 4), referring to an assessment of whether or not | |||
this advice has been followed in production networks (Appendix A). | this advice has been followed in production networks (Appendix A). | |||
To wrap up, outstanding issues are discussed that will need | To wrap up, outstanding issues are discussed that will need | |||
resolution both to inform future protocol designs and to handle | resolution both to inform future protocol designs and to handle | |||
legacy (Section 5). Then security issues are collected together in | legacy (Section 5). Then security issues are collected together in | |||
skipping to change at page 6, line 37 | skipping to change at page 6, line 38 | |||
virtual limit smaller than the actual limit to the resource, then | virtual limit smaller than the actual limit to the resource, then | |||
notify when this virtual limit is exceeded in order to avoid | notify when this virtual limit is exceeded in order to avoid | |||
uncontrolled congestion of the actual capacity. | uncontrolled congestion of the actual capacity. | |||
Congestion notification communicates a real number bounded by the | Congestion notification communicates a real number bounded by the | |||
range [ 0 , 1 ]. This ties in with the most well-understood | range [ 0 , 1 ]. This ties in with the most well-understood | |||
measure of congestion notification: drop probability. | measure of congestion notification: drop probability. | |||
Explicit and Implicit Notification: The byte vs. packet dilemma | Explicit and Implicit Notification: The byte vs. packet dilemma | |||
concerns congestion notification irrespective of whether it is | concerns congestion notification irrespective of whether it is | |||
signalled implicitly by drop or using explicit congestion | signalled implicitly by drop or using Explicit Congestion | |||
notification (ECN [RFC3168] or PCN [RFC5670]). Throughout this | Notification (ECN [RFC3168] or PCN [RFC5670]). Throughout this | |||
document, unless clear from the context, the term marking will be | document, unless clear from the context, the term marking will be | |||
used to mean notifying congestion explicitly, while congestion | used to mean notifying congestion explicitly, while congestion | |||
notification will be used to mean notifying congestion either | notification will be used to mean notifying congestion either | |||
implicitly by drop or explicitly by marking. | implicitly by drop or explicitly by marking. | |||
Bit-congestible vs. Packet-congestible: If the load on a resource | Bit-congestible vs. Packet-congestible: If the load on a resource | |||
depends on the rate at which packets arrive, it is called packet- | depends on the rate at which packets arrive, it is called packet- | |||
congestible. If the load depends on the rate at which bits arrive | congestible. If the load depends on the rate at which bits arrive | |||
it is called bit-congestible. | it is called bit-congestible. | |||
skipping to change at page 8, line 41 | skipping to change at page 8, line 43 | |||
size. Because there are 25 times more small packets in one second, | size. Because there are 25 times more small packets in one second, | |||
it naturally drops 25 times more small packets, that is 100 small | it naturally drops 25 times more small packets, that is 100 small | |||
packets but only 4 large packets. But if we count how many bits it | packets but only 4 large packets. But if we count how many bits it | |||
drops, there are 48,000 bits in 100 small packets and 48,000 bits in | drops, there are 48,000 bits in 100 small packets and 48,000 bits in | |||
4 large packets--the same number of bits of small packets as large. | 4 large packets--the same number of bits of small packets as large. | |||
The packet-mode drop algorithm drops any bit with the same | The packet-mode drop algorithm drops any bit with the same | |||
probability whether the bit is in a small or a large packet. | probability whether the bit is in a small or a large packet. | |||
For byte-mode drop, again we use an example drop probability of 0.1%, | For byte-mode drop, again we use an example drop probability of 0.1%, | |||
but only for maximum size packets (assuming the link MTU is 1,500B or | but only for maximum size packets (assuming the link maximum | |||
12,000b). The byte-mode algorithm reduces the drop probability of | transmission unit (MTU) is 1,500B or 12,000b). The byte-mode | |||
smaller packets proportional to their size, making the probability | algorithm reduces the drop probability of smaller packets | |||
that it drops a small packet 25 times smaller at 0.004%. But there | proportional to their size, making the probability that it drops a | |||
are 25 times more small packets, so dropping them with 25 times lower | small packet 25 times smaller at 0.004%. But there are 25 times more | |||
probability results in dropping the same number of packets: 4 drops | small packets, so dropping them with 25 times lower probability | |||
in both cases. The 4 small dropped packets contain 25 times less | results in dropping the same number of packets: 4 drops in both | |||
bits than the 4 large dropped packets: 1,920 compared to 48,000. | cases. The 4 small dropped packets contain 25 times less bits than | |||
the 4 large dropped packets: 1,920 compared to 48,000. | ||||
The byte-mode drop algorithm drops any bit with a probability | The byte-mode drop algorithm drops any bit with a probability | |||
proportionate to the size of the packet it is in. | proportionate to the size of the packet it is in. | |||
2. Recommendations | 2. Recommendations | |||
This section gives recommendations related to network equipment in | This section gives recommendations related to network equipment in | |||
Sections 2.1 and 2.2, and in Sections 2.3 and 2.4 we discuss the | Sections 2.1 and 2.2, and in Sections 2.3 and 2.4 we discuss the | |||
implications on the transport protocols. | implications on the transport protocols. | |||
2.1. Recommendation on Queue Measurement | 2.1. Recommendation on Queue Measurement | |||
Ideally, an AQM would measure the service time of the queue to | Ideally, an AQM would measure the service time of the queue to | |||
measure congestion of a resource. However service time can only be | measure congestion of a resource. However service time can only be | |||
measured as packets leave the queue, where it is not always feasible | measured as packets leave the queue, where it is not always expedient | |||
to implement a full AQM algorithm. To predict the service time as | to implement a full AQM algorithm. To predict the service time as | |||
packets join the queue, an AQM algorithm needs to measure the length | packets join the queue, an AQM algorithm needs to measure the length | |||
of the queue. | of the queue. | |||
In this case, if the resource is bit-congestible, the AQM | In this case, if the resource is bit-congestible, the AQM | |||
implementation SHOULD measure the length of the queue in bytes and, | implementation SHOULD measure the length of the queue in bytes and, | |||
if the resource is packet-congestible, the implementation SHOULD | if the resource is packet-congestible, the implementation SHOULD | |||
measure the length of the queue in packets. No other choice makes | measure the length of the queue in packets. Subject to the | |||
sense, because the number of packets waiting in the queue isn't | exceptions below, no other choice makes sense, because the number of | |||
relevant if the resource gets congested by bytes and vice versa. For | packets waiting in the queue isn't relevant if the resource gets | |||
example, the length of the queue into a transmission line would be | congested by bytes and vice versa. For example, the length of the | |||
measured in bytes, while the length of the queue into a firewall | queue into a transmission line would be measured in bytes, while the | |||
would be measured in packets. | length of the queue into a firewall would be measured in packets. | |||
To avoid the pathological effects of drop tail, the AQM can then | To avoid the pathological effects of drop tail, the AQM can then | |||
transform this service time or queue length into the probability of | transform this service time or queue length into the probability of | |||
dropping or marking a packet (e.g. RED's piecewise linear function | dropping or marking a packet (e.g. RED's piecewise linear function | |||
between thresholds). | between thresholds). | |||
What this advice means for RED as a specific example: | What this advice means for RED as a specific example: | |||
1. A RED implementation SHOULD use byte mode queue measurement for | 1. A RED implementation SHOULD use byte mode queue measurement for | |||
measuring the congestion of bit-congestible resources and packet | measuring the congestion of bit-congestible resources and packet | |||
mode queue measurement for packet-congestible resources. | mode queue measurement for packet-congestible resources. | |||
2. An implementation SHOULD NOT make it possible to configure the | 2. An implementation SHOULD NOT make it possible to configure the | |||
way a queue measures itself, because whether a queue is bit- | way a queue measures itself, because whether a queue is bit- | |||
congestible or packet-congestible is an inherent property of the | congestible or packet-congestible is an inherent property of the | |||
queue. | queue. | |||
Exceptions to these recommendations MAY be necessary, for instance | Exceptions to these recommendations might be necessary, for instance | |||
where a packet-congestible resource has to be configured as a proxy | where a packet-congestible resource has to be configured as a proxy | |||
bottleneck for a bit-congestible resource in an adjacent box that | bottleneck for a bit-congestible resource in an adjacent box that | |||
does not support AQM. | does not support AQM. | |||
The recommended approach in less straightforward scenarios, such as | The recommended approach in less straightforward scenarios, such as | |||
fixed size packet buffers, resources without a queue and buffers | fixed size packet buffers, resources without a queue and buffers | |||
comprising a mix of packet and bit-congestible resources, is | comprising a mix of packet and bit-congestible resources, is | |||
discussed in Section 4.1. For instance, Section 4.1.1 explains that | discussed in Section 4.1. For instance, Section 4.1.1 explains that | |||
the queue into a line should be measured in bytes even if the queue | the queue into a line should be measured in bytes even if the queue | |||
consists of fixed-size packet-buffers, because the root-cause of any | consists of fixed-size packet-buffers, because the root-cause of any | |||
skipping to change at page 11, line 23 | skipping to change at page 11, line 24 | |||
marked, it SHOULD consider the strength of the congestion indication | marked, it SHOULD consider the strength of the congestion indication | |||
as proportionate to the size in octets (bytes) of the missing or | as proportionate to the size in octets (bytes) of the missing or | |||
marked packet. | marked packet. | |||
In other words, when a packet indicates congestion (by being lost or | In other words, when a packet indicates congestion (by being lost or | |||
marked) it can be considered conceptually as if there is a congestion | marked) it can be considered conceptually as if there is a congestion | |||
indication on every octet of the packet, not just one indication per | indication on every octet of the packet, not just one indication per | |||
packet. | packet. | |||
To be clear, the above recommendation solely describes how a | To be clear, the above recommendation solely describes how a | |||
transport should interpret the meaning of a congestion indication. | transport should interpret the meaning of a congestion indication, as | |||
It makes no recommendation on whether a transport should act | a long term goal. It makes no recommendation on whether a transport | |||
differently based on this interpretation. It merely aids | should act differently based on this interpretation. It merely aids | |||
interoperablity between transports, if they choose to make their | interoperablity between transports, if they choose to make their | |||
actions depend on the strength of congestion indications. | actions depend on the strength of congestion indications. | |||
This definition will be useful as the IETF transport area continues | This definition will be useful as the IETF transport area continues | |||
its programme of; | its programme of; | |||
o updating host-based congestion control protocols to take account | o updating host-based congestion control protocols to take account | |||
of packet size | of packet size | |||
o making transports less sensitive to losing control packets like | o making transports less sensitive to losing control packets like | |||
skipping to change at page 12, line 6 | skipping to change at page 12, line 8 | |||
2. If it is desired to improve TCP performance by reducing the | 2. If it is desired to improve TCP performance by reducing the | |||
chance that a SYN or a pure ACK will be dropped, this SHOULD be | chance that a SYN or a pure ACK will be dropped, this SHOULD be | |||
done by modifying TCP (Section 4.2.3), not network equipment. | done by modifying TCP (Section 4.2.3), not network equipment. | |||
To be clear, we are not recommending at all that TCPs under | To be clear, we are not recommending at all that TCPs under | |||
equivalent conditions should aim for equal bit-rates. We are merely | equivalent conditions should aim for equal bit-rates. We are merely | |||
saying that anyone trying to do such a thing should modify their TCP | saying that anyone trying to do such a thing should modify their TCP | |||
algorithm, not the network. | algorithm, not the network. | |||
These recommendations are phrased as 'SHOULD' rather than 'MUST', | These recommendations are phrased as 'SHOULD' rather than 'MUST', | |||
because there may be cases where compatibility with pre-existing | because there may be cases where expediency dictates that | |||
versions of a transport protocol make the recommendations | compatibility with pre-existing versions of a transport protocol make | |||
impractical. | the recommendations impractical. | |||
2.4. Recommendation on Handling Congestion Indications when Splitting | 2.4. Recommendation on Handling Congestion Indications when Splitting | |||
or Merging Packets | or Merging Packets | |||
Packets carrying congestion indications may be split or merged in | Packets carrying congestion indications may be split or merged in | |||
some circumstances (e.g. at a RTP/RTCP transcoder or during IP | some circumstances (e.g. at a RTP/RTCP transcoder or during IP | |||
fragment reassembly). Splitting and merging only make sense in the | fragment reassembly). Splitting and merging only make sense in the | |||
context of ECN, not loss. | context of ECN, not loss. | |||
The general rule to follow is that the number of octets in packets | The general rule to follow is that the number of octets in packets | |||
skipping to change at page 23, line 22 | skipping to change at page 23, line 27 | |||
Recently, two RFCs have defined changes to TCP that make it more | Recently, two RFCs have defined changes to TCP that make it more | |||
robust against losing small control packets [RFC5562] [RFC5690]. In | robust against losing small control packets [RFC5562] [RFC5690]. In | |||
both cases they note that the case for these two TCP changes would be | both cases they note that the case for these two TCP changes would be | |||
weaker if RED were biased against dropping small packets. We argue | weaker if RED were biased against dropping small packets. We argue | |||
here that these two proposals are a safer and more principled way to | here that these two proposals are a safer and more principled way to | |||
achieve TCP performance improvements than reverse engineering RED to | achieve TCP performance improvements than reverse engineering RED to | |||
benefit TCP. | benefit TCP. | |||
Although there are no known proposals, it would also be possible and | Although there are no known proposals, it would also be possible and | |||
perfectly valid to make control packets robust against drop by | perfectly valid to make control packets robust against drop by | |||
explicitly requesting a lower drop probability using their Diffserv | requesting a scheduling class with lower drop probability, by re- | |||
code point [RFC2474] to request a scheduling class with lower drop. | marking to a Diffserv code point [RFC2474] within the same behaviour | |||
aggregate. | ||||
Although not brought to the IETF, a simple proposal from Wischik | Although not brought to the IETF, a simple proposal from Wischik | |||
[DupTCP] suggests that the first three packets of every TCP flow | [DupTCP] suggests that the first three packets of every TCP flow | |||
should be routinely duplicated after a short delay. It shows that | should be routinely duplicated after a short delay. It shows that | |||
this would greatly improve the chances of short flows completing | this would greatly improve the chances of short flows completing | |||
quickly, but it would hardly increase traffic levels on the Internet, | quickly, but it would hardly increase traffic levels on the Internet, | |||
because Internet bytes have always been concentrated in the large | because Internet bytes have always been concentrated in the large | |||
flows. It further shows that the performance of many typical | flows. It further shows that the performance of many typical | |||
applications depends on completion of long serial chains of short | applications depends on completion of long serial chains of short | |||
messages. It argues that, given most of the value people get from | messages. It argues that, given most of the value people get from | |||
the Internet is concentrated within short flows, this simple | the Internet is concentrated within short flows, this simple | |||
expedient would greatly increase the value of the best efforts | expedient would greatly increase the value of the best efforts | |||
Internet at minimal cost. | Internet at minimal cost. A similar but more extensive approach has | |||
been evaluated on Google servers [GentleAggro]. | ||||
The proposals discussed in this sub-section are experimental | ||||
approaches that are not yet in wide operational use, but they are | ||||
existence proofs that transports can make themselves robust against | ||||
loss of control packets. The examples are all TCP-based, but | ||||
applications over non-TCP transports could mitigate loss of control | ||||
packets by making similar use of Diffserv, data duplication, FEC etc. | ||||
4.2.4. Congestion Notification: Summary of Conflicting Advice | 4.2.4. Congestion Notification: Summary of Conflicting Advice | |||
+-----------+----------------+-----------------+--------------------+ | +-----------+----------------+-----------------+--------------------+ | |||
| transport | RED_1 (packet | RED_4 (linear | RED_5 (square byte | | | transport | RED_1 (packet | RED_4 (linear | RED_5 (square byte | | |||
| cc | mode drop) | byte mode drop) | mode drop) | | | cc | mode drop) | byte mode drop) | mode drop) | | |||
+-----------+----------------+-----------------+--------------------+ | +-----------+----------------+-----------------+--------------------+ | |||
| TCP or | s/sqrt(p) | sqrt(s/p) | 1/sqrt(p) | | | TCP or | s/sqrt(p) | sqrt(s/p) | 1/sqrt(p) | | |||
| TFRC | | | | | | TFRC | | | | | |||
| TFRC-SP | 1/sqrt(p) | 1/sqrt(sp) | 1/(s.sqrt(p)) | | | TFRC-SP | 1/sqrt(p) | 1/sqrt(sp) | 1/(s.sqrt(p)) | | |||
skipping to change at page 27, line 4 | skipping to change at page 27, line 19 | |||
o When network equipment decides whether to drop (or mark) a packet, | o When network equipment decides whether to drop (or mark) a packet, | |||
it is recommended that the size of the particular packet should | it is recommended that the size of the particular packet should | |||
not be taken into account | not be taken into account | |||
o However, when a transport algorithm responds to a dropped or | o However, when a transport algorithm responds to a dropped or | |||
marked packet, the size of the rate reduction should be | marked packet, the size of the rate reduction should be | |||
proportionate to the size of the packet. | proportionate to the size of the packet. | |||
In summary, the answers are 'it depends', 'no' and 'yes' respectively | In summary, the answers are 'it depends', 'no' and 'yes' respectively | |||
For the specific case of RED, this means that byte-mode queue | For the specific case of RED, this means that byte-mode queue | |||
measurement will often be appropriate although byte-mode drop is | measurement will often be appropriate but the use of byte-mode drop | |||
strongly deprecated. | is very strongly discouraged. | |||
At the transport layer the IETF should continue updating congestion | At the transport layer the IETF should continue updating congestion | |||
control protocols to take account of the size of each packet that | control protocols to take account of the size of each packet that | |||
indicates congestion. Also the IETF should continue to make | indicates congestion. Also the IETF should continue to make | |||
protocols less sensitive to losing control packets like SYNs, pure | protocols less sensitive to losing control packets like SYNs, pure | |||
ACKs and DNS exchanges. Although many control packets happen to be | ACKs and DNS exchanges. Although many control packets happen to be | |||
small, the alternative of network equipment favouring all small | small, the alternative of network equipment favouring all small | |||
packets would be dangerous. That would create perverse incentives to | packets would be dangerous. That would create perverse incentives to | |||
split data transfers into smaller packets. | split data transfers into smaller packets. | |||
skipping to change at page 29, line 5 | skipping to change at page 29, line 24 | |||
www.stanford.edu/~balaji/papers/ | www.stanford.edu/~balaji/papers/ | |||
01approximatefair.pdf}>. | 01approximatefair.pdf}>. | |||
[DRQ] Shin, M., Chong, S., and I. Rhee, "Dual- | [DRQ] Shin, M., Chong, S., and I. Rhee, "Dual- | |||
Resource TCP/AQM for Processing- | Resource TCP/AQM for Processing- | |||
Constrained Networks", IEEE/ACM | Constrained Networks", IEEE/ACM | |||
Transactions on Networking Vol 16, issue | Transactions on Networking Vol 16, issue | |||
2, April 2008, <http://dx.doi.org/10.1109/ | 2, April 2008, <http://dx.doi.org/10.1109/ | |||
TNET.2007.900415>. | TNET.2007.900415>. | |||
[DupTCP] Wischik, D., "Short messages", Royal | [DupTCP] Wischik, D., "Short messages", | |||
Society workshop on networks: modelling | Philosphical Transactions of the Royal | |||
and control , September 2007, <http:// | Society A 366(1872):1941-1953, June 2008, | |||
www.cs.ucl.ac.uk/staff/ucacdjw/Research/ | <http://rsta.royalsocietypublishing.org/ | |||
shortmsg.html>. | content/366/1872/1941.full.pdf+html>. | |||
[ECNFixedWireless] Siris, V., "Resource Control for Elastic | [ECNFixedWireless] Siris, V., "Resource Control for Elastic | |||
Traffic in CDMA Networks", Proc. ACM | Traffic in CDMA Networks", Proc. ACM | |||
MOBICOM'02 , September 2002, <http:// | MOBICOM'02 , September 2002, <http:// | |||
www.ics.forth.gr/netlab/publications/ | www.ics.forth.gr/netlab/publications/ | |||
resource_control_elastic_cdma.html>. | resource_control_elastic_cdma.html>. | |||
[Evol_cc] Gibbens, R. and F. Kelly, "Resource | [Evol_cc] Gibbens, R. and F. Kelly, "Resource | |||
pricing and the evolution of congestion | pricing and the evolution of congestion | |||
control", Automatica 35(12)1969--1985, | control", Automatica 35(12)1969--1985, | |||
December 1999, <http:// | December 1999, <http:// | |||
www.statslab.cam.ac.uk/~frank/evol.html>. | www.statslab.cam.ac.uk/~frank/evol.html>. | |||
[GentleAggro] Flach, T., Dukkipati, N., Terzis, A., | ||||
Raghavan, B., Cardwell, N., Cheng, Y., | ||||
Jain, A., Hao, S., Katz-Bassett, E., and | ||||
R. Govindan, "Reducing Web Latency: the | ||||
Virtue of Gentle Aggression", ACM SIGCOMM | ||||
CCR 43(4)159--170, August 2013, <http:// | ||||
doi.acm.org/10.1145/2486001.2486014>. | ||||
[I-D.nichols-tsvwg-codel] Nichols, K. and V. Jacobson, "Controlled | [I-D.nichols-tsvwg-codel] Nichols, K. and V. Jacobson, "Controlled | |||
Delay Active Queue Management", | Delay Active Queue Management", | |||
draft-nichols-tsvwg-codel-01 (work in | draft-nichols-tsvwg-codel-01 (work in | |||
progress), February 2013. | progress), February 2013. | |||
[I-D.pan-tsvwg-pie] Pan, R., Natarajan, P., Piglione, C., and | [I-D.pan-tsvwg-pie] Pan, R., Natarajan, P., Piglione, C., and | |||
M. Prabhu, "PIE: A Lightweight Control | M. Prabhu, "PIE: A Lightweight Control | |||
Scheme To Address the Bufferbloat | Scheme To Address the Bufferbloat | |||
Problem", draft-pan-tsvwg-pie-00 (work in | Problem", draft-pan-tsvwg-pie-00 (work in | |||
progress), December 2012. | progress), December 2012. | |||
skipping to change at page 40, line 5 | skipping to change at page 40, line 35 | |||
across different size flows [Rate_fair_Dis]. | across different size flows [Rate_fair_Dis]. | |||
Appendix D. Changes from Previous Versions | Appendix D. Changes from Previous Versions | |||
To be removed by the RFC Editor on publication. | To be removed by the RFC Editor on publication. | |||
Full incremental diffs between each version are available at | Full incremental diffs between each version are available at | |||
<http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-byte-pkt-congest/> | <http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-byte-pkt-congest/> | |||
(courtesy of the rfcdiff tool): | (courtesy of the rfcdiff tool): | |||
From -11 to -12: Following the second pass through the IESG: | ||||
* Section 2.1 [Barry Leiba]: | ||||
+ s/No other choice makes sense,/Subject to the exceptions | ||||
below, no other choice makes sense,/ | ||||
+ s/Exceptions to these recommendations MAY be necessary | ||||
/Exceptions to these recommendations may be necessary / | ||||
* Sections 3.2 and 4.2.3 [Joel Jaeggli]: | ||||
+ Added comment to section 4.2.3 that the examples given are | ||||
not in widespread production use, but they give evidence | ||||
that it is possible to follow the advice given. | ||||
+ Section 4.2.3: | ||||
- OLD: Although there are no known proposals, it would also | ||||
be possible and perfectly valid to make control packets | ||||
robust against drop by explicitly requesting a lower drop | ||||
probability using their Diffserv code point [RFC2474] to | ||||
request a scheduling class with lower drop. | ||||
NEW: Although there are no known proposals, it would also | ||||
be possible and perfectly valid to make control packets | ||||
robust against drop by requesting a scheduling class with | ||||
lower drop probability, by re-marking to a Diffserv code | ||||
point [RFC2474] within the same behaviour aggregate. | ||||
- appended "Similarly applications, over non-TCP transports | ||||
could make any packets that are effectively control | ||||
packets more robust by using Diffserv, data duplication, | ||||
FEC etc." | ||||
+ Updated Wischik ref and added "Reducing Web Latency: the | ||||
Virtue of Gentle Aggression" ref. | ||||
* Expanded more abbreviations (CoDel, PIE, MTU). | ||||
* Section 1. Intro [Stephen Farrell]: | ||||
+ In the places where the doc desribes the dichotomy between | ||||
'long-term goal' and 'expediency' the words long term goal | ||||
and expedient have been introduced, to more explicitly refer | ||||
back to this introductory para (S.2.1 & S.2.3). | ||||
+ Added explanation of what scaling with packet size means. | ||||
* Conclusions [Benoit Claise]: | ||||
+ OLD: For the specific case of RED, this means that byte-mode | ||||
queue measurement will often be appropriate although byte- | ||||
mode drop is strongly deprecated. | ||||
NEW: For the specific case of RED, this means that byte-mode | ||||
queue measurement will often be appropriate but the use of | ||||
byte-mode drop is very strongly discouraged. | ||||
From -10 to -11: Following a further WGLC: | From -10 to -11: Following a further WGLC: | |||
* Abstract: clarified that advice applies to all AQMs including | * Abstract: clarified that advice applies to all AQMs including | |||
newer ones | newer ones | |||
* Abstract & Intro: changed 'read' to 'detect', because you don't | * Abstract & Intro: changed 'read' to 'detect', because you don't | |||
read losses, you detect them. | read losses, you detect them. | |||
* S.1. Introduction: Disambiguated summary of advice on queue | * S.1. Introduction: Disambiguated summary of advice on queue | |||
measurement. | measurement. | |||
End of changes. 26 change blocks. | ||||
64 lines changed or deleted | 142 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |