draft-ietf-opsawg-large-flow-load-balancing-04.txt | draft-ietf-opsawg-large-flow-load-balancing-05.txt | |||
---|---|---|---|---|
OPSAWG R. Krishnan | OPSAWG R. Krishnan | |||
Internet Draft S. Khanna | Internet Draft Brocade Communications | |||
Intended status: Informational Brocade Communications | Intended status: Informational L. Yong | |||
Expires: January 9, 2014 L. Yong | Expires: February 23, 2014 Huawei USA | |||
July 9, 2013 Huawei USA | August 23, 2013 A. Ghanwani | |||
A. Ghanwani | ||||
Dell | Dell | |||
Ning So | Ning So | |||
Tata Communications | Tata Communications | |||
S. Khanna | ||||
Cisco Systems | ||||
B. Khasnabish | B. Khasnabish | |||
ZTE Corporation | ZTE Corporation | |||
Mechanisms for Optimal LAG/ECMP Component Link Utilization in | Mechanisms for Optimal LAG/ECMP Component Link Utilization in | |||
Networks | Networks | |||
draft-ietf-opsawg-large-flow-load-balancing-04.txt | draft-ietf-opsawg-large-flow-load-balancing-05.txt | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. This document may not be modified, | provisions of BCP 78 and BCP 79. This document may not be modified, | |||
and derivative works of it may not be created, except to publish it | and derivative works of it may not be created, except to publish it | |||
as an RFC and to translate it into languages other than English. | as an RFC and to translate it into languages other than English. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
skipping to change at page 1, line 42 | skipping to change at page 1, line 43 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt | http://www.ietf.org/ietf/1id-abstracts.txt | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html | http://www.ietf.org/shadow.html | |||
This Internet-Draft will expire on January 9, 2014. | This Internet-Draft will expire on February 23, 2014. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2013 IETF Trust and the persons identified as the | Copyright (c) 2013 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 27 | skipping to change at page 2, line 27 | |||
Demands on networking infrastructure are growing exponentially; the | Demands on networking infrastructure are growing exponentially; the | |||
drivers are bandwidth hungry rich media applications, inter-data | drivers are bandwidth hungry rich media applications, inter-data | |||
center communications, etc. In this context, it is important to | center communications, etc. In this context, it is important to | |||
optimally use the bandwidth in wired networks that extensively use | optimally use the bandwidth in wired networks that extensively use | |||
LAG/ECMP techniques for bandwidth scaling. This draft explores some | LAG/ECMP techniques for bandwidth scaling. This draft explores some | |||
of the mechanisms useful for achieving this. | of the mechanisms useful for achieving this. | |||
Table of Contents | Table of Contents | |||
1. Introduction...................................................3 | 1. Introduction...................................................3 | |||
1.1. Acronyms..................................................3 | 1.1. Acronyms..................................................4 | |||
1.2. Terminology...............................................4 | 1.2. Terminology...............................................4 | |||
2. Flow Categorization............................................4 | 2. Flow Categorization............................................4 | |||
3. Hash-based Load Distribution in LAG/ECMP.......................5 | 3. Hash-based Load Distribution in LAG/ECMP.......................5 | |||
4. Mechanisms for Optimal LAG/ECMP Component Link Utilization.....7 | 4. Mechanisms for Optimal LAG/ECMP Component Link Utilization.....7 | |||
4.1. Differences in LAG vs ECMP................................8 | 4.1. Differences in LAG vs ECMP................................8 | |||
4.2. Overview of the mechanism.................................9 | 4.2. Overview of the mechanism.................................9 | |||
4.3. Large Flow Recognition...................................10 | 4.3. Large Flow Recognition...................................10 | |||
4.3.1. Flow Identification.................................10 | 4.3.1. Flow Identification.................................10 | |||
4.3.2. Criteria for Identifying a Large Flow...............10 | 4.3.2. Criteria for Identifying a Large Flow...............10 | |||
4.3.3. Sampling Techniques.................................11 | 4.3.3. Sampling Techniques.................................11 | |||
skipping to change at page 3, line 10 | skipping to change at page 3, line 10 | |||
5. Information Model for Flow Re-balancing.......................15 | 5. Information Model for Flow Re-balancing.......................15 | |||
5.1. Configuration Parameters for Flow Re-balancing...........16 | 5.1. Configuration Parameters for Flow Re-balancing...........16 | |||
5.2. System Configuration and Identification Parameters.......16 | 5.2. System Configuration and Identification Parameters.......16 | |||
5.3. Information for Alternative Placement of Large Flows.....17 | 5.3. Information for Alternative Placement of Large Flows.....17 | |||
5.4. Information for Redistribution of Small Flows............17 | 5.4. Information for Redistribution of Small Flows............17 | |||
5.5. Export of Flow Information...............................17 | 5.5. Export of Flow Information...............................17 | |||
5.6. Monitoring information...................................18 | 5.6. Monitoring information...................................18 | |||
5.6.1. Interface (link) utilization........................18 | 5.6.1. Interface (link) utilization........................18 | |||
5.6.2. Other monitoring information........................18 | 5.6.2. Other monitoring information........................18 | |||
6. Operational Considerations....................................18 | 6. Operational Considerations....................................18 | |||
6.1. Rebalancing Frequency....................................19 | ||||
6.2. Handling Route Changes...................................19 | ||||
7. IANA Considerations...........................................19 | 7. IANA Considerations...........................................19 | |||
8. Security Considerations.......................................19 | 8. Security Considerations.......................................19 | |||
9. Acknowledgements..............................................20 | 9. Acknowledgements..............................................20 | |||
10. References...................................................20 | 10. References...................................................20 | |||
10.1. Normative References....................................20 | 10.1. Normative References....................................20 | |||
10.2. Informative References..................................20 | 10.2. Informative References..................................20 | |||
1. Introduction | 1. Introduction | |||
Networks extensively use LAG/ECMP techniques for capacity scaling. | Networks extensively use LAG/ECMP techniques for capacity scaling. | |||
skipping to change at page 19, line 4 | skipping to change at page 19, line 6 | |||
5.6.2. Other monitoring information | 5.6.2. Other monitoring information | |||
Additional monitoring information includes: | Additional monitoring information includes: | |||
. Number of times rebalancing was done. | . Number of times rebalancing was done. | |||
. Time since the last rebalancing event. | . Time since the last rebalancing event. | |||
6. Operational Considerations | 6. Operational Considerations | |||
6.1. Rebalancing Frequency | ||||
Flows should be re-balanced only when the imbalance in the | Flows should be re-balanced only when the imbalance in the | |||
utilization across component links exceeds a certain threshold. | utilization across component links exceeds a certain threshold. | |||
Frequent re-balancing to achieve precise equitable utilization across | Frequent re-balancing to achieve precise equitable utilization across | |||
component links could be counter-productive as it may result in | component links could be counter-productive as it may result in | |||
moving flows back and forth between the component links impacting | moving flows back and forth between the component links impacting | |||
packet ordering and system stability. This applies regardless of | packet ordering and system stability. This applies regardless of | |||
whether large flows or small flows are re-distributed. It should be | whether large flows or small flows are re-distributed. It should be | |||
noted that reordering is a concern for TCP flows with even a few | noted that reordering is a concern for TCP flows with even a few | |||
packets because three out-of-order packets would trigger sufficient | packets because three out-of-order packets would trigger sufficient | |||
duplicate ACKs to the sender resulting in a retransmission [RFC | duplicate ACKs to the sender resulting in a retransmission [RFC | |||
5681]. | 5681]. | |||
The operator would have to experiment with various values of the | The operator would have to experiment with various values of the | |||
large flow recognition parameters (minimum bandwidth threshold, | large flow recognition parameters (minimum bandwidth threshold, | |||
observation interval) and the imbalance threshold across component | observation interval) and the imbalance threshold across component | |||
links to tune the solution for their environment. | links to tune the solution for their environment. | |||
6.2. Handling Route Changes | ||||
Large flow rebalancing must be aware of any changes to the FIB. In | ||||
cases where the next-hop of a route no longer to points to the LAG, | ||||
or to an ECMP group, any PBR entries added as described in Section | ||||
4.4.1 and 4.4.2 must be withdrawn in order to avoid the creation of | ||||
forwarding loops. | ||||
7. IANA Considerations | 7. IANA Considerations | |||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
8. Security Considerations | 8. Security Considerations | |||
This document does not directly impact the security of the Internet | This document does not directly impact the security of the Internet | |||
infrastructure or its applications. In fact, it could help if there | infrastructure or its applications. In fact, it could help if there | |||
is a DOS attack pattern which causes a hash imbalance resulting in | is a DOS attack pattern which causes a hash imbalance resulting in | |||
heavy overloading of large flows to certain LAG/ECMP component | heavy overloading of large flows to certain LAG/ECMP component | |||
skipping to change at page 21, line 33 | skipping to change at page 21, line 38 | |||
the ACM SIGCOMM, August 2011. | the ACM SIGCOMM, August 2011. | |||
[NDTM] Estan, C. and G. Varghese, "New directions in traffic | [NDTM] Estan, C. and G. Varghese, "New directions in traffic | |||
measurement and accounting," Proceedings of ACM SIGCOMM, August 2002. | measurement and accounting," Proceedings of ACM SIGCOMM, August 2002. | |||
[bin-pack] Coffman, Jr., E., M. Garey, and D. Johnson. Approximation | [bin-pack] Coffman, Jr., E., M. Garey, and D. Johnson. Approximation | |||
Algorithms for Bin-Packing -- An Updated Survey. In Algorithm Design | Algorithms for Bin-Packing -- An Updated Survey. In Algorithm Design | |||
for Computer System Design, ed. by Ausiello, Lucertini, and Serafini. | for Computer System Design, ed. by Ausiello, Lucertini, and Serafini. | |||
Springer-Verlag, 1984. | Springer-Verlag, 1984. | |||
Appendix A. Internet Traffic Analysis and Load Balancing Simulation | Appendix A. Internet Traffic Analysis and Load Balancing Simulation | |||
Internet traffic [CAIDA] has been analyzed to obtain flow statistics | Internet traffic [CAIDA] has been analyzed to obtain flow statistics | |||
such as the number of packets in a flow and the flow duration. The | such as the number of packets in a flow and the flow duration. The | |||
five tuples in the packet header (IP addresses, TCP/UDP Ports, and IP | five tuples in the packet header (IP addresses, TCP/UDP Ports, and IP | |||
protocol) are used for flow identification. The analysis indicates | protocol) are used for flow identification. The analysis indicates | |||
that < ~2% of the flows take ~30% of total traffic volume while the | that < ~2% of the flows take ~30% of total traffic volume while the | |||
rest of the flows (> ~98%) contributes ~70% [YONG]. | rest of the flows (> ~98%) contributes ~70% [YONG]. | |||
The simulation has shown that given Internet traffic pattern, the | The simulation has shown that given Internet traffic pattern, the | |||
hash-based technique does not evenly distribute the flows over ECMP | hash-based technique does not evenly distribute the flows over ECMP | |||
skipping to change at page 22, line 23 | skipping to change at page 22, line 30 | |||
traffic characteristics [YONG]. | traffic characteristics [YONG]. | |||
Authors' Addresses | Authors' Addresses | |||
Ram Krishnan | Ram Krishnan | |||
Brocade Communications | Brocade Communications | |||
San Jose, 95134, USA | San Jose, 95134, USA | |||
Phone: +1-408-406-7890 | Phone: +1-408-406-7890 | |||
Email: ramk@brocade.com | Email: ramk@brocade.com | |||
Sanjay Khanna | ||||
Brocade Communications | ||||
San Jose, 95134, USA | ||||
Phone: +1-408-333-4850 | ||||
Email: skhanna@brocade.com | ||||
Lucy Yong | Lucy Yong | |||
Huawei USA | Huawei USA | |||
5340 Legacy Drive | 5340 Legacy Drive | |||
Plano, TX 75025, USA | Plano, TX 75025, USA | |||
Phone: +1-469-277-5837 | Phone: +1-469-277-5837 | |||
Email: lucy.yong@huawei.com | Email: lucy.yong@huawei.com | |||
Anoop Ghanwani | Anoop Ghanwani | |||
Dell | Dell | |||
San Jose, CA 95134 | San Jose, CA 95134 | |||
Phone: +1-408-571-3228 | Phone: +1-408-571-3228 | |||
Email: anoop@alumni.duke.edu | Email: anoop@alumni.duke.edu | |||
Ning So | Ning So | |||
Tata Communications | Tata Communications | |||
Plano, TX 75082, USA | Plano, TX 75082, USA | |||
Phone: +1-972-955-0914 | Phone: +1-972-955-0914 | |||
Email: ning.so@tatacommunications.com | Email: ning.so@tatacommunications.com | |||
Sanjay Khanna | ||||
Cisco Systems | ||||
Email: sanjakha@gmail.com | ||||
Bhumip Khasnabish | Bhumip Khasnabish | |||
ZTE Corporation | ZTE Corporation | |||
New Jersey, 07960, USA | New Jersey, 07960, USA | |||
Phone: +1-781-752-8003 | Phone: +1-781-752-8003 | |||
Email: bhumip.khasnabish@zteusa.com | Email: bhumip.khasnabish@zteusa.com | |||
End of changes. 11 change blocks. | ||||
15 lines changed or deleted | 27 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |