draft-ietf-opsawg-large-flow-load-balancing-04.txt   draft-ietf-opsawg-large-flow-load-balancing-05.txt 
OPSAWG R. Krishnan OPSAWG R. Krishnan
Internet Draft S. Khanna Internet Draft Brocade Communications
Intended status: Informational Brocade Communications Intended status: Informational L. Yong
Expires: January 9, 2014 L. Yong Expires: February 23, 2014 Huawei USA
July 9, 2013 Huawei USA August 23, 2013 A. Ghanwani
A. Ghanwani
Dell Dell
Ning So Ning So
Tata Communications Tata Communications
S. Khanna
Cisco Systems
B. Khasnabish B. Khasnabish
ZTE Corporation ZTE Corporation
Mechanisms for Optimal LAG/ECMP Component Link Utilization in Mechanisms for Optimal LAG/ECMP Component Link Utilization in
Networks Networks
draft-ietf-opsawg-large-flow-load-balancing-04.txt draft-ietf-opsawg-large-flow-load-balancing-05.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. This document may not be modified, provisions of BCP 78 and BCP 79. This document may not be modified,
and derivative works of it may not be created, except to publish it and derivative works of it may not be created, except to publish it
as an RFC and to translate it into languages other than English. as an RFC and to translate it into languages other than English.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 42 skipping to change at page 1, line 43
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on January 9, 2014. This Internet-Draft will expire on February 23, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 27 skipping to change at page 2, line 27
Demands on networking infrastructure are growing exponentially; the Demands on networking infrastructure are growing exponentially; the
drivers are bandwidth hungry rich media applications, inter-data drivers are bandwidth hungry rich media applications, inter-data
center communications, etc. In this context, it is important to center communications, etc. In this context, it is important to
optimally use the bandwidth in wired networks that extensively use optimally use the bandwidth in wired networks that extensively use
LAG/ECMP techniques for bandwidth scaling. This draft explores some LAG/ECMP techniques for bandwidth scaling. This draft explores some
of the mechanisms useful for achieving this. of the mechanisms useful for achieving this.
Table of Contents Table of Contents
1. Introduction...................................................3 1. Introduction...................................................3
1.1. Acronyms..................................................3 1.1. Acronyms..................................................4
1.2. Terminology...............................................4 1.2. Terminology...............................................4
2. Flow Categorization............................................4 2. Flow Categorization............................................4
3. Hash-based Load Distribution in LAG/ECMP.......................5 3. Hash-based Load Distribution in LAG/ECMP.......................5
4. Mechanisms for Optimal LAG/ECMP Component Link Utilization.....7 4. Mechanisms for Optimal LAG/ECMP Component Link Utilization.....7
4.1. Differences in LAG vs ECMP................................8 4.1. Differences in LAG vs ECMP................................8
4.2. Overview of the mechanism.................................9 4.2. Overview of the mechanism.................................9
4.3. Large Flow Recognition...................................10 4.3. Large Flow Recognition...................................10
4.3.1. Flow Identification.................................10 4.3.1. Flow Identification.................................10
4.3.2. Criteria for Identifying a Large Flow...............10 4.3.2. Criteria for Identifying a Large Flow...............10
4.3.3. Sampling Techniques.................................11 4.3.3. Sampling Techniques.................................11
skipping to change at page 3, line 10 skipping to change at page 3, line 10
5. Information Model for Flow Re-balancing.......................15 5. Information Model for Flow Re-balancing.......................15
5.1. Configuration Parameters for Flow Re-balancing...........16 5.1. Configuration Parameters for Flow Re-balancing...........16
5.2. System Configuration and Identification Parameters.......16 5.2. System Configuration and Identification Parameters.......16
5.3. Information for Alternative Placement of Large Flows.....17 5.3. Information for Alternative Placement of Large Flows.....17
5.4. Information for Redistribution of Small Flows............17 5.4. Information for Redistribution of Small Flows............17
5.5. Export of Flow Information...............................17 5.5. Export of Flow Information...............................17
5.6. Monitoring information...................................18 5.6. Monitoring information...................................18
5.6.1. Interface (link) utilization........................18 5.6.1. Interface (link) utilization........................18
5.6.2. Other monitoring information........................18 5.6.2. Other monitoring information........................18
6. Operational Considerations....................................18 6. Operational Considerations....................................18
6.1. Rebalancing Frequency....................................19
6.2. Handling Route Changes...................................19
7. IANA Considerations...........................................19 7. IANA Considerations...........................................19
8. Security Considerations.......................................19 8. Security Considerations.......................................19
9. Acknowledgements..............................................20 9. Acknowledgements..............................................20
10. References...................................................20 10. References...................................................20
10.1. Normative References....................................20 10.1. Normative References....................................20
10.2. Informative References..................................20 10.2. Informative References..................................20
1. Introduction 1. Introduction
Networks extensively use LAG/ECMP techniques for capacity scaling. Networks extensively use LAG/ECMP techniques for capacity scaling.
skipping to change at page 19, line 4 skipping to change at page 19, line 6
5.6.2. Other monitoring information 5.6.2. Other monitoring information
Additional monitoring information includes: Additional monitoring information includes:
. Number of times rebalancing was done. . Number of times rebalancing was done.
. Time since the last rebalancing event. . Time since the last rebalancing event.
6. Operational Considerations 6. Operational Considerations
6.1. Rebalancing Frequency
Flows should be re-balanced only when the imbalance in the Flows should be re-balanced only when the imbalance in the
utilization across component links exceeds a certain threshold. utilization across component links exceeds a certain threshold.
Frequent re-balancing to achieve precise equitable utilization across Frequent re-balancing to achieve precise equitable utilization across
component links could be counter-productive as it may result in component links could be counter-productive as it may result in
moving flows back and forth between the component links impacting moving flows back and forth between the component links impacting
packet ordering and system stability. This applies regardless of packet ordering and system stability. This applies regardless of
whether large flows or small flows are re-distributed. It should be whether large flows or small flows are re-distributed. It should be
noted that reordering is a concern for TCP flows with even a few noted that reordering is a concern for TCP flows with even a few
packets because three out-of-order packets would trigger sufficient packets because three out-of-order packets would trigger sufficient
duplicate ACKs to the sender resulting in a retransmission [RFC duplicate ACKs to the sender resulting in a retransmission [RFC
5681]. 5681].
The operator would have to experiment with various values of the The operator would have to experiment with various values of the
large flow recognition parameters (minimum bandwidth threshold, large flow recognition parameters (minimum bandwidth threshold,
observation interval) and the imbalance threshold across component observation interval) and the imbalance threshold across component
links to tune the solution for their environment. links to tune the solution for their environment.
6.2. Handling Route Changes
Large flow rebalancing must be aware of any changes to the FIB. In
cases where the next-hop of a route no longer to points to the LAG,
or to an ECMP group, any PBR entries added as described in Section
4.4.1 and 4.4.2 must be withdrawn in order to avoid the creation of
forwarding loops.
7. IANA Considerations 7. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
8. Security Considerations 8. Security Considerations
This document does not directly impact the security of the Internet This document does not directly impact the security of the Internet
infrastructure or its applications. In fact, it could help if there infrastructure or its applications. In fact, it could help if there
is a DOS attack pattern which causes a hash imbalance resulting in is a DOS attack pattern which causes a hash imbalance resulting in
heavy overloading of large flows to certain LAG/ECMP component heavy overloading of large flows to certain LAG/ECMP component
skipping to change at page 21, line 33 skipping to change at page 21, line 38
the ACM SIGCOMM, August 2011. the ACM SIGCOMM, August 2011.
[NDTM] Estan, C. and G. Varghese, "New directions in traffic [NDTM] Estan, C. and G. Varghese, "New directions in traffic
measurement and accounting," Proceedings of ACM SIGCOMM, August 2002. measurement and accounting," Proceedings of ACM SIGCOMM, August 2002.
[bin-pack] Coffman, Jr., E., M. Garey, and D. Johnson. Approximation [bin-pack] Coffman, Jr., E., M. Garey, and D. Johnson. Approximation
Algorithms for Bin-Packing -- An Updated Survey. In Algorithm Design Algorithms for Bin-Packing -- An Updated Survey. In Algorithm Design
for Computer System Design, ed. by Ausiello, Lucertini, and Serafini. for Computer System Design, ed. by Ausiello, Lucertini, and Serafini.
Springer-Verlag, 1984. Springer-Verlag, 1984.
Appendix A. Internet Traffic Analysis and Load Balancing Simulation Appendix A. Internet Traffic Analysis and Load Balancing Simulation
Internet traffic [CAIDA] has been analyzed to obtain flow statistics Internet traffic [CAIDA] has been analyzed to obtain flow statistics
such as the number of packets in a flow and the flow duration. The such as the number of packets in a flow and the flow duration. The
five tuples in the packet header (IP addresses, TCP/UDP Ports, and IP five tuples in the packet header (IP addresses, TCP/UDP Ports, and IP
protocol) are used for flow identification. The analysis indicates protocol) are used for flow identification. The analysis indicates
that < ~2% of the flows take ~30% of total traffic volume while the that < ~2% of the flows take ~30% of total traffic volume while the
rest of the flows (> ~98%) contributes ~70% [YONG]. rest of the flows (> ~98%) contributes ~70% [YONG].
The simulation has shown that given Internet traffic pattern, the The simulation has shown that given Internet traffic pattern, the
hash-based technique does not evenly distribute the flows over ECMP hash-based technique does not evenly distribute the flows over ECMP
skipping to change at page 22, line 23 skipping to change at page 22, line 30
traffic characteristics [YONG]. traffic characteristics [YONG].
Authors' Addresses Authors' Addresses
Ram Krishnan Ram Krishnan
Brocade Communications Brocade Communications
San Jose, 95134, USA San Jose, 95134, USA
Phone: +1-408-406-7890 Phone: +1-408-406-7890
Email: ramk@brocade.com Email: ramk@brocade.com
Sanjay Khanna
Brocade Communications
San Jose, 95134, USA
Phone: +1-408-333-4850
Email: skhanna@brocade.com
Lucy Yong Lucy Yong
Huawei USA Huawei USA
5340 Legacy Drive 5340 Legacy Drive
Plano, TX 75025, USA Plano, TX 75025, USA
Phone: +1-469-277-5837 Phone: +1-469-277-5837
Email: lucy.yong@huawei.com Email: lucy.yong@huawei.com
Anoop Ghanwani Anoop Ghanwani
Dell Dell
San Jose, CA 95134 San Jose, CA 95134
Phone: +1-408-571-3228 Phone: +1-408-571-3228
Email: anoop@alumni.duke.edu Email: anoop@alumni.duke.edu
Ning So Ning So
Tata Communications Tata Communications
Plano, TX 75082, USA Plano, TX 75082, USA
Phone: +1-972-955-0914 Phone: +1-972-955-0914
Email: ning.so@tatacommunications.com Email: ning.so@tatacommunications.com
Sanjay Khanna
Cisco Systems
Email: sanjakha@gmail.com
Bhumip Khasnabish Bhumip Khasnabish
ZTE Corporation ZTE Corporation
New Jersey, 07960, USA New Jersey, 07960, USA
Phone: +1-781-752-8003 Phone: +1-781-752-8003
Email: bhumip.khasnabish@zteusa.com Email: bhumip.khasnabish@zteusa.com
 End of changes. 11 change blocks. 
15 lines changed or deleted 27 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/