draft-ietf-opsawg-large-flow-load-balancing-11.txt   draft-ietf-opsawg-large-flow-load-balancing-12.txt 
OPSAWG R. Krishnan OPSAWG R. Krishnan
Internet Draft Brocade Communications Internet Draft Brocade Communications
Intended status: Informational L. Yong Intended status: Informational L. Yong
Expires: October 22, 2014 Huawei USA Expires: December 13, 2014 Huawei USA
A. Ghanwani A. Ghanwani
Dell Dell
Ning So Ning So
Tata Communications Tata Communications
B. Khasnabish B. Khasnabish
ZTE Corporation ZTE Corporation
April 22, 2014 June 13, 2014
Mechanisms for Optimizing LAG/ECMP Component Link Utilization in Mechanisms for Optimizing LAG/ECMP Component Link Utilization in
Networks Networks
draft-ietf-opsawg-large-flow-load-balancing-11.txt draft-ietf-opsawg-large-flow-load-balancing-12.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. This document may not be modified, provisions of BCP 78 and BCP 79. This document may not be modified,
and derivative works of it may not be created, except to publish it and derivative works of it may not be created, except to publish it
as an RFC and to translate it into languages other than English. as an RFC and to translate it into languages other than English.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 42 skipping to change at page 1, line 42
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on October 22, 2014. This Internet-Draft will expire on December 13, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 31 skipping to change at page 2, line 31
link aggregation groups and equal cost multi-paths as techniques for link aggregation groups and equal cost multi-paths as techniques for
bandwidth scaling. This draft explores some of the mechanisms useful bandwidth scaling. This draft explores some of the mechanisms useful
for achieving this. for achieving this.
Table of Contents Table of Contents
1. Introduction...................................................3 1. Introduction...................................................3
1.1. Acronyms..................................................4 1.1. Acronyms..................................................4
1.2. Terminology...............................................4 1.2. Terminology...............................................4
2. Flow Categorization............................................5 2. Flow Categorization............................................5
3. Hash-based Load Distribution in LAG/ECMP.......................5 3. Hash-based Load Distribution in LAG/ECMP.......................6
4. Mechanisms for Optimizing LAG/ECMP Component Link Utilization..7 4. Mechanisms for Optimizing LAG/ECMP Component Link Utilization..7
4.1. Differences in LAG vs ECMP................................8 4.1. Differences in LAG vs ECMP................................8
4.2. Operational Overview......................................9 4.2. Operational Overview......................................9
4.3. Large Flow Recognition...................................10 4.3. Large Flow Recognition...................................10
4.3.1. Flow Identification.................................10 4.3.1. Flow Identification.................................10
4.3.2. Criteria and Techniques for Large Flow Recognition..11 4.3.2. Criteria and Techniques for Large Flow Recognition..11
4.3.3. Sampling Techniques.................................11 4.3.3. Sampling Techniques.................................11
4.3.4. Inline Data Path Measurement........................13 4.3.4. Inline Data Path Measurement........................13
4.3.5. Use of More Than One Method for Large Flow Recognition13 4.3.5. Use of More Than One Method for Large Flow
Recognition.........................................13
4.4. Load Rebalancing Options.................................14 4.4. Load Rebalancing Options.................................14
4.4.1. Alternative Placement of Large Flows................14 4.4.1. Alternative Placement of Large Flows................14
4.4.2. Redistributing Small Flows..........................15 4.4.2. Redistributing Small Flows..........................15
4.4.3. Component Link Protection Considerations............15 4.4.3. Component Link Protection Considerations............15
4.4.4. Load Rebalancing Algorithms.........................15 4.4.4. Load Rebalancing Algorithms.........................15
4.4.5. Load Rebalancing Example............................16 4.4.5. Load Rebalancing Example............................16
5. Information Model for Flow Rebalancing........................17 5. Information Model for Flow Rebalancing........................17
5.1. Configuration Parameters for Flow Rebalancing............17 5.1. Configuration Parameters for Flow Rebalancing............17
5.2. System Configuration and Identification Parameters.......18 5.2. System Configuration and Identification Parameters.......18
5.3. Information for Alternative Placement of Large Flows.....19 5.3. Information for Alternative Placement of Large Flows.....19
5.4. Information for Redistribution of Small Flows............19 5.4. Information for Redistribution of Small Flows............19
5.5. Export of Flow Information...............................20 5.5. Export of Flow Information...............................20
5.6. Monitoring information...................................20 5.6. Monitoring information...................................20
5.6.1. Interface (link) utilization........................20 5.6.1. Interface (link) utilization........................20
5.6.2. Other monitoring information........................21 5.6.2. Other monitoring information........................20
6. Operational Considerations....................................21 6. Operational Considerations....................................21
6.1. Rebalancing Frequency....................................21 6.1. Rebalancing Frequency....................................21
6.2. Handling Route Changes...................................22 6.2. Handling Route Changes...................................21
6.3. Forwarding Resources.....................................21
7. IANA Considerations...........................................22 7. IANA Considerations...........................................22
8. Security Considerations.......................................22 8. Security Considerations.......................................22
9. Contributing Authors..........................................22 9. Contributing Authors..........................................22
10. Acknowledgements.............................................22 10. Acknowledgements.............................................22
11. References...................................................22 11. References...................................................23
11.1. Normative References....................................22 11.1. Normative References....................................23
11.2. Informative References..................................22 11.2. Informative References..................................23
1. Introduction 1. Introduction
Networks extensively use link aggregation groups (LAG) [802.1AX] and Networks extensively use link aggregation groups (LAG) [802.1AX] and
equal cost multi-paths (ECMP) [RFC 2991] as techniques for capacity equal cost multi-paths (ECMP) [RFC 2991] as techniques for capacity
scaling. For the problems addressed by this document, network traffic scaling. For the problems addressed by this document, network traffic
can be predominantly categorized into two traffic types: long-lived can be predominantly categorized into two traffic types: long-lived
large flows and other flows. These other flows, which include long- large flows and other flows. These other flows, which include long-
lived small flows, short-lived small flows, and short-lived large lived small flows, short-lived small flows, and short-lived large
flows, are referred to as "small flows" in this document. Long-lived flows, are referred to as "small flows" in this document. Long-lived
skipping to change at page 4, line 11 skipping to change at page 4, line 12
of bandwidth on a link, e.g. greater than 5% of link bandwidth. The of bandwidth on a link, e.g. greater than 5% of link bandwidth. The
number of such flows would necessarily be fairly small, e.g. on the number of such flows would necessarily be fairly small, e.g. on the
order of 10's or 100's per LAG/ECMP. In other words, the number of order of 10's or 100's per LAG/ECMP. In other words, the number of
large flows is NOT expected to be on the order of millions of flows. large flows is NOT expected to be on the order of millions of flows.
Examples of such large flows would be IPsec tunnels in service Examples of such large flows would be IPsec tunnels in service
provider backbone networks or storage backup traffic in data center provider backbone networks or storage backup traffic in data center
networks. networks.
1.1. Acronyms 1.1. Acronyms
COTS: Commercial Off-the-shelf
DOS: Denial of Service DOS: Denial of Service
ECMP: Equal Cost Multi-path ECMP: Equal Cost Multi-path
GRE: Generic Routing Encapsulation GRE: Generic Routing Encapsulation
LAG: Link Aggregation Group LAG: Link Aggregation Group
MPLS: Multiprotocol Label Switching MPLS: Multiprotocol Label Switching
skipping to change at page 4, line 37 skipping to change at page 4, line 36
QoS: Quality of Service QoS: Quality of Service
STT: Stateless Transport Tunneling STT: Stateless Transport Tunneling
TCAM: Ternary Content Addressable Memory TCAM: Ternary Content Addressable Memory
VXLAN: Virtual Extensible LAN VXLAN: Virtual Extensible LAN
1.2. Terminology 1.2. Terminology
Central management entity: Refers to an entity that is capable of
monitoring information about link utilization and flows in routers
across the network and may be capable of making traffic engineering
decisions for placement of large flows. It may include the functions
of a collector if the routers employ a sampling technique [RFC 7011].
ECMP component link: An individual nexthop within an ECMP group. An ECMP component link: An individual nexthop within an ECMP group. An
ECMP component link may itself comprise a LAG. ECMP component link may itself comprise a LAG.
ECMP table: A table that is used as the nexthop of an ECMP route that ECMP table: A table that is used as the nexthop of an ECMP route that
comprises the set of component links and the weights associated with comprises the set of component links and the weights associated with
each of those component links. The weights are used to determine each of those component links. The weights are used to determine
which values of the hash function map to a given component link. which values of the hash function map to a given component link.
LAG component link: An individual link within a LAG. A LAG component LAG component link: An individual link within a LAG. A LAG component
link is typically a physical link. link is typically a physical link.
skipping to change at page 7, line 11 skipping to change at page 7, line 11
o The presence of 2 large flows causes congestion on this o The presence of 2 large flows causes congestion on this
component link. component link.
+-----------+ -> +-----------+ +-----------+ -> +-----------+
| | -> | | | | -> | |
| | ===> | | | | ===> | |
| (1)|--------|(1) | | (1)|--------|(1) |
| | -> | | | | -> | |
| | -> | | | | -> | |
| (R1) | -> | (R2) | | (R1) | -> | (R2) |
| (2)|--------|(2) | | (2)|--------|(2) |
| | -> | | | | -> | |
| | -> | | | | -> | |
| | ===> | | | | ===> | |
| | ===> | | | | ===> | |
| (3)|--------|(3) | | (3)|--------|(3) |
| | | | | | | |
+-----------+ +-----------+ +-----------+ +-----------+
Where: -> small flow Where: -> small flow
skipping to change at page 8, line 47 skipping to change at page 8, line 47
+-----+ +-----+ +-----+ +-----+
/ \ \ / /\ / \ \ / /\
/ +---------+ / \ / +---------+ / \
/ / \ \ / \ / / \ \ / \
/ / \ +------+ \ / / \ +------+ \
/ / \ / \ \ / / \ / \ \
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+
| L1 | | L2 | | L3 | | L1 | | L2 | | L3 |
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+
Figure 3: Two-level Fat Tree Figure 3: Two-level Clos Network
To demonstrate the limitations of local optimization, consider a two- To demonstrate the limitations of local optimization, consider a two-
level fat-tree topology with three leaf nodes (L1, L2, L3) and two level Clos network topology as shown in Figure 3 with three leaf
spine nodes (S1, S2) and assume all of the links are 10 Gbps. nodes (L1, L2, L3) and two spine nodes (S1, S2). Assume all of the
links are 10 Gbps.
Let L1 have two flows of 4 Gbps each towards L3, and let L2 have one Let L1 have two flows of 4 Gbps each towards L3, and let L2 have one
flow of 7 Gbps also towards L3. If L1 balances the load optimally flow of 7 Gbps also towards L3. If L1 balances the load optimally
between S1 and S2, and L2 sends the flow via S1, then the downlink between S1 and S2, and L2 sends the flow via S1, then the downlink
from S1 to L3 would get congested resulting in packet discards. On from S1 to L3 would get congested resulting in packet discards. On
the other hand, if L1 had sent both its flows towards S1 and L2 had the other hand, if L1 had sent both its flows towards S1 and L2 had
sent its flow towards S2, there would have been no congestion at sent its flow towards S2, there would have been no congestion at
either S1 or S2. either S1 or S2.
The other issue with applying this scheme to ECMP groups is that it The other issue with applying this scheme to ECMP groups is that it
skipping to change at page 10, line 40 skipping to change at page 10, line 40
. IP header: IP Protocol, IP source address, IP destination . IP header: IP Protocol, IP source address, IP destination
address, flow label (IPv6 only), TCP/UDP source port, TCP/UDP address, flow label (IPv6 only), TCP/UDP source port, TCP/UDP
destination port. destination port.
. MPLS Labels. . MPLS Labels.
For tunneling protocols like Generic Routing Encapsulation (GRE) For tunneling protocols like Generic Routing Encapsulation (GRE)
[RFC 2784], Virtual eXtensible Local Area Network (VXLAN) [VXLAN], [RFC 2784], Virtual eXtensible Local Area Network (VXLAN) [VXLAN],
Network Virtualization using Generic Routing Encapsulation (NVGRE) Network Virtualization using Generic Routing Encapsulation (NVGRE)
[NVGRE], Stateless Transport Tunneling (STT) [STT], etc., flow [NVGRE], Stateless Transport Tunneling (STT) [STT], Layer 2 Tunneling
identification is possible based on inner and/or outer headers. The Protocol (L2TP) [RFC 3931], etc., flow identification is possible
above list is not exhaustive. The mechanisms described in this based on inner and/or outer headers as well as fields introduced by
document are agnostic to the fields that are used for flow the tunnel header, as any or all such fields may be used for load
identification. balancing decisions [RFC 5640]. The above list is not exhaustive.
The mechanisms described in this document are agnostic to the fields
that are used for flow identification.
This method of flow identification is consistent with that of IPFIX This method of flow identification is consistent with that of IPFIX
[RFC 7011]. [RFC 7011].
4.3.2. Criteria and Techniques for Large Flow Recognition 4.3.2. Criteria and Techniques for Large Flow Recognition
From a bandwidth and time duration perspective, in order to recognize From a bandwidth and time duration perspective, in order to recognize
large flows we define an observation interval and observe the large flows we define an observation interval and observe the
bandwidth of the flow over that interval. A flow that exceeds a bandwidth of the flow over that interval. A flow that exceeds a
certain minimum bandwidth threshold over that observation interval certain minimum bandwidth threshold over that observation interval
skipping to change at page 14, line 13 skipping to change at page 14, line 13
to reliably determine the mapping of large flows to component links to reliably determine the mapping of large flows to component links
of a LAG/ECMP group, it is acceptable for the router to use more than of a LAG/ECMP group, it is acceptable for the router to use more than
one method for large flow recognition. one method for large flow recognition.
If both methods are supported, inline data path measurement may be If both methods are supported, inline data path measurement may be
preferable because of its speed of detection [FLOW-ACC]. preferable because of its speed of detection [FLOW-ACC].
4.4. Load Rebalancing Options 4.4. Load Rebalancing Options
Below are suggested techniques for load rebalancing. Equipment Below are suggested techniques for load rebalancing. Equipment
vendors should implement all of these techniques and allow the vendors may implement more than one technique, including those not
operator to choose one or more techniques based on their described in this document, allowing the operator to choose between
applications. them.
Note that regardless of the method used, perfect rebalancing of large Note that regardless of the method used, perfect rebalancing of large
flows may not be possible since flows arrive and depart at different flows may not be possible since flows arrive and depart at different
times. Also, any flows that are moved from one component link to times. Also, any flows that are moved from one component link to
another may experience momentary packet reordering. another may experience momentary packet reordering.
4.4.1. Alternative Placement of Large Flows 4.4.1. Alternative Placement of Large Flows
Within a LAG/ECMP group, the member component links with least Within a LAG/ECMP group, the member component links with least
average port utilization are identified. Some large flow(s) from the average port utilization are identified. Some large flow(s) from the
skipping to change at page 16, line 28 skipping to change at page 16, line 28
flow -- and the link utilization is normal now. flow -- and the link utilization is normal now.
+-----------+ -> +-----------+ +-----------+ -> +-----------+
| | -> | | | | -> | |
| | ===> | | | | ===> | |
| (1)|--------|(1) | | (1)|--------|(1) |
| | | | | | | |
| | ===> | | | | ===> | |
| | -> | | | | -> | |
| | -> | | | | -> | |
| (R1) | -> | (R2) | | (R1) | -> | (R2) |
| (2)|--------|(2) | | (2)|--------|(2) |
| | | | | | | |
| | -> | | | | -> | |
| | -> | | | | -> | |
| | ===> | | | | ===> | |
| (3)|--------|(3) | | (3)|--------|(3) |
| | | | | | | |
+-----------+ +-----------+ +-----------+ +-----------+
Where: -> small flow Where: -> small flow
===> large flow ===> large flow
Figure 4: Evenly Utilized Composite Links Figure 4: Evenly Utilized Composite Links
Basically, the use of the mechanisms described in Section 4.4.1 Basically, the use of the mechanisms described in Section 4.4.1
resulted in a rebalancing of flows where one of the large flows on resulted in a rebalancing of flows where one of the large flows on
component link (3) which was previously congested was moved to component link (3) which was previously congested was moved to
component link (2) which was previously under-utilized. component link (2) which was previously under-utilized.
5. Information Model for Flow Rebalancing 5. Information Model for Flow Rebalancing
skipping to change at page 17, line 46 skipping to change at page 17, line 46
be recognized as a large flow until it falls below this be recognized as a large flow until it falls below this
threshold. This is also configured as a percentage of link threshold. This is also configured as a percentage of link
speed and is typically lower than the minimum bandwidth speed and is typically lower than the minimum bandwidth
threshold defined above. threshold defined above.
. Imbalance threshold: A measure of the deviation of the . Imbalance threshold: A measure of the deviation of the
component link utilizations from the utilization of the overall component link utilizations from the utilization of the overall
LAG/ECMP group. Since component links can be of a different LAG/ECMP group. Since component links can be of a different
speed, the imbalance can be computed as follows. Let the speed, the imbalance can be computed as follows. Let the
utilization of each component link in a LAG/ECMP group with n utilization of each component link in a LAG/ECMP group with n
links of speed b_1, b_2, .., b_n, be u_1, u_2, .., u_n. The mean links of speed b_1, b_2 ... b_n, be u_1, u_2 ... u_n. The mean
utilization is computed is u_ave = [ (u_1 x b_1) + (u_2 x b_2) + utilization is computed is u_ave = [ (u_1 x b_1) + (u_2 x b_2) +
.. + (u_n x b_n) ] / [b_1 + b_2 + b_n]. The imbalance is then ... + (u_n x b_n) ] / [b_1 + b_2 + ... + b_n]. The imbalance is
computed as max_{i=1..n} | u_i - u_ave | / u_ave. then computed as max_{i=1 ... n} | u_i - u_ave |.
. Rebalancing interval: The minimum amount of time between . Rebalancing interval: The minimum amount of time between
rebalancing events. This parameter ensures that rebalancing is rebalancing events. This parameter ensures that rebalancing is
not invoked too frequently as it impacts packet ordering. not invoked too frequently as it impacts packet ordering.
These parameters may be configured on a system-wide basis or it may These parameters may be configured on a system-wide basis or it may
apply to an individual LAG. It may be applied to an ECMP group apply to an individual LAG. It may be applied to an ECMP group
provided the component links are not shared with any other ECMP provided the component links are not shared with any other ECMP
group. group.
skipping to change at page 19, line 12 skipping to change at page 19, line 12
ECMP groups, or it may be configured specifically for a given LAG or ECMP groups, or it may be configured specifically for a given LAG or
ECMP group. ECMP group.
5.3. Information for Alternative Placement of Large Flows 5.3. Information for Alternative Placement of Large Flows
In cases where large flow recognition is handled by an external In cases where large flow recognition is handled by an external
management station (see Section 4.3.3), an information model for management station (see Section 4.3.3), an information model for
flows is required to allow the import of large flow information to flows is required to allow the import of large flow information to
the router. the router.
The following are some of the elements of information model for Typical fields use for identifying large flows were discussed in
importing of flows: Section 4.3.1. The IPFIX information model [RFC 7012] can be
leveraged for large flow identification.
. Layer 2: source MAC address, destination MAC address, VLAN ID.
. Layer 3 IP: IP Protocol, IP source address, IP destination
address, flow label (IPv6 only), TCP/UDP source port, TCP/UDP
destination port.
. MPLS Labels.
This list is not exhaustive. For example, with overlay protocols
such as VXLAN and NVGRE, fields from the outer and/or inner headers
may be specified. In general, all fields in the packet that can be
used by forwarding decisions should be available for use when
importing flow information from an external management station.
The IPFIX information model [RFC 7012] can be leveraged for large
flow identification.
Large Flow placement is achieved by specifying the relevant flow Large Flow placement is achieved by specifying the relevant flow
information along with the following: information along with the following:
. For LAG: Router's IP address, LAG ID, LAG component link ID. . For LAG: Router's IP address, LAG ID, LAG component link ID.
. For ECMP: Router's IP address, ECMP group, ECMP component link . For ECMP: Router's IP address, ECMP group, ECMP component link
ID. ID.
In the case where the ECMP component link itself comprises a LAG, we In the case where the ECMP component link itself comprises a LAG, we
skipping to change at page 20, line 40 skipping to change at page 20, line 27
5.6. Monitoring information 5.6. Monitoring information
5.6.1. Interface (link) utilization 5.6.1. Interface (link) utilization
The incoming bytes (ifInOctets), outgoing bytes (ifOutOctets) and The incoming bytes (ifInOctets), outgoing bytes (ifOutOctets) and
interface speed (ifSpeed) can be measured from the Interface table interface speed (ifSpeed) can be measured from the Interface table
(iftable) MIB [RFC 1213]. (iftable) MIB [RFC 1213].
The link utilization can then be computed as follows: The link utilization can then be computed as follows:
Incoming link utilization = (ifInOctets 8 / ifSpeed) Incoming link utilization = (ifInOctets/8) / ifSpeed
Outgoing link utilization = (ifOutOctets 8 / ifSpeed) Outgoing link utilization = (ifOutOctets/8) / ifSpeed
For high speed Ethernet links, the etherStatsHighCapacityTable MIB For high speed Ethernet links, the etherStatsHighCapacityTable MIB
[RFC 3273] can be used. [RFC 3273] can be used.
For scalability, it is recommended to use the counter push mechanism For scalability, it is recommended to use the counter push mechanism
in [sflow-v5] for the interface counters. Doing so would help avoid in [sflow-v5] for the interface counters. Doing so would help avoid
counter polling through the MIB interface. counter polling through the MIB interface.
The outgoing link utilization of the component links within a The outgoing link utilization of the component links within a
LAG/ECMP group can be used to compute the imbalance (See Section 5.1) LAG/ECMP group can be used to compute the imbalance (See Section 5.1)
skipping to change at page 22, line 13 skipping to change at page 21, line 48
links to tune the solution for their environment. links to tune the solution for their environment.
6.2. Handling Route Changes 6.2. Handling Route Changes
Large flow rebalancing must be aware of any changes to the FIB. In Large flow rebalancing must be aware of any changes to the FIB. In
cases where the nexthop of a route no longer to points to the LAG, or cases where the nexthop of a route no longer to points to the LAG, or
to an ECMP group, any PBR entries added as described in Section 4.4.1 to an ECMP group, any PBR entries added as described in Section 4.4.1
and 4.4.2 must be withdrawn in order to avoid the creation of and 4.4.2 must be withdrawn in order to avoid the creation of
forwarding loops. forwarding loops.
6.3. Forwarding Resources
Hash-based techniques used for load balancing with LAG/ECMP are
usually stateless. The mechanisms described in this document require
additional resources in the forwarding plane of routers for creating
PBR rules that are capable of overriding the forwarding decision from
the hash-based approach. These resources may limit the number of
flows that can be rebalanced and may also impact the latency
experienced by packets due to the additional lookups that are
required.
7. IANA Considerations 7. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
8. Security Considerations 8. Security Considerations
This document does not directly impact the security of the Internet This document does not directly impact the security of the Internet
infrastructure or its applications. In fact, it could help if there infrastructure or its applications. In fact, it could help if there
is a DOS attack pattern which causes a hash imbalance resulting in is a DOS attack pattern which causes a hash imbalance resulting in
heavy overloading of large flows to certain LAG/ECMP component heavy overloading of large flows to certain LAG/ECMP component
links. links.
An attacker with knowledge of the large flow recognition algorithm
and any stateless distribution method can generate flows that are
distributed in a way that overloads a specific path. This could be
used to cause the creation of PBR rules that exhaust the available
rule capacity on nodes. If PBR rules are consequently discarded,
this could result in congestion on the attacker-selected path.
Alternatively, tracking large numbers of PBR rules could result in
performance degradation.
9. Contributing Authors 9. Contributing Authors
Sanjay Khanna Sanjay Khanna
Cisco Systems Cisco Systems
Email: sanjakha@gmail.com Email: sanjakha@gmail.com
10. Acknowledgements 10. Acknowledgements
The authors would like to thank the following individuals for their The authors would like to thank the following individuals for their
review and valuable feedback on earlier versions of this document: review and valuable feedback on earlier versions of this document:
Shane Amante, Fred Baker, Michael Bugenhagen, Zhen Cao, Brian Shane Amante, Fred Baker, Michael Bugenhagen, Zhen Cao, Brian
Carpenter, Benoit Claise, Michael Fargano, Wes George, Sriganesh Carpenter, Benoit Claise, Michael Fargano, Wes George, Sriganesh
Kini, Roman Krzanowski, Andrew Malis, Dave McDysan, Pete Moyer, Kini, Roman Krzanowski, Andrew Malis, Dave McDysan, Pete Moyer,
Peter Phaal, Dan Romascanu, Curtis Villamizar, Jianrong Wong, George Peter Phaal, Dan Romascanu, Curtis Villamizar, Jianrong Wong, George
Yum, and Weifeng Zhang. Yum, and Weifeng Zhang. As a part of the IETF Last Call process,
valuable comments were received from Martin Thomson,
11. References 11. References
11.1. Normative References 11.1. Normative References
11.2. Informative References
[802.1AX] IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE [802.1AX] IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE
Standard for Local and Metropolitan Area Networks - Link Standard for Local and Metropolitan Area Networks - Link
Aggregation", 2008. Aggregation", 2008.
[RFC 2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
Multicast," November 2000.
[RFC 7011] Claise, B. et al., "Specification of the IP Flow
Information Export (IPFIX) Protocol for the Exchange of IP Traffic
Flow Information," September 2013.
[RFC 7012] Claise, B. and B. Trammell, "Information Model for IP Flow
Information Export (IPFIX)," September 2013.
[sFlow-v5] Phaal, P. and M. Lavine, "sFlow version 5,"
http://www.sflow.org/sflow_version_5.txt, July 2004.
11.2. Informative References
[bin-pack] Coffman, Jr., E., M. Garey, and D. Johnson. Approximation [bin-pack] Coffman, Jr., E., M. Garey, and D. Johnson. Approximation
Algorithms for Bin-Packing -- An Updated Survey. In Algorithm Design Algorithms for Bin-Packing -- An Updated Survey. In Algorithm Design
for Computer System Design, ed. by Ausiello, Lucertini, and Serafini. for Computer System Design, ed. by Ausiello, Lucertini, and Serafini.
Springer-Verlag, 1984. Springer-Verlag, 1984.
[CAIDA] Caida Internet Traffic Analysis, http://www.caida.org/home. [CAIDA] "Caida Internet Traffic Analysis," http://www.caida.org/home.
[DevoFlow] Mogul, J., et al., "DevoFlow: Cost-Effective Flow [DevoFlow] Mogul, J., et al., "DevoFlow: Cost-Effective Flow
Management for High Performance Enterprise Networks," Proceedings of Management for High Performance Enterprise Networks," Proceedings of
the ACM SIGCOMM, August 2011. the ACM SIGCOMM, August 2011.
[FLOW-ACC] Zseby, T., et al., "Packet sampling for flow accounting: [FLOW-ACC] Zseby, T., et al., "Packet sampling for flow accounting:
challenges and limitations," Proceedings of the 9th international challenges and limitations," Proceedings of the 9th international
conference on Passive and active network measurement, 2008. conference on Passive and active network measurement, 2008.
[ID.ietf-rtgwg-cl-requirement] Villamizar, C. et al., "Requirements [ID.ietf-rtgwg-cl-requirement] Villamizar, C. et al., "Requirements
for MPLS over a Composite Link," September 2013. for MPLS over a Composite Link," September 2013.
skipping to change at page 23, line 35 skipping to change at page 24, line 18
[NDTM] Estan, C. and G. Varghese, "New directions in traffic [NDTM] Estan, C. and G. Varghese, "New directions in traffic
measurement and accounting," Proceedings of ACM SIGCOMM, August 2002. measurement and accounting," Proceedings of ACM SIGCOMM, August 2002.
[NVGRE] Sridharan, M. et al., "NVGRE: Network Virtualization using [NVGRE] Sridharan, M. et al., "NVGRE: Network Virtualization using
Generic Routing Encapsulation," draft-sridharan-virtualization- Generic Routing Encapsulation," draft-sridharan-virtualization-
nvgre-04, February 2014. nvgre-04, February 2014.
[RFC 2784] Farinacci, D. et al., "Generic Routing Encapsulation [RFC 2784] Farinacci, D. et al., "Generic Routing Encapsulation
(GRE)," March 2000. (GRE)," March 2000.
[RFC 2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
Multicast," November 2000.
[RFC 6790] Kompella, K. et al., "The Use of Entropy Labels in MPLS [RFC 6790] Kompella, K. et al., "The Use of Entropy Labels in MPLS
Forwarding," November 2012. Forwarding," November 2012.
[RFC 1213] McCloghrie, K., "Management Information Base for Network [RFC 1213] McCloghrie, K., "Management Information Base for Network
Management of TCP/IP-based internets: MIB-II," March 1991. Management of TCP/IP-based internets: MIB-II," March 1991.
[RFC 2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path [RFC 2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path
Algorithm," November 2000. Algorithm," November 2000.
[RFC 3273] Waldbusser, S., "Remote Network Monitoring Management [RFC 3273] Waldbusser, S., "Remote Network Monitoring Management
Information Base for High Capacity Networks," July 2002. Information Base for High Capacity Networks," July 2002.
[RFC 3931] Lau, J. (Ed.), M. Townsley (Ed.), and I. Goyret (Ed.),
"Layer 2 Tunneling Protocol - Version 3," March 2005.
[RFC 3954] Claise, B., "Cisco Systems NetFlow Services Export Version [RFC 3954] Claise, B., "Cisco Systems NetFlow Services Export Version
9," October 2004. 9," October 2004.
[RFC 5470] G. Sadasivan et al., "Architecture for IP Flow Information [RFC 5470] G. Sadasivan et al., "Architecture for IP Flow Information
Export," March 2009. Export," March 2009.
[RFC 5475] Zseby, T. et al., "Sampling and Filtering Techniques for [RFC 5475] Zseby, T. et al., "Sampling and Filtering Techniques for
IP Packet Selection," March 2009. IP Packet Selection," March 2009.
[RFC 5640] Filsfils, C., P. Mohapatra, and C. Pignataro, "Load
Balancing for Mesh Softwires," August 2009.
[RFC 5681] Allman, M. et al., "TCP Congestion Control," September [RFC 5681] Allman, M. et al., "TCP Congestion Control," September
2009. 2009.
[RFC 7011] Claise, B. et al., "Specification of the IP Flow
Information Export (IPFIX) Protocol for the Exchange of IP Traffic
Flow Information," September 2013.
[RFC 7012] Claise, B. and B. Trammell, "Information Model for IP Flow
Information Export (IPFIX)," September 2013.
[SAMP-BASIC] Phaal, P. and S. Panchen, "Packet Sampling Basics," [SAMP-BASIC] Phaal, P. and S. Panchen, "Packet Sampling Basics,"
http://www.sflow.org/packetSamplingBasics/. http://www.sflow.org/packetSamplingBasics/.
[sFlow-LAG] Phaal, P. and A. Ghanwani, "sFlow LAG counters [sFlow-LAG] Phaal, P. and A. Ghanwani, "sFlow LAG counters
structure," http://www.sflow.org/sflow_lag.txt, September 2012. structure," http://www.sflow.org/sflow_lag.txt, September 2012.
[sFlow-v5] Phaal, P. and M. Lavine, "sFlow version 5," [STT] Davie, B. (Ed.) and J. Gross, "A Stateless Transport Tunneling
http://www.sflow.org/sflow_version_5.txt, July 2004.
[STT] Davie, B. (ed) and J. Gross, "A Stateless Transport Tunneling
Protocol for Network Virtualization (STT)," draft-davie-stt-06, March Protocol for Network Virtualization (STT)," draft-davie-stt-06, March
2014. 2014.
[VXLAN] Mahalingam, M. et al., "VXLAN: A Framework for Overlaying [VXLAN] Mahalingam, M. et al., "VXLAN: A Framework for Overlaying
Virtualized Layer 2 Networks over Layer 3 Networks," draft- Virtualized Layer 2 Networks over Layer 3 Networks," draft-
mahalingam-dutt-dcops-vxlan-09, April 2014. mahalingam-dutt-dcops-vxlan-09, April 2014.
[YONG] Yong, L., "Enhanced ECMP and Large Flow Aware Transport," [YONG] Yong, L., "Enhanced ECMP and Large Flow Aware Transport,"
draft-yong-pwe3-enhance-ecmp-lfat-01, September 2010. draft-yong-pwe3-enhance-ecmp-lfat-01, September 2010.
 End of changes. 34 change blocks. 
69 lines changed or deleted 90 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/