--- 1/draft-ietf-intarea-frag-fragile-14.txt 2019-07-06 13:13:10.459375086 -0700 +++ 2/draft-ietf-intarea-frag-fragile-15.txt 2019-07-06 13:13:10.515376830 -0700 @@ -1,27 +1,27 @@ Internet Area WG R. Bonica Internet-Draft Juniper Networks Intended status: Best Current Practice F. Baker -Expires: January 6, 2020 Unaffiliated +Expires: January 7, 2020 Unaffiliated G. Huston APNIC R. Hinden Check Point Software O. Troan Cisco F. Gont SI6 Networks - July 5, 2019 + July 6, 2019 IP Fragmentation Considered Fragile - draft-ietf-intarea-frag-fragile-14 + draft-ietf-intarea-frag-fragile-15 Abstract This document describes IP fragmentation and explains how it introduces fragility to Internet communication. This document also proposes alternatives to IP fragmentation and provides recommendations for developers and network operators. Status of This Memo @@ -32,21 +32,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on January 6, 2020. + This Internet-Draft will expire on January 7, 2020. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -57,59 +57,59 @@ described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. IP-in-IP Tunnels . . . . . . . . . . . . . . . . . . . . 3 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Links, Paths, MTU and PMTU . . . . . . . . . . . . . . . 4 2.2. Fragmentation Procedures . . . . . . . . . . . . . . . . 6 - 2.3. Upper-Layer Reliance on IP Fragmentation . . . . . . . . 6 + 2.3. Upper-Layer Reliance on IP Fragmentation . . . . . . . . 7 3. Increased Fragility . . . . . . . . . . . . . . . . . . . . . 7 3.1. Virtual Reassembly . . . . . . . . . . . . . . . . . . . 7 3.2. Policy-Based Routing . . . . . . . . . . . . . . . . . . 8 3.3. Network Address Translation (NAT) . . . . . . . . . . . . 9 3.4. Stateless Firewalls . . . . . . . . . . . . . . . . . . . 9 3.5. Equal Cost Multipath, Link Aggregate Groups and Stateless - Load-Balancers . . . . . . . . . . . . . . . . . . . . . 9 + Load-Balancers . . . . . . . . . . . . . . . . . . . . . 10 3.6. IPv4 Reassembly Errors at High Data Rates . . . . . . . . 11 3.7. Security Vulnerabilities . . . . . . . . . . . . . . . . 11 3.8. PMTU Blackholing Due to ICMP Loss . . . . . . . . . . . . 12 3.8.1. Transient Loss . . . . . . . . . . . . . . . . . . . 13 3.8.2. Incorrect Implementation of Security Policy . . . . . 13 3.8.3. Persistent Loss Caused By Anycast . . . . . . . . . . 14 3.8.4. Persistent Loss Caused By Unidirectional Routing . . 14 3.9. Blackholing Due To Filtering or Loss . . . . . . . . . . 14 4. Alternatives to IP Fragmentation . . . . . . . . . . . . . . 15 4.1. Transport Layer Solutions . . . . . . . . . . . . . . . . 15 - 4.2. Application Layer Solutions . . . . . . . . . . . . . . . 16 + 4.2. Application Layer Solutions . . . . . . . . . . . . . . . 17 5. Applications That Rely on IPv6 Fragmentation . . . . . . . . 17 5.1. Domain Name Service (DNS) . . . . . . . . . . . . . . . . 18 5.2. Open Shortest Path First (OSPF) . . . . . . . . . . . . . 18 5.3. Packet-in-Packet Encapsulations . . . . . . . . . . . . . 18 5.4. UDP Applications Enhancing Performance . . . . . . . . . 19 6. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 19 6.1. For Application and Protocol Developers . . . . . . . . . 19 6.2. For System Developers . . . . . . . . . . . . . . . . . . 20 6.3. For Middle Box Developers . . . . . . . . . . . . . . . . 20 6.4. For ECMP, LAG and Load-Balancer Developers And Operators 20 - 6.5. For Network Operators . . . . . . . . . . . . . . . . . . 20 + 6.5. For Network Operators . . . . . . . . . . . . . . . . . . 21 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 10.1. Normative References . . . . . . . . . . . . . . . . . . 21 10.2. Informative References . . . . . . . . . . . . . . . . . 23 Appendix A. Contributors' Address . . . . . . . . . . . . . . . 26 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 1. Introduction Operational experience [Kent] [Huston] [RFC7872] reveals that IP fragmentation introduces fragility to Internet communication. This document describes IP fragmentation and explains the fragility it introduces. It also proposes alternatives to IP fragmentation and provides recommendations for developers and network operators. While this document identifies issues associated with IP @@ -139,35 +139,36 @@ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. IP Fragmentation 2.1. Links, Paths, MTU and PMTU An Internet path connects a source node to a destination node. A - path can contain links and routers. If a path contains more than one + path may contain links and routers. If a path contains more than one link, the links are connected in series and a router connects each link to the next. Internet paths are dynamic. Assume that the path from one node to another contains a set of links and routers. If a link fails, the path can also change so that it includes a different set of links and routers. Each link is constrained by the number of bytes that it can convey in a single IP packet. This constraint is called the link Maximum - Transmission Unit (MTU). IPv4 [RFC0791] requires every link to - support a specified MTU (see NOTE 1). IPv6 [RFC8200] requires every - link to support an MTU of 1280 bytes or greater. These are called - the IPv4 and IPv6 minimum link MTU's. + Transmission Unit (MTU). Whlie the end-to-end Path MTU is the size + of a single IPv4 header, IPv4 [RFC0791] requires every link to + support at least a specified MTU (see NOTE 1). IPv6 [RFC8200] + similarly requires every link to support an MTU of 1280 bytes or + greater. These are called the IPv4 and IPv6 minimum link MTU's. Likewise, each Internet path is constrained by the number of bytes that it can convey in a single IP packet. This constraint is called the Path MTU (PMTU). For any given path, the PMTU is equal to the smallest of its link MTU's. Because Internet paths are dynamic, PMTU is also dynamic. For reasons described below, source nodes estimate the PMTU between themselves and destination nodes. A source node can produce extremely conservative PMTU estimates in which: @@ -227,80 +228,82 @@ to minimize the requirement for fragmentation en route. So, for the purposes of this document, we assume that the IPv4 minimum path MTU is 576 bytes. NOTE 2: A non-fragmentable packet can be fragmented at its source. However, it cannot be fragmented by a downstream node. An IPv4 packet whose DF-bit is set to 0 is fragmentable. An IPv4 packet whose DF-bit is set to 1 is non-fragmentable. All IPv6 packets are also non-fragmentable. - NOTE 3:: The ICMP PTB message has two instantiations. In ICMPv4 + NOTE 3: The ICMP PTB message has two instantiations. In ICMPv4 [RFC0792], the ICMP PTB message is a Destination Unreachable message with Code equal to 4 fragmentation needed and DF set. This message was augmented by [RFC1191] to indicate the MTU of the link through which the packet could not be forwarded. In ICMPv6 [RFC4443], the ICMP PTB message is a Packet Too Big Message with Code equal to 0. This message also indicates the MTU of the link through which the packet could not be forwarded. 2.2. Fragmentation Procedures When an upper-layer protocol submits data to the underlying IP module, and the resulting IP packet's length is greater than the PMTU, the packet is divided into fragments. Each fragment includes an IP header and a portion of the original packet. [RFC0791] describes IPv4 fragmentation procedures. An IPv4 packet - whose DF-bit is set to 1 can be fragmented by the source node, but - cannot be fragmented by a downstream router. An IPv4 packet whose - DF-bit is set to 0 can be fragmented by the source node or by a + whose DF-bit is set to 1 may be fragmented by the source node, but + may not be fragmented by a downstream router. An IPv4 packet whose + DF-bit is set to 0 may be fragmented by the source node or by a downstream router. When an IPv4 packet is fragmented, all IP options - appear in the first fragment, but only options whose "copy" bit is - set to 1 appear in subsequent fragments. + (which are within the IPv4 header) appear in the first fragment, but + only options whose "copy" bit is set to 1 appear in subsequent + fragments. - [RFC8200] describes IPv6 fragmentation procedures. An IPv6 packet - can be fragmented at the source node only. When an IPv6 packet is - fragmented, all extension headers appear in the first fragment, but - only per-fragment headers appear in subsequent fragments. Per- - fragment headers include the following: + [RFC8200], notably in section 4.5, describes IPv6 fragmentation + procedures. An IPv6 packet may be fragmented only at the source + node. When an IPv6 packet is fragmented, all extension headers + appear in the first fragment, but only per-fragment headers appear in + subsequent fragments. Per-fragment headers include the following: o The IPv6 header. o The Hop-by-hop Options header (if present) o The Destination Options header (if present and if it precedes a Routing header) o The Routing Header (if present) o The Fragment Header - In both IPv4 and IPv6, the upper-layer header appears in the first - fragment only. It does not appear in subsequent fragments. + In IPv4, the upper-layer header usually appears in the first + fragment, due to the sizes of the headers involved; in IPv6, it is + required to. 2.3. Upper-Layer Reliance on IP Fragmentation Upper-layer protocols can operate in the following modes: o Do not rely on IP fragmentation. o Rely on IP fragmentation by the source node only. o Rely on IP fragmentation by any node. Upper-layer protocols running over IPv4 can operate in all of the above-mentioned modes. Upper-layer protocols running over IPv6 can operate in the first and second modes only. Upper-layer protocols that operate in the first two modes (above) - require access to the PMTU estimate. In order to fulfil this + require access to the PMTU estimate. In order to fulfill this requirement, they can: o Estimate the PMTU to be equal to the IPv4 or IPv6 minimum link MTU. o Access the estimate that PMTUD produced. o Execute PMTUD procedures themselves. o Execute Packetization Layer PMTUD (PLPMTUD) [RFC4821] @@ -314,31 +317,39 @@ dropped messages. Therefore, PLPMTUD does not rely on the network's ability to deliver ICMP PTB messages to the source. 3. Increased Fragility This section explains how IP fragmentation introduces fragility to Internet communication. 3.1. Virtual Reassembly - Virtual reassembly is a procedure in which a device reassembles a - packet, forwards its fragments, and discards the reassembled copy. - In A+P and CGN, virtual reassembly is required in order to correctly - translate fragment addresses. It can be useful in Section 3.2, - Section 3.3, Section 3.4, and Section 3.5. + Virtual reassembly is a procedure in which a device conceptually + reassembles a packet, forwards its fragments, and discards the + reassembled copy. In A+P and CGN, virtual reassembly is required in + order to correctly translate fragment addresses. It could be useful + to address the problems in Section 3.2, Section 3.3, Section 3.4, and + Section 3.5. Virtual reassembly in the network is problematic, however, because it is computationally expensive and because it holds state for indeterminate periods of time, is prone to errors and, is prone to attacks (Section 3.7). + One of the benefits of fragmenting at the source, as IPv6 does, is + that there is no question of temporary state or involved processes as + required in virtual fragmentation. The sender has the entire + message, and is fragmenting it as needed - and can apply that + knowledge consistently across the fragments it produces. It is + better than virtual fragmentation in that sense. + 3.2. Policy-Based Routing IP Fragmentation causes problems for routers that implement policy- based routing. When a router receives a packet, it identifies the next-hop on route to the packet's destination and forwards the packet to that next-hop. In order to identify the next-hop, the router interrogates a local data structure called the Forwarding Information Base (FIB). @@ -414,21 +425,21 @@ o Block all trailing fragments, possibly blocking legitimate traffic. Neither option is attractive. 3.5. Equal Cost Multipath, Link Aggregate Groups and Stateless Load- Balancers IP fragmentation causes problems for Equal Cost Multipath (ECMP), - Link Aggregate Groups (LAG) and other stateless load-balancing + Link Aggregate Groups (LAG) and other stateless load-distribution technologies. In order to assign a packet or packet fragment to a link, an intermediate node executes a hash (i.e., load-distributing) algorithm. The following paragraphs describe a commonly deployed hash algorithm. If the packet or packet fragment contains a transport-layer header, the algorithm accepts the following 5-tuple as input: o IP Source Address. @@ -445,26 +456,26 @@ o IP Source Address. o IP Destination Address. o IPv4 Protocol or IPv6 Next Header. Therefore, non-fragmented packets belonging to a flow can be assigned to one link while fragmented packets belonging to the same flow can be divided between that link and another. This can cause suboptimal - load-balancing. + load-distribution. [RFC6438] offers a partial solution to this problem for IPv6 devices only. According to [RFC6438]: - "At intermediate routers that perform load distribution, the hash + "At intermediate routers that perform load balancing, the hash algorithm used to determine the outgoing component-link in an ECMP and/or LAG toward the next hop MUST minimally include the 3-tuple {dest addr, source addr, flow label} and MAY also include the remaining components of the 5-tuple." If the algorithm includes only the 3-tuple {dest addr, source addr, flow label}, it will assign all fragments belonging to a packet to the same link. (See [RFC6437] and [RFC7098]). In order to avoid the problem described above, implementations SHOULD @@ -918,21 +927,21 @@ that is not compliant with RFC 791 or RFC 8200, or even discard IP fragments completely. Such behaviors are NOT RECOMMENDED. If a middleboxes implements non-standard behavior with respect to IP fragmentation, then that behavior MUST be clearly documented. 6.4. For ECMP, LAG and Load-Balancer Developers And Operators In their default configuration, when the IPv6 Flow Label is not equal to zero, IPv6 devices that implement Equal-Cost Multipath (ECMP) Routing as described in OSPF [RFC2328] and other routing protocols, - Link Aggregation Grouping (LAG) [RFC7424], or other load-balancing + Link Aggregation Grouping (LAG) [RFC7424], or other load-distribution technologies SHOULD accept only the following fields as input to their hash algorithm: o IP Source Address. o IP Destination Address. o Flow Label. Operators SHOULD deploy these devices in their default configuration.