--- 1/draft-ietf-intarea-gue-05.txt 2018-08-31 15:13:44.720368774 -0700 +++ 2/draft-ietf-intarea-gue-06.txt 2018-08-31 15:13:44.792370529 -0700 @@ -1,21 +1,21 @@ Internet Area WG T. Herbert Internet-Draft Quantonium Intended status: Standard track L. Yong -Expires June 30, 2017 Huawei USA +Expires March 4, 2019 Huawei USA O. Zia Microsoft - December 30, 2017 + August 31, 2018 Generic UDP Encapsulation - draft-ietf-intarea-gue-05 + draft-ietf-intarea-gue-06 Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. @@ -24,25 +24,25 @@ and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html - This Internet-Draft will expire on June 30, 2018. + This Internet-Draft will expire on March 4, 2019. Copyright Notice - Copyright (c) 2017 IETF Trust and the persons identified as the + Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. This document is subject to BCP 78 and the IETF Trust's Legal @@ -68,83 +68,82 @@ part of the encapsulation, and is generic in that it can encapsulate packets of various IP protocols. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Terminology and acronyms . . . . . . . . . . . . . . . . . 5 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 6 2. Base packet format . . . . . . . . . . . . . . . . . . . . . . 7 2.1. GUE variant . . . . . . . . . . . . . . . . . . . . . . . . 7 - 3. Variant 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 + 3. Variant 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1. Header format . . . . . . . . . . . . . . . . . . . . . . . 8 3.2. Proto/ctype field . . . . . . . . . . . . . . . . . . . . . 9 3.2.1 Proto field . . . . . . . . . . . . . . . . . . . . . . 9 3.2.2 Ctype field . . . . . . . . . . . . . . . . . . . . . . 10 - 3.3. Flags and extension fields . . . . . . . . . . . . . . . . 10 - 3.3.1. Requirements . . . . . . . . . . . . . . . . . . . . . 10 + 3.3. Flags and extension fields . . . . . . . . . . . . . . . . 11 + 3.3.1. Requirements . . . . . . . . . . . . . . . . . . . . . 11 3.3.2. Example GUE header with extension fields . . . . . . . 11 3.4. Private data . . . . . . . . . . . . . . . . . . . . . . . 12 - 3.5. Message types . . . . . . . . . . . . . . . . . . . . . . . 12 - 3.5.1. Control messages . . . . . . . . . . . . . . . . . . . 12 + 3.5. Message types . . . . . . . . . . . . . . . . . . . . . . . 13 + 3.5.1. Control messages . . . . . . . . . . . . . . . . . . . 13 3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . . 13 3.6. Hiding the transport layer protocol number . . . . . . . . 13 - 4. Variant 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 - 4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 14 - 4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 15 - 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 - 5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 16 - 5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 16 - 5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 16 - 5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 17 - 5.4.1. Processing a received data message . . . . . . . . . . 17 - 5.4.2. Processing a received control message . . . . . . . . . 18 - 5.5. Router and switch operation . . . . . . . . . . . . . . . . 18 - 5.6. Middlebox interactions . . . . . . . . . . . . . . . . . . 18 - 5.6.1. Inferring connection semantics . . . . . . . . . . . . 19 - 5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 19 - 5.7. Checksum Handling . . . . . . . . . . . . . . . . . . . . . 19 - 5.7.1. Requirements . . . . . . . . . . . . . . . . . . . . . 19 - 5.7.2. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 20 - 5.7.3. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 20 - 5.8. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 21 - 5.9. Congestion control . . . . . . . . . . . . . . . . . . . . 21 - 5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 22 - 5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 22 - 5.11.1. Flow classification . . . . . . . . . . . . . . . . . 22 - 5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 23 - 5.12 Negotiation of acceptable flags and extension fields . . . 24 - 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 24 - 6.1. Benefits of GUE . . . . . . . . . . . . . . . . . . . . . . 24 - 6.2 Comparison of GUE to other encapsulations . . . . . . . . . 25 - 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 26 - 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 27 - 8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 27 - 8.2. GUE variant number . . . . . . . . . . . . . . . . . . . . 28 - 8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 28 - 8.4. Flag-fields . . . . . . . . . . . . . . . . . . . . . . . . 28 + 4. Variant 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 + 4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 15 + 4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 16 + 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 + 5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 17 + 5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 17 + 5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 18 + 5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 18 + 5.4.1. Processing a received data message . . . . . . . . . . 18 + 5.4.2. Processing a received control message . . . . . . . . . 19 + 5.5. Router and switch operation . . . . . . . . . . . . . . . . 19 + 5.6. Middlebox interactions . . . . . . . . . . . . . . . . . . 20 + 5.6.1. Inferring connection semantics . . . . . . . . . . . . 20 + 5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 20 + 5.7. Checksum Handling . . . . . . . . . . . . . . . . . . . . . 20 + 5.7.1. Requirements . . . . . . . . . . . . . . . . . . . . . 21 + 5.7.2. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 21 + 5.7.3. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 22 + 5.8. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 22 + 5.9. Congestion control . . . . . . . . . . . . . . . . . . . . 22 + 5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 23 + 5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 23 + 5.11.1. Flow classification . . . . . . . . . . . . . . . . . 23 + 5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 24 + 5.12 Negotiation of acceptable flags and extension fields . . . 25 + 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 26 + 6.1. Benefits of GUE . . . . . . . . . . . . . . . . . . . . . . 26 + 6.2 Comparison of GUE to other encapsulations . . . . . . . . . 26 + 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 28 + 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 28 + 8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 28 + 8.2. GUE variant number . . . . . . . . . . . . . . . . . . . . 29 + 8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 29 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 29 - 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 - 10.1. Normative References . . . . . . . . . . . . . . . . . . . 29 + 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 30 + 10.1. Normative References . . . . . . . . . . . . . . . . . . . 30 10.2. Informative References . . . . . . . . . . . . . . . . . . 30 - Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 32 - A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 32 - A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 33 - A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 33 - A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 34 - A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 34 - A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 35 + Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 33 + A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 33 + A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 34 + A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 34 + A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 35 + A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 35 + A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 36 Appendix B: Implementation considerations . . . . . . . . . . . . 36 - B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 36 - B.2. Setting flow entropy as a route selector . . . . . . . . . 36 - B.3. Hardware protocol implementation considerations . . . . . . 36 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 37 + B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 37 + B.2. Setting flow entropy as a route selector . . . . . . . . . 37 + B.3. Hardware protocol implementation considerations . . . . . . 37 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 38 1. Introduction This specification describes Generic UDP Encapsulation (GUE) which is a general method for encapsulating packets of arbitrary IP protocols within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating packets in UDP facilitates efficient transport across networks. Networking devices widely provide protocol specific processing and optimizations for UDP (as well as TCP) packets. Packets for atypical IP protocols (those not usually parsed by networking hardware) can be @@ -154,21 +153,21 @@ GUE provides an extensible header format for including optional data in the encapsulation header. This data potentially covers items such as the virtual networking identifier, security data for validating or authenticating the GUE header, congestion control data, etc. GUE also allows private optional data in the encapsulation header. This feature can be used by a site or implementation to define local custom optional data, and allows experimentation of options that may eventually become standard. This document does not define any specific GUE extensions. [GUEEXTEN] - specifies a set of core extensions. + specifies a set of initial extensions. The motivation for the GUE protocol is described in section 6. 1.1. Terminology and acronyms GUE Generic UDP Encapsulation GUE Header A variable length protocol header that is composed of a primary four byte header and zero or more four byte words for optional header data @@ -287,24 +286,24 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ~ Private data (optional) ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The contents of the UDP header are: o Source port: If connection semantics (section 5.6.1) are applied to an encapsulation, this is set to the local source port for - the connection. When connection semantics are not applied, this - is set to a flow entropy value for use with ECMP (Equal-Cost - Mulit-Path [RFC2992]); the properties of flow entropy are - described in section 5.11. + the connection. When connection semantics are not applied, the + source port is either set to a flow entropy value as described + in section 5.11, or it should be set to the GUE assigned port + number, 6080. o Destination port: If connection semantics (section 5.6.1) are applied to an encapsulation, this is set to the destination port for the tuple. If connection semantics are not applied this is set to the GUE assigned port number, 6080. o Length: Canonical length of the UDP packet (length of UDP header and payload). o Checksum: Standard UDP checksum (handling is described in @@ -393,53 +392,56 @@ might be the case when the payload is a fragment of a control message, where only the reassembled packet can be interpreted as a control message. Control messages will be defined in an IANA registry. Control message types 1 through 127 may be defined in standards. Types 128 through 255 are reserved to be user defined for experimentation or private control messages. This document does not specify any standard control message types - other than type 0. + other than type 0. Type 0 does not define a format of the control + message. Instead, it indicates that the GUE payload is a control + message, or part of a control message (as might be the case in GUE + fragmentation), that cannot be correctly parsed or interpreted + without additional context. 3.3. Flags and extension fields Flags and associated extension fields are the primary mechanism of extensibility in GUE. As mentioned in section 3.1, GUE header flags indicate the presence of optional extension fields in the GUE header. - [GUEXTENS] defines a basic set of GUE extensions. + [GUEXTENS] defines an initial set of GUE extensions. 3.3.1. Requirements There are sixteen flag bits in the GUE header. Flags may indicate presence of an extension fields. The size of an extension field indicated by a flag MUST be fixed. Flags can be paired together to allow different lengths for an extension field. For example, if two flag bits are paired, a field can possibly be three different lengths-- that is bit value of 00 indicates no field present; 01, 10, and 11 indicate three possible lengths for the field. Regardless of how flag bits are paired, the lengths and offsets of optional fields corresponding to a set of flags MUST be well defined. Extension fields are placed in order of the flags. New flags are to be allocated from high to low order bit contiguously without holes. Flags allow random access, for instance to inspect the field corresponding to the Nth flag bit, an implementation only considers the previous N-1 flags to determine the offset. Flags after the Nth - flag are not pertinent in calculating the offset of the Nth flag. - Random access of flags and fields permits processing of optional - extensions in an order that is independent of their position in the - packet. The processing order of extensions defined in [GUEEXTEN] - demonstrates this property. + flag are not pertinent in calculating the offset of the field for the + Nth flag. Random access of flags and fields permits processing of + optional extensions in an order that is independent of their position + in the packet. Flags (or paired flags) are idempotent such that new flags MUST NOT cause reinterpretation of old flags. Also, new flags MUST NOT alter interpretation of other elements in the GUE header nor how the message is parsed (for instance, in a data message the proto/ctype field always holds an IP protocol number as an invariant). The set of available flags can be extended in the future by defining a "flag extensions bit" that refers to a field containing a new set of flags. @@ -558,21 +561,21 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ For IPv4, it is permitted in GUE to used this precise destination option to contain the obfuscated protocol number. In this case next header MUST refer to a valid IP protocol for IPv4. No other extension headers or destination options are permitted with IPv4. 4. Variant 1 Variant 1 of GUE allows direct encapsulation of IPv4 and IPv6 in UDP. - In this varinant there is no GUE header; a UDP packet carries an IP + In this variant there is no GUE header; a UDP packet carries an IP packet. The first two bits of the UDP payload for GUE are the GUE variant and coincide with the first two bits of the version number in the IP header. The first two version bits of IPv4 and IPv6 are 01, so we use GUE variant 1 for direct IP encapsulation which makes two bits of GUE variant to also be 01. This technique is effectively a means to compress out the version 0 GUE header when encapsulating IPv4 or IPv6 packets and there are no flags or extension fields present. This method is compatible to use on the same port number as packets with the GUE header (GUE variant 0 @@ -595,20 +598,23 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source IPv4 Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination IPv4 Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + The UDP fields are set in a similar manner as described in section + 3.1. + Note that the 0100 value in the first four bits of the the UDP payload expresses the GUE variant as 1 (bits 01) and IP version as 4 (bits 0100). 4.2. Direct encapsulation of IPv6 The format for encapsulating IPv6 directly in UDP is demonstrated below: 0 1 2 3 @@ -632,33 +638,36 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Destination IPv6 Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + The UDP fields are set in a similar manner as described in section + 3.1. + Note that the 0110 value in the first four bits of the the UDP payload expresses the GUE variant as 1 (bits 01) and IP version as 6 (bits 0110). 5. Operation The figure below illustrates the use of GUE encapsulation between two hosts. Host 1 is sending packets to Host 2. An encapsulator performs encapsulation of packets from Host 1. These encapsulated packets traverse the network as UDP packets. At the decapsulator, packets are decapsulated and sent on to Host 2. Packet flow in the reverse - direction need not be symmetric; GUE encapsulation is not required in - the reverse path. + direction need not be symmetric; for example, the reverse path might + not use GUE and/or any other form of encapsulation. +---------------+ +---------------+ | | | | | Host 1 | | Host 2 | | | | | +---------------+ +---------------+ | ^ V | +---------------+ +---------------+ +---------------+ | | | | | | @@ -713,60 +722,60 @@ A decapsulator performs decapsulation of GUE packets. A decapsulator is addressed by the outer destination IP address of a GUE packet. The decapsulator validates packets, including fields of the GUE header. If a decapsulator receives a GUE packet with an unsupported variant, unknown flag, bad header length (too small for included extension fields), unknown control message type, bad protocol number, an unsupported payload type, or an otherwise malformed header, it MUST drop the packet. Such events MAY be logged subject to configuration - and rate limiting of logging messages. No error message is returned - back to the encapsulator. Note that set flags in a GUE header that - are unknown to a decapsulator MUST NOT be ignored. If a GUE packet is - received by a decapsulator with unknown flags, the packet MUST be - dropped. + and rate limiting of logging messages. Note that set flags in a GUE + header that are unknown to a decapsulator MUST NOT be ignored. If a + GUE packet is received by a decapsulator with unknown flags, the + packet MUST be dropped. 5.4.1. Processing a received data message If a valid data message is received, the UDP header and GUE header are removed from the packet. The outer IP header remains intact and the next protocol in the IP header is set to the protocol from the proto field in the GUE header. The resulting packet is then resubmitted into the protocol stack to process that packet as though it was received with the protocol in the GUE header. As an example, consider that a data message is received where GUE - encapsulates an IP packet. In this case proto field in the GUE header - is set 94 for IPIP: + encapsulates an IPv4 packet using GUE variant 0. In this case proto + field in the GUE header is set to 4 for IPv4 encapsulation: +-------------------------------------+ | IP header (next proto = 17,UDP) | |-------------------------------------| | UDP | |-------------------------------------| - | GUE (proto = 94,IPIP) | + | GUE (proto = 4,IPv4 encapsulation) | |-------------------------------------| - | IP header and packet | + | IPv4 header and packet | +-------------------------------------+ + The receiver removes the UDP and GUE headers and sets the next - protocol field in the IP packet to IPIP, which is derived from the - GUE proto field. The resultant packet would have the format: + protocol field in the IP packet to 4, which is derived from the GUE + proto field. The resultant packet would have the format: +-------------------------------------+ - | IP header (next proto = 94,IPIP) | + | IP header (next proto = 4,IPv4) | |-------------------------------------| | IP header and packet | +-------------------------------------+ This packet is then resubmitted into the protocol stack to be - processed as an IPIP packet. + processed as an IPv4 encapsulated packet. 5.4.2. Processing a received control message If a valid control message is received, the packet MUST be processed as a control message. The specific processing to be performed depends on the value in the ctype field of the GUE header. 5.5. Router and switch operation Routers and switches SHOULD forward GUE packets as standard UDP/IP @@ -779,45 +788,45 @@ ports are fixed to provide connection semantics (section 5.6.1), then the encapsulated packet MAY be parsed to determine flow entropy. A router MUST NOT modify a GUE header when forwarding a packet. It MAY encapsulate a GUE packet in another GUE packet, for instance to implement a network tunnel (i.e. by encapsulating an IP packet with a GUE payload in another IP packet as a GUE payload). In this case, the router takes the role of an encapsulator, and the corresponding decapsulator is the logical endpoint of the tunnel. When encapsulating a GUE packet within another GUE packet, there are no - provisions to automatically GUE copy flags or fields to the outer GUE + provisions to automatically copy flags or fields to the outer GUE header. Each layer of encapsulation is considered independent. 5.6. Middlebox interactions A middle box MAY interpret some flags and extension fields of the GUE header for classification purposes, but is not required to understand - any of the flags or extension fields in GUE packets. A middle box - MUST NOT drop a GUE packet merely because there are flags unknown to - it. The header length in the GUE header allows a middlebox to inspect - the payload packet without needing to parse the flags or extension + any of the flags or extension fields in GUE packets. A middlebox MUST + NOT drop a GUE packet merely because there are flags unknown to it. + The header length in the GUE header allows a middlebox to inspect the + payload packet without needing to parse the flags or extension fields. 5.6.1. Inferring connection semantics A middlebox might infer bidirectional connection semantics for a UDP flow. For instance, a stateful firewall might create a five-tuple rule to match flows on egress, and a corresponding five-tuple rule for matching ingress packets where the roles of source and destination are reversed for the IP addresses and UDP port numbers. To operate in this environment, a GUE tunnel should be configured to assume connected semantics defined by the UDP five tuple and the use of GUE encapsulation needs to be symmetric between both endpoints. The source port set in the UDP header MUST be the destination port - the peer would set for replies. In this case the UDP source port for + the peer would set for replies. In this case, the UDP source port for a tunnel would be a fixed value and not set to be flow entropy as described in section 5.11. The selection of whether to make the UDP source port fixed or set to a flow entropy value for each packet sent SHOULD be configurable for a tunnel. The default MUST be to set the flow entropy value in the UDP source port. 5.6.2. NAT @@ -856,38 +866,38 @@ For UDP in IPv4, the UDP checksum MUST be processed as specified in [RFC768] and [RFC1122] for both transmit and receive. An encapsulator MAY set the UDP checksum to zero for performance or implementation considerations. The IPv4 header includes a checksum that protects against mis-delivery of the packet due to corruption of IP addresses. The UDP checksum potentially provides protection against corruption of the UDP header, GUE header, and GUE payload. Enabling or disabling the use of checksums is a deployment consideration that should take into account the risk and effects of packet corruption, and whether the packets in the network are - already adequately protected by other, possibly stronger mechanisms + already adequately protected by other, possibly stronger mechanisms, such as the Ethernet CRC. If an encapsulator sets a zero UDP checksum for IPv4, it SHOULD use the GUE header checksum as described in [GUEEXTEN] assuming there are no other mechanisms used to protect the GUE packet. When a decapsulator receives a packet, the UDP checksum field MUST be processed. If the UDP checksum is non-zero, the decapsulator MUST verify the checksum before accepting the packet. By default, a decapsulator SHOULD accept UDP packets with a zero checksum. A node MAY be configured to disallow zero checksums per [RFC1122]. Configuration of zero checksums can be selective. For instance, zero checksums might be disallowed from certain hosts that are known to be traversing paths subject to packet corruption. If verification of a non-zero checksum fails, a decapsulator lacks the capability to verify a non-zero checksum, or a packet with a zero-checksum was - received and the decapsulator is configured to disallow, the packet - MUST be dropped. + received and the decapsulator is configured to disallow, then the + packet MUST be dropped. 5.7.3. UDP Checksum with IPv6 In IPv6, there is no checksum in the IPv6 header that protects against mis-delivery due to address corruption. Therefore, when GUE is used over IPv6, either the UDP checksum or the GUE header checksum SHOULD be used unless there are alternative mechanisms in use that protect against misdelivery. The UDP checksum and GUE header checksum SHOULD NOT be used at the same time since that would be mostly redundant. @@ -1112,24 +1122,22 @@ provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling layer 2 and layer 3 packets over IP. NVGRE [RFC7637] and VXLAN [RFC7348] are proposals for encapsulation of layer 2 packets for network virtualization. IPIP [RFC2003] and Generic packet tunneling in IPv6 [RFC2473] provide methods for tunneling IP packets over IP. Several proposals exist for encapsulating packets over UDP including ESP over UDP [RFC3948], TCP directly over UDP [TCPUDP], VXLAN [RFC7348], LISP [RFC6830] which encapsulates layer 3 packets, - MPLS/UDP [RFC7510], GENEVE [GENEVE], and Generic UDP Encapsulation - for IP Tunneling (GRE over UDP)[RFC8086]. Generic UDP tunneling [GUT] - is a proposal similar to GUE in that it aims to tunnel packets of IP - protocols over UDP. + MPLS/UDP [RFC7510], GENEVE [GENEVE], and GRE-in-UDP Encapsulation + [RFC8086]. GUE has the following discriminating features: o UDP encapsulation leverages specialized network device processing for efficient transport. The semantics for using the UDP source port for flow entropy as input to ECMP are defined in section 5.11. o GUE permits encapsulation of arbitrary IP protocols, which includes layer 2 3, and 4 protocols. @@ -1154,20 +1162,23 @@ o GUE includes both data messages (encapsulation of packets) and control messages (such as OAM). o The flags-field model facilitates efficient implementation of extensibility in hardware. For instance, a TCAM can be use to parse a known set of N flags where the number of entries in the TCAM is 2^N. By comparison, the number of TCAM entries needed to parse a set of N arbitrarily ordered TLVS is approximately e*N!. + o GUE includes a variant that encapsulates IPv4 and IPv6 packets + directly within UDP. + 7. Security Considerations There are two important considerations of security with respect to GUE. o Authentication and integrity of the GUE header. o Authentication, integrity, and confidentiality of the GUE payload. @@ -1222,39 +1233,30 @@ 8.3. Control types IANA is requested to set up a registry for the GUE control types. Control types are 8 bit values. New values for control types 1-127 are assigned in accordance with RFC Required policy [RFC5226]. +----------------+------------------+---------------+ | Control type | Description | Reference | +----------------+------------------+---------------+ - | 0 | Need further | This document | + | 0 | Control payload | This document | + | | needs more | | + | | context for | | | | interpretation | | | | | | | 1..127 | Unassigned | | | | | | | 128..255 | User defined | This document | +----------------+------------------+---------------+ -8.4. Flag-fields - - IANA is requested to create a "GUE flag-fields" registry to allocate - flags and extension fields used with GUE. This shall be a registry of - bit assignments for flags, length of extension fields for - corresponding flags, and descriptive strings. There are sixteen bits - for primary GUE header flags (bit number 0-15). New values are - assigned in accordance with RFC Required policy [RFC5226]. New flags - should be allocated from high to low order bit contiguously without - holes. [GUEXTENS] requests an initial set of flag assignments. - 9. Acknowledgements The authors would like to thank David Liu, Erik Nordmark, Fred Templin, Adrian Farrel, Bob Briscoe, and Murray Kucherawy for valuable input on this draft. 10. References 10.1. Normative References @@ -1409,23 +1411,20 @@ "Encapsulation of TCP and other Transport Protocols over UDP" draft-cheshire-tcp-over-udp-00 [TOU] Herbert, T., "Transport layer protocols over UDP" draft- herbert-transports-over-udp-00 [GENEVE] Gross, J., Ed., Ganga, I. Ed., and Sridhar, T., "Geneve: Generic Network Virtualization Encapsulation", draft-ietf- nvo3-geneve-05 - [GUT] Manner, J., Varia, N., and Briscoe, B., "Generic UDP - Tunnelling (GUT) draft-manner-tsvwg-gut-02.txt" - [LCO] Cree, E., https://www.kernel.org/doc/Documentation/ networking/checksum-offloads.txt Appendix A: NIC processing for GUE This appendix provides some guidelines for Network Interface Cards (NICs) to implement common offloads and accelerations to support GUE. Note that most of this discussion is generally applicable to other methods of UDP based encapsulation. @@ -1443,21 +1442,21 @@ GUE encapsulation is compatible with multi-queue NICs that support five-tuple hash calculation for UDP/IP packets as input to RSS. The flow entropy in the UDP source port ensures classification of the encapsulated flow even in the case that the outer source and destination addresses are the same for all flows (e.g. all flows are going over a single tunnel). By default, UDP RSS support is often disabled in NICs to avoid out- of-order reception that can occur when UDP packets are fragmented. As - discussed above, fragmentation of GUE packets is be mostly avoided by + discussed above, fragmentation of GUE packets is mostly avoided by fragmenting packets before entering a tunnel, GUE fragmentation, path MTU discovery in higher layer protocols, or operator adjusting MTUs. Other UDP traffic might not implement such procedures to avoid fragmentation, so enabling UDP RSS support in the NIC might be a considered tradeoff during configuration. A.2. Checksum offload Many NICs provide capabilities to calculate standard ones complement payload checksum for packets in transmit or receive. When using GUE @@ -1610,46 +1608,46 @@ deprecated. To approximate this behavior, an implementation could restrict a user from sending a packet destined to the GUE port without proper credentials. B.2. Setting flow entropy as a route selector An encapsulator generating flow entropy in the UDP source port could modulate the value to perform a type of multipath source routing. Assuming that networking switches perform ECMP based on the flow hash, a sender can affect the path by altering the flow entropy. For - instance, a host can store a flow hash in its PCB for an inner flow, - and might alter the value upon detecting that packets are traversing - a lossy path. Changing the flow entropy for a flow SHOULD be subject - to hysteresis (at most once every thirty seconds) to limit the number - of out of order packets. + instance, a host can store a flow hash in its protocol control block + (PCB) for an inner flow, and might alter the value upon detecting + that packets are traversing a lossy path. Changing the flow entropy + for a flow SHOULD be subject to hysteresis (at most once every thirty + seconds) to limit the number of out of order packets. B.3. Hardware protocol implementation considerations - Low level data path protocol, such is GUE, are often supported in + Low level data path protocols, such is GUE, are often supported in high speed network device hardware. Variable length header (VLH) protocols like GUE are often considered difficult to efficiently implement in hardware. In order to retain the important characteristics of an extensible and robust protocol, hardware vendors may practice "constrained flexibility". In this model, only certain combinations or protocol header parameterizations are implemented in hardware fast path. Each such parameterization is fixed length so that the particular instance can be optimized as a fixed length protocol. In the case of GUE this constitutes specific combinations of GUE flags, fields, and next protocol. The selected combinations would naturally be the most common cases which form the "fast path", and other combinations are assumed to take the "slow path". In time, needs and requirements of the protocol may change which may manifest themselves as new parameterizations to be supported in the - fast path. To allow allow this extensibility, a device practicing + fast path. To allow this extensibility, a device practicing constrained flexibility should allow the fast path parameterizations to be programmable. Authors' Addresses Tom Herbert Quantonium 4701 Patrick Henry Santa Clara, CA 95054 US