draft-ietf-intarea-gue-06.txt | draft-ietf-intarea-gue-07.txt | |||
---|---|---|---|---|
Internet Area WG T. Herbert | Internet Area WG T. Herbert | |||
Internet-Draft Quantonium | Internet-Draft Quantonium | |||
Intended status: Standard track L. Yong | Intended status: Standard track L. Yong | |||
Expires March 4, 2019 Huawei USA | Expires September 8, 2019 Independent | |||
O. Zia | O. Zia | |||
Microsoft | Microsoft | |||
August 31, 2018 | March 7, 2019 | |||
Generic UDP Encapsulation | Generic UDP Encapsulation | |||
draft-ietf-intarea-gue-06 | draft-ietf-intarea-gue-07 | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
Drafts. | Drafts. | |||
skipping to change at page 1, line 35 ¶ | skipping to change at page 1, line 35 ¶ | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt | http://www.ietf.org/ietf/1id-abstracts.txt | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html | http://www.ietf.org/shadow.html | |||
This Internet-Draft will expire on March 4, 2019. | This Internet-Draft will expire on September 8, 2019. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. | to this document. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
skipping to change at page 3, line 21 ¶ | skipping to change at page 3, line 21 ¶ | |||
efficient handling of UDP packets can be leveraged. GUE specifies | efficient handling of UDP packets can be leveraged. GUE specifies | |||
basic encapsulation methods upon which higher level constructs, such | basic encapsulation methods upon which higher level constructs, such | |||
as tunnels and overlay networks for network virtualization, can be | as tunnels and overlay networks for network virtualization, can be | |||
constructed. GUE is extensible by allowing optional data fields as | constructed. GUE is extensible by allowing optional data fields as | |||
part of the encapsulation, and is generic in that it can encapsulate | part of the encapsulation, and is generic in that it can encapsulate | |||
packets of various IP protocols. | packets of various IP protocols. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
1.1. Terminology and acronyms . . . . . . . . . . . . . . . . . 5 | 1.1. Applicability . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
1.2. Requirements Language . . . . . . . . . . . . . . . . . . 6 | 1.2. Terminology and acronyms . . . . . . . . . . . . . . . . . 6 | |||
2. Base packet format . . . . . . . . . . . . . . . . . . . . . . 7 | 1.3. Requirements Language . . . . . . . . . . . . . . . . . . . 7 | |||
2.1. GUE variant . . . . . . . . . . . . . . . . . . . . . . . . 7 | 2. Base packet format . . . . . . . . . . . . . . . . . . . . . . 8 | |||
3. Variant 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 2.1. GUE variant . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
3.1. Header format . . . . . . . . . . . . . . . . . . . . . . . 8 | 3. Variant 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
3.2. Proto/ctype field . . . . . . . . . . . . . . . . . . . . . 9 | 3.1. Header format . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
3.2.1 Proto field . . . . . . . . . . . . . . . . . . . . . . 9 | 3.2. Proto/ctype field . . . . . . . . . . . . . . . . . . . . . 10 | |||
3.2.2 Ctype field . . . . . . . . . . . . . . . . . . . . . . 10 | 3.2.1. Proto field . . . . . . . . . . . . . . . . . . . . . . 10 | |||
3.2.2. Ctype field . . . . . . . . . . . . . . . . . . . . . . 11 | ||||
3.3. Flags and extension fields . . . . . . . . . . . . . . . . 11 | 3.3. Flags and extension fields . . . . . . . . . . . . . . . . 11 | |||
3.3.1. Requirements . . . . . . . . . . . . . . . . . . . . . 11 | 3.3.1. Requirements . . . . . . . . . . . . . . . . . . . . . 11 | |||
3.3.2. Example GUE header with extension fields . . . . . . . 11 | 3.3.2. Example GUE header with extension fields . . . . . . . 12 | |||
3.4. Private data . . . . . . . . . . . . . . . . . . . . . . . 12 | 3.4. Private data . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
3.5. Message types . . . . . . . . . . . . . . . . . . . . . . . 13 | 3.5. Message types . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
3.5.1. Control messages . . . . . . . . . . . . . . . . . . . 13 | 3.5.1. Control messages . . . . . . . . . . . . . . . . . . . 13 | |||
3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . . 13 | 3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . . 14 | |||
3.6. Hiding the transport layer protocol number . . . . . . . . 13 | 4. Variant 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
4. Variant 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | ||||
4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 15 | 4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 15 | |||
4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 16 | 4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 16 | |||
5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 | 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 17 | 5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 17 | |||
5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 17 | 5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 17 | |||
5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 18 | 5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 18 | |||
5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 18 | 5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 18 | |||
5.4.1. Processing a received data message . . . . . . . . . . 18 | 5.4.1. Processing a received data message . . . . . . . . . . 18 | |||
5.4.2. Processing a received control message . . . . . . . . . 19 | 5.4.2. Processing a received control message . . . . . . . . . 19 | |||
5.5. Router and switch operation . . . . . . . . . . . . . . . . 19 | 5.5. Middlebox inspection . . . . . . . . . . . . . . . . . . . 19 | |||
5.6. Middlebox interactions . . . . . . . . . . . . . . . . . . 20 | 5.6. Router and switch operation . . . . . . . . . . . . . . . . 20 | |||
5.6.1. Inferring connection semantics . . . . . . . . . . . . 20 | 5.6.1. Connection semantics . . . . . . . . . . . . . . . . . 20 | |||
5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 20 | 5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
5.7. Checksum Handling . . . . . . . . . . . . . . . . . . . . . 20 | 5.7. Checksum Handling . . . . . . . . . . . . . . . . . . . . . 21 | |||
5.7.1. Requirements . . . . . . . . . . . . . . . . . . . . . 21 | 5.7.1. Requirements . . . . . . . . . . . . . . . . . . . . . 21 | |||
5.7.2. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 21 | 5.7.2. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 21 | |||
5.7.3. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 22 | 5.7.3. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 22 | |||
5.8. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 22 | 5.8. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 22 | |||
5.9. Congestion control . . . . . . . . . . . . . . . . . . . . 22 | 5.9. Congestion control . . . . . . . . . . . . . . . . . . . . 23 | |||
5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 23 | 5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 23 | 5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 23 | |||
5.11.1. Flow classification . . . . . . . . . . . . . . . . . 23 | 5.11.1. Flow classification . . . . . . . . . . . . . . . . . 24 | |||
5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 24 | 5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 24 | |||
5.12 Negotiation of acceptable flags and extension fields . . . 25 | 5.12. Negotiation of acceptable flags and extension fields . . . 25 | |||
6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 26 | 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 26 | |||
6.1. Benefits of GUE . . . . . . . . . . . . . . . . . . . . . . 26 | 6.1. Benefits of GUE . . . . . . . . . . . . . . . . . . . . . . 26 | |||
6.2 Comparison of GUE to other encapsulations . . . . . . . . . 26 | 6.2. Comparison of GUE to other encapsulations . . . . . . . . . 26 | |||
7. Security Considerations . . . . . . . . . . . . . . . . . . . . 28 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 28 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 28 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 28 | |||
8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 28 | 8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 28 | |||
8.2. GUE variant number . . . . . . . . . . . . . . . . . . . . 29 | 8.2. GUE variant number . . . . . . . . . . . . . . . . . . . . 29 | |||
8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 29 | 8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 29 | |||
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 29 | 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 29 | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . . 30 | 10.1. Normative References . . . . . . . . . . . . . . . . . . . 30 | |||
10.2. Informative References . . . . . . . . . . . . . . . . . . 30 | 10.2. Informative References . . . . . . . . . . . . . . . . . . 31 | |||
Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 33 | Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 34 | |||
A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 33 | A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 34 | |||
A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 34 | A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 34 | |||
A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 34 | A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 35 | |||
A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 35 | A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 35 | |||
A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 35 | A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 36 | |||
A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 36 | A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 37 | |||
Appendix B: Implementation considerations . . . . . . . . . . . . 36 | Appendix B: Implementation considerations . . . . . . . . . . . . 37 | |||
B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 37 | B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 37 | |||
B.2. Setting flow entropy as a route selector . . . . . . . . . 37 | B.2. Setting flow entropy as a route selector . . . . . . . . . 38 | |||
B.3. Hardware protocol implementation considerations . . . . . . 37 | B.3. Hardware protocol implementation considerations . . . . . . 38 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 38 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
1. Introduction | 1. Introduction | |||
This specification describes Generic UDP Encapsulation (GUE) which is | This specification describes Generic UDP Encapsulation (GUE) which is | |||
a general method for encapsulating packets of arbitrary IP protocols | a general method for encapsulating packets of arbitrary IP protocols | |||
within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating | within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating | |||
packets in UDP facilitates efficient transport across networks. | packets in UDP facilitates efficient transport across networks. | |||
Networking devices widely provide protocol specific processing and | Networking devices widely provide protocol specific processing and | |||
optimizations for UDP (as well as TCP) packets. Packets for atypical | optimizations for UDP (as well as TCP) packets. Packets for atypical | |||
IP protocols (those not usually parsed by networking hardware) can be | IP protocols (those not usually parsed by networking hardware) can be | |||
encapsulated in UDP packets to maximize deliverability and to | encapsulated in UDP packets to maximize deliverability and to | |||
leverage flow specific mechanisms for routing and packet steering. | leverage flow specific mechanisms for routing and packet steering. | |||
GUE provides an extensible header format for including optional data | GUE provides an extensible header format for including optional data | |||
in the encapsulation header. This data potentially covers items such | in the encapsulation header. This data potentially covers items such | |||
as the virtual networking identifier, security data for validating or | as a virtual networking identifier, security data for validating or | |||
authenticating the GUE header, congestion control data, etc. GUE also | authenticating the GUE header, congestion control data, etc. GUE also | |||
allows private optional data in the encapsulation header. This | allows private optional data in the encapsulation header. This | |||
feature can be used by a site or implementation to define local | feature can be used by a site or implementation to define local | |||
custom optional data, and allows experimentation of options that may | custom optional data, and allows experimentation of options that may | |||
eventually become standard. | eventually become standard. | |||
This document does not define any specific GUE extensions. [GUEEXTEN] | This document does not define any specific GUE extensions. [GUEEXTEN] | |||
specifies a set of initial extensions. | specifies a set of initial extensions. | |||
The motivation for the GUE protocol is described in section 6. | 1.1. Applicability | |||
1.1. Terminology and acronyms | GUE is a network encapsulation protocol that encapsulates packets for | |||
various IP protocols. Potential use cases include network tunneling, | ||||
multi-tenant network virtualization, tunneling for mobility, and | ||||
transport layer encapsulation. GUE is intended for deploying overlay | ||||
networks in public or private data center environments, as well as | ||||
providing a general tunneling mechanism usable in the Internet. | ||||
GUE is a UDP based encapsulation protocol transported over existing | ||||
IPv4 and IPv6 networks. Hence, as a UDP based protocol, GUE adheres | ||||
to the UDP usage guidelines as specified in [RFC8085]. Applicability | ||||
of these guidelines are dependent on the underlay IP network and the | ||||
nature of GUE payload protocol (for example TCP/IP or IP/Ethernet). | ||||
[RFC8085] outlines two applicability scenarios for UDP applications, | ||||
1) general Internet and 2) controlled environment. GUE is intended to | ||||
allow deployment in both controlled environments and in the | ||||
uncontrolled Internet. The requirements of [RFC8085] pertaining to | ||||
deployment of a UDP encapsulation protocol in these environments are | ||||
applicable. Section 5 provides the specifics for satisfying | ||||
requirements of [RFC8085]. It is the responsibility of the operator | ||||
deploying GUE to ensure that the necessary operational requirements | ||||
are met for the environment in which GUE is being deployed. | ||||
GUE has much of the same applicability and benefits as GRE-in-UDP | ||||
[RFC8086] that are afforded by UDP encapsulation protocols. GUE | ||||
offers the possibility of good performance for load-balancing | ||||
encapsulated IP traffic in transit networks using existing Equal-Cost | ||||
Multipath (ECMP) mechanisms that use a hash of the five-tuple of | ||||
source IP address, destination IP address, UDP/TCP source port, | ||||
UDP/TCP destination port, and protocol number. Encapsulating packets | ||||
in UDP enables use of the UDP source port to provide entropy to ECMP | ||||
hashing. | ||||
In addition, GUE enables extending the use of atypical IP protocols | ||||
(those other than TCP and UDP) across networks that might otherwise | ||||
filter packets carrying those protocols. GUE may also be used with | ||||
connection oriented UDP semantics in order to facilitate traversal | ||||
through stateful firewalls and stateful NAT. | ||||
Additional motivation for the GUE protocol is provided in section 6. | ||||
1.2. Terminology and acronyms | ||||
GUE Generic UDP Encapsulation | GUE Generic UDP Encapsulation | |||
GUE Header A variable length protocol header that is composed | GUE Header A variable length protocol header that is composed | |||
of a primary four byte header and zero or more four | of a primary four byte header and zero or more four | |||
byte words for optional header data | byte words of optional header data | |||
GUE packet A UDP/IP packet that contains a GUE header and GUE | GUE packet A UDP/IP packet that contains a GUE header and GUE | |||
payload within the UDP payload | payload within the UDP payload | |||
GUE variant A version of the GUE protocol or an alternate form | GUE variant A version of the GUE protocol or an alternate form | |||
of a version | of a version | |||
Encapsulator A network node that encapsulates packets in GUE | Encapsulator A network node that encapsulates packets in GUE | |||
Decapsulator A network node that decapsulates and processes | Decapsulator A network node that decapsulates and processes | |||
packets encapsulated in GUE | packets encapsulated in GUE | |||
Data message An encapsulated packet in the GUE payload that is | Data message An encapsulated packet in a GUE payload that is | |||
addressed to the protocol stack for an associated | addressed to the protocol stack for an associated | |||
protocol | protocol | |||
Control message A formatted message in the GUE payload that is | Control message A formatted message in the GUE payload that is | |||
implicitly addressed to the decapsulator to monitor | implicitly addressed to the decapsulator to monitor | |||
or control the state or behavior of a tunnel | or control the state or behavior of a tunnel | |||
Flags A set of bit flags in the primary GUE header | Flags A set of bit flags in the primary GUE header | |||
Extension field | Extension field | |||
An optional field in a GUE header whose presence is | An optional field in a GUE header whose presence is | |||
indicated by corresponding flag(s) | indicated by corresponding flag(s) | |||
C-bit A single bit flag in the primary GUE header that | C-bit A single bit flag in the primary GUE header that | |||
indicates whether the GUE packet contains a control | indicates whether the GUE packet contains a control | |||
message or data message | message or data message | |||
Hlen A field in the primary GUE header that gives the | Hlen A field in the primary GUE header that gives the | |||
length of the GUE header | length of the GUE header | |||
skipping to change at page 6, line 39 ¶ | skipping to change at page 7, line 32 ¶ | |||
Outer IP header Refers to the outer most IP header or packet when | Outer IP header Refers to the outer most IP header or packet when | |||
encapsulating a packet over IP | encapsulating a packet over IP | |||
Inner IP header Refers to an encapsulated IP header when an IP | Inner IP header Refers to an encapsulated IP header when an IP | |||
packet is encapsulated | packet is encapsulated | |||
Outer packet Refers to an encapsulating packet | Outer packet Refers to an encapsulating packet | |||
Inner packet Refers to a packet that is encapsulated | Inner packet Refers to a packet that is encapsulated | |||
1.2. Requirements Language | 1.3. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
2. Base packet format | 2. Base packet format | |||
A GUE packet is comprised of a UDP packet whose payload is a GUE | A GUE packet is comprised of a UDP packet whose payload is a GUE | |||
header followed by a payload which is either an encapsulated packet | header followed by a payload which is either an encapsulated packet | |||
of some IP protocol or a control message such as an OAM (Operations, | of some IP protocol or a control message such as an OAM (Operations, | |||
skipping to change at page 7, line 29 ¶ | skipping to change at page 8, line 29 ¶ | |||
| GUE Header | | | GUE Header | | |||
| | | | | | |||
|-------------------------------| | |-------------------------------| | |||
| | | | | | |||
| Encapsulated packet | | | Encapsulated packet | | |||
| or control message | | | or control message | | |||
| | | | | | |||
+-------------------------------+ | +-------------------------------+ | |||
The GUE header is variable length as determined by the presence of | The GUE header is variable length as determined by the presence of | |||
optional extension fields. | optional extension fields and private data. | |||
2.1. GUE variant | 2.1. GUE variant | |||
The first two bits of the GUE header contain the GUE protocol variant | The first two bits of the GUE header contain the GUE protocol variant | |||
number. The variant number can indicate the version of the GUE | number. The variant number can indicate the version of the GUE | |||
protocol as well as alternate forms of a version. | protocol as well as alternate forms of a version. | |||
Variants 0 and 1 are described in this specification; variants 2 and | Variants 0 and 1 are described in this specification; variants 2 and | |||
3 are reserved. | 3 are reserved. | |||
skipping to change at page 8, line 21 ¶ | skipping to change at page 9, line 21 ¶ | |||
The header format for variant 0 of GUE in UDP is: | The header format for variant 0 of GUE in UDP is: | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ | |||
| Source port | Destination port | | | | Source port | Destination port | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP | |||
| Length | Checksum | | | | Length | Checksum | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ | |||
| 0 |C| Hlen | Proto/ctype | Flags | | | 0 |C| Hlen | Proto/ctype | Flags |\ | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | |||
| | | | | | | |||
~ Extensions Fields (optional) ~ | ~ Extensions Fields (optional) ~ | | |||
| | | | | GUE | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | |||
| | | | | | | |||
~ Private data (optional) ~ | ~ Private data (optional) ~ | | |||
| | | | | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ | |||
The contents of the UDP header are: | The contents of the UDP header are: | |||
o Source port: If connection semantics (section 5.6.1) are applied | o Source port: If connection semantics (section 5.6.1) are applied | |||
to an encapsulation, this is set to the local source port for | to an encapsulation, this is set to the local source port for | |||
the connection. When connection semantics are not applied, the | the connection. When connection semantics are not applied, the | |||
source port is either set to a flow entropy value as described | source port is either set to a flow entropy value, as described | |||
in section 5.11, or it should be set to the GUE assigned port | in section 5.11, or is set to the GUE assigned port number, | |||
number, 6080. | 6080. | |||
o Destination port: If connection semantics (section 5.6.1) are | o Destination port: If connection semantics (section 5.6.1) are | |||
applied to an encapsulation, this is set to the destination port | applied to an encapsulation, this is set to the destination port | |||
for the tuple. If connection semantics are not applied this is | for the tuple. If connection semantics are not applied then the | |||
set to the GUE assigned port number, 6080. | destination port is set to the GUE assigned port number, 6080. | |||
o Length: Canonical length of the UDP packet (length of UDP header | o Length: Canonical length of the UDP packet (length of UDP header | |||
and payload). | and payload). | |||
o Checksum: Standard UDP checksum (handling is described in | o Checksum: Standard UDP checksum (handling is described in | |||
section 5.7). | section 5.7). | |||
The GUE header consists of: | The GUE header consists of: | |||
o Variant: 0 indicates GUE protocol version 0 with a header. | o Variant: 0 indicates GUE protocol version 0 with a header. | |||
o C: C-bit: When set indicates a control message, not set | o C: C-bit: When set indicates a control message. When not set | |||
indicates a data message. | indicates a data message. | |||
o Hlen: Length in 32-bit words of the GUE header, including | o Hlen: Length in 32-bit words of the GUE header, including | |||
optional extension fields but not the first four bytes of the | optional extension fields but not the first four bytes of the | |||
header. Computed as (header_len - 4) / 4, where header_len is | header. Computed as (header_len - 4) / 4, where header_len is | |||
the total header length in bytes. All GUE headers are a multiple | the total header length in bytes. All GUE headers are a multiple | |||
of four bytes in length. Maximum header length is 128 bytes. | of four bytes in length. Maximum header length is 128 bytes. | |||
o Proto/ctype: When the C-bit is set, this field contains a | o Proto/ctype: When the C-bit is set, this field contains a | |||
control message type for the payload (section 3.2.2). When the | control message type for the payload (section 3.2.2). When the | |||
C-bit is not set, the field holds the Internet protocol number | C-bit is not set, the field holds the Internet protocol number | |||
for the encapsulated packet in the payload (section 3.2.1). The | for the encapsulated packet in the payload (section 3.2.1). The | |||
control message or encapsulated packet begins at the offset | control message or encapsulated packet begins at the offset | |||
provided by Hlen. | provided by Hlen. | |||
o Flags: Header flags that may be allocated for various purposes | o Flags: Header flags that may be allocated for various purposes | |||
and may indicate presence of extension fields. Undefined header | and may indicate the presence of extension fields. Undefined | |||
flag bits MUST be set to zero on transmission. | header flag bits MUST be set to zero on transmission. | |||
o Extension Fields: Optional fields whose presence is indicated by | o Extension Fields: Optional fields whose presence is indicated by | |||
corresponding flags. | corresponding flags. | |||
o Private data: Optional private data block (see section 3.4). If | o Private data: Optional private data block (see section 3.4). If | |||
the private block is present, it immediately follows that last | the private block is present, it immediately follows that last | |||
extension field present in the header. The private block is | extension field present in the header. The private block is | |||
considered to be part of the GUE header. The length of this data | considered to be part of the GUE header. The length of this data | |||
is determined by subtracting the starting offset from the header | is determined by subtracting the starting offset of the private | |||
length. | data from the header length. | |||
3.2. Proto/ctype field | 3.2. Proto/ctype field | |||
The proto/ctype fields either contains an Internet protocol number | The proto/ctype fields either contains an Internet protocol number | |||
(when the C-bit is not set) or GUE control message type (when the C- | (when the C-bit is not set) or GUE control message type (when the C- | |||
bit is set). | bit is set). | |||
3.2.1 Proto field | 3.2.1. Proto field | |||
When the C-bit is not set, the proto/ctype field MUST contain an IANA | When the C-bit is not set, the proto/ctype field MUST contain an IANA | |||
Internet Protocol Number. The protocol number is interpreted relative | Internet Protocol Number [IANA-PN]. The protocol number is | |||
to the IP protocol that encapsulates the UDP packet (i.e. protocol of | interpreted relative to the IP protocol that encapsulates the UDP | |||
the outer IP header). The protocol number serves as an indication of | packet (i.e. protocol of the outer IP header). The protocol number | |||
the type of the next protocol header which is contained in the GUE | serves as an indication of the type of the next protocol header which | |||
payload at the offset indicated in Hlen. Intermediate devices MAY | is contained in the GUE payload at the offset indicated in Hlen. | |||
parse the GUE payload per the number in the proto/ctype field, and | ||||
header flags cannot affect the interpretation of the proto/ctype | ||||
field. | ||||
When the outer IP protocol is IPv4, the proto field MUST be set to a | ||||
valid IP protocol number usable with IPv4; it MUST NOT be set to a | ||||
number for IPv6 extension headers or ICMPv6 options (number 58). An | ||||
exception is that the destination options extension header using the | ||||
PadN option MAY be used with IPv4 as described in section 3.6. The | ||||
"no next header" protocol number (59) also MAY be used with IPv4 as | ||||
described below. | ||||
When the outer IP protocol is IPv6, the proto field can be set to any | ||||
defined protocol number except that it MUST NOT be set to Hop-by-hop | ||||
options (number 0). If a received GUE packet in IPv6 contains a | ||||
protocol number that is an extension header (e.g. Destination | ||||
Options) then the extension header is processed after the GUE header | ||||
is processed as though the GUE header is an extension header. | ||||
IP protocol number 59 ("No next header") can be set to indicate that | IP protocol number 59 ("No next header") can be set to indicate that | |||
the GUE payload does not begin with the header of an IP protocol. | the GUE payload does not begin with the header of an IP protocol. | |||
This would be the case, for instance, if the GUE payload were a | This would be the case, for instance, if the GUE payload were a | |||
fragment when performing GUE level fragmentation. The interpretation | fragment when performing GUE level fragmentation. The interpretation | |||
of the payload is performed through other means (such as flags and | of the payload is performed through other means such as flags and | |||
extension fields), and intermediate devices MUST NOT parse packets | extension fields, and nodes MUST NOT parse packets based on the IP | |||
based on the IP protocol number in this case. | protocol number in this case. | |||
3.2.2 Ctype field | 3.2.2. Ctype field | |||
When the C-bit is set, the proto/ctype field MUST be set to a valid | When the C-bit is set, the proto/ctype field MUST be set to a valid | |||
control message type. A value of zero indicates that the GUE payload | control message type. A value of zero indicates that the GUE payload | |||
requires further interpretation to deduce the control type. This | requires further interpretation to deduce the control type. This | |||
might be the case when the payload is a fragment of a control | might be the case when the payload is a fragment of a control | |||
message, where only the reassembled packet can be interpreted as a | message, where only the reassembled packet can be interpreted as a | |||
control message. | control message. | |||
Control messages will be defined in an IANA registry. Control message | Control messages will be defined in an IANA registry. Control message | |||
types 1 through 127 may be defined in standards. Types 128 through | types 1 through 127 may be defined in standards. Types 128 through | |||
skipping to change at page 11, line 10 ¶ | skipping to change at page 11, line 39 ¶ | |||
message. Instead, it indicates that the GUE payload is a control | message. Instead, it indicates that the GUE payload is a control | |||
message, or part of a control message (as might be the case in GUE | message, or part of a control message (as might be the case in GUE | |||
fragmentation), that cannot be correctly parsed or interpreted | fragmentation), that cannot be correctly parsed or interpreted | |||
without additional context. | without additional context. | |||
3.3. Flags and extension fields | 3.3. Flags and extension fields | |||
Flags and associated extension fields are the primary mechanism of | Flags and associated extension fields are the primary mechanism of | |||
extensibility in GUE. As mentioned in section 3.1, GUE header flags | extensibility in GUE. As mentioned in section 3.1, GUE header flags | |||
indicate the presence of optional extension fields in the GUE header. | indicate the presence of optional extension fields in the GUE header. | |||
[GUEXTENS] defines an initial set of GUE extensions. | [GUEEXTEN] defines an initial set of GUE extensions. | |||
3.3.1. Requirements | 3.3.1. Requirements | |||
There are sixteen flag bits in the GUE header. Flags may indicate | There are sixteen flag bits in the GUE header. Flags may indicate | |||
presence of an extension fields. The size of an extension field | presence of extension fields. The size of an extension field | |||
indicated by a flag MUST be fixed. | indicated by a flag MUST be fixed in the specification of the flag. | |||
Flags can be paired together to allow different lengths for an | Flags can be paired together to allow different lengths for an | |||
extension field. For example, if two flag bits are paired, a field | extension field. For example, if two flag bits are paired, a field | |||
can possibly be three different lengths-- that is bit value of 00 | can possibly be three different lengths-- that is bit value of 00 | |||
indicates no field present; 01, 10, and 11 indicate three possible | indicates no field present; 01, 10, and 11 indicate three possible | |||
lengths for the field. Regardless of how flag bits are paired, the | lengths for the field. Regardless of how flag bits are paired, the | |||
lengths and offsets of optional fields corresponding to a set of | lengths and offsets of extension fields corresponding to a set of | |||
flags MUST be well defined. | flags MUST be well defined and deterministic. | |||
Extension fields are placed in order of the flags. New flags are to | Extension fields are placed in order of the flags. New flags are to | |||
be allocated from high to low order bit contiguously without holes. | be allocated from high to low order bit contiguously without holes. | |||
Flags allow random access, for instance to inspect the field | Flags allow random access, for instance to inspect the field | |||
corresponding to the Nth flag bit, an implementation only considers | corresponding to the Nth flag bit, an implementation only considers | |||
the previous N-1 flags to determine the offset. Flags after the Nth | the previous N-1 flags to determine the offset. Flags after the Nth | |||
flag are not pertinent in calculating the offset of the field for the | flag are not pertinent in calculating the offset of the field for the | |||
Nth flag. Random access of flags and fields permits processing of | Nth flag. Random access of flags and fields permits processing of | |||
optional extensions in an order that is independent of their position | optional extensions in an order that is independent of their position | |||
in the packet. | in the packet. | |||
skipping to change at page 11, line 50 ¶ | skipping to change at page 12, line 30 ¶ | |||
field always holds an IP protocol number as an invariant). | field always holds an IP protocol number as an invariant). | |||
The set of available flags can be extended in the future by defining | The set of available flags can be extended in the future by defining | |||
a "flag extensions bit" that refers to a field containing a new set | a "flag extensions bit" that refers to a field containing a new set | |||
of flags. | of flags. | |||
3.3.2. Example GUE header with extension fields | 3.3.2. Example GUE header with extension fields | |||
An example GUE header for a data message encapsulating an IPv4 packet | An example GUE header for a data message encapsulating an IPv4 packet | |||
and containing the Group Identifier and Security extension fields | and containing the Group Identifier and Security extension fields | |||
(both defined in [GUEXTENS]) is shown below: | (both defined in [GUEEXTEN]) is shown below: | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| 0 |0| 3 | 94 |1|0 0 1| 0 | | | 0 |0| 3 | 4 |1|0 0 1| 0 | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Group Identifier | | | Group Identifier | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | | | | | |||
+ Security + | + Security + | |||
| | | | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
In the above example, the first flag bit is set which indicates that | In the above example, the first flag bit is set which indicates that | |||
the Group Identifier extension is present which is a 32 bit field. | the Group Identifier extension is present which is a 32 bit field. | |||
The second through fourth bits of the flags are paired flags that | The second through fourth bits of the flags are paired flags that | |||
indicate the presence of a Security field with seven possible sizes. | indicate the presence of a Security field with seven possible sizes. | |||
In this example 001 indicates a sixty-four bit security field. | In this example 001 indicates a sixty-four bit security field. | |||
3.4. Private data | 3.4. Private data | |||
An implementation MAY use private data for its own use. The private | An implementation MAY use private data for its own use. The private | |||
data immediately follows the last field in the GUE header and is not | data immediately follows the last extension field in the GUE header | |||
a fixed length. This data is considered part of the GUE header and | and is not a fixed length. This data is considered part of the GUE | |||
MUST be accounted for in header length (Hlen). The length of the | header and MUST be accounted for in header length (Hlen). The length | |||
private data MUST be a multiple of four and is determined by | of the private data MUST be a multiple of four bytes and is | |||
subtracting the offset of private data in the GUE header from the | determined by subtracting the offset of private data in the GUE | |||
header length. Specifically: | header from the header length. Specifically: | |||
Private_length = (Hlen * 4) - Length(flags) | Private_length = (Hlen * 4) - Length(flags) | |||
where "Length(flags)" returns the sum of lengths of all the extension | where "Length(flags)" returns the sum of lengths of all the extension | |||
fields present in the GUE header. When there is no private data | fields present in the GUE header. When there is no private data | |||
present, the length of the private data is zero. | present, the length of the private data is zero. | |||
The semantics and interpretation of private data are implementation | The semantics and interpretation of private data are implementation | |||
specific. The private data may be structured as necessary, for | specific. The private data may be structured as necessary, for | |||
instance it might contain its own set of flags and extension fields. | instance it might contain its own set of flags and extension fields. | |||
skipping to change at page 13, line 10 ¶ | skipping to change at page 13, line 41 ¶ | |||
If a decapsulator receives a GUE packet with private data, it MUST | If a decapsulator receives a GUE packet with private data, it MUST | |||
validate the private data appropriately. If a decapsulator does not | validate the private data appropriately. If a decapsulator does not | |||
expect private data from an encapsulator, the packet MUST be dropped. | expect private data from an encapsulator, the packet MUST be dropped. | |||
If a decapsulator cannot validate the contents of private data per | If a decapsulator cannot validate the contents of private data per | |||
the provided semantics, the packet MUST also be dropped. An | the provided semantics, the packet MUST also be dropped. An | |||
implementation MAY place security data in GUE private data which if | implementation MAY place security data in GUE private data which if | |||
present MUST be verified for packet acceptance. | present MUST be verified for packet acceptance. | |||
3.5. Message types | 3.5. Message types | |||
There are two message types in GUE variant 0: control messages and | ||||
data messages. | ||||
3.5.1. Control messages | 3.5.1. Control messages | |||
Control messages carry formatted data that are implicitly addressed | Control messages carry formatted data that are implicitly addressed | |||
to the decapsulator to monitor or control the state or behavior of a | to the decapsulator to monitor or control the state or behavior of a | |||
tunnel (OAM). For instance, an echo request and corresponding echo | tunnel (OAM). For instance, an echo request and corresponding echo | |||
reply message can be defined to test for liveness. | reply message can be defined to test for liveness. | |||
Control messages are indicated in the GUE header when the C-bit is | Control messages are indicated in the GUE header when the C-bit is | |||
set. The payload is interpreted as a control message with type | set. The payload is interpreted as a control message with type | |||
specified in the proto/ctype field. The format and contents of the | specified in the proto/ctype field. The format and contents of the | |||
control message are indicated by the type and can be variable length. | control message are indicated by the type and can be variable length. | |||
Other than interpreting the proto/ctype field as a control message | Other than interpreting the proto/ctype field as a control message | |||
type, the meaning and semantics of the rest of the elements in the | type, the meaning and semantics of the rest of the elements in the | |||
GUE header are the same as that of data messages. Forwarding and | GUE header are the same as that of data messages. Forwarding and | |||
routing of control messages should be the same as that of a data | routing of control messages should be the same as that of a data | |||
message with the same outer IP and UDP header and GUE flags; this | message with the same outer IP and UDP header; this ensures that | |||
ensures that control messages can be created that follow the same | control messages can be created that follow the same path through the | |||
path as data messages. | network as data messages. | |||
3.5.2. Data messages | 3.5.2. Data messages | |||
Data messages carry encapsulated packets that are addressed to the | Data messages carry encapsulated packets that are addressed to the | |||
protocol stack for the associated protocol. Data messages are a | protocol stack for the associated protocol. Data messages are a | |||
primary means of encapsulation and can be used to create tunnels for | primary means of encapsulation and can be used to create tunnels for | |||
overlay networks. | overlay networks. | |||
Data messages are indicated in GUE header when the C-bit is not set. | Data messages are indicated in GUE header when the C-bit is not set. | |||
The payload of a data message is interpreted as an encapsulated | The payload of a data message is interpreted as an encapsulated | |||
packet of an Internet protocol indicated in the proto/ctype field. | packet of an Internet protocol indicated in the proto/ctype field. | |||
The packet immediately follows the GUE header. | The encapsulated packet immediately follows the GUE header. | |||
3.6. Hiding the transport layer protocol number | ||||
The GUE header indicates the Internet protocol of the encapsulated | ||||
packet. A protocol number is either contained in the Proto/ctype | ||||
field of the primary GUE header or in the Payload Type field of a GUE | ||||
Transform extension field (used to encrypt the payload with DTLS, | ||||
[GUEEXTEN]). If the transport protocol number needs to be hidden from | ||||
the network, then a trivial destination options can be used. | ||||
The PadN destination option [RFC2460] can be used to encode the | ||||
transport protocol as a next header of an extension header (and | ||||
maintain alignment of encapsulated transport headers). The | ||||
Proto/ctype field or Payload Type field of the GUE Transform field is | ||||
set to 60 to indicate that the first encapsulated header is a | ||||
destination options extension header. | ||||
The format of the extension header is below: | ||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| Next Header | 2 | 1 | 0 | | ||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
For IPv4, it is permitted in GUE to used this precise destination | ||||
option to contain the obfuscated protocol number. In this case next | ||||
header MUST refer to a valid IP protocol for IPv4. No other extension | ||||
headers or destination options are permitted with IPv4. | ||||
4. Variant 1 | 4. Variant 1 | |||
Variant 1 of GUE allows direct encapsulation of IPv4 and IPv6 in UDP. | Variant 1 of GUE allows direct encapsulation of IPv4 and IPv6 in UDP. | |||
In this variant there is no GUE header; a UDP packet carries an IP | In this variant there is no GUE header, a UDP packet carries an IP | |||
packet. The first two bits of the UDP payload for GUE are the GUE | packet. The first two bits of the UDP payload are the GUE variant | |||
variant and coincide with the first two bits of the version number in | field and coincide with the first two bits of the version number in | |||
the IP header. The first two version bits of IPv4 and IPv6 are 01, so | the IP header. The first two version bits of IPv4 and IPv6 are 01, so | |||
we use GUE variant 1 for direct IP encapsulation which makes two bits | we use GUE variant 1 for direct IP encapsulation which makes the two | |||
of GUE variant to also be 01. | bits of GUE variant to also be 01. | |||
This technique is effectively a means to compress out the version 0 | This technique is effectively a means to compress out the GUE version | |||
GUE header when encapsulating IPv4 or IPv6 packets and there are no | 0 header when encapsulating IPv4 or IPv6 packets and there are no | |||
flags or extension fields present. This method is compatible to use | flags, extension fields, or private data present. This method is | |||
on the same port number as packets with the GUE header (GUE variant 0 | compatible to use on the same port number as packets with the GUE | |||
packets). This technique saves encapsulation overhead on costly links | header (GUE variant 0 packets). This technique saves encapsulation | |||
for the common use of IP encapsulation, and also obviates the need to | overhead on costly links for the common use of IP encapsulation, and | |||
allocate a separate port number for IP-over-UDP encapsulation. | also obviates the need to allocate a separate UDP port number for IP- | |||
over-UDP encapsulation. | ||||
4.1. Direct encapsulation of IPv4 | 4.1. Direct encapsulation of IPv4 | |||
The format for encapsulating IPv4 directly in UDP is: | The format for encapsulating IPv4 directly in UDP is: | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ | |||
| Source port | Destination port | | | | Source port | Destination port | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP | |||
skipping to change at page 15, line 48 ¶ | skipping to change at page 15, line 30 ¶ | |||
| Time to Live | Protocol | Header Checksum | | | Time to Live | Protocol | Header Checksum | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Source IPv4 Address | | | Source IPv4 Address | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Destination IPv4 Address | | | Destination IPv4 Address | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
The UDP fields are set in a similar manner as described in section | The UDP fields are set in a similar manner as described in section | |||
3.1. | 3.1. | |||
Note that the 0100 value in the first four bits of the the UDP | Note that the 0100 value in the first four bits of the UDP payload | |||
payload expresses the GUE variant as 1 (bits 01) and IP version as 4 | expresses the GUE variant as 1 (bits 01) and IP version as 4 (bits | |||
(bits 0100). | 0100). | |||
4.2. Direct encapsulation of IPv6 | 4.2. Direct encapsulation of IPv6 | |||
The format for encapsulating IPv6 directly in UDP is demonstrated | The format for encapsulating IPv6 directly in UDP is demonstrated | |||
below: | below: | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ | |||
| Source port | Destination port | | | | Source port | Destination port | | | |||
skipping to change at page 17, line 13 ¶ | skipping to change at page 17, line 13 ¶ | |||
(bits 0110). | (bits 0110). | |||
5. Operation | 5. Operation | |||
The figure below illustrates the use of GUE encapsulation between two | The figure below illustrates the use of GUE encapsulation between two | |||
hosts. Host 1 is sending packets to Host 2. An encapsulator performs | hosts. Host 1 is sending packets to Host 2. An encapsulator performs | |||
encapsulation of packets from Host 1. These encapsulated packets | encapsulation of packets from Host 1. These encapsulated packets | |||
traverse the network as UDP packets. At the decapsulator, packets are | traverse the network as UDP packets. At the decapsulator, packets are | |||
decapsulated and sent on to Host 2. Packet flow in the reverse | decapsulated and sent on to Host 2. Packet flow in the reverse | |||
direction need not be symmetric; for example, the reverse path might | direction need not be symmetric; for example, the reverse path might | |||
not use GUE and/or any other form of encapsulation. | not use GUE or any other form of encapsulation. | |||
+---------------+ +---------------+ | +---------------+ +---------------+ | |||
| | | | | | | | | | |||
| Host 1 | | Host 2 | | | Host 1 | | Host 2 | | |||
| | | | | | | | | | |||
+---------------+ +---------------+ | +---------------+ +---------------+ | |||
| ^ | | ^ | |||
V | | V | | |||
+---------------+ +---------------+ +---------------+ | +---------------+ +---------------+ +---------------+ | |||
| | | | | | | | | | | | | | |||
| Encapsulator |-->| Layer 3 |-->| Decapsulator | | | Encapsulator |-->| Layer 3 |-->| Decapsulator | | |||
| | | Network | | | | | | | Network | | | | |||
+---------------+ +---------------+ +---------------+ | +---------------+ +---------------+ +---------------+ | |||
The encapsulator and decapsulator may be co-resident with the | The encapsulator and decapsulator may be co-resident with the | |||
corresponding hosts, or may be on separate nodes in the network. | corresponding hosts, or may be on separate nodes in the network. | |||
5.1. Network tunnel encapsulation | 5.1. Network tunnel encapsulation | |||
Network tunneling can be achieved by encapsulating layer 2 or layer 3 | Network tunneling can be achieved by encapsulating layer 2 or layer 3 | |||
packets. In this case the encapsulator and decapsulator nodes are the | packets. In this case, the encapsulator and decapsulator nodes are | |||
tunnel endpoints. These could be routers that provide network tunnels | the tunnel endpoints. These could be routers that provide network | |||
on behalf of communicating hosts. | tunnels on behalf of communicating hosts. | |||
5.2. Transport layer encapsulation | 5.2. Transport layer encapsulation | |||
When encapsulating layer 4 packets, the encapsulator and decapsulator | When encapsulating layer 4 packets, the encapsulator and decapsulator | |||
should be co-resident with the hosts. In this case, the encapsulation | should be co-resident with the hosts. In this case, the encapsulation | |||
headers are inserted between the IP header and the transport packet. | headers are inserted between the IP header and the transport packet. | |||
The addresses in the IP header refer to both the endpoints of the | The addresses in the IP header refer to both the endpoints of the | |||
encapsulation and the endpoints for terminating the transport | encapsulation and the endpoints for terminating the encapsulated | |||
protocol. Note that the transport layer ports in the encapsulated | transport protocol. Note that the transport layer ports in the | |||
packet are independent of the UDP ports in the outer packet. | encapsulated packet are independent of the UDP ports in the outer | |||
packet. | ||||
Details about performing transport layer encapsulation are discussed | ||||
in [TOU]. | ||||
5.3. Encapsulator operation | 5.3. Encapsulator operation | |||
Encapsulators create GUE data messages, set the fields of the UDP | Encapsulators create GUE data messages, set the fields of the UDP | |||
header, set flags and optional extension fields in the GUE header, | header, set flags and optional extension fields in the GUE header, | |||
and forward packets to a decapsulator. | and forward packets to a decapsulator. | |||
An encapsulator can be an end host originating the packets of a flow, | An encapsulator can be an end host originating the packets of a flow, | |||
or can be a network device performing encapsulation on behalf of | or can be a network device performing encapsulation on behalf of | |||
hosts (routers implementing tunnels for instance). In either case, | hosts (routers implementing tunnels for instance). In either case, | |||
the intended target (decapsulator) is indicated by the outer | the intended target (decapsulator) is indicated by the outer | |||
destination IP address and destination port in the UDP header. | destination IP address and destination port in the UDP header. | |||
If an encapsulator is tunneling packets -- that is encapsulating | If an encapsulator is tunneling packets -- that is encapsulating | |||
packets of layer 2 or layer 3 protocols (e.g. EtherIP, IPIP, ESP | packets of layer 2 or layer 3 protocols (e.g. EtherIP, IPIP, ESP | |||
tunnel mode) -- it SHOULD follow standard conventions for tunneling | tunnel mode) -- it SHOULD follow standard conventions for tunneling | |||
of one protocol over another. For instance, if an IP packet is being | one protocol over another. For instance, if an IP packet is being | |||
encapsualated in GUE then diffserv interaction [RFC2983] and ECN | encapsulated in GUE then diffserv interaction [RFC2983] and ECN | |||
propagation for tunnels [RFC6040] SHOULD be followed. | propagation for tunnels [RFC6040] SHOULD be followed. | |||
5.4. Decapsulator operation | 5.4. Decapsulator operation | |||
A decapsulator performs decapsulation of GUE packets. A decapsulator | A decapsulator performs decapsulation of GUE packets. A decapsulator | |||
is addressed by the outer destination IP address of a GUE packet. | is addressed by the outer destination IP address and UDP destination | |||
The decapsulator validates packets, including fields of the GUE | port of a GUE packet. The decapsulator validates packets, including | |||
header. | fields of the GUE header. | |||
If a decapsulator receives a GUE packet with an unsupported variant, | If a decapsulator receives a GUE packet with an unsupported variant, | |||
unknown flag, bad header length (too small for included extension | unknown flag, bad header length (too small for included extension | |||
fields), unknown control message type, bad protocol number, an | fields), unknown control message type, bad protocol number, an | |||
unsupported payload type, or an otherwise malformed header, it MUST | unsupported payload type, or an otherwise malformed header, it MUST | |||
drop the packet. Such events MAY be logged subject to configuration | drop the packet. Such events MAY be logged subject to configuration | |||
and rate limiting of logging messages. Note that set flags in a GUE | and rate limiting of logging messages. Note that set flags in a GUE | |||
header that are unknown to a decapsulator MUST NOT be ignored. If a | header that are unknown to a decapsulator MUST NOT be ignored. If a | |||
GUE packet is received by a decapsulator with unknown flags, the | GUE packet is received by a decapsulator with unknown flags, the | |||
packet MUST be dropped. | packet MUST be dropped. | |||
5.4.1. Processing a received data message | 5.4.1. Processing a received data message | |||
If a valid data message is received, the UDP header and GUE header | If a valid data message is received, the UDP header and GUE header | |||
are removed from the packet. The outer IP header remains intact and | are (logically) removed from the packet. The outer IP header remains | |||
the next protocol in the IP header is set to the protocol from the | intact and the next protocol in the IP header is set to the protocol | |||
proto field in the GUE header. The resulting packet is then | from the proto field in the GUE header. The resulting packet is then | |||
resubmitted into the protocol stack to process that packet as though | resubmitted into the protocol stack to process the packet as though | |||
it was received with the protocol in the GUE header. | it was received with the protocol indicated in the GUE header. | |||
As an example, consider that a data message is received where GUE | As an example, consider that a data message is received where GUE | |||
encapsulates an IPv4 packet using GUE variant 0. In this case proto | encapsulates an IPv4 packet using GUE variant 0. In this case proto | |||
field in the GUE header is set to 4 for IPv4 encapsulation: | field in the GUE header is set to 4 for IPv4 encapsulation: | |||
+-------------------------------------+ | +-------------------------------------+ | |||
| IP header (next proto = 17,UDP) | | | IP header (next proto = 17,UDP) | | |||
|-------------------------------------| | |-------------------------------------| | |||
| UDP | | | UDP | | |||
|-------------------------------------| | |-------------------------------------| | |||
skipping to change at page 19, line 22 ¶ | skipping to change at page 19, line 22 ¶ | |||
| IPv4 header and packet | | | IPv4 header and packet | | |||
+-------------------------------------+ | +-------------------------------------+ | |||
The receiver removes the UDP and GUE headers and sets the next | The receiver removes the UDP and GUE headers and sets the next | |||
protocol field in the IP packet to 4, which is derived from the GUE | protocol field in the IP packet to 4, which is derived from the GUE | |||
proto field. The resultant packet would have the format: | proto field. The resultant packet would have the format: | |||
+-------------------------------------+ | +-------------------------------------+ | |||
| IP header (next proto = 4,IPv4) | | | IP header (next proto = 4,IPv4) | | |||
|-------------------------------------| | |-------------------------------------| | |||
| IP header and packet | | | IPv4 header and packet | | |||
+-------------------------------------+ | +-------------------------------------+ | |||
This packet is then resubmitted into the protocol stack to be | This packet is then resubmitted into the protocol stack to be | |||
processed as an IPv4 encapsulated packet. | processed as an IPv4 encapsulated packet. | |||
5.4.2. Processing a received control message | 5.4.2. Processing a received control message | |||
If a valid control message is received, the packet MUST be processed | If a valid control message is received, the packet MUST be processed | |||
as a control message. The specific processing to be performed depends | as a control message. The specific processing to be performed depends | |||
on the value in the ctype field of the GUE header. | on the value in the ctype field of the GUE header. | |||
5.5. Router and switch operation | 5.5. Middlebox inspection | |||
A middlebox MAY inspect a GUE header. A middlebox MUST NOT modify a | ||||
GUE header or UDP payload. | ||||
To inspect a GUE header, a middlebox needs to identify GUE packets. | ||||
The obvious method is to match the destination UDP port number to be | ||||
the GUE port number (i.e. 6080). Per [RFC7605], transport port | ||||
numbers only have meaning at the endpoints of communications, so | ||||
inferring the type of a UDP payload based on port number may be | ||||
incorrect. Middleboxes MUST NOT take any action that would have | ||||
harmful side effects if a UDP packet were misinterpreted as being a | ||||
GUE packet. In particular, a middlebox MUST NOT modify a UDP payload | ||||
based on inferring the payload type from the port number lest the | ||||
middlebox could cause silent data corruption. | ||||
A middlebox MAY interpret some flags and extension fields of the GUE | ||||
header for classification purposes, but is not required to understand | ||||
any of the flags or extension fields in GUE packets. A middlebox MUST | ||||
NOT drop a GUE packet merely because there are flags unknown to it. | ||||
Similarly, a middlebox MUST NOT arbitrarily filter packets based on | ||||
GUE flags or extension fields that are present or not present. The | ||||
header length in the GUE header allows a middlebox to inspect the | ||||
payload packet without needing to parse the flags or extension | ||||
fields. | ||||
5.6. Router and switch operation | ||||
Routers and switches SHOULD forward GUE packets as standard UDP/IP | Routers and switches SHOULD forward GUE packets as standard UDP/IP | |||
packets. The outer five-tuple should contain sufficient information | packets. The outer five-tuple should contain sufficient information | |||
to perform flow classification corresponding to the flow of the inner | to perform flow classification corresponding to the flow of the inner | |||
packet. A router does not normally need to parse a GUE header, and | packet. A router does not normally need to parse a GUE header, and | |||
none of the flags or extension fields in the GUE header are expected | none of the flags or extension fields in the GUE header are expected | |||
to affect routing. In cases where the outer five-tuple does not | to affect routing. In cases where the outer five-tuple does not | |||
provide sufficient entropy for flow classification, for instance UDP | provide sufficient entropy for flow classification, for instance UDP | |||
ports are fixed to provide connection semantics (section 5.6.1), then | ports are fixed to provide connection semantics (section 5.6.1), then | |||
the encapsulated packet MAY be parsed to determine flow entropy. | the encapsulated packet MAY be parsed to determine flow entropy. | |||
A router MUST NOT modify a GUE header when forwarding a packet. It | A router MUST NOT modify a GUE header or payload when forwarding a | |||
MAY encapsulate a GUE packet in another GUE packet, for instance to | packet. It MAY encapsulate a GUE packet in another GUE packet, for | |||
implement a network tunnel (i.e. by encapsulating an IP packet with a | instance to implement a network tunnel (i.e. by encapsulating an IP | |||
GUE payload in another IP packet as a GUE payload). In this case, the | packet with a GUE payload in another IP packet as a GUE payload). In | |||
router takes the role of an encapsulator, and the corresponding | this case, the router takes the role of an encapsulator, and the | |||
decapsulator is the logical endpoint of the tunnel. When | corresponding decapsulator is the logical endpoint of the tunnel. | |||
encapsulating a GUE packet within another GUE packet, there are no | When encapsulating a GUE packet within another GUE packet, there are | |||
provisions to automatically copy flags or fields to the outer GUE | no provisions to automatically copy flags or fields to the outer GUE | |||
header. Each layer of encapsulation is considered independent. | header. Each layer of encapsulation is considered independent. | |||
5.6. Middlebox interactions | 5.6.1. Connection semantics | |||
A middlebox MAY interpret some flags and extension fields of the GUE | ||||
header for classification purposes, but is not required to understand | ||||
any of the flags or extension fields in GUE packets. A middlebox MUST | ||||
NOT drop a GUE packet merely because there are flags unknown to it. | ||||
The header length in the GUE header allows a middlebox to inspect the | ||||
payload packet without needing to parse the flags or extension | ||||
fields. | ||||
5.6.1. Inferring connection semantics | ||||
A middlebox might infer bidirectional connection semantics for a UDP | A middlebox might infer bidirectional connection semantics for a UDP | |||
flow. For instance, a stateful firewall might create a five-tuple | flow. For instance, a stateful firewall might create a five-tuple | |||
rule to match flows on egress, and a corresponding five-tuple rule | rule to match flows on egress, and a corresponding five-tuple rule | |||
for matching ingress packets where the roles of source and | for matching ingress packets where the roles of source and | |||
destination are reversed for the IP addresses and UDP port numbers. | destination are reversed for the IP addresses and UDP port numbers. | |||
To operate in this environment, a GUE tunnel should be configured to | To operate in this environment, a GUE tunnel should be configured to | |||
assume connected semantics defined by the UDP five tuple and the use | assume connected semantics defined by the UDP five tuple and the use | |||
of GUE encapsulation needs to be symmetric between both endpoints. | of GUE encapsulation needs to be symmetric between both endpoints. | |||
The source port set in the UDP header MUST be the destination port | The source port set in the UDP header MUST be the destination port | |||
skipping to change at page 20, line 40 ¶ | skipping to change at page 21, line 8 ¶ | |||
described in section 5.11. | described in section 5.11. | |||
The selection of whether to make the UDP source port fixed or set to | The selection of whether to make the UDP source port fixed or set to | |||
a flow entropy value for each packet sent SHOULD be configurable for | a flow entropy value for each packet sent SHOULD be configurable for | |||
a tunnel. The default MUST be to set the flow entropy value in the | a tunnel. The default MUST be to set the flow entropy value in the | |||
UDP source port. | UDP source port. | |||
5.6.2. NAT | 5.6.2. NAT | |||
IP address and port translation can be performed on the UDP/IP | IP address and port translation can be performed on the UDP/IP | |||
headers adhering to the requirements for NAT with UDP [RFC4787]. In | headers adhering to the requirements for NAT (Network Address | |||
the case of stateful NAT, connection semantics MUST be applied to a | Translation) with UDP [RFC4787]. In the case of stateful NAT, | |||
GUE tunnel as described in section 5.6.1. GUE endpoints MAY also | connection semantics MUST be applied to a GUE tunnel as described in | |||
invoke STUN [RFC5389] or ICE [RFC5245] to manage NAT port mappings | section 5.6.1. GUE endpoints MAY also invoke STUN [RFC5389] or ICE | |||
for encapsulations. | [RFC5245] to manage NAT port mappings for encapsulations. | |||
5.7. Checksum Handling | 5.7. Checksum Handling | |||
The potential for mis-delivery of packets due to corruption of IP, | The potential for mis-delivery of packets due to corruption of IP, | |||
UDP, or GUE headers needs to be considered. Historically, the UDP | UDP, or GUE headers needs to be considered. Historically, the UDP | |||
checksum would be considered sufficient as a check against corruption | checksum would be considered sufficient as a check against corruption | |||
of either the UDP header and payload or the IP addresses. | of either the UDP header and payload or the IP addresses. | |||
Encapsulation protocols, such as GUE, can be originated or terminated | Encapsulation protocols, such as GUE, can be originated or terminated | |||
on devices incapable of computing the UDP checksum for packet. This | on devices incapable of computing the UDP checksum for packet. This | |||
section discusses the requirements around checksum and alternatives | section discusses the requirements around checksum and alternatives | |||
that might be used when an endpoint does not support UDP checksum. | that might be used when an endpoint does not support UDP checksum. | |||
5.7.1. Requirements | 5.7.1. Requirements | |||
One of the following requirements MUST be met: | One of the following requirements MUST be met: | |||
o UDP checksums are enabled (for IPv4 or IPv6). | o UDP checksums are enabled (for IPv4 or IPv6). | |||
o The GUE header checksum is used (defined in [GUEEXTEN]). | o The GUE header checksum is used (defined in [GUEEXTEN]). | |||
o Use zero UDP checksums. This is always permissible with IPv4; in | o Use zero UDP checksums. This is always permissible with IPv4; in | |||
IPv6, they can only be used in accordance with applicable | IPv6, they can only be used in accordance with applicable | |||
requirements in [RFC8086], [RFC6935], and [RFC6936]. | requirements in [RFC8086], [RFC6935], and [RFC6936]. | |||
5.7.2. UDP Checksum with IPv4 | 5.7.2. UDP Checksum with IPv4 | |||
For UDP in IPv4, the UDP checksum MUST be processed as specified in | For UDP in IPv4, the UDP checksum MUST be processed as specified in | |||
[RFC768] and [RFC1122] for both transmit and receive. An | [RFC0768] and [RFC1122] for both transmit and receive. An | |||
encapsulator MAY set the UDP checksum to zero for performance or | encapsulator MAY set the UDP checksum to zero for performance or | |||
implementation considerations. The IPv4 header includes a checksum | implementation considerations. The IPv4 header includes a checksum | |||
that protects against mis-delivery of the packet due to corruption | that protects against mis-delivery of the packet due to corruption of | |||
of IP addresses. The UDP checksum potentially provides protection | IP addresses. The UDP checksum potentially provides protection | |||
against corruption of the UDP header, GUE header, and GUE payload. | against corruption of the UDP header, GUE header, and GUE payload. | |||
Enabling or disabling the use of checksums is a deployment | Enabling or disabling the use of checksums is a deployment | |||
consideration that should take into account the risk and effects of | consideration that should take into account the risk and effects of | |||
packet corruption, and whether the packets in the network are | packet corruption, and whether the packets in the network are already | |||
already adequately protected by other, possibly stronger mechanisms, | adequately protected by other, possibly stronger mechanisms, such as | |||
such as the Ethernet CRC. If an encapsulator sets a zero UDP | the Ethernet CRC. If an encapsulator sets a zero UDP checksum for | |||
checksum for IPv4, it SHOULD use the GUE header checksum as | IPv4, it SHOULD use the GUE header checksum as described in | |||
described in [GUEEXTEN] assuming there are no other mechanisms used | [GUEEXTEN] if there are no other mechanisms used that would detect | |||
to protect the GUE packet. | corruption of GUE packets. | |||
When a decapsulator receives a packet, the UDP checksum field MUST | When a decapsulator receives a packet, the UDP checksum field MUST be | |||
be processed. If the UDP checksum is non-zero, the decapsulator MUST | processed. If the UDP checksum is non-zero, the decapsulator MUST | |||
verify the checksum before accepting the packet. By default, a | verify the checksum before accepting the packet. By default, a | |||
decapsulator SHOULD accept UDP packets with a zero checksum. A node | decapsulator SHOULD accept UDP packets with a zero checksum. A node | |||
MAY be configured to disallow zero checksums per [RFC1122]. | MAY be configured to disallow zero checksums per [RFC1122]. | |||
Configuration of zero checksums can be selective. For instance, zero | Configuration of zero checksums can be selective. For instance, zero | |||
checksums might be disallowed from certain hosts that are known to | checksums might be disallowed from certain hosts that are known to be | |||
be traversing paths subject to packet corruption. If verification of | traversing paths subject to packet corruption. If verification of a | |||
a non-zero checksum fails, a decapsulator lacks the capability to | non-zero checksum fails, a decapsulator lacks the capability to | |||
verify a non-zero checksum, or a packet with a zero-checksum was | verify a non-zero checksum, or a packet with a zero-checksum was | |||
received and the decapsulator is configured to disallow, then the | received and the decapsulator is configured to disallow that, then | |||
packet MUST be dropped. | the packet MUST be dropped. | |||
5.7.3. UDP Checksum with IPv6 | 5.7.3. UDP Checksum with IPv6 | |||
In IPv6, there is no checksum in the IPv6 header that protects | In IPv6, there is no checksum in the IPv6 header that protects | |||
against mis-delivery due to address corruption. Therefore, when GUE | against mis-delivery due to address corruption. Therefore, when GUE | |||
is used over IPv6, either the UDP checksum or the GUE header | is used over IPv6, either the UDP checksum or the GUE header checksum | |||
checksum SHOULD be used unless there are alternative mechanisms in | SHOULD be used unless there are alternative mechanisms in use that | |||
use that protect against misdelivery. The UDP checksum and GUE | protect against misdelivery. The UDP checksum and GUE header checksum | |||
header checksum SHOULD NOT be used at the same time since that would | SHOULD NOT be used at the same time since that would be mostly | |||
be mostly redundant. | redundant. | |||
If neither the UDP checksum or the GUE header checksum is used, then | If neither the UDP checksum nor the GUE header checksum is used, then | |||
the requirements for using zero IPv6 UDP checksums in [RFC6935] and | the requirements for using zero IPv6 UDP checksums in [RFC6935] and | |||
[RFC6936] MUST be met. | [RFC6936] MUST be met. | |||
When a decapsulator receives a packet, the UDP checksum field MUST | When a decapsulator receives a packet, the UDP checksum field MUST be | |||
be processed. If the UDP checksum is non-zero, the decapsulator MUST | processed. If the UDP checksum is non-zero, the decapsulator MUST | |||
verify the checksum before accepting the packet. By default a | verify the checksum before accepting the packet. By default a | |||
decapsulator MUST only accept UDP packets with a zero checksum if | decapsulator MUST only accept UDP packets with a zero checksum if the | |||
the GUE header checksum is used and is verified. If verification of | GUE header checksum is used and is verified. If verification of a | |||
a non-zero checksum fails, a decapsulator lacks the capability to | non-zero checksum fails or a decapsulator lacks the capability to | |||
verify a non-zero checksum, or a packet with a zero-checksum and no | verify a non-zero checksum then the packet MUST be dropped. If a | |||
GUE header checksum was received, the packet MUST be dropped. | packet is received with a zero UDP checksum, no GUE header checksum, | |||
and zero UDP checksums are disallowed then the packet MUST be | ||||
dropped. | ||||
5.8. MTU and fragmentation | 5.8. MTU and fragmentation | |||
Standard conventions for handling of MTU (Maximum Transmission Unit) | Standard conventions for handling of MTU (Maximum Transmission Unit) | |||
and fragmentation in conjunction with networking tunnels | and fragmentation in conjunction with networking tunnels | |||
(encapsulation of layer 2 or layer 3 packets) SHOULD be followed. | (encapsulation of layer 2 or layer 3 packets) SHOULD be followed. | |||
Details are described in MTU and Fragmentation Issues with In-the- | Details are described in MTU and Fragmentation Issues with In-the- | |||
Network Tunneling [RFC4459]. | Network Tunneling [RFC4459]. | |||
If a packet is fragmented before encapsulation in GUE, all the | If a packet is fragmented before encapsulation in GUE, all the | |||
related fragments MUST be encapsulated using the same UDP source | related fragments MUST be encapsulated using the same UDP source | |||
port. An operator SHOULD set MTU to account for encapsulation | port. An operator SHOULD set MTU to account for encapsulation | |||
overhead and reduce the likelihood of fragmentation. | overhead and reduce the likelihood of fragmentation. | |||
Alternative to IP fragmentation, the GUE fragmentation extension can | Alternative to IP fragmentation, the GUE fragmentation extension can | |||
be used. GUE fragmentation is described in [GUEEXTEN]. | be used. GUE fragmentation is described in [GUEEXTEN]. | |||
5.9. Congestion control | 5.9. Congestion control | |||
Per requirements of [RFC5405], if the IP traffic encapsulated with | Per requirements of [RFC8085], if the IP traffic encapsulated with | |||
GUE implements proper congestion control no additional mechanisms | GUE implements proper congestion control then no additional | |||
should be required. | mechanisms should be required. | |||
In the case that the encapsulated traffic does not implement any or | In the case that the encapsulated traffic does not implement any or | |||
sufficient control, or it is not known whether a transmitter will | sufficient control, or it is not known whether a transmitter will | |||
consistently implement proper congestion control, then congestion | consistently implement proper congestion control, then congestion | |||
control at the encapsulation layer MUST be provided per [RFC5405]. | control at the encapsulation layer MUST be provided per [RFC8085]. | |||
Note that this case applies to a significant use case in network | Note that this case applies to a significant use case in network | |||
virtualization in which guests run third party networking stacks | virtualization in which guests run third party networking stacks that | |||
that cannot be implicitly trusted to implement conformant congestion | cannot be implicitly trusted to implement conformant congestion | |||
control. | control. | |||
Out of band mechanisms such as rate limiting, Managed Circuit | Out of band mechanisms such as rate limiting, Managed Circuit Breaker | |||
Breaker [RFC8084], or traffic isolation MAY be used to provide | [RFC8084], or traffic isolation MAY be used to provide rudimentary | |||
rudimentary congestion control. For finer-grained congestion control | congestion control. For finer-grained congestion control that allows | |||
that allows alternate congestion control algorithms, reaction time | alternate congestion control algorithms, reaction time within an RTT, | |||
within an RTT, and interaction with ECN, in-band mechanisms might be | and interaction with ECN, in-band mechanisms might be warranted. | |||
warranted. | ||||
5.10. Multicast | 5.10. Multicast | |||
GUE packets can be multicast to decapsulators using a multicast | GUE packets can be multicast to decapsulators using a multicast | |||
destination address in the encapsulating IP headers. Each receiving | destination address in the outer IP header. Each receiving host will | |||
host will decapsulate the packet independently following normal | decapsulate the packet independently following normal decapsulator | |||
decapsulator operations. The receiving decapsulators need to agree | operations. The receiving decapsulators need to agree on the same set | |||
on the same set of GUE parameters and properties; how such an | of GUE parameters and properties; how such an agreement is reached is | |||
agreement is reached is outside the scope of this document. | outside the scope of this document. | |||
GUE allows encapsulation of unicast, broadcast, or multicast | GUE allows encapsulation of unicast, broadcast, or multicast traffic. | |||
traffic. Flow entropy (the value in the UDP source port) can be | Flow entropy (the value in the UDP source port) can be generated from | |||
generated from the header of encapsulated unicast or | the header of encapsulated unicast or broadcast/multicast packets at | |||
broadcast/multicast packets at an encapsulator. The mapping | an encapsulator. The mapping mechanism between the encapsulated | |||
mechanism between the encapsulated multicast traffic and the | multicast traffic and the multicast capability in the IP network is | |||
multicast capability in the IP network is transparent and | transparent and independent of the encapsulation and is otherwise | |||
independent of the encapsulation and is otherwise outside the scope | outside the scope of this document. | |||
of this document. | ||||
5.11. Flow entropy for ECMP | 5.11. Flow entropy for ECMP | |||
A major objective of using GUE is that a network device can perform | ||||
flow classification corresponding to the flow of the inner | ||||
encapsulated packet based on the contents of the outer headers. | ||||
5.11.1. Flow classification | 5.11.1. Flow classification | |||
A major objective of using GUE is that a network device can perform | When a packet is encapsulated with GUE and connection semantics are | |||
flow classification corresponding to the flow of the inner | not applied, the source port in the outer UDP packet is set to a flow | |||
encapsulated packet based on the contents in the outer headers. | entropy value that corresponds to the flow of the inner packet. When | |||
a device computes a five-tuple hash on the outer UDP/IP header of a | ||||
Hardware devices commonly perform hash computations on packet | GUE packet, the resultant value classifies the packet per its inner | |||
headers to classify packets into flows or flow buckets. Flow | flow. | |||
classification is done to support load balancing of flows across a | ||||
set of networking resources. Examples of such load balancing | ||||
techniques are Equal Cost Multipath routing (ECMP), port selection | ||||
in Link Aggregation, and NIC device Receive Side Scaling (RSS). | ||||
Hashes are usually either a three-tuple hash of IP protocol, source | ||||
address, and destination address; or a five-tuple hash consisting of | ||||
IP protocol, source address, destination address, source port, and | ||||
destination port. Typically, networking hardware will compute five- | ||||
tuple hashes for TCP and UDP, but only three-tuple hashes for other | ||||
IP protocols. Since the five-tuple hash provides more granularity, | ||||
load balancing can be finer-grained with better distribution. When a | ||||
packet is encapsulated with GUE and connection semantics are not | ||||
applied, the source port in the outer UDP packet is set to a flow | ||||
entropy value that corresponds to the flow of the inner packet. When | ||||
a device computes a five-tuple hash on the outer UDP/IP header of a | ||||
GUE packet, the resultant value classifies the packet per its inner | ||||
flow. | ||||
Examples of deriving flow entropy for encapsulation are: | Examples of deriving flow entropy for encapsulation are: | |||
o If the encapsulated packet is a layer 4 packet, TCP/IPv4 for | o If the encapsulated packet is a layer 4 packet, TCP/IPv4 for | |||
instance, the flow entropy could be based on the canonical five- | instance, the flow entropy could be based on the canonical five- | |||
tuple hash of the inner packet. | tuple hash of the inner packet. | |||
o If the encapsulated packet is an AH transport mode packet with | o If the encapsulated packet is an AH transport mode packet with | |||
TCP as next header, the flow entropy could be a hash over a | TCP as next header, the flow entropy could be a hash over a | |||
three-tuple: TCP protocol and TCP ports of the encapsulated | three-tuple: TCP protocol and TCP ports of the encapsulated | |||
packet. | packet. | |||
o If a node is encrypting a packet using ESP tunnel mode and GUE | o If a node is encrypting a packet using ESP tunnel mode and GUE | |||
encapsulation, the flow entropy could be based on the contents | encapsulation, the flow entropy could be based on the contents | |||
of the clear-text packet. For instance, a canonical five-tuple | of the clear-text packet. For instance, a canonical five-tuple | |||
hash for a TCP/IP packet could be used. | hash for a TCP/IP packet could be used. | |||
[RFC6438] discusses methods to compute and set flow entropy value for | [RFC6438] discusses methods to compute and set flow entropy value for | |||
IPv6 flow labels. Such methods can also be used to create flow | IPv6 flow labels, such methods can also be used to create flow | |||
entropy values for GUE. | entropy values for GUE. | |||
5.11.2. Flow entropy properties | 5.11.2. Flow entropy properties | |||
The flow entropy is the value set in the UDP source port of a GUE | The flow entropy is the value set in the UDP source port of a GUE | |||
packet. Flow entropy in the UDP source port SHOULD adhere to the | packet. Flow entropy in the UDP source port SHOULD adhere to the | |||
following properties: | following properties: | |||
o The value set in the source port is within the ephemeral port | o The value set in the source port is within the ephemeral port | |||
range (49152 to 65535 [RFC6335]). Since the high order two bits | range (49152 to 65535 [RFC6335]). Since the high order two bits | |||
skipping to change at page 25, line 16 ¶ | skipping to change at page 25, line 18 ¶ | |||
o Decapsulators, or any networking devices, SHOULD NOT attempt to | o Decapsulators, or any networking devices, SHOULD NOT attempt to | |||
interpret flow entropy as anything more than an opaque value. | interpret flow entropy as anything more than an opaque value. | |||
Neither should they attempt to reproduce the hash calculation | Neither should they attempt to reproduce the hash calculation | |||
used by an encapasulator in creating a flow entropy value. They | used by an encapasulator in creating a flow entropy value. They | |||
MAY use the value to match further receive packets for steering | MAY use the value to match further receive packets for steering | |||
decisions, but MUST NOT assume that the hash uniquely or | decisions, but MUST NOT assume that the hash uniquely or | |||
permanently identifies a flow. | permanently identifies a flow. | |||
o Input to the flow entropy calculation is not restricted to ports | o Input to the flow entropy calculation is not restricted to ports | |||
and addresses; input could include flow label from an IPv6 | and addresses; input could include the flow label from an IPv6 | |||
packet, SPI from an ESP packet, or other flow related state in | packet, SPI from an ESP packet, or other flow related state in | |||
the encapsulator that is not necessarily conveyed in the packet. | the encapsulator that is not necessarily conveyed in the packet. | |||
o The assignment function for flow entropy SHOULD be randomly | o The assignment function for flow entropy SHOULD be randomly | |||
seeded to mitigate denial of service attacks. The seed SHOULD be | seeded to mitigate denial of service attacks. The seed SHOULD be | |||
changed periodically. | changed periodically. | |||
5.12 Negotiation of acceptable flags and extension fields | 5.12. Negotiation of acceptable flags and extension fields | |||
An encapsulator and decapsulator need to achieve agreement about GUE | An encapsulator and decapsulator need to achieve agreement about GUE | |||
parameters that will be used in communications. Parameters include | parameters that will be used in communications. Parameters include | |||
supported GUE variants, flags and extension fields that can be used, | supported GUE variants, flags and extension fields that can be used, | |||
security algorithms and keys, supported protocols and control | security algorithms and keys, supported protocols and control | |||
messages, etc. This document proposes different general methods to | messages, etc. This document proposes different general methods to | |||
accomplish this, however the details of implementing these are | accomplish this, however the details of implementing these are | |||
considered out of scope. | considered out of scope. | |||
General methods for this are: | General methods for this are: | |||
o Configuration. The parameters used for a tunnel are configured | o Configuration. The parameters used for a tunnel are configured | |||
at each endpoint. | at each endpoint. | |||
o Negotiation. A tunnel negotiation can be performed. This could | o Negotiation. A tunnel negotiation can be performed. This could | |||
be accomplished in-band of GUE using control messages or private | be accomplished in-band of GUE using control messages. | |||
data. | ||||
o Via a control plane. Parameters for communicating with a tunnel | o Via a control plane. Parameters for communicating with a tunnel | |||
endpoint can be set in a control plane protocol (such as that | endpoint can be set in a control plane protocol (such as that | |||
needed for network virtualization). | needed for network virtualization). | |||
o Via security negotiation. Use of security typically implies a | o Via security negotiation. Use of security typically implies a | |||
key exchange between endpoints. Other GUE parameters may be | key exchange between endpoints. Other GUE parameters may be | |||
conveyed as part of that process. | conveyed as part of that process. | |||
6. Motivation for GUE | 6. Motivation for GUE | |||
This section presents the motivation for GUE with respect to other | This section provides the motivation for GUE with respect to other | |||
encapsulation methods. | encapsulation methods. | |||
6.1. Benefits of GUE | 6.1. Benefits of GUE | |||
* GUE is a generic encapsulation protocol. GUE can encapsulate | * GUE is a generic encapsulation protocol. GUE can encapsulate | |||
protocols that are represented by an IP protocol number. This | protocols that are represented by an IP protocol number. This | |||
includes layer 2, layer 3, and layer 4 protocols. | includes layer 2, layer 3, and layer 4 protocols. | |||
* GUE is an extensible encapsulation protocol. Standardized | * GUE is an extensible encapsulation protocol. Standardized | |||
optional data such as security, virtual networking identifiers, | optional data such as security, virtual networking identifiers, | |||
fragmentation are being defined. | fragmentation are defined. | |||
* For extensilbity, GUE uses flag fields as opposed to TLVs as | * For extensibility, GUE uses flag fields as opposed to TLVs as | |||
some other encapsulation protocols do. Flag fields are strictly | some other encapsulation protocols do. Flag fields are strictly | |||
ordered, allow random access, and are efficient in use of header | ordered, allow random access, and are efficient in use of header | |||
space. | space. | |||
* GUE allows private data to be sent as part of the encapsulation. | * GUE allows private data to be sent as part of the encapsulation. | |||
This permits experimentation or customization in deployment. | This permits experimentation or customization in deployment. | |||
* GUE allows sending of control messages such as OAM using the | * GUE allows sending of control messages such as OAM using the | |||
same GUE header format (for routing purposes) as normal data | same GUE header format (for routing purposes) as normal data | |||
messages. | messages. | |||
* GUE maximizes deliverability of non-UDP and non-TCP protocols. | * GUE maximizes deliverability of non-UDP and non-TCP protocols. | |||
* GUE provides a means for exposing per flow entropy for ECMP for | * GUE provides a means for exposing per flow entropy for ECMP for | |||
atypical protocols such as SCTP, DCCP, ESP, etc. | atypical protocols such as SCTP, DCCP, ESP, etc. | |||
6.2 Comparison of GUE to other encapsulations | 6.2. Comparison of GUE to other encapsulations | |||
A number of different encapsulation techniques have been proposed for | A number of different encapsulation techniques have been proposed for | |||
the encapsulation of one protocol over another. EtherIP [RFC3378] | the encapsulation of one protocol over another. EtherIP [RFC3378] | |||
provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], | provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], | |||
MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling | MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling | |||
layer 2 and layer 3 packets over IP. NVGRE [RFC7637] and VXLAN | layer 2 and layer 3 packets over IP. NVGRE [RFC7637] and VXLAN | |||
[RFC7348] are proposals for encapsulation of layer 2 packets for | [RFC7348] are proposals for encapsulation of layer 2 packets for | |||
network virtualization. IPIP [RFC2003] and Generic packet tunneling | network virtualization. IPIP [RFC2003] and Generic packet tunneling | |||
in IPv6 [RFC2473] provide methods for tunneling IP packets over IP. | in IPv6 [RFC2473] provide methods for tunneling IP packets over IP. | |||
skipping to change at page 27, line 13 ¶ | skipping to change at page 27, line 13 ¶ | |||
[RFC8086]. | [RFC8086]. | |||
GUE has the following discriminating features: | GUE has the following discriminating features: | |||
o UDP encapsulation leverages specialized network device | o UDP encapsulation leverages specialized network device | |||
processing for efficient transport. The semantics for using the | processing for efficient transport. The semantics for using the | |||
UDP source port for flow entropy as input to ECMP are defined in | UDP source port for flow entropy as input to ECMP are defined in | |||
section 5.11. | section 5.11. | |||
o GUE permits encapsulation of arbitrary IP protocols, which | o GUE permits encapsulation of arbitrary IP protocols, which | |||
includes layer 2 3, and 4 protocols. | includes layer 2, 3, and 4 protocols. | |||
o Multiple protocols can be multiplexed over a single UDP port | o Multiple protocols can be multiplexed over a single UDP port | |||
number. This is in contrast to techniques to encapsulate | number. This is in contrast to techniques to encapsulate | |||
protocols over UDP using a protocol specific port number (such | protocols over UDP using a protocol specific port number (such | |||
as ESP/UDP, GRE/UDP, SCTP/UDP). GUE provides a uniform and | as ESP/UDP, GRE/UDP, SCTP/UDP). GUE provides a uniform and | |||
extensible mechanism for encapsulating all IP protocols in UDP | extensible mechanism for encapsulating all IP protocols in UDP | |||
with minimal overhead (four bytes of additional header). | with minimal overhead (four bytes of additional header). | |||
o GUE is extensible. New flags and extension fields can be | o GUE is extensible. New flags and extension fields can be | |||
defined. | defined. | |||
skipping to change at page 27, line 37 ¶ | skipping to change at page 27, line 37 ¶ | |||
to parse the full encapsulation header. | to parse the full encapsulation header. | |||
o Private data in the encapsulation header allows local | o Private data in the encapsulation header allows local | |||
customization and experimentation while being compatible with | customization and experimentation while being compatible with | |||
processing in network nodes (routers and middleboxes). | processing in network nodes (routers and middleboxes). | |||
o GUE includes both data messages (encapsulation of packets) and | o GUE includes both data messages (encapsulation of packets) and | |||
control messages (such as OAM). | control messages (such as OAM). | |||
o The flags-field model facilitates efficient implementation of | o The flags-field model facilitates efficient implementation of | |||
extensibility in hardware. For instance, a TCAM can be use to | extensibility in hardware. For instance, a TCAM can be used to | |||
parse a known set of N flags where the number of entries in the | parse a known set of N flags where the number of entries in the | |||
TCAM is 2^N. By comparison, the number of TCAM entries needed to | TCAM is 2^N. By comparison, the number of TCAM entries needed to | |||
parse a set of N arbitrarily ordered TLVS is approximately e*N!. | parse a set of N arbitrarily ordered TLVs is approximately e*N!. | |||
o GUE includes a variant that encapsulates IPv4 and IPv6 packets | o GUE includes a variant that encapsulates IPv4 and IPv6 packets | |||
directly within UDP. | directly within UDP. | |||
7. Security Considerations | 7. Security Considerations | |||
There are two important considerations of security with respect to | There are two important considerations of security with respect to | |||
GUE. | GUE. | |||
o Authentication and integrity of the GUE header. | o Authentication and integrity of the GUE header. | |||
o Authentication, integrity, and confidentiality of the GUE | o Authentication, integrity, and confidentiality of the GUE | |||
payload. | payload. | |||
GUE security is provided by extensions for security defined in | GUE security is provided by extensions for security defined in | |||
[GUEEXTEN]. These extensions include methods to authenticate the GUE | [GUEEXTEN]. These extensions include methods to authenticate the GUE | |||
header and encrypt the GUE payload. | header and encrypt the GUE payload. | |||
The GUE header can be authenticated using a security extension for an | The GUE header can be authenticated using a security extension for an | |||
HMAC. Securing the GUE payload can be accomplished use of the GUE | HMAC (Hashed Message Authentication Code). Securing the GUE payload | |||
Payload Transform. This extension can be used to perform DTLS in the | can be accomplished use of the GUE Payload Transform extension. This | |||
payload of a GUE packet to encrypt the payload. | extension allows the use of DTLS (Datagram Transport Layer Security) | |||
to encrypt and authenticate the GUE payload. | ||||
A hash function for computing flow entropy (section 5.11) SHOULD be | A hash function for computing flow entropy (section 5.11) SHOULD be | |||
randomly seeded to mitigate some possible denial service attacks. | randomly seeded to mitigate some possible denial service attacks. | |||
8. IANA Considerations | 8. IANA Considerations | |||
8.1. UDP source port | 8.1. UDP source port | |||
A user UDP port number assignment for GUE has been assigned: | A user UDP port number assignment for GUE has been assigned: | |||
skipping to change at page 29, line 8 ¶ | skipping to change at page 29, line 8 ¶ | |||
Description: Generic UDP Encapsulation | Description: Generic UDP Encapsulation | |||
Reference: draft-herbert-gue | Reference: draft-herbert-gue | |||
Port Number: 6080 | Port Number: 6080 | |||
Service Code: N/A | Service Code: N/A | |||
Known Unauthorized Uses: N/A | Known Unauthorized Uses: N/A | |||
Assignment Notes: N/A | Assignment Notes: N/A | |||
8.2. GUE variant number | 8.2. GUE variant number | |||
IANA is requested to set up a registry for the GUE variant number. | IANA is requested to set up a registry for the GUE variant number. | |||
The GUE variant number is 2 bits containing four possible values. | The GUE variant number is two bits containing four possible values. | |||
This document defines version 0 and 1. New values are assigned in | This document defines variants 0 and 1. New values are assigned in | |||
accordance with RFC Required policy [RFC5226]. | accordance with RFC Required policy [RFC5226]. | |||
+----------------+----------------+---------------+ | +----------------+----------------+---------------+ | |||
| Variant number | Description | Reference | | | Variant number | Description | Reference | | |||
+----------------+----------------+---------------+ | +----------------+----------------+---------------+ | |||
| 0 | GUE Version 0 | This document | | | 0 | GUE Version 0 | This document | | |||
| | with header | | | | | with header | | | |||
| | | | | | | | | | |||
| 1 | GUE Version 0 | This document | | | 1 | GUE Version 0 | This document | | |||
| | with direct IP | | | | | with direct IP | | | |||
skipping to change at page 29, line 48 ¶ | skipping to change at page 29, line 48 ¶ | |||
| | | | | | | | | | |||
| 1..127 | Unassigned | | | | 1..127 | Unassigned | | | |||
| | | | | | | | | | |||
| 128..255 | User defined | This document | | | 128..255 | User defined | This document | | |||
+----------------+------------------+---------------+ | +----------------+------------------+---------------+ | |||
9. Acknowledgements | 9. Acknowledgements | |||
The authors would like to thank David Liu, Erik Nordmark, Fred | The authors would like to thank David Liu, Erik Nordmark, Fred | |||
Templin, Adrian Farrel, Bob Briscoe, and Murray Kucherawy for | Templin, Adrian Farrel, Bob Briscoe, and Murray Kucherawy for | |||
valuable input on this draft. | valuable input on this draft. Special thanks to Fred Templin who is | |||
serving as document shepherd. | ||||
10. References | 10. References | |||
10.1. Normative References | 10.1. Normative References | |||
[RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI | [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI | |||
10.17487/RFC0768, August 1980, <http://www.rfc- | 10.17487/RFC0768, August 1980, <http://www.rfc- | |||
editor.org/info/rfc768>. | editor.org/info/rfc768>. | |||
[RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - | [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage | |||
Communication Layers", STD 3, RFC 1122, DOI | Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, | |||
10.17487/RFC1122, October 1989, <http://www.rfc- | March 2017, <https://www.rfc-editor.org/info/rfc8085>. | |||
editor.org/info/rfc1122>. | ||||
[RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
IANA Considerations Section in RFCs", RFC 2434, DOI | Requirement Levels", BCP 14, RFC 2119, DOI | |||
10.17487/RFC2434, October 1998, <http://www.rfc- | 10.17487/RFC2119, March 1997, <https://www.rfc- | |||
editor.org/info/rfc2434>. | editor.org/info/rfc2119>. | |||
[RFC2983] Black, D., "Differentiated Services and Tunnels", RFC | [RFC2983] Black, D., "Differentiated Services and Tunnels", RFC | |||
2983, DOI 10.17487/RFC2983, October 2000, <http://www.rfc- | 2983, DOI 10.17487/RFC2983, October 2000, <http://www.rfc- | |||
editor.org/info/rfc2983>. | editor.org/info/rfc2983>. | |||
[RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion | [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion | |||
Notification", RFC 6040, DOI 10.17487/RFC6040, November | Notification", RFC 6040, DOI 10.17487/RFC6040, November | |||
2010, <http://www.rfc-editor.org/info/rfc6040>. | 2010, <http://www.rfc-editor.org/info/rfc6040>. | |||
[RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and | [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and | |||
UDP Checksums for Tunneled Packets", RFC 6935, DOI | UDP Checksums for Tunneled Packets", RFC 6935, DOI | |||
10.17487/RFC6935, April 2013, <http://www.rfc- | 10.17487/RFC6935, April 2013, <http://www.rfc- | |||
editor.org/info/rfc6935>. | editor.org/info/rfc6935>. | |||
[RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement | [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement | |||
for the Use of IPv6 UDP Datagrams with Zero Checksums", | for the Use of IPv6 UDP Datagrams with Zero Checksums", | |||
RFC 6936, DOI 10.17487/RFC6936, April 2013, | RFC 6936, DOI 10.17487/RFC6936, April 2013, | |||
<http://www.rfc-editor.org/info/rfc6936>. | <http://www.rfc-editor.org/info/rfc6936>. | |||
[RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - | ||||
Communication Layers", STD 3, RFC 1122, DOI | ||||
10.17487/RFC1122, October 1989, <http://www.rfc- | ||||
editor.org/info/rfc1122>. | ||||
[RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- | [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- | |||
Network Tunneling", RFC 4459, DOI 10.17487/RFC4459, April | Network Tunneling", RFC 4459, DOI 10.17487/RFC4459, April | |||
2006, <http://www.rfc-editor.org/info/rfc4459>. | 2006, <http://www.rfc-editor.org/info/rfc4459>. | |||
10.2. Informative References | [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. | |||
Cheshire, "Internet Assigned Numbers Authority (IANA) | ||||
[RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., | Procedures for the Management of the Service Name and | |||
and G. Fairhurst, Ed., "The Lightweight User Datagram | Transport Protocol Port Number Registry", BCP 165, RFC | |||
Protocol (UDP-Lite)", RFC 3828, July 2004, | 6335, DOI 10.17487/RFC6335, August 2011, <https://www.rfc- | |||
<http://www.rfc-editor.org/info/rfc3828>. | editor.org/info/rfc6335>. | |||
[RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, | [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an | |||
L., Sridhar, T., Bursell, M., and C. Wright, "Virtual | IANA Considerations Section in RFCs", RFC 5226, DOI | |||
eXtensible Local Area Network (VXLAN): A Framework for | 10.17487/RFC5226, May 2008, <https://www.rfc- | |||
Overlaying Virtualized Layer 2 Networks over Layer 3 | editor.org/info/rfc5226>. | |||
Networks", RFC 7348, August 2014, <http://www.rfc- | ||||
editor.org/info/rfc7348>. | ||||
[RFC7605] Touch, J., "Recommendations on Using Assigned Transport | [GUEEXTEN] Herbert, T., Yong, L., and Templin, F., "Extensions for | |||
Port Numbers", BCP 165, RFC 7605, DOI 10.17487/RFC7605, | Generic UDP Encapsulation", draft-herbert-gue-extensions- | |||
August 2015, <http://www.rfc-editor.org/info/rfc7605>. | 06 | |||
[RFC7637] Garg, P., Ed., and Y. Wang, Ed., "NVGRE: Network | 10.2. Informative References | |||
Virtualization Using Generic Routing Encapsulation", RFC | ||||
7637, DOI 10.17487/RFC7637, September 2015, | ||||
<http://www.rfc-editor.org/info/rfc7637>. | ||||
[RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- | [RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- | |||
in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, | in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, | |||
March 2017, <http://www.rfc-editor.org/info/rfc8086>. | March 2017, <http://www.rfc-editor.org/info/rfc8086>. | |||
[RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, | [RFC7605] Touch, J., "Recommendations on Using Assigned Transport | |||
"Encapsulating MPLS in UDP", RFC 7510, DOI | Port Numbers", BCP 165, RFC 7605, DOI 10.17487/RFC7605, | |||
10.17487/RFC7510, April 2015, <http://www.rfc- | August 2015, <https://www.rfc-editor.org/info/rfc7605>. | |||
editor.org/info/rfc7510>. | ||||
[RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram | ||||
Congestion Control Protocol (DCCP)", RFC 4340, DOI | ||||
10.17487/RFC4340, March 2006, <http://www.rfc- | ||||
editor.org/info/rfc4340>. | ||||
[RFC4787] Audet, F., Ed., and C. Jennings, "Network Address | [RFC4787] Audet, F., Ed., and C. Jennings, "Network Address | |||
Translation (NAT) Behavioral Requirements for Unicast | Translation (NAT) Behavioral Requirements for Unicast | |||
UDP", BCP 127, RFC 4787, DOI 10.17487/RFC4787, January | UDP", BCP 127, RFC 4787, DOI 10.17487/RFC4787, January | |||
2007, <http://www.rfc-editor.org/info/rfc4787>. | 2007, <http://www.rfc-editor.org/info/rfc4787>. | |||
[RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, | [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, | |||
"Session Traversal Utilities for NAT (STUN)", RFC 5389, | "Session Traversal Utilities for NAT (STUN)", RFC 5389, | |||
DOI 10.17487/RFC5389, October 2008, <http://www.rfc- | DOI 10.17487/RFC5389, October 2008, <http://www.rfc- | |||
editor.org/info/rfc5389>. | editor.org/info/rfc5389>. | |||
[RFC5285] Rosenberg, J., "Interactive Connectivity Establishment | [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment | |||
(ICE): A Protocol for Network Address Translator (NAT) | (ICE): A Protocol for Network Address Translator (NAT) | |||
Traversal for Offer/Answer Protocols", RFC 5245, DOI | Traversal for Offer/Answer Protocols", RFC 5245, DOI | |||
10.17487/RFC5245, April 2010, <http://www.rfc- | 10.17487/RFC5245, April 2010, <http://www.rfc- | |||
editor.org/info/rfc5245>. | editor.org/info/rfc5245>. | |||
[RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines | [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", BCP | |||
for Application Designers", BCP 145, RFC 5405, DOI | 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, | |||
10.17487/RFC5405, November 2008, <http://www.rfc- | <https://www.rfc-editor.org/info/rfc8084>. | |||
editor.org/info/rfc5405>. | ||||
[RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label | [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label | |||
for Equal Cost Multipath Routing and Link Aggregation in | for Equal Cost Multipath Routing and Link Aggregation in | |||
Tunnels", RFC 6438, DOI 10.17487/RFC6438, November 2011, | Tunnels", RFC 6438, DOI 10.17487/RFC6438, November 2011, | |||
<http://www.rfc-editor.org/info/rfc6438>. | <http://www.rfc-editor.org/info/rfc6438>. | |||
[RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, DOI | ||||
10.17487/RFC2003, October 1996, <http://www.rfc- | ||||
editor.org/info/rfc2003>. | ||||
[RFC3948] Huttunen, A., Swander, B., Volpe, V., DiBurro, L., and M. | ||||
Stenberg, "UDP Encapsulation of IPsec ESP Packets", RFC | ||||
3948, DOI 10.17487/RFC3948, January 2005, <http://www.rfc- | ||||
editor.org/info/rfc3948>. | ||||
[RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The | ||||
Locator/ID Separation Protocol (LISP)", RFC 6830, DOI | ||||
10.17487/RFC6830, January 2013, <http://www.rfc- | ||||
editor.org/info/rfc6830>. | ||||
[RFC3378] Housley, R. and S. Hollenbeck, "EtherIP: Tunneling | [RFC3378] Housley, R. and S. Hollenbeck, "EtherIP: Tunneling | |||
Ethernet Frames in IP Datagrams", RFC 3378, DOI | Ethernet Frames in IP Datagrams", RFC 3378, DOI | |||
10.17487/RFC3378, September 2002, <http://www.rfc- | 10.17487/RFC3378, September 2002, <http://www.rfc- | |||
editor.org/info/rfc3378>. | editor.org/info/rfc3378>. | |||
[RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. | [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. | |||
Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, | Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, | |||
DOI 10.17487/RFC2784, March 2000, <http://www.rfc- | DOI 10.17487/RFC2784, March 2000, <http://www.rfc- | |||
editor.org/info/rfc2784>. | editor.org/info/rfc2784>. | |||
[RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., | [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., | |||
"Encapsulating MPLS in IP or Generic Routing Encapsulation | "Encapsulating MPLS in IP or Generic Routing Encapsulation | |||
(GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, | (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, | |||
<http://www.rfc-editor.org/info/rfc4023>. | <http://www.rfc-editor.org/info/rfc4023>. | |||
[RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, | [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, | |||
G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", | G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", | |||
RFC 2661, DOI 10.17487/RFC2661, August 1999, | RFC 2661, DOI 10.17487/RFC2661, August 1999, | |||
<http://www.rfc-editor.org/info/rfc2661>. | <http://www.rfc-editor.org/info/rfc2661>. | |||
[RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", BCP | [RFC7637] Garg, P., Ed., and Y. Wang, Ed., "NVGRE: Network | |||
208, RFC 8084, DOI 10.17487/RFC8084, March 2017, | Virtualization Using Generic Routing Encapsulation", RFC | |||
<https://www.rfc-editor.org/info/rfc8084>. | 7637, DOI 10.17487/RFC7637, September 2015, | |||
<https://www.rfc-editor.org/info/rfc7637>. | ||||
[GUEEXTEN] Herbert, T., Yong, L., and Templin, F., "Extensions for | [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, | |||
Generic UDP Encapsulation" draft-herbert-gue-extensions-00 | L., Sridhar, T., Bursell, M., and C. Wright, "Virtual | |||
eXtensible Local Area Network (VXLAN): A Framework for | ||||
Overlaying Virtualized Layer 2 Networks over Layer 3 | ||||
Networks", RFC 7348, August 2014, <http://www.rfc- | ||||
editor.org/info/rfc7348>. | ||||
[GUE4NVO3] Yong, L., Herbert, T., Zia, O., "Generic UDP Encapsulation | [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, DOI | |||
(GUE) for Network Virtualization Overlay" draft-hy-nvo3- | 10.17487/RFC2003, October 1996, <http://www.rfc- | |||
gue-4-nvo-03 | editor.org/info/rfc2003>. | |||
[GUESEC] Yong, L., Herbert, T., "Generic UDP Encapsulation (GUE) | [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in | |||
for Secure Transport" draft-hy-gue-4-secure-transport-03 | IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, | |||
December 1998, <https://www.rfc-editor.org/info/rfc2473>. | ||||
[RFC3948] Huttunen, A., Swander, B., Volpe, V., DiBurro, L., and M. | ||||
Stenberg, "UDP Encapsulation of IPsec ESP Packets", RFC | ||||
3948, DOI 10.17487/RFC3948, January 2005, <http://www.rfc- | ||||
editor.org/info/rfc3948>. | ||||
[RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The | ||||
Locator/ID Separation Protocol (LISP)", RFC 6830, DOI | ||||
10.17487/RFC6830, January 2013, <http://www.rfc- | ||||
editor.org/info/rfc6830>. | ||||
[RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, | ||||
"Encapsulating MPLS in UDP", RFC 7510, DOI | ||||
10.17487/RFC7510, April 2015, <http://www.rfc- | ||||
editor.org/info/rfc7510>. | ||||
[IANA-PN] IANA, "Protocol Numbers", | ||||
<https://www.iana.org/assignments/protocol-numbers>. | ||||
[TCPUDP] Chesire, S., Graessley, J., and McGuire, R., | [TCPUDP] Chesire, S., Graessley, J., and McGuire, R., | |||
"Encapsulation of TCP and other Transport Protocols over | "Encapsulation of TCP and other Transport Protocols over | |||
UDP" draft-cheshire-tcp-over-udp-00 | UDP", draft-cheshire-tcp-over-udp-00 | |||
[TOU] Herbert, T., "Transport layer protocols over UDP" draft- | ||||
herbert-transports-over-udp-00 | ||||
[GENEVE] Gross, J., Ed., Ganga, I. Ed., and Sridhar, T., "Geneve: | [GENEVE] Gross, J., Ed., Ganga, I. Ed., and Sridhar, T., "Geneve: | |||
Generic Network Virtualization Encapsulation", draft-ietf- | Generic Network Virtualization Encapsulation", draft-ietf- | |||
nvo3-geneve-05 | nvo3-geneve-10 | |||
[LCO] Cree, E., https://www.kernel.org/doc/Documentation/ | [UDPENCAP] Herbert, T., "UDP Encapsulation in Linux", | |||
networking/checksum-offloads.txt | <http://people.netfilter.org/pablo/netdev0.1/papers/UDP- | |||
Encapsulation-in-Linux.pdf> | ||||
[MULTIQ] Herbert, T. and de Bruijn, W., "Scaling in the Linux | ||||
Networking Stack", <https://www.kernel.org/doc/ | ||||
Documentation/networking/scaling.txt> | ||||
[CSUMOFF] Cree, E., "Checksum Offloads in the Linux Networking | ||||
Stack", <https://www.kernel.org/doc/Documentation/ | ||||
networking/checksum-offloads.txt> | ||||
[SEGOFF] Duyck, A., "Segmentation Offloads in the Linux Networking | ||||
Stack", <https://www.kernel.org/doc/ | ||||
Documentation/networking/segmentation-offloads.txt> | ||||
Appendix A: NIC processing for GUE | Appendix A: NIC processing for GUE | |||
This appendix is informational and does not constitute a normative | ||||
part of this document. | ||||
This appendix provides some guidelines for Network Interface Cards | This appendix provides some guidelines for Network Interface Cards | |||
(NICs) to implement common offloads and accelerations to support GUE. | (NICs) to implement common offloads and accelerations to support GUE. | |||
Note that most of this discussion is generally applicable to other | Note that most of this discussion is generally applicable to other | |||
methods of UDP based encapsulation. | methods of UDP based encapsulation. An overview of UDP based | |||
encapsulation and acceleration is in [UDPENCAP] | ||||
A.1. Receive multi-queue | A.1. Receive multi-queue | |||
Contemporary NICs support multiple receive descriptor queues (multi- | Contemporary NICs support multiple receive descriptor queues (multi- | |||
queue). Multi-queue enables load balancing of network processing for | queue) [MUTLIQ]. Multi-queue enables load balancing of network | |||
a NIC across multiple CPUs. On packet reception, a NIC selects the | processing for a NIC across multiple CPUs. On packet reception, a NIC | |||
appropriate queue for host processing. Receive Side Scaling is a | selects an appropriate queue for host processing. Receive Side | |||
common method which uses the flow hash for a packet to index an | Scaling (RSS) is a common method which uses the flow hash for a | |||
indirection table where each entry stores a queue number. Flow | packet to index an indirection table where each entry stores a queue | |||
Director and Accelerated Receive Flow Steering (aRFS) allow a host to | number. Flow Director and Accelerated Receive Flow Steering (aRFS) | |||
program the queue that is used for a given flow which is identified | allow a host to program the queue that is used for a given flow which | |||
either by an explicit five-tuple or by the flow's hash. | is identified either by an explicit five-tuple or by the flow's hash. | |||
GUE encapsulation is compatible with multi-queue NICs that support | GUE encapsulation is compatible with multi-queue NICs that support | |||
five-tuple hash calculation for UDP/IP packets as input to RSS. The | five-tuple hash calculation for UDP/IP packets as input to RSS. The | |||
flow entropy in the UDP source port ensures classification of the | flow entropy in the UDP source port ensures classification of the | |||
encapsulated flow even in the case that the outer source and | encapsulated flow even in the case that the outer source and | |||
destination addresses are the same for all flows (e.g. all flows are | destination addresses are the same for all flows (e.g. all flows are | |||
going over a single tunnel). | going over a single tunnel). | |||
By default, UDP RSS support is often disabled in NICs to avoid out- | By default, UDP RSS support is often disabled in NICs to avoid out- | |||
of-order reception that can occur when UDP packets are fragmented. As | of-order reception that can occur when UDP packets are fragmented. As | |||
discussed above, fragmentation of GUE packets is mostly avoided by | discussed is section 5.8, fragmentation of GUE packets is mostly | |||
fragmenting packets before entering a tunnel, GUE fragmentation, path | avoided by fragmenting packets before entering a tunnel, GUE | |||
MTU discovery in higher layer protocols, or operator adjusting MTUs. | fragmentation, path MTU discovery in higher layer protocols, or | |||
Other UDP traffic might not implement such procedures to avoid | operator adjusting MTUs. Other UDP traffic might not implement such | |||
fragmentation, so enabling UDP RSS support in the NIC might be a | procedures to avoid fragmentation, so enabling UDP RSS support in the | |||
considered tradeoff during configuration. | NIC might be a considered tradeoff during configuration. | |||
A.2. Checksum offload | A.2. Checksum offload | |||
Many NICs provide capabilities to calculate standard ones complement | Many NICs provide capabilities to calculate the standard ones | |||
payload checksum for packets in transmit or receive. When using GUE | complement checksum for packets in transmit or receive [CSUMOFF]. | |||
encapsulation, there are at least two checksums that are of interest: | When using GUE encapsulation, there are at least two checksums that | |||
the encapsulated packet's transport checksum, and the UDP checksum in | are of interest: the encapsulated packet's transport checksum, and | |||
the outer header. | the UDP checksum in the outer header. | |||
A.2.1. Transmit checksum offload | A.2.1. Transmit checksum offload | |||
NICs can provide a protocol agnostic method to offload transmit | NICs can provide a protocol agnostic method to offload the transmit | |||
checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with | checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with | |||
GUE. In this method, the host provides checksum related parameters in | GUE. In this method, the host provides checksum related parameters in | |||
a transmit descriptor for a packet. These parameters include the | a transmit descriptor for a packet. These parameters include the | |||
starting offset of data to checksum, the length of data to checksum, | starting offset of data to checksum, the length of data to checksum, | |||
and the offset in the packet where the computed checksum is to be | and the offset in the packet where the computed checksum is to be | |||
written. The host initializes the checksum field to pseudo header | written. The host initializes the checksum field to a pseudo header | |||
checksum. | checksum. | |||
In the case of GUE, the checksum for an encapsulated transport layer | In the case of GUE, the checksum for an encapsulated transport layer | |||
packet, a TCP packet for instance, can be offloaded by setting the | packet, a TCP packet for instance, can be offloaded by setting the | |||
appropriate checksum parameters. | appropriate checksum parameters. | |||
NICs typically can offload only one transmit checksum per packet, so | NICs typically can offload only one transmit checksum per packet, so | |||
simultaneously offloading both an inner transport packet's checksum | simultaneously offloading both an inner transport packet's checksum | |||
and the outer UDP checksum is likely not possible. | and the outer UDP checksum is likely not possible. | |||
If an encapsulator is co-resident with a host, then checksum offload | If an encapsulator is co-resident with a host, then checksum offload | |||
may be performed using remote checksum offload (described in | may be performed using remote checksum offload (RCO)[GUEEXTEN]. | |||
[GUEEXTEN]). Remote checksum offload relies on NIC offload of the | Remote checksum offload relies on NIC offload of the simple UDP/IP | |||
simple UDP/IP checksum which is commonly supported even in legacy | checksum which is commonly supported even in legacy devices. In | |||
devices. In remote checksum offload, the outer UDP checksum is set | remote checksum offload, the outer UDP checksum is set and the GUE | |||
and the GUE header includes an option indicating the start and offset | header includes an option indicating the start and offset of the | |||
of the inner "offloaded" checksum. The inner checksum is initialized | inner "offloaded" checksum. The inner checksum is initialized to the | |||
to the pseudo header checksum. When a decapsulator receives a GUE | pseudo header checksum. When a decapsulator receives a GUE packet | |||
packet with the remote checksum offload option, it completes the | with the remote checksum offload option, it completes the offload | |||
offload operation by determining the packet checksum from the | operation by determining the packet checksum from the indicated start | |||
indicated start point to the end of the packet, and then adds this | point to the end of the packet, and then adds this into the checksum | |||
into the checksum field at the offset given in the option. Computing | field at the offset given in the option. Computing the checksum from | |||
the checksum from the start to end of packet is efficient if | the start to end of packet is efficient if checksum-complete is | |||
checksum-complete is provided on the receiver. | provided on the receiver. | |||
Another alternative when an encapsulator is co-resident with a host | Another alternative when an encapsulator is co-resident with a host | |||
is to perform Local Checksum Offload [LCO]. In this method, the inner | is to perform Local Checksum Offload (LCO) [CSUMOFF]. In this method, | |||
transport layer checksum is offloaded and the outer UDP checksum can | the inner transport layer checksum is offloaded and the outer UDP | |||
be deduced based on the fact that the portion of the packet covered | checksum can be deduced based on the fact that the portion of the | |||
by the inner transport checksum will sum to zero (or at least the bit | packet covered by the inner transport checksum will sum to zero or at | |||
wise "not" of the inner pseudo header). | least the bitwise "not" of the inner pseudo header. | |||
A.2.2. Receive checksum offload | A.2.2. Receive checksum offload | |||
GUE is compatible with NICs that perform a protocol agnostic receive | GUE is compatible with NICs that perform a protocol agnostic receive | |||
checksum (CHECKSUM_COMPLETE in Linux parlance). In this technique, a | checksum (CHECKSUM_COMPLETE in Linux parlance). In this technique, a | |||
NIC computes a ones complement checksum over all (or some predefined | NIC computes a ones complement checksum over all (or some predefined | |||
portion) of a packet. The computed value is provided to the host | portion) of a packet. The computed value is provided to the host | |||
stack in the packet's receive descriptor. The host driver can use | stack in the packet's receive descriptor. The host driver can use | |||
this checksum to "patch up" and validate any inner packet transport | this checksum to "patch up" and validate any inner packet transport | |||
checksum, as well as the outer UDP checksum if it is non-zero. | checksums, as well as the outer UDP checksum if it is non-zero. | |||
Many legacy NICs don't provide checksum-complete but instead provide | Many legacy NICs don't provide checksum-complete but instead provide | |||
an indication that a checksum has been verified (CHECKSUM_UNNECESSARY | an indication that a checksum has been verified (CHECKSUM_UNNECESSARY | |||
in Linux). Usually, such validation is only done for simple TCP/IP or | in Linux). Usually, such validation is only done for simple TCP/IP or | |||
UDP/IP packets. If a NIC indicates that a UDP checksum is valid, the | UDP/IP packets. If a NIC indicates that a UDP checksum is valid, the | |||
checksum-complete value for the UDP packet is the "not" of the pseudo | checksum-complete value for the UDP packet is the bitwise "not" of | |||
header checksum. In this way, checksum-unnecessary can be converted | the pseudo header checksum. In this way, checksum-unnecessary can be | |||
to checksum-complete. So, if the NIC provides checksum-unnecessary | converted to checksum-complete. So, if the NIC provides checksum- | |||
for the outer UDP header in an encapsulation, checksum conversion can | unnecessary for the outer UDP header in an encapsulation, checksum | |||
be done so that the checksum-complete value is derived and can be | conversion can be done so that the checksum-complete value is derived | |||
used by the stack to validate checksums in the encapsulated packet. | and can be used by the stack to validate checksums in the | |||
encapsulated packet. | ||||
A.3. Transmit Segmentation Offload | A.3. Transmit Segmentation Offload | |||
Transmit Segmentation Offload (TSO) is a NIC feature where a host | Transmit Segmentation Offload (TSO) [SEGOFF] is a NIC feature where a | |||
provides a large (>MTU size) TCP packet to the NIC, which in turn | host provides a large (>MTU size) TCP packet to the NIC, which in | |||
splits the packet into separate segments and transmits each one. This | turn splits the packet into separate segments and transmits each one. | |||
is useful to reduce CPU load on the host. | This is useful to reduce CPU load on the host. | |||
The process of TSO can be generalized as: | The process of TSO can be generalized as: | |||
- Split the TCP payload into segments which allow packets with | - Split the TCP payload into segments of size less than or equal | |||
size less than or equal to MTU. | to MTU. | |||
- For each created segment: | - For each created segment: | |||
1. Replicate the TCP header and all preceding headers of the | 1. Replicate the TCP header and all preceding headers of the | |||
original packet. | original packet. | |||
2. Set payload length fields in any headers to reflect the | 2. Set payload length fields in any headers to reflect the | |||
length of the segment. | length of the segment. | |||
3. Set TCP sequence number to correctly reflect the offset of | 3. Set TCP sequence number to correctly reflect the offset of | |||
skipping to change at page 36, line 31 ¶ | skipping to change at page 37, line 16 ¶ | |||
To facilitate TSO with GUE, it is recommended that extension fields | To facilitate TSO with GUE, it is recommended that extension fields | |||
do not contain values that need to be updated on a per segment basis. | do not contain values that need to be updated on a per segment basis. | |||
For example, extension fields should not include checksums, lengths, | For example, extension fields should not include checksums, lengths, | |||
or sequence numbers that refer to the payload. If the GUE header does | or sequence numbers that refer to the payload. If the GUE header does | |||
not contain such fields then the TSO engine only needs to copy the | not contain such fields then the TSO engine only needs to copy the | |||
bits in the GUE header when creating each segment and does not need | bits in the GUE header when creating each segment and does not need | |||
to parse the GUE header. | to parse the GUE header. | |||
A.4. Large Receive Offload | A.4. Large Receive Offload | |||
Large Receive Offload (LRO) is a NIC feature where packets of a TCP | Large Receive Offload (LRO) [SEGOFF] is a NIC feature where packets | |||
connection are reassembled, or coalesced, in the NIC and delivered to | of a TCP connection are reassembled, or coalesced, in the NIC and | |||
the host as one large packet. This feature can reduce CPU utilization | delivered to the host as one large packet. This feature can reduce | |||
in the host. | CPU utilization in the host. | |||
LRO requires significant protocol awareness to be implemented | LRO requires significant protocol awareness to be implemented | |||
correctly and is difficult to generalize. Packets in the same flow | correctly and is difficult to generalize. Packets in the same flow | |||
need to be unambiguously identified. In the presence of tunnels or | need to be unambiguously identified. In the presence of tunnels or | |||
network virtualization, this may require more than a five-tuple match | network virtualization, this may require more than a five-tuple match | |||
(for instance packets for flows in two different virtual networks may | (for instance packets for flows in two different virtual networks may | |||
have identical five-tuples). Additionally, a NIC needs to perform | have identical five-tuples). Additionally, a NIC needs to perform | |||
validation over packets that are being coalesced, and needs to | validation over packets that are being coalesced, and needs to | |||
fabricate a single meaningful header from all the coalesced packets. | fabricate a single meaningful header from all the coalesced packets. | |||
skipping to change at page 37, line 35 ¶ | skipping to change at page 38, line 21 ¶ | |||
Assuming that networking switches perform ECMP based on the flow | Assuming that networking switches perform ECMP based on the flow | |||
hash, a sender can affect the path by altering the flow entropy. For | hash, a sender can affect the path by altering the flow entropy. For | |||
instance, a host can store a flow hash in its protocol control block | instance, a host can store a flow hash in its protocol control block | |||
(PCB) for an inner flow, and might alter the value upon detecting | (PCB) for an inner flow, and might alter the value upon detecting | |||
that packets are traversing a lossy path. Changing the flow entropy | that packets are traversing a lossy path. Changing the flow entropy | |||
for a flow SHOULD be subject to hysteresis (at most once every thirty | for a flow SHOULD be subject to hysteresis (at most once every thirty | |||
seconds) to limit the number of out of order packets. | seconds) to limit the number of out of order packets. | |||
B.3. Hardware protocol implementation considerations | B.3. Hardware protocol implementation considerations | |||
Low level data path protocols, such is GUE, are often supported in | Low level data path protocols, such as GUE, are often supported in | |||
high speed network device hardware. Variable length header (VLH) | high speed network device hardware. Variable length header (VLH) | |||
protocols like GUE are often considered difficult to efficiently | protocols like GUE are sometimes considered difficult to efficiently | |||
implement in hardware. In order to retain the important | implement in hardware. In order to retain the important | |||
characteristics of an extensible and robust protocol, hardware | characteristics of an extensible and robust protocol, hardware | |||
vendors may practice "constrained flexibility". In this model, only | vendors may practice "constrained flexibility". In this model, only | |||
certain combinations or protocol header parameterizations are | certain combinations or protocol header parameterizations are | |||
implemented in hardware fast path. Each such parameterization is | implemented in the hardware fast path. Each such parameterization is | |||
fixed length so that the particular instance can be optimized as a | fixed length so that the particular instance can be optimized as a | |||
fixed length protocol. In the case of GUE this constitutes specific | fixed length protocol. In the case of GUE, this constitutes specific | |||
combinations of GUE flags, fields, and next protocol. The selected | combinations of GUE flags, fields, and next protocol. The selected | |||
combinations would naturally be the most common cases which form the | combinations would naturally be the most common cases which form the | |||
"fast path", and other combinations are assumed to take the "slow | "fast path", and other combinations are assumed to take the "slow | |||
path". | path". | |||
In time, needs and requirements of the protocol may change which may | In time, the needs and requirements of a protocol may change which | |||
manifest themselves as new parameterizations to be supported in the | may manifest themselves as new parameterizations to be supported in | |||
fast path. To allow this extensibility, a device practicing | the fast path. To allow this extensibility, a device practicing | |||
constrained flexibility should allow the fast path parameterizations | constrained flexibility should allow fast path parameterizations to | |||
to be programmable. | be programmable. | |||
Authors' Addresses | Authors' Addresses | |||
Tom Herbert | Tom Herbert | |||
Quantonium | Quantonium | |||
4701 Patrick Henry | 4701 Patrick Henry | |||
Santa Clara, CA 95054 | Santa Clara, CA 95054 | |||
US | US | |||
Email: tom@herbertland.com | Email: tom@herbertland.com | |||
Lucy Yong | Lucy Yong | |||
Huawei USA | Independent | |||
5340 Legacy Dr. | Austin, TX | |||
Plano, TX 75024 | ||||
US | US | |||
Email: lucy.yong@huawei.com | ||||
Osama Zia | Osama Zia | |||
Microsoft | Microsoft | |||
1 Microsoft Way | 1 Microsoft Way | |||
Redmond, WA 98029 | Redmond, WA 98029 | |||
US | US | |||
Email: osamaz@microsoft.com | Email: osamaz@microsoft.com | |||
End of changes. 132 change blocks. | ||||
477 lines changed or deleted | 490 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |