draft-ietf-ipngwg-esd-analysis-01.txt   draft-ietf-ipngwg-esd-analysis-02.txt 
INTERNET-DRAFT Matt Crawford INTERNET-DRAFT Matt Crawford
Fermilab Fermilab
<draft-ietf-ipngwg-esd-analysis-01.txt> Allison Mankin <draft-ietf-ipngwg-esd-analysis-02.txt> Allison Mankin
ISI ISI
Thomas Narten Thomas Narten
IBM IBM
John W. Stewart, III John W. Stewart, III
ISI Juniper
Lixia Zhang Lixia Zhang
UCLA UCLA
July 30, 1997 March 13, 1998
Separating Identifiers and Locators in Addresses: Separating Identifiers and Locators in Addresses:
An Analysis of the GSE Proposal for IPv6 An Analysis of the GSE Proposal for IPv6
<draft-ietf-ipngwg-esd-analysis-01.txt> <draft-ietf-ipngwg-esd-analysis-02.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
skipping to change at page 1, line 40 skipping to change at page 1, line 40
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
To learn the current status of any Internet-Draft, please check the To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ds.internic.net (US East Coast), nic.nordu.net Directories on ds.internic.net (US East Coast), nic.nordu.net
(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
Rim). Rim).
Distribution of this memo is unlimited. Distribution of this memo is unlimited.
This Internet Draft expires January 30, 1997. This Internet-Draft expires May 7, 1998.
Abstract Abstract
On February 27-28, 1997, the IPng Working Group held an interim On February 27-28, 1997, the IPng Working Group held an interim
meeting in Palo Alto, California to consider adopting Mike O'Dell's meeting in Palo Alto, California to consider adopting Mike O'Dell's
'GSE - An Alternate Addressing Architecture for IPv6' proposal [GSE]. "GSE - An Alternate Addressing Architecture for IPv6" proposal [GSE].
In GSE, 16-byte IPv6 addresses are split into three portions: a In GSE, 16-byte IPv6 addresses are split into distinct portions for
globally unique End System Designator (ESD), a Site Topology global routing, local routing and end-point identification. GSE
Partition (STP) and a Routing Goop (RG) portion. The STP corresponds includes the feature of configuring a node internal to a site with
(roughly) to a site's subnet portion of an IPv4 address, whereas the only the local routing and end-point identfication portions of the
RG identifies the attachment point to the public Internet. Routers address, thus hiding the full address from the node. When such a node
use the RG+STP portions of addresses (called 'Routing Stuff' in this generates a packet, only the low-order bytes of the source address
document) to route packets to the link to which the destination is are specified; the high-order bytes of the address are filled in by a
directly attached; the ESD is used to deliver the packet across the border router when the packet leaves the site.
last hop link. An important idea in GSE is that nodes within a site
do not know the RG portion of their addresses. A border router at the
site's Internet connect point would dynamically replace the RG part
of source addresses of all outgoing IP datagrams and the RG part of
destination addresses on incoming traffic.
This document provides a detailed analysis of the GSE plan. Much of There is a long history of a vague assertion in certain circles that
the analysis presented here is an expansion of official meeting IPv4 "got it wrong" by treating its addresses simultaneously as
minutes, though it also includes issues uncovered by the authors in locators and identifiers. Despite these claims, however, there was
the process of fully fleshing out the analysis. In summary, the never a complete proposal for a scaleable network protocol which
working group eventually decided that the full addresses of nodes separated the functions. As a result, it wasn't possible to do a
within a site should not be hidden from those nodes, so as a result serious analysis comparing and contrasting a "separated" architecture
it is not necessary for routers to rewrite the Routing Goop portion and an "overloaded" architecture. The GSE proposal serves as a
of addresses. However, other parts of the GSE plan were adopted vehicle for just such an analysis, and that is the purpose of this
(e.g., having 64-bit interface identifiers with an option for paper.
specifying them as globally unique and easing the renumbering of the
high-order portion of addresses within DNS).
In addition to analyzing the GSE proposal in particular, the document We conclude that an architecture that clearly separates locators and
also studies the general issue of separating network layer addresses indentifiers in addresses introduces new issues and problems that do
into two separate values satisfying location and identification not have an easy or clear solution. Indeed, the alleged disadvantages
purposes, respectively. of overloading addresses turn out to provide some significant
benefits over the non-overloaded approach.
Contents Contents
Status of this Memo.......................................... 1 Status of this Memo.......................................... 1
1. Introduction............................................. 4 1. Introduction............................................. 3
2. Addressing and Routing in IPv4........................... 5 2. Definitions and Terminology.............................. 4
2.1. The Need for Aggregation............................ 7
2.2. The Pre-CIDR Internet............................... 7
2.3. CIDR and Provider-Based Addressing.................. 8
2.4. Multi-Homing and Aggregation........................ 11
3. GSE Background........................................... 14 3. Addressing and Routing in IPv4........................... 5
3.1. Motivation For GSE.................................. 14 3.1. The Need for Aggregation............................ 7
3.2. GSE Address Format.................................. 15 3.2. The Pre-CIDR Internet............................... 7
3.3. Routing Stuff (RG and STP).......................... 15 3.3. CIDR and Provider-Based Addressing.................. 8
3.4. End-System Designator............................... 17 3.4. Multi-Homing and Aggregation........................ 12
3.5. Address Rewriting by Border Routers................. 18
3.6. Renumbering and Rehoming Mid-Level ISPs............. 19
3.7. Support for Multi-Homed Sites....................... 20
3.8. Explicit Non-Goals for GSE.......................... 21
4. Analysis of GSE's Advantages and Disadvantages........... 21 4. The GSE Proposal......................................... 14
4.1. End System Designator............................... 21 4.1. Motivation For GSE.................................. 14
4.1.1. Uniqueness Enforcement in the IPv4 Internet.... 21 4.2. GSE Address Format.................................. 15
4.1.2. Overloading Addresses: Network Layer Issues.... 23 4.2.1. Routing Stuff (RG and STP)..................... 15
4.1.3. Overloading Addresses: Transport Layer Issues.. 24 4.2.2. End-System Designator.......................... 17
4.1.4. Potential Benefits of Globally Unique ESDs..... 25 4.3. Address Rewriting by Border Routers................. 18
4.1.5. ESD: Network Layer Issues...................... 26 4.4. Renumbering and Rehoming Mid-Level ISPs............. 19
4.1.6. ESD: Transport Layer Issues.................... 28 4.5. Support for Multi-Homed Sites....................... 20
4.1.7. On The Uniqueness Of ESDs...................... 34 4.6. Explicit Non-Goals for GSE.......................... 21
4.1.8. DNS PTR Queries................................ 35
4.1.9. Reverse Mapping of ESDs........................ 37
4.1.10. Reverse Mapping of Complete GSE Addresses..... 38
4.1.11. The ICMP "Who Are You" Message................ 39
4.2. Renumbering and Domain Name System (DNS) Issues..... 40
4.2.1. How Frequently Can We Renumber?................ 40
4.2.2. Efficient DNS support for Site Renumbering..... 41
4.2.3. Two-Faced DNS.................................. 42
4.2.4. Bootstrapping Issues........................... 43
4.2.5. Renumbering and Reverse DNS Lookups............ 44
4.3. Address Rewriting Routers........................... 44
4.3.1. Load Balancing................................. 45
4.3.2. End-To-End Argument: Don't Hide RG from Hosts.. 45
4.4. Multi-Homing........................................ 46
5. Results.................................................. 48 5. Analysis: The Pros and Cons of Overloading Addresses..... 21
6. Security Considerations.................................. 49 5.1. Purpose of an Identifier............................ 22
5.2. Mapping an Identifier to a Locator.................. 24
5.2.1. Scalable Mapping of Identifers to Locators..... 25
5.2.2. Insufficient Hierarchy Space in ESDs........... 26
5.2.3. Reverse Mapping of Complete GSE Addresses...... 27
5.2.4. DNS-Like Reverse Mapping of Full GSE Addresses. 27
5.2.5. The ICMP Who-Are-You Message................... 28
5.3. Authentication of Identifiers....................... 29
5.3.1. Identifier Authentication in IPv4.............. 30
5.3.2. Identifier Authentication in GSE............... 31
5.3.3. Transport Layer: What Locator Should Be Used?.. 31
5.3.4. RG Selection On An Active Open................. 32
5.3.5. RG Selection On An Passive Open................ 32
5.3.6. Mid-Connection RG Changes...................... 32
5.3.7. The Impact of Corrupt Routing Goop............. 33
5.3.8. On The Uniqueness Of ESDs...................... 35
5.3.9. New Denial of Service Attacks.................. 36
5.3.10. Summary of Identifier Authentication Issues... 36
5.4. Miscellaneous....................................... 38
5.4.1. Renumbering and Domain Name System (DNS) Issues 38
5.4.2. How Frequently Can We Renumber?................ 38
5.4.3. Efficient DNS support for Site Renumbering..... 39
5.4.4. Two-Faced DNS.................................. 40
5.4.5. Bootstrapping Issues........................... 41
7. Acknowledgments.......................................... 49 6. Conclusion............................................... 41
8. References............................................... 49 7. Security Considerations.................................. 42
9. Authors' Addresses....................................... 51 8. Acknowledgments.......................................... 42
9. References............................................... 43
10. Authors' Addresses...................................... 44
1. Introduction 1. Introduction
In October of 1996, Mike O'Dell published an Internet-Draft (dubbed In October of 1996, Mike O'Dell published an Internet-Draft (dubbed
"8+8") that proposed significant changes to the IPv6 addressing "8+8") that proposed significant changes to the IPv6 addressing
architecture. The 8+8 proposal was the topic of considerable architecture. The 8+8 proposal was the topic of considerable
discussion at the December 1996 IETF meeting in San Jose. Because the discussion at the December 1996 IETF meeting in San Jose. Because the
proposal offered both potential benefits (e.g., enhanced routing proposal offered both potential benefits (e.g., enhanced routing
scalability) and risks (e.g., changes to the basic IPv6 scalability) and risks (e.g., changes to the basic IPv6
architecture), the IPng Working Group held an interim meeting on architecture), the IPng Working Group held an interim meeting on
February 27-28, 1997 to consider adopting the 8+8 proposal. The February 27-28, 1997 to consider adopting the 8+8 proposal.
meeting, at which over 45 persons attended, was held at Sun
Microsystems' PAL1 facility in Palo Alto, CA.
Shortly before the interim meeting, an updated version of the Shortly before the interim meeting, an updated version of the
Internet-Draft was produced, in which the name of the proposal was Internet-Draft was produced. This version changed the name of the
changed from "8+8" to "GSE," to identify the three separate proposal from "8+8" to "GSE" to identify the three separate
components of the address: Global, site and End-System Designator. components of the address: Global, Site and End-System Designator.
This last version of the GSE proposal was published as an
Informational RFC [GSE] for historical purposes.
The purpose of the meeting was to evaluate the GSE proposal and
decide whether to adopt it in whole or in part or to reject it.
The well-attended meeting generated high caliber, focused technical The well-attended meeting generated high caliber, focused technical
discussions on the issues involved, with participation by almost all discussions on the issues involved, with participation by almost all
of the attendees. By the middle of the second day there was unanimous of the attendees. By the middle of the second day there was unanimous
agreement by the attendees that the GSE proposal as written presented agreement that the GSE proposal as written presented too many risks
too many risks and should not be adopted as the basis for IPv6. and should not be adopted as the basis for IPv6. The proposal did,
However, the attendees also concluded that some of the issues however, challenge the group to make improvements to the then
discussed in the GSE proposal were equally applicable to the current existing IPv6 specifications (e.g., increasing the aggregatability of
IPv6 provider-based addressing plan and had enough benefit to warrant addresses, having hard boundaries in addresses between routing parts
further consideration apart from the GSE address format. These and non-routing parts and easing the DNS aspects of renumbering).
changes include:
1) Making changes to the IPv6 provider-based addressing document to This document focuses primarily on the issue of separating addresses
facilitate increased aggregation. into distinct portions for identification and location: a separation
that GSE has but IPv4 does not. We start with a discussion of the
current architecture of IPv4 addressing and its impact on route
scalability, identification, multi-homing, etc. Next, the details of
the GSE proposal are described. Finally, the fundamental issue of
decomposing addresses into multiple separate functional parts is
analyzed in the context of the GSE proposal. Here we detail some of
the practical reasons why separating addresses into locators and
identifier poses a number of challenging problems, making it clear
that having such a separation is no panacea. An appendix contains a
summary of the IPng Working Group's deliberations of GSE and the
results on IPv6 addressing.
2) Creating hard boundaries in IPv6 addresses to clearly 2. Definitions and Terminology
distinguish between the portions used for identifying hosts and
for routing.
3) Having an option to indicate that the low-order 8 bytes of an The following terminology is used throughout this document.
IPv6 address is a globally unique End System Designator (ESD).
This change has potential benefits to future transport protocols
(e.g., TCPng).
4) Making a clear distinction between the "locator" part of an Routing Goop --- A term defined by the GSE document that refers to
address and the "identifier" part of the address. The former is first six bytes of an IPv6 GSE address. The Routing
used to route a packet to its end-point, the latter is used to Goop portion of an address identifies where a site
identify an end-point, independent of the path used to deliver connects to the public Internet. More generally,
the packet. the term refers to the portion of an address's
routing prefix that identifies where a site at which
an address resides connects to the public Internet.
5) Making changes to the way AAAA records are stored within the Site Topology Partition --- A term defined by the GSE document
DNS, so that renumbering a site (e.g., when a site changes ISPs) that refers to the two bytes of an IPv6 GSE address
requires few changes to the DNS database in order to effectively immediately to the right of the Routing Goop. The
change all of a site's address AAAA RRs. Site Topology Partition part of an address
identifies which link within a site an address
resides on.
While this document does contain an analysis of the specific Routing Stuff --- The part of an address that identifies which
mechanisms of the GSE proposal, much of document's analysis applies link the address resides on. Within the context of
to any proposal in which the identifying and locating properties of GSE, the Routing Goop and Site Topology Partition
an address (which are combined in IPv4) are split apart into parts of an address comprise the Routing Stuff.
separable pieces.
2. Addressing and Routing in IPv4 identifier --- a value that indicates the sender of a packet, or
the intended recipient of a packet. Within the
context of GSE, the ESD portion of the address is an
identifier.
locator --- a field in a packet header that is used by the routing
subsystem to deliver a packet to the link on which a
destination resides. The terms locator and Routing
Stuff are similar, we use Routing Stuff when
referring to the specific locator in GSE.
3. Addressing and Routing in IPv4
Before dealing with details of GSE, we present some background about Before dealing with details of GSE, we present some background about
how routing and addressing works in "classical IP" (i.e., IPv4). We how routing and addressing works in "classical IP" (i.e., IPv4). We
present this background because the GSE proposal proposes a fairly present this background because the GSE proposal proposes a fairly
major change to the base model. In order to properly evaluate the major change to the base model. In order to properly evaluate GSE,
benefits of GSE, one must understand what problems in IPv4 it alleges one must understand what problems in IPv4 it alleges to improve or
to improve or fix. fix.
The structure and semantics of a network layer protocol's addresses The structure and semantics of a network layer protocol's addresses
are absolutely core to that protocol. Addressing substantially are absolutely core to that protocol. Addressing substantially
impacts the way packets are routed, the ability of a protocol to impacts the way packets are routed, the ability of a protocol to
scale and the kinds of functionality higher layer protocols can scale and the kinds of functionality higher layer protocols can
provide. Indeed, addressing is intertwined with both routing and provide. Indeed, addressing is intertwined with both routing and
transport layer issues; a change in any one of these can impact transport layer issues; a change in any one of these can impact
another. Issues of administration and operation (e.g., address another. Issues of administration and operation (e.g., address
allocation and required renumbering), while not part of the pure allocation and required renumbering), while not part of the pure
exercise of engineering a network layer protocol, turn out to be exercise of engineering a network layer protocol, turn out to be
critical to the scalability of that protocol in a global and critical to the scalability of that protocol in a global and
commercial network. The interaction between addressing, routing and commercial network. The interaction between addressing, routing and
especially aggregation is particularly relevant to this document, so especially aggregation is particularly relevant to this document, so
some time will be spent describing it. some time will be spent describing it.
Addresses in IPv4 serve two purposes: Addresses in IPv4 serve two purposes:
1) Unique identification of an interface. An IP address by itself 1) Unique identification of an interface. A sending host tells the
identifies which interface a packet should be delivered to. network the identity of the intended recipient by placing an IP
address into the destination address field. In addition, the
receiving host checks the destination address field of received
packets to ensure that the packet is, in fact, for it.
2) Location information of that interface. Routers extract location 2) Location information of that interface. Routers use the packet's
information from a packet's destination address in order to destination address in deciding where to forward the packet to
route it towards its ultimate destination. That is, addresses get it closer to its ultimate destination. That is, addresses
identify "where" the intended recipient is located within the identify "where" the intended recipient is located within the
Internet topology. Internet topology.
For scalability, the location information contained in addresses For scalability, the location information contained in addresses
must be aggregatable. In practice, this means nodes must be aggregatable. In practice, this means that nodes
topologically close to each other (e.g., connected to the same topologically close to each other (e.g., connected to the same
link, residing at the same site, or customers of the same ISP) link, residing at the same site, or customers of the same ISP)
must use addresses that share a common prefix. must use addresses that share a common prefix.
What is important to note is that these identification and location What is important to note is that these identification and location
requirements have been met through the use of the same value, namely requirements have been met through the use of the same value, namely
the IP address. As will be noted repeatedly in this document, the the IP address. As will be noted repeatedly in this document, the
"over-loading" of IPv4 addresses with multiple semantics has some "overloading" of IPv4 addresses with multiple semantics has some
undesirable implications. For example, the embedding of IPv4 undesirable implications. For example, the embedding of IPv4
addresses within transport protocol addresses that identify the end- addresses within transport protocol addresses that identify the end-
point of a connection couples those transport protocols with routing. point of a connection couples those transport protocols with routing.
This entanglement is inconsistent with a strictly layered model in This entanglement is inconsistent with a strictly layered model in
which routing would be a completely independent function of the which routing would be a completely independent function of the
network layer and not directly impact the transport layer. network layer and not directly impact the transport layer.
Combining locator and identifier functions also has the practical Combining locator and identifier functions also has the practical
impact of complicating the support for mobility. In a mobile impact of complicating the support for mobility. In a mobile
environment, the location of an end-station may change even though environment, the location of an end-station may change even though
its identity stays the same; ideally, transport connections should be its identity stays the same; ideally, transport connections should be
able to survive such changes. In IPv4, however, one cannot change the able to survive such changes. In IPv4, however, one cannot change the
locator without also changing the identifier. Consequently, locator without also changing the identifier.
conventional wisdom for some time has been that having separate
values for location and identification could be of significant Consequently, there has been a train of thought for some time has
benefit. The GSE proposal attempts to make such a separation. been that having separate values for location and identification
could be of significant benefit. The GSE proposal, among other
things, attempts to make such a separation.
This document frequently uses mobility as an example to demonstrate This document frequently uses mobility as an example to demonstrate
the pros and cons of separating the identifier from the locator. the pros and cons of separating the identifier from the locator.
However, the reader should note the fundamental equivalence between However, the reader should note the fundamental equivalence between
the problems faced by mobile hosts and the problem faced by sites the problems faced by mobile hosts and the problem faced by sites
that change providers yet don't want to be required to renumber their that change providers yet don't want to renumber their network. When
network. When a site changes providers, it moves (topologically) in a site changes providers, it moves topologically in much the same way
much the same way a mobile node does when it moves from one place to a mobile node does when it moves from one place to another.
another. Consequently, techniques that help (or hinder) mobility are Consequently, techniques that help or hinder mobility are often
often relevant to the issue of site renumbering. relevant to the issue of site renumbering.
2.1. The Need for Aggregation 3.1. The Need for Aggregation
IPv4 has seen a number of different addressing schemes. Since the IPv4 has seen a number of different addressing schemes. Since the
original specification, the two major additions have been subnetting original specification, the two major additions have been subnetting
and classless routing. The motivation for adding subnetting was to and classless routing. The motivation for adding subnetting was to
allow a collection of networks located at one site to be viewed from allow a collection of networks located at one site to be viewed from
afar as being just one IP network (i.e., to aggregate all of the afar as a single IP network (i.e., to aggregate all of the individual
individual networks into one bigger network). The practical benefit networks into one bigger network). The practical benefit of
of subnetting was that all of a site's hosts, even if scattered among subnetting was that all of a site's hosts, even if scattered among
tens or hundreds of LANs, could be represented via a single routing tens or hundreds of LANs, could be represented with a single routing
table entry in routers located far from the site. In contrast, prior table entry in routers located far from the site. In contrast, prior
to subnetting, a site with ten LANs would advertise ten separate to subnetting, a site with ten LANs would advertise ten separate
network entries, and all routers would have to maintain ten separate network entries, and all routers would have to maintain ten separate
entries, even though they contained redundant information.. entries, even though they contained essentially redundant
information.
The benefits of aggregation should be clear. The amount of work The benefits of aggregation should be clear. The amount of work
involved in computing forwarding tables from routing tables is involved in constructing forwarding tables (i.e., selecting best
dependent in part on the number of network routes (i.e., routes and installing them into the switching subsystem) is dependent
destinations) to which best paths are computed. If each site has 10 in part on the number of network routes (i.e., destinations) to which
internal networks, and each of those networks is individually best paths are computed. If each site has 10 internal networks, and
advertised to the global routing subsystem, the complexity of each of those networks is individually advertised to the global
computing forwarding tables can easily be an order of magnitude routing system, the complexity of computing forwarding tables can
greater than if each site advertised just a single entry that covered easily be an order of magnitude greater than if each site advertised
all of the addresses used within the site. a single entry that covered all of the addresses used within the
site.
2.2. The Pre-CIDR Internet 3.2. The Pre-CIDR Internet
In the early days of the Internet, the Internet's topology and its In the early days of the Internet, its topology and addressing were
addressing were treated as orthogonal. Specifically, when a site orthogonal. Specifically, when a site wanted to connect to the
wanted to connect to the Internet, it approached a centralized Internet, it approached a centralized address allocation authority to
address allocation authority to obtain an address and then approached obtain an address and then approached a provider about procuring
a provider about procuring connectivity. This procedure for address connectivity. This procedure for address allocation resulted in a
allocation resulted in a system where the addresses used by customers system where the addresses used by customers of the same provider
of the same provider bore little relation to the addresses used by bore little relation to the addresses used by other customers of that
other customers of that provider. In other words, though the topology same provider. In other words, though the topology of the Internet
of the Internet was mostly hierarchical (i.e., customers connected to was mostly hierarchical, the addressing was not. An example of such a
only one provider and the same path was used to reach all customers topology and addressing scheme is shown in Figure 1.
of the same provider), the addressing was not, and little aggregation
of routes took place. An example of such a topology and addressing
scheme shown in Figure 1.
+----------------+ +----------------+
| |------- Customer1 (192.2.2.0) | |------- Customer1 (192.2.2.0)
| |------- Customer2 (128.128.0.0) | |------- Customer2 (128.128.0.0)
| Provider A |------- Customer3 (18.0.0.0) | Provider A |------- Customer3 (18.0.0.0)
| |------- Customer4 (193.3.3.0) | |------- Customer4 (193.3.3.0)
| |------- Customer5 (194.4.4.0) | |------- Customer5 (194.4.4.0)
+----------------+ +----------------+
| |
| |
| |
| |
+----------------+ +----------------+
| Provider B | | Provider B |
+----------------+ +----------------+
Figure 1 Figure 1
Figure 1 shows Provider A having 5 customers, each with their own Figure 1 shows Provider A having 5 customers, each with their own
independently obtained network addresses. Providers A and B connect independently obtained network address. Providers A and B connect to
to each other. In order for Provider B to be able to send traffic to each other. In order for Provider B to be able to send traffic to
Customers1-5, Provider A must announce each of the 5 networks to Customers1-5, Provider A must announce a separate route to Provider B
Provider B. That is, the routers within Provider B must have explicit for each of the 5 networks. That is, the routers within Provider B
routing entries for each of Provider A's customers, 5 separate routes must have explicit routing entries for each of Provider A's customers
in Figure 1. -- 5 separate routes.
Experience has shown that this approach scales very poorly. In the Experience has shown that this approach scales very poorly. In the
Default-Free Zone (DFZ) of the Public Internet, where routers must Default-Free Zone (DFZ) of the Public Internet, where routers must
maintain routing entries for all reachable destinations, the cost of maintain routing entries for all reachable destinations, the cost of
computing forwarding tables quickly becomes unacceptably large. A computing forwarding tables quickly becomes unacceptably large. A
large part of the cost is related to the seemingly redundant large part of the cost is related to the seemingly redundant
computations that must be made for each individual network, even computations that must be made for each individual network, even
though the reality is that many reside in the same topological though the reality is that many reside in the same topological
location (e.g., the same provider). Looking at Figure 1, the problem location (e.g., under the same provider). Looking at Figure 1, the
is that provider B performs 5 separate calculations to construct the problem is that provider B performs 5 separate calculations to
routing tables needed to reach each of A's customers. construct the forwarding table needed to reach each of A's customers.
Said another way, from Provider B's perspective, it doesn't matter
where Provider A's customers connect to Provider A because Provider B
is going to take the same path for all of them; in other words, there
is an opportunity to do data abstraction.
2.3. CIDR and Provider-Based Addressing 3.3. CIDR and Provider-Based Addressing
One of the reasons Classless Inter-Domain Routing (CIDR) and its One of the reasons CIDR (Classless Inter-Domain Routing) and its
associated provider-assigned address allocation policy were associated provider-assigned address allocation policy were
introduced was to help reduce the size of and cost of computing introduced was to help reduce the size of a routing table and the
forwarding tables. CIDR reduces the cost of computing forwarding complexity of computing a forwarding table from that routing table.
tables by aggressively aggregating addresses. Aggregating addresses
means structuring them in such a way that the location of the nodes
having those addresses can be represented by a single routing entry.
In CIDR, this means that addresses share a common prefix. The common
prefix provides location information for all addresses sharing that
same prefix.
In CIDR, sites that want to connect to the Internet approach a CIDR does this by aggressively aggregating network addresses.
provider to procure both connectivity and a network address; Aggregating network addresses means "merging" multiple addresses into
individual providers have a large block of address space covered by a single "bigger" one. In CIDR, this means that addresses share a
one prefix and assign pieces of their space to customers. common prefix. The common prefix provides location information for
Consequently, customers of the same provider have addresses that all addresses sharing that same prefix.
share the same prefix. Note that CIDR started the use of the term
"prefix" to refer to a Classless network. The combination of CIDR and With CIDR, sites that want to connect to the Internet approach a
provider-based addressing results in the ability for a provider to provider to procure both connectivity and a network address.
address many hundreds of sites while introducing just *one* network Individual providers have a block of address space covered by one
address into the global routing system, i.e., aggregating all of its prefix and assign pieces of that space to customers. Consequently,
customers addresses under one prefix. An example of such a topology customers of the same provider have addresses that share the same
and addressing scheme is shown in Figure 2. prefix. Note that CIDR started to use the term "prefix" to refer to a
classless network. The combination of CIDR and provider-based
addressing results in the ability of a provider to address many
hundreds of sites while introducing just one network address into the
global routing system. An example of such a topology and addressing
scheme is shown in Figure 2.
+----------------+ +----------------+
| |------- Customer1 (204.1.0.0/19) | |------- Customer1 (204.1.0.0/19)
| |------- Customer2 (204.1.32.0/23) | |------- Customer2 (204.1.32.0/23)
| Provider A |------- Customer3 (204.1.34.0/24) | Provider A |------- Customer3 (204.1.34.0/24)
| |------- Customer4 (204.1.35.0/24) | |------- Customer4 (204.1.35.0/24)
| |------- Customer5 (204.1.36.0/23) | |------- Customer5 (204.1.36.0/23)
+----------------+ +----------------+
| |
| A announces | A announces
| 204.1/16 to B | 204.1/16 to B
| |
+----------------+ +----------------+
| Provider B | | Provider B |
+----------------+ +----------------+
Figure 2 Figure 2
In Figure 2, Provider A has been assigned the classless block, or In Figure 2, Provider A has been assigned the classless block, or
"aggregate," 204.1.0.0/16 (i.e., a network prefix with 16 bits for "aggregate," 204.1.0.0/16 (i.e., a prefix with the high-order 16 bits
the network part and 16 bits for local use). Provider A has 5 denoting a single network). Provider A has 5 customers, each of which
customers, each of which has been assigned a prefix subordinate to has been assigned a prefix subordinate to the aggregate. In order
the aggregate. In order for Provider B to be able to reach for Provider B to be able to reach Customers1-5, Provider A only
Customers1-5, Provider A need only announce a single prefix, needs to announce the single prefix 204.1.0.0/16. The benefit for
204.1.0.0/16, because that prefix covers all of its customers. The Provider B is that its routers need only a single routing table entry
benefit for Provider B is that its routers need only a single routing to reach all of Provider A's customers. Note the difference between
table entry to reach all of Provider A's customers. Note the the cases described in Figures 1 and 2. The important difference in
difference between the cases described in Figures 1 and 2. The the two Figures is that the latter example uses fewer entries in the
important difference in the two Figures is that the latter example routing table to reach the same number of destinations.
uses fewer slots in the routing table to reach the same number of
destinations.
CIDR was a critical step for the Internet: in the early 1990s the CIDR was a critical step for the Internet: in the early 1990s the
size of default-free routing tables required to support the Classful size of default-free routing tables required to support the classful
Internet was almost more than the commercially-available hardware and Internet was almost more than the commercially-available hardware and
software of the day could handle. The introduction of BGP4's software of the day could handle. The introduction of BGP4's
classless routing and provider-based address allocation policies classless routing and provider-based address allocation policies
resulted in an immediate relief. Having said that, however, there are resulted in an immediate relief. At the same time, however, CIDR
some weaknesses of the system. First, the Internet addressing model introduced some new weaknesses. First, the Internet addressing model
shifted from one of "address owning" to "address lending." In pre- had to shift from one of "address owning" to "address lending." In
CIDR days sites acquired addresses from a central authority pre-CIDR days sites acquired addresses from a central authority
independent of who their network provider was, and a site could independent of their provider, and a site could assume it "owned" the
assume it "owned" the address it was given. Owning addresses meant address it was given. Owning addresses meant that once one had been
that once one had been given a set of network addresses, one could given a set of network addresses, one could always use them and
always use them and assume that no matter where a site connected to assume that no matter where a site connected to the Internet, the
the Internet, the prefix for that network could be injected into the prefix for that network could be injected into the public routing
public routing system. Today, however, it is simply no longer system. Today, however, it is simply no longer possible for each
possible for each individual site to have its own private prefix individual site to have its own private prefix injected into the DFZ;
injected into the DFZ; there would simply be too many of them. there would simply be too many of them. Consequently, if a site
Consequently, if a site decides to change providers, then it needs to decides to change providers, then it needs to renumber all of its
number itself out of space given to it by the new provider and give nodes using address space given to it by the new provider. The "old"
its old address back to the old provider. To understand this, addresses it had used are returned back to its previous provider. To
consider if, from Figure 2, Customer3 changes its provider from understand this, consider if, from Figure 2, Customer3 changes its
Provider A to Provider C, but does not renumber. The picture would be provider from Provider A to Provider C, but does not renumber. The
as follows: picture would be as follows:
+----------------+ +----------------+
| |---- Customer1 (204.1.0.0/19) | |---- Customer1 (204.1.0.0/19)
| |---- Customer2 (204.1.32.0/23) | |---- Customer2 (204.1.32.0/23)
| Provider A | | Provider A |
+---------------| |---- Customer4 (204.1.35.0/24) +---------------| |---- Customer4 (204.1.35.0/24)
| A announces | |---- Customer5 (204.1.36.0/23) | A announces | |---- Customer5 (204.1.36.0/23)
| 204.1/16 to B +----------------+ | 204.1/16 to B +----------------+
| | | |
| |
| |
+----------------+ | +----------------+ |
| Provider B | | | Provider B | |
+----------------+ | +----------------+ |
| | | |
| |
| |
| C announces | | C announces |
| 204.1.34/24 | | 204.1.34/24 |
| to B +----------------+ | to B +----------------+
+---------------| Provider C |---- Customer3 (204.1.34.0/24) +---------------| Provider C |---- Customer3 (204.1.34.0/24)
+----------------+ +----------------+
Figure 3 Figure 3
In Figure 3, Providers A, B and C are all directly connected to each
In Figure 3, each of Provider A, B and C are directly connected to other. In order for Provider B to reach Customers 1, 2, 4 and 5,
each other provider. In order for Provider B to reach Customers 1, 2, Provider A still only announces the 204.1.0.0/16 aggregate. However,
4 and 5, Provider A still only announces the 204.1.0.0/16 aggregate. in order for Provider B to reach Customer 3, Provider C must announce
However, in order for Provider B to reach Customer 3, Provider C must the prefix 204.1.34.0/24. Prefix 204.1.34.0/24 is called a "more-
announce the prefix 204.1.34.0/24. Prefix 204.1.34.0/24 is called a specific" of 204.1.0.0/16; another term used is that Customer3 and
"more-specific" of 204.1.0.0/16; another term used is that Customer3 Provider C have "punched a hole in" Provider A's block. The result
and Provider C have "punched a hole in" Provider A's block. The of this is that from Provider B's view, the address space underneath
result of this is that from Provider B's view, the address space 204.1.0.0/16 is no longer cleanly aggregated into a single prefix and
underneath 204.1.0.0/16 is no longer cleanly aggregated into a single instead the aggregation has been broken because the addressing is
prefix and instead the aggregation has been broken because the inconsistent with the topology; in order to maintain reachability to
addressing is inconsistent with the topology; in order to maintain Customer3, Provider B must carry two prefixes where it used to have
reachability to Customer3, Provider B must carry two prefixes where to carry only one.
it used to have to carry only one.
The example in Figure 3 explains why sites must renumber if existing The example in Figure 3 explains why sites must renumber if existing
levels of aggregation are to be maintained. While it is certainly levels of aggregation are to be maintained. While it is certainly
clear that one or two "exceptions" to the ideal case can be clear that a small number of exceptions can be tolerated, the reality
tolerated, the reality in today's Internet is that there are in today's Internet is that there are thousands of providers, many
thousands of providers, many with thousands of individual customers. with thousands of individual customers. It is generally accepted that
It is generally accepted that some renumbering of sites is essential renumbering of sites is essential for maintaining sufficient
for maintaining sufficient aggregation. aggregation.
The empirical cost of renumbering a site in order to maintain The empirical cost of renumbering a site in order to maintain
aggregation has been the subject of much discussion. The practical aggregation has been the subject of much discussion. The practical
reality, however, is that forcing all sites to renumber is difficult reality, however, is that forcing all sites to renumber is difficult
given the size and wealth of companies that now depend on the given the size and wealth of companies that now depend on the
Internet for running their business. Thus, although the technical Internet for running their business. Thus, although the technical
community came to consensus that address lending was necessary in community came to consensus that address lending was necessary in
order for the Internet to continue to operate and grow, the reality order for the Internet to continue to operate and grow, the reality
has been that some of CIDR's benefits have been lost because sites has been that some of CIDR's benefits have been lost because not all
refuse to renumber. sites renumber. It is worth noting that a number of providers do
route filtering based, in part, on prefix length; as a result, a site
which does not renumber may have, at best, partial connectivity to
the Internet.
One unfortunate characteristic of CIDR at an architectural level is One unfortunate characteristic of CIDR at an architectural level is
that the pieces of the infrastructure which benefit from the that the pieces of the infrastructure that benefit from the
aggregation (i.e., the providers whose major headache is managing aggregation (i.e., the providers which make up the DFZ) are not the
routing table growth in the DFZ) are not the pieces that incur the pieces that incur the cost (i.e., the end site). The logical
cost (i.e., the end site). The logical corollary of this statement is corollary of this statement is that the pieces of the infrastructure
that the pieces of the infrastructure which do incur cost to achieve that do incur cost to achieve aggregation (e.g., sites which renumber
aggregation (e.g., sites which renumber when they change providers) when they change providers) don't directly see the benefit. (The word
don't directly see the benefit. (The word "directly" is used here "directly" is used here because the continued operation of the
because one could claim that the continued operation of the Internet Internet is a benefit, though it requires selflessness on the part of
is a benefit, though it is an indirect benefit and requires the site to recognize.)
selflessness on the part of the site in order to recognize it.)
2.4. Multi-Homing and Aggregation 3.4. Multi-Homing and Aggregation
As sites become more dependent on the Internet, they have begun to As sites become more dependent on the Internet, they have begun to
install additional connections to the Internet to improve robustness install additional connections to the Internet to improve robustness
and performance. Such sites are called "multi-homed." Unfortunately, and performance. Such sites are called "multi-homed." Unfortunately,
when a site connects to the Internet at multiple places, the impact when a site connects to the Internet at multiple places, the impact
on routing can be much like a site that switches providers but on routing can be much like a site that switches providers but
refuses to renumber. refuses to renumber.
In the pre-CIDR days, multi-homed sites were typically known by only In the pre-CIDR days, multi-homed sites were typically known by only
one network prefix. When that site's providers announced the site's one network prefix. When that site's providers announced the site's
network into the global routing system, a "shortest path" type of network into the global routing system, a "shortest path" type of
routing would occur so that pieces of the Internet closest to the routing would occur so that pieces of the Internet closest to the
first provider would use the first provider while other pieces of the first provider would use the first provider while other pieces of the
Internet might use the second provider. This allowed sites to use the Internet would use the second provider. This allowed sites to use the
routing system itself to load balance traffic across their multiple routing system itself to load balance traffic across their multiple
connections. This type of multi-homing assumes that a site's prefix connections. This type of multi-homing assumes that a site's prefix
can be propagated throughout the DFZ, an assumption that is no longer can be propagated throughout the DFZ, an assumption that is no longer
universally true. universally true.
With CIDR, issues of addressing and aggregation complicate matters With CIDR, issues of addressing and aggregation complicate matters
significantly. At the highest levels, there are three possible ways significantly. At the highest levels, there are three possible ways
to deal with multi-homed sites. The first approach is for multi- to deal with multi-homed sites. The first approach is for multi-
homed sites to receive address space directly from a registry, homed sites to receive address space directly from a registry,
independent of its providers. The problem with this approach is independent of its providers. The problem with this approach is
that, because the address space is obtained independent of either that, because the address space is obtained independent of either
provider, it is not aggregatable and therefore has a negative impact provider, it is not aggregatable and therefore has a negative impact
on the scaling of global routing. on the scaling of global routing.
The second approach is for a multi-homed site to receive an The second approach is for a multi-homed site to receive an
allocation from one of its providers and just use that single prefix. allocation from one of its providers and just use that single prefix.
The site would advertise its prefix to all of the providers to which The site would advertise its prefix to all of the providers to which
it connects. Their are two problems with this is approach. First, it connects. There are two problems with this is approach. First,
although the prefix is aggregatable by the provider which made the although the prefix is aggregatable by the provider which made the
allocation, it is not aggregatable by the other providers. To the allocation, it is not aggregatable by the other providers. To the
other providers, the site's prefix poses the same problem as a other providers, the site's prefix poses the same problem that a
provider-independent address would. This has a negative impact on provider-independent address would. This has a negative impact on
the scaling of global routing. Second, due to CIDR's longest-match the scaling of global routing. Second, due to CIDR's rule for
routing rules, it turns out that the site's prefix is not always longest-match routing, it turns out that the site's prefix is not
aggregable in practice by the provider that made the allocation. always aggregatable in practice even by the provider that made the
Consider Figure 4. Provider C has two paths for reaching customer 1. allocation. Consider Figure 4. Provider C has two paths for reaching
Provider A advertises 204.1/16, which includes customer 1. But Customer 1. Provider A advertises 204.1/16, an aggregate which
Provider C will also receive an advertisement for prefix 204.1.0/19 includes Customer 1. But Provider C will also receive an
from Provider B, and because the prefix match through B is longer, C advertisement for prefix 204.1.0/19 from Provider B, and because the
will choose that path. In order for Provider C to be able to choose prefix match through B is longer, C will choose that path. In order
between the two paths, Provider A would also have to advertise the for Provider C to be able to choose between the two paths, Provider A
longer prefix for 204.1.0/19 in addition to the shorter 204.1/16. At would also have to advertise the longer prefix for 204.1.0/19 in
this point, from the routing perspective, the situation is very addition to the shorter 204.1/16. At this point, from the routing
similar to the general problem posed by the use of provider- perspective, the situation is very similar to the general problem
independent addresses. posed by the use of provider-independent addresses.
It should be noted that the above example simplifies a very complex It should be noted that the above example simplifies a very complex
issue. For example, consider the example in Figure 4 again. Provider issue. For example, consider the example in Figure 4 again. Provider
A could choose *not* to propagate a route entry for the longer A could choose not to propagate a route entry for the longer
2.4.1.0/19 prefix, advertising only the shorter 204.1/16. In such 204.1.0/19 prefix, advertising only the shorter 204.1/16. In such
cases, provider C would always select Provider B. Internally, cases, provider C would always select Provider B. Internally,
Provider A would continue to router traffic from its other customers Provider A would continue to route traffic from its other customers
to customer 1 directly. If Provider A had a large enough customer to Customer 1 directly. If Provider A had a large enough customer
base, effective load sharing would achieved. base, effective load sharing might be achieved.
+------------+ +------------+ A advertises
_____| Provider A |---| Provider C | +------------+ 204.1/16 to C +------------+
___| Provider A |-----------------| Provider C |
/ +------------+ +------------+ / +------------+ +------------+
/ 204.1/16 / / +----------/
/ / / /
Customer 1 --- / B advertises 204.1.0/19 to C Customer 1 --- / B advertises 204.1.0/19 to C
204.1.0.0/19 | / 204.1.0.0/19 | /
| +------------+ | +------------+
----- | Provider B | ----- | Provider B |
+------------+ +------------+
Figure 4 Figure 4
The third approach is for a multi-homed site to receive an allocation The third approach is for a multi-homed site to receive an allocation
skipping to change at page 13, line 44 skipping to change at page 14, line 4
prefixes. Consider a configuration where a site is connected to prefixes. Consider a configuration where a site is connected to
ISP-A and ISP-B. If the link to ISP-A goes down, then unless the ISP-A and ISP-B. If the link to ISP-A goes down, then unless the
ISP-A prefix is announced to ISP-B (which breaks aggregation), ISP-A prefix is announced to ISP-B (which breaks aggregation),
the hosts numbered out of the ISP-A prefix would be unreachable. the hosts numbered out of the ISP-A prefix would be unreachable.
2) The site could assign each host multiple addresses (i.e., one 2) The site could assign each host multiple addresses (i.e., one
address for each ISP connection). There are two problems with address for each ISP connection). There are two problems with
this. First, it accelerates the consumption of the address this. First, it accelerates the consumption of the address
space. Second, when the connection to ISP-A goes down, space. Second, when the connection to ISP-A goes down,
addresses numbered out of ISP-A's space become unreachable. addresses numbered out of ISP-A's space become unreachable.
Remote peers would have to have sufficient intelligence to use Remote peers would have to have sufficient intelligence to use
the second address. For example, when initiating a connection to the second address. For example, when initiating a connection to
a host, the DNS would return multiple candidate addresses. a host, the DNS would return multiple candidate addresses.
Clients would need to try them all before concluding that a Clients would need to try them all before concluding that a
destination is unreachable (something not all hosts currently destination is unreachable (something not all hosts currently
do). In addition, a site's hosts would need a significant amount do). In addition, a site's hosts would need a significant amount
of intelligence for choosing the source addresses they use. A of intelligence for choosing the source addresses they use. A
host shouldn't choose a source address corresponding to a host shouldn't choose a source address corresponding to a link
addresses that are not reachable from the Public Internet. At that is down. At present, hosts do not have such sophistication.
present, hosts do not have such sophistication.
In summary, how best to achieve multi-homing with IPv4 in the face of In summary, how best to achieve multi-homing with IPv4 in the face of
CIDR is an unsolved problem. There is a delicate balance between the CIDR is an unsolved problem. There is a delicate balance between the
scalability of routing versus the site's requirements of robustness scalability of routing versus the site's requirements of robustness
and load-sharing. At this point in time, no solution has been and load-sharing. At this point in time, no solution has been
discovered that satisfies the competing requirements of route scaling discovered that satisfies the competing requirements of route scaling
and robustness/performance. It is worth noting, however, that some and robustness/performance. It is worth noting, however, that some
people are beginning to study the issue more closely and propose people are beginning to study the issue more closely and propose
novel ideas [BATES]. novel ideas [BATES].
3. GSE Background 4. The GSE Proposal
This section provides background information about GSE with the This section provides a description of GSE with the intent of making
intent of making this document stand-alone with respect to the GSE this document stand-alone with respect to the GSE "specification." We
"specification." Additional details on GSE can be found in [GSE]. begin by reviewing the motivation for GSE. Next we review the salient
We begin by reviewing the motivation for GSE. Next we review the technical details, and we conclude by listing the explicit non-goals
salient technical details, and we conclude by listing the explicit of the GSE proposal.
non-goals of the GSE proposal.
3.1. Motivation For GSE 4.1. Motivation For GSE
The primary motivation for GSE is the fact that the chief IPv6 global The primary motivation for GSE was the fact that the chief initial
unicast address structure, provider-based [RFC 2073], is IPv6 global unicast address structure, provider-based [RFC 2073], was
fundamentally the same as IPv4 with CIDR and provider-based fundamentally the same as IPv4 with CIDR and provider-based
aggregation. Provider-based addressing requires that sites renumber aggregation. Provider-based addressing requires that sites renumber
when they switch providers, so that sites are always aggregated when they switch providers, so that sites are always aggregated
within their provider's prefix. In practice, the cost of renumbering within their provider's prefix. In practice, the cost of renumbering
(which can only grow as a site grows in size and becomes more (which can only grow as a site grows in size and becomes more
dependent on the Internet for day-to-day business) is high enough dependent on the Internet for day-to-day business) is high enough
that an increasing number of sites refuse to renumber. This cost is that an increasing number of sites refuse to renumber. This cost is
particularly relevant in cases where end-users are asked to renumber particularly relevant in cases where end-users are asked to renumber
because an upstream provider has changed its transit provider (i.e., because an upstream provider has changed its transit provider (i.e.,
the end site is asked to renumber for reasons outside of its control the end site is asked to renumber for reasons outside of its control
and for which it sees no direct benefit). Consequently, The GSE and for which it sees no direct benefit). Consequently, the GSE
draft asserts that IPv4 with CIDR has not achieved the aggressive draft asserted that IPv4 with CIDR has not achieved the aggressive
aggregation required for the route computation functions of the aggregation required for the route computation functions of the DFZ
default-free zone of the Internet to scale for IPv4, and that the of the Internet to scale for IPv4 and that the larger addresses of
larger addresses of IPv6 simply exacerbate the problem. IPv6 simply exacerbate the problem.
The GSE proposal does not propose to eliminate the need for The GSE proposal did not propose to eliminate the need for
renumbering. Indeed, it asserts that end sites will have to be renumbering. Indeed, it asserted that end sites will have to be
renumbered more frequently in order to continue scaling the Internet. renumbered more frequently in order to continue scaling the Internet.
However, GSE proposes to make the cost of such a renumbering so However, GSE proposed to make the cost of such a renumbering so small
small, that sites could be renumbered at essentially any time with that sites could be renumbered at essentially any time with little or
only minor disruption to the site. no disruption.
Finally, GSE deals significantly with sites that have multiple Finally, GSE dealt significantly with sites that have multiple
Internet connections. In some addressing schemes (e.g., CIDR), this Internet connections. In some addressing schemes (e.g., CIDR), this
"multi-homing" can create exceptions to the aggregation and result in "multi-homing" can create exceptions to the aggregation and result in
poor scaling. That is, the public routing infrastructure needs to poor scaling. That is, the public routing infrastructure needs to
carry multiple distinct routes for the multi-homed site, one for each carry multiple distinct routes for the multi-homed site, one for each
independent path. GSE recognizes the "special work done by the global independent path. GSE recognized the "special work done by the global
Internet infrastructure on behalf of multi-homed sites," [GSE] and Internet infrastructure on behalf of multi-homed sites," [GSE] and
proposes a way for multi-homed sites to gain some benefit without proposed a way for multi-homed sites to gain some benefit without
impacting global scaling. This includes a specific mechanism that impacting global scaling. This included a specific mechanism that
providers could use to support multi-homed sites, presumably at a providers could use to support multi-homed sites, presumably at a
cost that the Site would consider when deciding whether or not to cost that the site would consider when deciding whether or not to
become multi-homed. become multi-homed.
3.2. GSE Address Format 4.2. GSE Address Format
The key departure of GSE from classical IP addressing (both v4 and The key departure of GSE from classical IP addressing (both v4 and
v6) is that rather than over-loading addresses with both locator and v6) was that rather than over-loading addresses with both locator and
identifier purposes, it splits the address into two elements: the identifier purposes, it split the address into two elements: the
high-order 8 bytes for routing (called "Routing Stuff" throughout the high-order 8 bytes for routing (called "Routing Stuff" throughout the
rest of this document) and the low-order 8 bytes for unique rest of this document) and the low-order 8 bytes for unique
identification of an end-point. The structure of GSE addresses is: identification of an end-point. The structure of GSE addresses was:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| Routing Goop | STP| End System Designator | | Routing Goop | STP| End System Designator |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
6+ bytes ~2 bytes 8 bytes 6+ bytes ~2 bytes 8 bytes
Figure 5 Figure 5
3.3. Routing Stuff (RG and STP) 4.2.1. Routing Stuff (RG and STP)
The Routing Goop (RG) identifies the place in the Public Internet The Routing Goop (RG) identifies the place in the Internet topology
topology where a Site connects and is used to route datagrams to the where a site connects and is used to route datagrams to the site. RG
Site. RG is structured as follows: is structured as follows:
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| xxx | 13 Bits of LSID | Upper 16 bits of Goop | | xxx | 13 Bits of LSID | Upper 16 bits of Goop |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3 4 3 4
2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Bottom 18 bits of Routing Goop | | Bottom 18 bits of Routing Goop |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6 Figure 6
The RG describes the location of a Site's connection by identifying The RG describes the location of a site's connection by identifying
smaller and smaller regions of topology until finally it identifies a smaller and smaller regions of topology until finally it identifies
single link to which the site. Before interpreting the bits in the the link which connects the site. Before interpreting the bits in the
RG, it is important to understand that routing with GSE depends on RG, it is important to understand that routing with GSE depends on
decomposing the Internet's topology into a specific graph. At the decomposing the Internet's topology into a specific graph. At the
highest level, the topology is broken into Large Structures (LSs). An highest level, the topology is broken into Large Structures (LSs). An
LS is basically a region that can aggregate significant amounts of LS is basically a region that can aggregate significant amounts of
topology. Examples of potential LSs are large providers and exchange topology. Examples of potential LSs are large providers and exchange
points. Within an LS the topology is further divided into another points. Within an LS the topology is further divided into another
graph of structures, with each LS dividing itself however it sees graph of structures, with each LS dividing itself however it sees
fit. This division of the topology into smaller and smaller fit. This division of the topology into smaller and smaller
structures can recurse for a number of levels, where the trade-off is structures can recurse for a number of levels, where the trade-off is
"between the flat-routing complexity within a region and minimizing "between the flat-routing complexity within a region and minimizing
total depth of the substructure." [ESD] total depth of the substructure." [ESD]
Having described the decomposition process, we can now examine the Having described the decomposition process, we can now examine the
bits in the RG. After the 3-bit prefix identifying the address as bits in the RG. After the 3-bit prefix identifying the address as
GSE, the next 13 bits identify the LS. By limiting the field to 13 GSE, the next 13 bits identify the LS. By limiting the field to 13
bits, a ceiling is defined on the complexity of the top-most routing bits, a ceiling is defined on the complexity of the top-most routing
level. In the next 34 bits, a series of subordinate structure(s) are level (i.e., what we currently call the DFZ). In the next 34 bits, a
identified until finally the leaf subordinate structure is series of subordinate structure(s) are identified until finally the
identified, at which point the remaining bits identify the individual leaf subordinate structure is identified, at which point the
link within that leaf structure. The remaining 14 bits of the Routing remaining bits identify the individual link within that leaf
Stuff comprise the STP and are used for routing structure within a structure.
Site, similar to subnetting with IPv4, though these bits are *not*
part of the Routing Goop. The distinction between Routing Stuff and
Routing Goop is that RG controls routing in the Public Internet,
while Routing Stuff includes the RG plus the Site Topology Partition
(STP). The STP is used for routing structure within a Site.
The GSE proposal formalizes the ideas of sites and of public versus The remaining 14 bits of the Routing Stuff (i.e., the low-order 14
private topology. In the first case, a Site is a set of hosts, bits of the high-order 8 bytes) comprise the STP and are used for
routers and media which have one or more connections to the Internet. routing structure within a site, similar to subnetting with IPv4.
These bits are not part of the Routing Goop per se. The distinction
between Routing Stuff and Routing Goop is that RG controls routing in
the Public Internet, while Routing Stuff includes the RG plus the
Site Topology Partition (STP). The STP is used for routing structure
within a site. [Note that the term "Routing Stuff" was a creation of
the author's of this analysis document and was not used in the GSE
document.]
A Site can have an arbitrarily complicated topology, but all of that The GSE proposal formalized the ideas of sites and of public versus
complexity is hidden from everyone outside of the Site. A Site only private topology. In the first case, a site is a set of hosts,
carries packets which originated from, or are destined to, that Site; routers and media under the same administrative control which have
in other words, a Site cannot be a transit network. A Site is private zero or more connections to the Internet. A site can have an
arbitrarily complicated topology, but all of that complexity is
hidden from everyone outside of the site. A site only carries
packets which originated from, or are destined to, that site; in
other words, a site cannot be a transit network. A site is private
topology, while the transit networks form the public topology. topology, while the transit networks form the public topology.
A datagram is routed through public topology using just the RG, but A datagram is routed through public topology using just the RG, but
within the destination Site routing is based on the Site Topology within the destination site, routing is based on the Site Topology
Partition (STP) field. Partition (STP).
3.4. End-System Designator 4.2.2. End-System Designator
The End-System Designator (ESD) is an unstructured 8-byte field that The End-System Designator (ESD) is an unstructured 8-byte field that
uniquely identifies that interface from all others. The most uniquely identifies an interface from all others. The most important
important feature of the ESD is that it alone identifies an end feature of the ESD is that it alone identifies an interface; the
point; the Routing Stuff portion of an address, although used to help Routing Stuff portion of an address, although used to help deliver a
deliver a packet to its destination, is not used to actually identify packet to its destination, is not used to actually identify an end
an end point. End-points of communication care about the ESD; as point. End-points of communication care about the ESD; as examples,
examples, TCP peers could be identified by the source and destination TCP peers could be identified by the source and destination ESDs
ESDs alone (together with port numbers), checksums would exclude the alone (together with port numbers), checksums would exclude the RG
RG (the sender doesn't know its RG, so can't include it in the (the sender doesn't know its RG, as described later) and on receipt
checksum), and on receipt of a datagram only the ESD would be used in of a datagram only the ESD would be used in testing whether a packet
testing whether a packet is intended for local delivery. is intended for local delivery.
The leading contender for the role of a 64-bit globally unique ESD is The leading contender for the role of a 64-bit globally unique ESD is
the recently defined "EUI-64" identifier [EUI64]. These identifiers the recently defined "EUI-64" identifier. [EUI64] These identifiers
consist of a 24-bit "company_id" concatenated with a 40-bit consist of a 24-bit "company_id" concatenated with a 40-bit
"extension." (Company_id is just a new name for the Organizationally "extension." (Company_id is just a new name for the
Unique Identifier (OUI) that forms the first half of an 802 MAC "Organizationally Unique Identifier" that forms the first half of an
address.) Manufacturers are expected to assign locally unique values 802 MAC address.) Manufacturers are expected to assign locally unique
to the extension field, guaranteeing global uniqueness for the values to the extension field, guaranteeing global uniqueness for the
complete 64-bit identifier. complete 64-bit identifier.
A range of the EUI-64 space is reserved to cover pre-existing 48-bit A range of the EUI-64 space is reserved to cover pre-existing 48-bit
MAC addresses, and a defined mapping insures that an ESD derived from MAC addresses, and a defined mapping insures that an ESD derived from
a MAC address will not duplicate the ESD of a device that has a a MAC address will not duplicate the ESD of a device that has a
built-in EUI-64. built-in EUI-64.
In some cases, interfaces may not have access to an appropriate MAC In some cases, interfaces may not have access to an appropriate MAC
address or EUI-64 identifier. A globally unique ESD must then be address or EUI-64 identifier. A globally unique ESD must then be
obtained through some alternate mechanism. Several possible obtained through some alternate mechanism. Several possible
mechanisms can be imagined (e.g., the IANA could hand out addresses mechanisms can be imagined (e.g., the IANA could hand out addresses
from the company id assigned it has been allocated), but we do not from the company_id it has been allocated), but we do not explore
explore them in detail here. them in detail here.
3.5. Address Rewriting by Border Routers 4.3. Address Rewriting by Border Routers
GSE Site border routers rewrite addresses of the packets they forward GSE site border routers rewrite addresses of the packets they forward
across the Site/Public Topology boundary. Within a Site, nodes need across the boundary between the site and public topology. Within a
not know the RG associated with their addresses. They simply use a site, nodes need not know the RG associated with their addresses.
designated "Site-Local RG" value for internal addresses. When a They simply use a designated "Site-Local RG" value for internal
packet is forwarded to the Public Topology, the border router addresses. When a packet is forwarded to the public topology, the
replaces the Site-Local RG portion of packet's source address with an border router replaces the Site-Local RG portion of the packet's
appropriate value. Likewise, when a packet from the Public Topology source address with an appropriate value. Likewise, when a packet
is forwarded into a Site, the border router replaces the RG part of from the public topology is forwarded into a site, the border router
the destination address with the designated Site-Local RG. replaces the RG part of the destination address with the designated
Site-Local RG.
To simplify discussion, the following discussion uses the singular To simplify discussion, the following text uses the singular term RG
term RG as if a site could have only one RG value (i.e., one as if a site could have only one RG value (i.e., one connection to
connection to the Public Internet). Of course, a site could have the Internet). In fact, a site could have multiple Internet
multiple Internet connections and consequently multiple RGs. connections and consequently multiple RGs.
Having border routers rewrite addresses obviates the need to renumber Having border routers rewrite addresses obviates the need to renumber
devices within sites because of changing providers --- GSE's approach devices within sites because of changing providers --- GSE's approach
isn't so much to ease renumbering as to make it transparent to end wasn't so much to ease renumbering as to make it transparent. To
sites. To achieve transparency, the RG by which a Site is known is achieve transparency, the RG by which a site is known is hidden
hidden (i.e., kept secret) from hosts or routers within that Site. (i.e., kept secret) from nodes within that site. Instead, the RG for
Instead, the RG for the Site would be known only by the exit router, the site would be known only by the exit router, either through
either through static configuration or through a dynamic protocol static configuration or through a dynamic protocol with an upstream
with an upstream provider. provider.
Because end-hosts don't know their RG, they don't know their entire Because end hosts don't know their RG, they don't know their entire
16-byte public address, so they can't specify the full address in the 16-byte address, so they can't specify the full address in the source
source fields of packets they originate. Consequently, when a fields of packets they originate. Consequently, when a datagram
datagram leaves a Site, the egress border router fills in the high- leaves a site, the egress border router fills in the high-order
order portion of the source address with the appropriate RG. portion of the source address with the appropriate RG.
The point of keeping the RG hidden from nodes within the core of a The point of keeping the RG hidden from nodes within the core of a
Site is to insure the changeability of this value without impacting site was to insure the changeability of the RG without impacting the
the Site itself. It is expected that the RG will need to change site itself. It was expected that the RG would need to change
relatively frequently (e.g., several times a year) in order to relatively frequently (e.g., several times a year) in order to
support scalable aggregation as the topology of the Public Internet support scalable aggregation as the topology of the Internet changes.
changes. A change to a Site's RG would only require a change at the A change to a site's RG would only require a change at the site's
Site's egress point (or points, in the case of a multi-homed Site); egress point, and it's well possible that this change could be
and it's well possible that this change would be accomplished through accomplished through a dynamic protocol with the upstream provider.
a dynamic protocol with the upstream provider.
Hiding a Site's RG from its internal nodes does not, however, mean Hiding a site's RG from its internal nodes does not, however, mean
that changes to RG have no impact on end sites. Since the full 16- that changes to RG have no impact on end sites. Since the full 16-
byte address of a node isn't a stable value (the RG portion can byte address of a node isn't a stable value (the RG portion can
change), a stored address may contain invalid RG and be unusable if change), a stored address may contain invalid RG and be unusable if
it isn't "refreshed" through some other means. For example, opening a it isn't "refreshed" through some other means. For example, opening a
TCP connection, writing the address of the peer to a file and then TCP connection, writing the address of the peer to a file and then
later trying to reestablish a connection to that peer is likely to later trying to reestablish a connection to that peer is likely to
fail. For intra-Site communication, however, it is expected that fail. For intra-site communication, however, it is expected that
only the Site-Local RG would be used (and stored) which would only the Site-Local RG would be used (and stored) which would
continue to work for intra-Site communication regardless of changes continue to work for intra-site communication regardless of changes
to the Site's external RG. This has the benefit of shielding a site's to the site's external RG. This has the benefit of shielding a site's
internal traffic from the affects of renumbering changes outside of intra-site traffic from any instabilities resulting from renumbering.
the site.
In addition to rewriting source addresses upon leaving a Site, In addition to rewriting source addresses upon leaving a site,
destination addresses are rewritten upon entering a Site. To destination addresses are rewritten upon entering a site. To
understand the motivation behind this, consider a Site with understand the motivation behind this, consider a site with
connection to three Internet providers. Because each of those connections to three Internet providers. Because each of those
connections has its own RG, each destination within the Site would be connections has its own RG, each destination within the site would be
known by three different 16-byte addresses. As a result, intra-Site known by three different 16-byte addresses. As a result, intra-site
routers would have to carry a routing table three times larger than routers would have to carry a routing table three times larger than
expected. Instead, GSE proposes replacing the RG in inbound packets expected. To work around this, GSE proposed replacing the RG in
with the special "Site-local RG" value to reduce intra-Site routing inbound packets with the special "Site-Local RG" value to reduce
tables to the minimum necessary. intra-site routing tables to the minimum necessary.
In summary, when a node initiates a flow to a node in another Site, In summary, when a node initiates a flow to a node at another site,
the initiating node knows the full 16-byte address for the the initiating node knows the full 16-byte address for the
destination through some mechanism like a DNS query. The initiating destination through some mechanism like a DNS query. The initiating
node places the full 16-byte address in the destination address field node does not, however, know its RG, so uses the Site-Local RG values
of the datagram, and that field stays intact through the first Site in the RG part of the source address. When the datagram reaches the
and through all of the Public Topology. When the datagram reaches exit border router, the router replaces the RG of the packet's source
the exit border router, the router replaces the RG of the packet's address. When the datagram arrives at the entry router at the
source address. When the datagram arrives at entry router at the destination site, the router replaces the RG portion of the
destination Site, the router replaces the RG portion of the
destination address with the distinguished "Site-Local RG" value. destination address with the distinguished "Site-Local RG" value.
When the destination host needs to send return traffic, that host When the destination host needs to send return traffic, that host
knows the full 16-byte address for the destination because it knows the full 16-byte address for the other host because it appeared
appeared in the source address field of the arriving packet. in the source address field of the arriving packet.
3.6. Renumbering and Rehoming Mid-Level ISPs 4.4. Renumbering and Rehoming Mid-Level ISPs
One of the most difficult-to-solve components of the renumbering One of the most difficult-to-solve components of the renumbering
problem is that of renumbering mid-level service providers. problem with CIDR is that of renumbering mid-level service providers.
Specifically, if SmallISP1 changes its transit provider from BigISP1 Specifically, if SmallISP1 changes its transit provider from BigISP1
to BigISP2 (in the CIDR model), then all of SmallISP1's customers to BigISP2, then in order for the overall size of the routing tables
would have to renumber into address space covered by an aggregate of to stay the same, all of SmallISP1's customers would have to renumber
BigISP2 (if the overall size of routing tables is to stay the same). into address space covered by an aggregate of BigISP2. GSE dealt
GSE deals with this problem by handling the RG in DNS with with this problem by handling the RG in DNS with indirection.
indirection. Specifically, a Site's DNS server specifies the RG Specifically, a site's DNS server specifies the RG portion of its
portion of its addresses by referencing the *name* of its immediate addresses by referencing the "name" of its immediate provider, which
provider, which is a resolvable DNS name (this obviously implies a is a resolvable DNS name (this implies a new Resource Record type).
new Resource Record type). That provider may define some of the low- That provider may define some of the low-order bits of the RG and
order bits of the RG and then reference its immediate provider. This then reference its immediate provider. This chain of reference allows
chain of reference allows mid-level service providers to change mid-level service providers to change transit providers, and the
transit providers, and the customers of that mid-level will simply customers of that mid-level will simply "inherit" the change in RG.
"inherit" the change in RG.
3.7. Support for Multi-Homed Sites 4.5. Support for Multi-Homed Sites
GSE defines a specific mechanism for providers to use to support GSE defined a specific mechanism for providers to use to support
multi-homed customers that gives those customers more reliability multi-homed customers that gives those customers more reliability
than singly-homed sites, but without a negative impact on the scaling than singly-homed sites, but without a negative impact on the scaling
of global routing. This mechanism is not specific to GSE and could be of global routing. This mechanism is not specific to GSE and could be
applied to any multi-homing scenario where a site is known by applied to any multi-homing scenario where a site is known by
multiple prefixes (including provider-based addressing). Assume the multiple prefixes (including provider-based addressing). Assume the
following topology: following topology:
Provider1 Provider2 Provider1 Provider2
+------+ +------+ +------+ +------+
| | | | | | | |
skipping to change at page 20, line 35 skipping to change at page 20, line 38
| | | |
+--x-----------x--+ +--x-----------x--+
| SBR1 SBR2 | | SBR1 SBR2 |
| | | |
+-----------------+ +-----------------+
Site Site
Figure 7 Figure 7
PBR1 is Provider1's border router while PBR2 is Provider2's border PBR1 is Provider1's border router while PBR2 is Provider2's border
router. SBR1 is the Site's border router that connects to Provider1 router. SBR1 is the site's border router that connects to Provider1
while SBR2 is the Site's border router that connects to Provider2. while SBR2 is the site's border router that connects to Provider2.
Imagine, for example, that the line between Provider1 and the Site Imagine, for example, that the line between Provider1 and the site
goes down. Any already existing flows that use a destination address goes down. Any already existing flows that use a destination address
including RG1 would stop working. In addition, any DNS queries that including RG1 would stop working. In addition, any DNS queries that
return addresses including RG1 would not be viable addresses. If PBR1 return addresses including RG1 would not be viable addresses. If PBR1
and PBR2 knew about each other, however, then in this case PBR1 could and PBR2 knew about each other, however, then in this case PBR1 could
tunnel packets destined for RG1-prefixed addresses to PBR2, thus tunnel packets destined for RG1-prefixed addresses to PBR2, thus
keeping the communication working. (Note that true tunneling, i.e., keeping the communication working. (Note that true tunneling, i.e.,
re-encapsulation, is necessary since routers between PBR1 and PBR2 re-encapsulation, is necessary since routers between PBR1 and PBR2
would forward RG1 addresses towards PBR1.) would forward RG1 addresses towards PBR1.)
3.8. Explicit Non-Goals for GSE 4.6. Explicit Non-Goals for GSE
It is worth noting explicitly that GSE does not attempt to address It is worth noting explicitly that GSE did not attempt to address the
the following issues: following issues:
1) Survival of TCP connections through renumbering events. If a 1) Survival of TCP connections through renumbering events. If a
Site is renumbered, TCP connections using a previous address site is renumbered, TCP connections using a previous address
will continue to work only as long as the previous address still will continue to work only as long as the previous address still
works (i.e., while it is still "valid" using RFC 1971 works (i.e., while it is still "valid" using RFC 1971
terminology). No attempt is made to have existing connections terminology). No attempt is made to have existing connections
switch to the new address. switch to the new address.
2) It is not known how mobility can be made to work under GSE. 2) It is not known how mobility can be made to work under GSE.
3) It is not known how multicast can be made to work under GSE. 3) It is not known how multicast can be made to work under GSE.
4) The performance impact of having routers rewrite portions of the 4) The performance impact of having routers rewrite portions of the
source and destination address in packet headers requires source and destination address in packet headers requires
further study. further study.
That GSE doesn't address the above does not mean they cannot be That GSE didn't address the above does not mean they cannot be
solved. Rather the issues haven't been studied in sufficient depth. solved. Rather the issues weren't studied in sufficient depth.
4. Analysis of GSE's Advantages and Disadvantages
This section contains the bulk of the GSE analysis and the analysis
of the general locator/identifier split.
4.1. End System Designator
4.1.1. Uniqueness Enforcement in the IPv4 Internet
As described earlier, in the IPv4 Public Internet, IP addresses
contain two pieces of information: a unique identifier and a locator.
Embedding location information within an address has the side-effect
of helping insure that all addresses are globally unique. If
interfaces on two different nodes are assigned the same unicast
address, the routing subsystem will (generally) deliver packets to
only one of those nodes. The other node will quickly realize that
something is wrong (since communication using the duplicate address
fails) and take corrective action (e.g., obtain a proper address).
This is important for two reasons. It helps detect misconfigurations
(use of the wrong address prevents communication from taking place),
and helps thwart intruders.
In IPv4, communication usually fails quickly when addresses are not
unique. There are two cases to consider, depending on whether the two
interfaces assigned duplicate addresses are attached to the same or
to different links.
When two interfaces on the same link use the same address, a node
(host or router) sending traffic to the duplicate address will in
practice send all packets to one of the nodes. On Ethernets, for
example, the sender will use ARP (or Neighbor Discovery in IPv6) to
determine the link layer address corresponding to the destination
address. When multiple ARP replies for the target IP address are
received, the most recently received response replaces whatever is
already in the cache. Consequently, the destinations a node using a
duplicate IP address can communicate with depends on what its
neighboring nodes have in their ARP caches. In most cases, such
communication failures become apparent relatively quickly, since it
is unlikely that communication can proceed correctly on both nodes.
It is also the case that a number of ARP implementations (e.g., BSD-
derived implementations) log warning messages when an ARP request is
received from a node using the same address as the machine receiving
the ARP request.
When two interfaces on different links use the same address, the
routing subsystem will generally deliver packets to only one of the
nodes because only one of the links has the right "prefix" or "subnet
part" corresponding to the IP address. Consequently, the node using
the address on the "wrong" link will generally never receive any
packets sent to it and will be unable to communicate with anyone. For
obvious reasons, this condition is usually detected quickly.
An important observation is that, with classical IP, when different
nodes mistakenly assign the same IP address to different interfaces,
problems become apparent relatively quickly because communication
with several (if not all) destinations fails. In contrast, failure
scenarios differ when globally unique ESDs are assumed, but two nodes
mistakenly select the same one.
Embedding location information within an address also provides some, 5. Analysis: The Pros and Cons of Overloading Addresses
though not much, protection from forged addresses. Although it is
trivial to forge a source address in today's Internet, the routing
subsystem will in most cases forward any return traffic sent to that
address to its proper destination --- not to an arbitrary node
masquerading as someone else. To masquerade as someone else requires
subverting the routing subsystem, placing the intruder somewhere on
the normal routing path between the masqueraded host and its peer,
etc.
4.1.2. Overloading Addresses: Network Layer Issues At this point we have given complete descriptions of two addressing
architectures: IPv4, which uses the overloading technique, and GSE,
which uses the separated technique. We now compare and contrast the
two techniques.
At the network layer, a node compares the destination address of The following discussion is organized around three fundamental
received packets against the addresses of its attached interfaces. points:
Only if the addresses of received packets match are packets handed up
to higher layer protocols. In IPv4, the entire address must match.
Otherwise, the packet is assumed to be intended for some other node
and forwarded on (if received by a router) or silently discarded (if
received by a host). This has subtle but significant implications:
1) If a receiving host has multiple interfaces, it has multiple IP 1) Identifiers indicate who the intended recipient of a packet is,
addresses. When a packet addressed to a multi-homed host is
received on an interface other than the one to which a packet is
addressed, the host may reject (i.e., silently discard) the
packet, if it implements the "Strong ES Model" defined in
[RFC1122].
2) In recent IPv4 stacks, an interface may have more than one 2) Identifiers must be mapped into a locator that the network layer
unicast IP address assigned to it. Indeed, one way to renumber uses to actually deliver a packet to its intended destination,
an end site is to phase out an address (i.e., "deprecate" it and
using RFC 1971 terminology) while simultaneously phasing in a
new one. Once the deprecated address becomes invalid, packets
sent to the invalid address will no longer be accepted by the
node, even though the packet may have intuitively reached its
intended recipient. Thus, even if a packet sent to an invalid
address is somehow delivered to the intended recipient (e.g.,
via tunneling), the receiver would reject the packet because the
address it was sent to no longer belongs to any of the node's
interfaces. Consequently, any communication using the invalid
address will fail (e.g., new and existing TCP connections).
Anyone wishing to communicate with the node must learn and
switch to the new address.
3) Because an address also indicates "where" the destination 3) There must be a suitable way to sufficiently authenticate the
resides within the Internet, a mobile node that moves from one user of an identifier, so that peers using identifiers have
part of the Internet to another must obtain a new address that sufficient confidence that packets sent to or received from a
reflects its new location. Moreover, the routing subsystem will particular identifier correspond to the intended recipient.
continue to forward packets sent to the mobile node's previous
address to the node's previous point of attachment where they
are likely be discarded. That is, even if a mobile node is
willing to continue accepting packets addressed to one its
previous addresses, it is unlikely that they will be received
(in the absence of something like Mobile IP [RFC2002]).
4) A multi-homed host has multiple interfaces, each with its own 5.1. Purpose of an Identifier
address(es). If one of its interfaces fails, packets could, in
theory, be delivered to one of the host's other interfaces. In
practice, however, the routing subsystem has no way of knowing
that the interface to which a packet is addressed has failed and
what alternate interface addresses the packet could be delivered
to. Consequently, packets sent to a failed interface of a
multi-homed host won't be delivered, even though the node is
reachable through alternate interfaces.
Note that the above problems fall into two general categories: An identifier gives an entity the ability to refer to a communication
end point and to refer to the same endpoint over an extended period
of time. In terms of semantics, two or more packets sent to the same
identifier should be delivered to the same end point. Likewise, one
expects multiple packets received from the same identifier to have
been originated by the same sending entity. That is, a source
identifier indicates who the packet is from and a destination
identifier indicates who the packet is intended for.
1) Today's routing subsystem is unable to automatically deliver a When applications communicate, "identifiers" consist of addresses and
packet to a host's "alternate" addresses (if the host is multi- port numbers. For the purposes of this discussion, the term
homed) or a new address (if the host moves), should there be a "identifier" means the identifier of an interface. It is assumed that
problem delivering a packet to the destination address listed in port numbers will be present when higher layer entities communicate;
the packet. It is possible to imagine, however, future routing the exact port numbers used are not relevant to this discussion.
advances addressing this problem (e.g., Mobile IP).
2) Even if a packet is delivered to its intended destination, the In small networks, flat routing can be used to deliver packets to
packet may still be rejected because the packet's destination their destination based only on the destination identifier carried in
address does not match any of the addresses assigned to a packet header (i.e., the identifier is the locator and is not
destination's interfaces. This problem does not appear to be required to have any structure). However, in such systems, a distinct
insurmountable and could be rectified (for example) by having a route entry is required for every destination, an approach that does
host remember its previous addresses. not scale. In larger networks, packet addresses include a locator
that helps the network layer deliver a packet to its destination.
Such a locator typically has structure (i.e., is an aggregate for
many destinations) that keeps routing tables small relative to the
total number of reachable destinations. In IPv4, the identifier and
locator are combined in a single address; it is not possible to
separate the locator portion of an address from the identifier
portion. In contrast, the ESD portion of a GSE address (which can
easily be extracted from the address) serves as an identifier, while
the Routing Stuff plays the role of a locator.
4.1.3. Overloading Addresses: Transport Layer Issues Having a clear separation between the locator and the identifer
portion of an address appears to give protocols some additional
flexibility. Once a packet has been delivered to its intended
destination interface (i.e., node), for example, the locator has
served its purpose and is no longer needed to further demultiplex a
packet to its higher-layer end point. This means that if a packet is
delivered to the correct destination node, the node will accept the
packet, regardless of how the packet got there. The exact locator
used does not matter, so long as it corresponds to one that delivers
a packet to its proper destination.
The problems discussed previously create particular complications at The most obvious example that could benefit from the separation of
the transport level. Transport protocols such as TCP and UDP use locators and indentifiers involves communication with a mobile host.
embedded IP addresses to identify the end-points of a transport Transport protocols such as TCP are unable to keep connections open
connection. Specifically, the communicating end-points of a transport if either of the endpoint identifiers for an open connection changes.
connection are uniquely identified by the sender's source IP address
and source port number together with the recipient's destination IP
address and port number. Once a connection has been established, the
IP addresses can not change. In particular, if a mobile host moves to
a new location and obtains a new address, packets intended for a TCP
connection created prior to the move cannot use the new address. TCP
will treat any packets sent to the new address as belonging to a
different TCP connection.
It is possible to imagine changes to TCP that might allow connections Fundamentally, the endpoint identifiers indicate the two endpoint
to change the addresses they are using mid-connection without entities that are communicating. If a node were to receive a packet
breaking the connection. However, some subtle issues arise: from a node with which it had been communicating previously, but the
identifier used by the sending node has changed, the recipient would
be unable to distinguish this case from that of a packet received
from a completely different node.
1) Packets intended for a pre-existing connection must be In the specific case of TCP and IPv4, connections are identified
demultiplexed to that connection as part of any negotiation to uniquely by the tuple: (srcIPaddr, dstIPaddr, srcport, dstport).
change the addresses that identify that transport end-point. Because IPv4 addresses contain a combined locator/identifier, it is
However, because the demultiplexing operation uses the transport not possible to have a node's location change without also having its
addresses of the pre-existing TCP connection (which is based on identifier change. Consequently, when a mobile node moves, its
the previous address), TCP packets sent to a new address won't existing connections no longer work, in the absence of special
be delivered to the desired transport end-point (which still protocols such as Mobile IP [RFC2002].
uses the previous address). Consequently, packets would need to
be sent to the previous address. However, by the time a mobile
node has moved and knows its new address, packets sent to the
previous address may no longer be delivered (i.e., they may not
be forwarded to the mobile host's new location).
2) When a mobile host moves, it could inform its TCP peers that it In contrast, connections in GSE are identified by the ESDs rather
has a new address. However, such a message could not be than full IPv6 addresses. That is, connections are identified
delivered to the remote TCP connection if it was sent using its uniquely by the tuple: (srcESD, dstESD, srcport, dstport).
new address for its source address. Just as above, such packets Consequently, when demultiplexing incoming packets to their proper
would not be demultiplexed to the correct TCP connection. On the end point, TCP would ignore the Routing Stuff portions of addresses.
other hand, it is infeasible to send packets using its previous Because the Routing Stuff portion of an address is ignored during
address from its new location. Because of the danger of spoofing demultiplexing operations, a mobile node is free to move -- and
attacks, routers are now encouraged to actively look for, and change its Routing Stuff -- without consequences for the
discard traffic from, a source address that does not match known demultiplexing operation.
addresses for that region of the Internet [CERT]. Consequently,
such packets cannot be expected to be delivered.
Although the previous discussion used mobile nodes as an example, the As a side note, it is a requirement in GSE that packets be
same problem arises in other contexts. For example, if a site is demultiplexed on ESDs alone independent of the Routing Stuff. If a
being renumbered in IPv6, it may have two addresses, a previous site is multi-homed, the packets it sends may exit the site at
(i.e., deprecated) one being phased out and a new (i.e., preferred) different egress border routers during the lifetime of a connection.
one being phased in. At the transport level, the problem of switching Because each border router will place its own RG into the source
addresses is similar in many respects to the mobility problem. addresses of outgoing packets, the receiving TCP must ignore (at
least) the RG portion of addresses when demultiplexing received
packets. The alternative would be to make TCP unable to cope with
common routing changes, i.e., if the path changed, packets delivered
correctly would be discarded by the receiving TCP rather than
processed.
4.1.4. Potential Benefits of Globally Unique ESDs Not surprisingly, having separate locator and identifiers in
addresses leads to some additional problems. First, an identifier by
itself provides only limited value. In order to actually deliver
packets to a destination identifier, a corresponding locator must be
known. The general problem of mapping identifiers into locators is
non-trivial to solve, and is the topic of the next Section. Second,
because the Routing Stuff is ignored when demultiplexing packets
upward in the protocol stack, it becomes much easier for an intruder
to masquerade as someone else.
Having a clear separation between the Routing Stuff and the ESD 5.2. Mapping an Identifier to a Locator
portion of an address gives protocols some additional flexibility. At
the network layer, for example, recipients can examine just the ESD
portion of the destination addresses when determining whether a
packet is intended for them. This means that if a packet is delivered
to the correct destination node, the node will accept the packet,
regardless of how the packet got there, i.e., without regard to the
Routing Stuff of the address, which interface it arrived on, etc.
Such packets would then be delivered and accepted by the target host.
The idea of using addresses that cleanly separate the Routing Stuff The idea of using addresses that cleanly separate location and
from an ESD is not new [references XXX]. However, there are several identification information is not new [references XXX]. However,
different flavors. In its pure form, a sender would only need to know there are several different flavors. In its pure form, a sender need
the ESD of an end-point in order to send packets to it. When only know the identifier of an end-point in order to send packets to
presented with a datagram to send, network software would be it. When presented with a datagram to send, network software would be
responsible for finding the Routing Stuff associated with the ESD so responsible for finding the locator associated with an identifier so
that the packet can be delivered. A key question is who is that the packet can be delivered. A key question is: "who is
responsible for finding the Routing Stuff associated with a given responsible for finding the Routing Stuff associated with a given
ESD? There are a number of possibilities: identifier"? There are a number of possibilities, each with a
different set of implications:
1) The network layer could be responsible for doing the mapping. 1) The network layer could be responsible for doing the mapping.
The advantage of such a system is that an ESD could be stored The advantage of such a system is that an ESD could be stored
essentially forever (e.g., in configuration files), but whenever essentially forever (e.g., in configuration files), but whenever
it is actually used, network layer software would automatically it is actually used, network layer software would automatically
perform the mapping to determine the appropriate Routing Stuff perform the mapping to determine the appropriate Routing Stuff
for the destination. Likewise, should an existing mapping become for the destination. Likewise, should an existing mapping become
invalid, network layer software could dynamically determine the invalid, network layer software could dynamically determine the
updated quantity. Unfortunately, building such a mapping updated value. Unfortunately, building such a mapping mechanism
mechanism that is scalable is a hard problem. that scales is a hard problem.
2) The transport layer could be responsible for doing the mapping. 2) The transport layer could be responsible for doing the mapping.
It could perform the mapping when a connection is first opened, It could perform the mapping when a connection is first opened,
periodically refreshing the binding for long-running periodically refreshing the binding for long-running
connections. Implementing such a scheme would change the connections. Implementing such a scheme would change the
existing transport layer protocols TCP and UDP significantly. existing transport layer protocols TCP and UDP significantly.
3) Higher-layer software (e.g., the application itself) could be 3) Higher-layer software (e.g., the application itself) could be
responsible for performing the mapping. This potentially responsible for performing the mapping. This potentially
increases the burden on application programmers significantly, increases the burden on application programmers significantly,
especially if long-running connections are required to survive especially if long-running connections are required to survive
renumbering and/or deal with mobile nodes. renumbering and/or deal with mobile nodes.
It should be noted that the GSE proposal does not embrace the general It should be noted that the GSE proposal does not embrace the general
model. Indeed, it proposes the last. The network layer (and indeed model, it uses the last. The network and transport layers are always
the transport layer) is always presented both the Routing Stuff (RG + presented with both the Routing Stuff (RG + STP) and the ESD together
STP) and the ESD together in one IPv6 address. It is not the network in one IPv6 address. It is neither of these layers' jobs to determine
(or transport) layer's job to determine the Routing Stuff given only the Routing Stuff given only the ESD or to validate that the Routing
the ESD or to validate that the Routing Stuff is correct. When an Stuff is correct. When an application has data to send, it queries
application has data to send, it queries the DNS to obtain the IPv6 the DNS to obtain the IPv6 AAAA record for a destination. The
AAAA record for a destination. The returned AAAA record contains both returned AAAA record contains both the Routing Stuff and the ESD of
the Routing Stuff and the ESD of the specified destination. While the specified destination. While such an approach eliminates the need
such an approach eliminates the need for the lower layers to be able for the lower layers to be able to map ESDs into corresponding
to map ESDs into corresponding Routing Stuff, it also means that when Routing Stuff, it also means that when presented with an address
presented with an address containing an incorrect (i.e., no longer containing an incorrect (i.e., no longer valid) Routing Stuff, the
valid) Routing Stuff, the network is unable to deliver the packet to network is unable to deliver the packet to its correct destination.
its correct destination. It is up to applications themselves to deal
with such failures. Note that addresses containing invalid Routing
Stuff will result any time cached addresses are used after the
Routing Stuff of the address becomes invalid. This may happen if
addresses are stored in configuration files, or with long-running
communication.
4.1.5. ESD: Network Layer Issues It is up to applications themselves to deal with such failures. Note
that addresses containing invalid Routing Stuff will result any time
cached addresses are used after the Routing Stuff of the address
becomes invalid. This may happen if addresses are stored in
configuration files, a mobile node moves to a new location, long-
running applications (clients and servers) cache the result of DNS
queries, a long-running connection attempts to continue operating
during a site renumbering event, etc.
Along with the flexibility offered by separating the ESD from the A network architecture must provide the ability to map an identifier
Routing Stuff come additional considerations that must be considered to a locator. In IPv4, this mapping is trivial (the identity
at the network layer: function), since the identifier and locator are combined in a single
quantity (i.e., the IPv4 address). GSE does not provide mapping
functionality directly. Indeed, GSE uses two different identifiers.
At the highest level, a node's DNS name serves as its identifer, with
normal DNS queries used to map the DNS "identifier" into a locator
(i.e., the first 8 bytes of the IPv6 address). At a lower layer, the
IPv6 address contains the ESD identifier together with its Routing
Stuff (i.e., locator). Note that the DNS name is expected to be the
stable identifier that can be mapped into an appropriate locator at
any time. In contrast, the ESD identifier, cannot be mapped into a
locator by itself.
1) Addresses must have a locator embedded within them. It is not The use of two identifiers contributes to making GSE appear simple.
feasible to route packets solely on an ESD; doing so would make However, there are two fundamental problems with this approach, if
it impossible to aggregate routing information in a scalable the intention is to make it transparently easy to change locators
way. The GSE proposal assumes that the locator part of an over time. First, the burden of performing the mapping from
address is filled with an appropriate value by higher layers identifier to locator is placed directly on the application,
(i.e., the transport or application layer). requiring active participation from the application. Second, The
lower layers (i.e., transport and network layers) cannot make use of
this mapping themselves due to layering violation concerns (i.e., TCP
and UDP can't depend on the DNS to perform a query).
2) If a receiver observes that recent packets are arriving with a The following subsections discuss a number of issues related to
different Routing Stuff in the source address than before, it keeping track of or determining the locator associated with an
may want to send return traffic using the new Routing Stuff. identifier.
However, such information should not be accepted without
appropriate authentication of the new Routing Stuff, otherwise
it would be trivial to hijack existing transport connections.
Always using the most recently received Routing Stuff of an
address to send return traffic without appropriate
authentication leads to a vulnerability that is equivalent in
potential danger to "reversing and using an unauthenticated
received source route."
Note also that in the GSE proposal, since a sender does not know 5.2.1. Scalable Mapping of Identifers to Locators
its own RG, it is not possible for the sender to compute an
Authentication Header via IPSec that covers the RG portion of an
address. Thus, a recipient of new RG would need to authenticate
the received information via some alternate (undefined)
mechanism.
Finally, receipt of packets from different Routing Stuff than It is not difficult to construct a mapping from an identifier (such
before does not necessarily indicate a permanent change. In the as an ESD) to a locator (as well as other information such as a name,
GSE proposal, for example, when a Site is multi-homed, some of cryptographic keys, etc.) provided one can structure the identifier
its packets may exit via one egress router while other packets appropriately to support such lookups. In particular, identifiers
exit via a different egress router. Even packets originated from must have sufficient structure to support the delegating mechanism of
the same source may exit through multiple egress routers. a distributed database such as DNS. On the other hand, no scalable
Consequently, a node may receive traffic from the same sender in mechanism is known for performing such a mapping on arbitrary
which the Routing Stuff part changes on every packet. identifiers taken from a flat space lacking structure.
3) In general, whenever an address is embedded within a packet Imposing a heirarchy on identifiers poses the following difficulties:
(including within data), one must consider whether all the bits
in the address should be used in computations, or whether just
the ESD portion should be used. Examples where such decisions
would need to be made include, but are not limited to, Neighbor
Discovery packets containing Neighbor Solicitations and
Responses [RFC 1970], IPSec packets being demultiplexed to their
appropriate Security Association, IP deciding whether to accept
an IP datagram (before reaching the transport level), the
reassembly of fragments, transport layer demultiplexing of
received packets to end-points, etc.
4.1.6. ESD: Transport Layer Issues - it increases the size of the identifier. The exact size
necessary to support sufficient heirarchy is unclear, though it
is likely to be roughly the same as that used for the routing
hierarchy. Analysis done during the original IPng debates
[RFC1752] suggests that close to 48-bits of hierarchy are needed
to identify all the possible sites 30-40 years from now.
Previous sections have made clear that the embedding of full IPv6 - the assignment of identifiers must be tied to the delegation
addresses (i.e., Routing Stuff) within transport connection end-point structure. That is, the site that "owns" an identifier is the
identifiers poses problems for mobility and site renumbering. This one responsible for maintaining the identifier-to-locator
section discusses an alternate approach, in which transport end-point mapping information about it.
identifiers use ESDs rather than full addresses (with embedded
Routing Stuff).
In the following discussion, it should be kept in mind that the IPng - a mechanism would be needed to make it possible for a node to
Recommendation [RFC 1752] states that a transition to IPv6 cannot determine what its identifier is. To be practical, such a
also require deployment of a "TCPng." In addition, although we focus mechanism would need to be automated and avoid the need for
on TCP, UDP-based protocols also depend on the Routing Stuff in manual configuration.
similar ways, e.g., starting with the UDP checksum of the peers'
addresses. Indeed, we believe that TCP is the "easy" case to deal 5.2.2. Insufficient Hierarchy Space in ESDs
with, for two reasons. First, TCP is a stateful protocol in which
both ends of the connection can negotiate with each other. Some UDP- In the case of GSE's 8-byte ESD, the size of the identifier is not
based protocols are stateless, and remember nothing from one packet large enough to contain sufficient heirarchy to both create DNS-like
to the next. Consequently, changing UDP-based protocols may require delegation points and support stateless address autoconfiguration.
the introduction of "session" features, perhaps as part of a common Stateless address autoconfiguration [RFC1971] already assumes that an
interface's 6-byte link-layer (i.e., MAC) address can be appended to
a link's routing prefix to produce a globally unique IPv6 address.
With GSE, only two bytes would be available for hierarchy and
delegation.
It is also the case that the sorts of built-in identifiers now found
in computing hardware, such as "EUI-48" and "EUI-64" addresses
[IEEE802, IEEE1212], do not have the structure required for this
delegation. Such identifiers have only two-levels of heirarchy; the
top-level typically identifies a manufacturer, with the remaining
part of the address being the equivalent of the serial number unique
to the manufacturer. In addition, the delegation of the two-level
heirarchy (i.e., equipment manufacturer) does not correspond to the
administrator under which the end-user operates. Hence, stateless
autoconfiguration [RFC1971] cannot create addresses with the
necessary hierarchical property in the ESD portion of an address.
Finally, imposing a required hierarchical structure on identifiers
such as an ESD would also introduce a new administrative burden and a
new or expanded registry system to manage ESD space (i.e., to insure
that ESDs are globally unique). While the procedures for assigning
ESDs, which need only organizational and not topological
significance, would be simpler than the procedures for managing IPv4
addresses (or DNS names), it is hard to imagine such a process being
universally well-received or without controversy; it seems a laudable
goal to avoid the problem altogether if possible. In addition, it
would likely increase the complexity for connecting new nodes to the
Internet, a goal inconsistent with Stateless Address
autoconfiguration [RFC1971].
5.2.3. Reverse Mapping of Complete GSE Addresses
The following two sections describe techniques for mapping a full
IPv6 address back into some quantity (e.g., a DNS name or locator).
We include these descriptions for completeness even though they do
not address the fundamental problem of how to perform the mapping on
an identifier alone. It should also be noted that because both
techniques operate on complete IPv6 addresses, they are both directly
applicable to provider-based addressing schemes and are not specific
to GSE.
5.2.4. DNS-Like Reverse Mapping of Full GSE Addresses
Although it seems infeasible to have a global scale, reverse mapping
of ESDs, within a site, one could imagine maintaining a database
keyed on unstructured 8-byte ESDs. However, it is a matter of debate
whether such a database can be kept up-to-date at reasonable cost,
without making unreasonable assumptions as to how large sites are
going to grow, and how frequently ESD registrations will be made or
updated. Note that the issue isn't just the physical database itself,
but the operational issues involved in keeping it up-to-date. For the
rest of this section, however, let us assume that such a database can
be built.
A mechanism supporting a lookup keyed on a flat-space ESD from an
arbitrary site requires having sufficient structure to identify the
site that needs to be queried. In practice, an ESD will almost always
be used in conjunction with Routing Stuff (i.e., a full 16-byte
address). Since the Routing Stuff is organized hierarchically, it
becomes feasible to maintain a DNS-like tree that maps full GSE
addresses into DNS names, in a fashion analogous to what is done with
IPv4 PTR records today.
It should be noted that a GSE address lookup will work only if the
Routing Stuff portion of the address is correctly entered in the DNS
tree. Because the Routing Stuff portion of an address is expected to
change over time, this assumption will not be valid indefinitely. As
a consequence, a packet trace recorded in the past might not contain
enough information to identify the off-Site sources of the packets in
the present. This problem can be addressed by requiring that the
database of RG delegations be maintained for some period of time
after the RG is no longer usable for routing packets.
Finally, it should be noted that the problem where an address's RG
"expires" with the implication that the mapping of "expired"
addresses into DNS names may no longer hold is not a problem specific
to the GSE proposal. With provider-based addressing, the same issue
arises when a site renumbers into a new provider prefix and releases
the allocation from a previous block. The authors are aware of one
such renumbering in IPv4 where a block of returned addresses was
reassigned and reused within 24 hours of the renumbering event.
5.2.5. The ICMP Who-Are-You Message
Although there is widespread agreement on the utility of being able
to determine the DNS name one is communicating with, there is also
widespread concern that repeating the experience of the "IN-
ADDR.ARPA" domain is undesirable. In practice, the IN-ADDR.ARPA
domain is not fully populated and poorly maintained. Consequently,
an old proposal to define an ICMP Who-Are-You message was resurrected
[RFC1788]. A client would send such a message to a peer, and that
peer would return an ICMP message containing its DNS name.
Asking a remote host to supply its own name in no way implies that
the returned information is accurate. However, having a remote peer
provide a piece of information that a client can use as input to a
separate authentication procedure provides a starting point for
performing strong authentication. The actual strength of the
authentication depends on the authentication procedure invoked,
rather than the untrustable piece of information provided by a remote
peer.
Reconsidering the "cheap" authentication procedure described earlier,
the ICMP Who-Are-You replaces the DNS PTR query used to obtain the
DNS name of a remote peer. The second DNS query, to map the DNS name
back into a set of addresses, would be performed as before. Because
the latter DNS query provides the strength of the authentication,
the use of an ICMP Who-Are-You message does not in any way weaken the
strength of the authentication method. Indeed, it can only make it
more useful in practice, because virtually all hosts can be expected
to implement the Who-Are-You message.
The Who-Are-You message is robust against renumbering, since it
follows the paths of valid routable prefixes. Essentially, it uses
the Internet routing system in place of the DNS delegation scheme. It
is attractive in the context of GSE-style renumbering, since no host
or DNS server needs to be updated after a renumbering event for Who-
Are-You-based lookups to work. It has advantages outside the context
of GSE as well, including a more decentralized, and hence more
scalable, administration and easier upkeep than a DNS reverse-lookup
zone. It also has drawbacks: it requires the target node to be up and
reachable at the time of the query and to know its fully qualified
domain name. It is also not possible to resolve addresses once those
addresses become unroutable. In contrast, the DNS PTR mirrors, but is
independent of, the routing hierarchy. The DNS can maintain mappings
long after the routing subsystem stops delivering packets to certain
addresses.
The requirement that the target node be up and reachable at the time
of the query makes it very uncertain that one would be able to take
addresses from a packet log and translate them to correct domain
names at a later date. One can argue that this is a design flaw in
the logging system, as it violates the architectural principle,
"Avoid any design that requires addresses to be ... stored on non-
volatile storage." [RFC1958] A better-designed system would look up
domain names promptly from logged addresses. Indeed, one of the
authors has been doing that for some years.
5.3. Authentication of Identifiers
The true value of a globally unique identifier lies not on its
uniqueness but on an ability to use the same identifier repeatedly
and have it refer to the same end point. That is, when an identifier
is used, there is an expectation that repeated and subsequent use of
the identifier results in continued communication with the same end
point. To be useful then, a valid identifier must either be easily
distinguishable from a fraudulant one, or the system must have a way
to prevent identifiers from being used in an unauthorized manner.
The remainder of this section discusses how identifer authentication
is done in both IPv4 and GSE, and shows how overloading an address
with both an identifier and a locator provides automatic identifier
authentication. In contrast, there is essentially no identifier
authentication in GSE. It should be noted that the actual strength
of authentication that would be considered sufficient is a topic in
its own right, and we do not spent much time on it. Instead, we focus
on the relative strengths in the two schemes.
5.3.1. Identifier Authentication in IPv4
As described earlier, an IPv4 address simultaneously plays two roles:
a unique identifier and a locator. Using an overloaded address as an
identifier has the side-effect of insuring that (for all practical
purposes) the identifier is globally unique. Furthermore, because
the same number is used both to identify an interface and to deliver
data to that interface, it is impossible for some interface A to use
the identification of another interface B in an attempt to receive
data destined to B without being detected, unless the routing system
is compromised.
When both interfaces A and B claim the same unicast address, the
routing subsystem generally delivers packets to only one of them. The
other node will quickly realize that something is wrong (since
communication using the duplicate address fails) and take corrective
actions, either correcting a misconfiguration or otherwise detecting
and thwarting the intruder. To understand how the routing subsystem
prevents the same address from being used in multiple locations,
there are two cases to consider, depending on whether the two
interfaces using duplicate addresses are attached to the same or to
different links.
When two interfaces on the same link use the same address, a node
(host or router) sending traffic to the duplicate address will in
practice send all packets to one of the nodes. On Ethernets, for
example, the sender will use ARP (or Neighbor Discovery in IPv6) to
determine the link-layer address corresponding to the destination
address. When multiple ARP replies for the target IP address are
received, the most recently received response replaces whatever is
already in the cache. Consequently, the destinations a node using a
duplicate IP address can communicate with depends on what its
neighboring nodes have in their ARP caches. In most cases, such
communication failures become apparent relatively quickly, since it
is unlikely that communication can proceed correctly on both nodes.
It is also the case that a number of ARP implementations (e.g., BSD-
derived implementations) log warning messages when an ARP request is
received from a node using the same address as the machine receiving
the ARP request.
When two interfaces on different links use the same address, the
routing subsystem generally delivers packets to only one of the nodes
because only one of the links has the right subnet corresponding to
the IP address. Consequently, the node using the address on the
"wrong" link will generally never receive any packets sent to it and
will be unable to communicate with anyone. For obvious reasons, this
condition is usually detected quickly.
It should be noted that although an address containing a combined
identifier and locator can be forged, the routing subsystem
significantly limits communication using the forged address. First,
return traffic will be sent to the correct destination and not the
originator of the forged address. Second, routers performing ingress
filtering can refuse to forward traffic claiming to originate from a
source whose claimed address does not match the expected addresses
(from a topology perspective) for sources located within a particular
region [RFC 2267]. To effectively masquerade as someone else
requires subverting the intermediate routing subsystem.
5.3.2. Identifier Authentication in GSE
In GSE, it is not possible for the routing subsystem to provide any
enforcement on the authenticity of identifiers with respect to their
corresponding Routing Stuff, since the Routing Stuff and ESD portions
of an address are by definition completely orthogonal quantities.
This fundamental problem is compounded by the fact that GSE provides
no way (at the transport or network layer) to map an ESD into its
corresponding Routing Stuff. Thus, when looking at the source address
of a received packet, there is no way to ascertain whether the
Routing Stuff portion of the address corresponds to legitimate
Routing Stuff with respect to the corresponding ESD. Consequently, it
becomes trivial in many cases for one node to masquerade as another.
5.3.3. Transport Layer: What Locator Should Be Used?
In the following, we focus on what Routing Stuff to use with TCP.
UDP-based protocols also depend on the Routing Stuff in similar way.
Indeed, we believe that TCP is the "easier" case to deal with, for
two reasons. First, TCP is a stateful protocol in which both ends of
the connection can negotiate with each other. Some UDP-based
protocols are stateless, and remember nothing from one packet to the
next. Consequently, changing UDP-based protocols may require the
introduction of "session" features, perhaps as part of a common
"library", for use by applications whose transport protocol is "library", for use by applications whose transport protocol is
relatively stateless. Second, changes to UDP-based protocols in relatively stateless. Second, changes to UDP-based protocols in
practice mean changing individual applications themselves, raising practice mean changing individual applications themselves, raising
deployability questions. deployability questions.
4.1.6.1. Demultiplexing Packets to Transport Endpoints There are three cases of interest from TCP's perspective:
Connections in GSE are identified by the ESDs rather than full IPv6
addresses (with embedded Routing Stuff). That is:
unique IPv4 TCP connection: srcaddr dstaddr srcport destport - the sending side of an active open
unique GSE TCP connection: srcESD dstESD srcport dstport
Consequently, with GSE, when demultiplexing incoming packets, TCP - the sending side of a passive open (i.e., how to respond to an
would ignore the Routing Stuff portions of addresses when delivering active open)
packets to their proper end-point.
Although there are potential benefits to this approach (discussed - changes to the Routing Stuff during an open connection.
below), demultiplexing on ESDs alone without the RS is, in fact,
required with GSE. If a site is multi-homed, the packets it sends may
exit different egress border routers during the lifetime of a
connection. Because each border router will place its own RG into the
source addresses of outgoing packets, the receiving TCP must ignore
(at least) the RG portion of addresses when demultiplexing received
packets. The alternative would be to make TCP less robust with
respect to changes in routing, i.e., if the path changed, packets
delivered correctly would be discarded by the receiving TCP rather
than processed.
4.1.6.2. Pseudo-Header Checksum Calculations 5.3.4. RG Selection On An Active Open
Having routers rewrite the RG portion of addresses means that TCP If the host is performing a TCP "active open", the application first
cannot include the RG in its checksum calculation; the sender does queries the DNS to obtain the destination address, which contains
not know its own RG. Consequently, upon receipt of a TCP segment, the appropriate RG. That is, the initiator of communication is assumed to
receiver has no way of determining whether the RG portion of an provide the correct Routing Stuff when initiating communication to a
address has been corrupted (or modified) in transit (the implications specific destination.
of this are discussed below).
4.1.6.3. RG Selection When Sending Packets 5.3.5. RG Selection On An Passive Open
When a host has a packet to send, there are three cases for deciding When a server passively accepts connections from arbitrary clients,
what RG to use in the destination. If the host is performing an it has no choice but to assume that the Routing Stuff in the source
"active open", it queries the DNS to obtain the destination address, address of a received packet that initiated the communication is
which contains appropriate RG. If the host is responding to an active correct, because it has no way to authenticate its validity. Note
open from a remote peer, the source address of packets from that peer that the Routing Stuff is "correct" only in the sense that it
contains usable RG. Note that assuming that the RG on an incoming TCP corresponds to the site originating the connection. Whether the
connection is "correct" needs qualification. It is "correct" in the Routing Stuff paired with the received ESD is actually located at
sense that it corresponds to the site originating the connection. that site where the legitimate owner of the ESD currently resides is
Whether the ESD paired with the RG is actually located at that site not known. Because the ESD alone cannot be mapped into a locator (or
cannot be assumed. The issue of spoofing is discussed in more detail some other quantity that can provide input to an authentication
later. The last (and most interesting) case is when RG changes mid- procedure), there is no way to determine whether the received Routing
connection. Although, the GSE proposal calls for always using the Stuff corresponds to that legitimately associated with the source
first RG learned (and then never switching), we explored the identifier of the received packet. The issue of spoofing is
possibility of doing so in order to better understand the issues. discussed in more detail later.
4.1.6.4. Mid-Connection RG Changes 5.3.6. Mid-Connection RG Changes
During a connection, the RG appearing on subsequent packets is While packets are flowing as part of an open connection, the RG
susceptible to change through renumbering events, and indeed more appearing on subsequent packets is susceptible to change through
frequently, to change through Site-internal routing changes that renumbering events, or as a result of site-internal routing changes
cause the egress point for off-Site traffic to change. It is even that cause the egress point for off-site traffic to change. It is
possible (in the worst case) that traffic-balancing schemes could even possible (in the worst case) that traffic-balancing schemes
result in the use of two egress routers, with roughly every other could result in the use of two egress routers, with roughly every
packet exiting through a different egress router. Consequently it may other packet exiting through a different egress router. In GSE, the
be desirable to switch to the just-received RG, as the old RG may no RG does not change once a connection has been opened.
longer be valid (e.g., a border router has failed), but care must be
taken not to thrash. Moreover, simply using the most-recently-
received RG makes it trivial for an intruder to hijack connections.
Because TCP under GSE demultiplexes packets using only ESDs, packets Because TCP under GSE demultiplexes packets using only ESDs, packets
will be delivered to the correct end-point regardless of what source will be delivered to the correct end-point regardless of what source
RG is used. However, return traffic will continue to be sent via the RG is used. However, in GSE return traffic continues to be sent via
"old" RG, even though it may have been deprecated or become less the "old" RG, even though it may have been deprecated or become less
optimal because the peer's border router has changed. It would seem optimal because the peer's border router has changed. It would seem
highly desirable for TCP connections to be able to survive such highly desirable for TCP connections to be able to survive such
events. However, the completion of renumbering events (so that an events. However, the completion of renumbering events (so that an
earlier RG is now invalid) and certain topology changes would require earlier RG is now invalid) and certain topology changes would require
TCP to switch sending to a new RG mid-connection. To explore the TCP to switch sending to a new RG mid-connection. To explore the
whole space, we considered ways of allowing this mid-connection RG whole space, we considered ways of allowing this mid-connection RG
change to happen. change to happen.
If TCP connection identifiers are based on ESDs rather than full If TCP connection identifiers are based on ESDs rather than full
addresses, traffic from the same ESD would be viewed as coming from addresses, traffic from the same ESD would be viewed as coming from
the same peer, regardless of its source RG. This makes it trivial for the same peer, regardless of its source RG. Because this
any Internet host to impersonate another, and have such traffic be vulnerability is already present in today's Internet (forging full
accepted by TCP. Because this vulnerability is already present in source addresses is trivial), the mere delivery of incoming datagrams
today's Internet (forging full source addresses is trivial), the mere with the same ESD but a different RG does not introduce new
delivery of incoming datagrams with the same ESD but a different RG vulnerability to TCP. In today's Internet, any node can already
does not introduce new vulnerability to TCP. In today's Internet, originate FINs/RSTs from an arbitrary source address and potentially
any node can already originate FINs/RSTs from an arbitrary source or definitely disrupt the connection. Therefore, changing RG for
address and potentially or definitely disrupt the connection. acceptance, or acceptance of traffic independent of its source RG,
Therefore, changing RG for acceptance, or acceptance of traffic does not appear to significantly worsen existing robustness. (See the
independent of its source RG, does not appear to significantly worsen comment on ingress filtering in Section 5.3.1, however.)
existing robustness.
We also considered allowing TCP to reply to each segment using the RG We also considered allowing TCP to reply to each segment using the RG
of the most recently-received segment. Although this allows TCP to of the most recently-received segment. Although this allows TCP to
survive some important events (e.g., renumbering), it also makes it survive some important events (e.g., renumbering), it also makes it
trivial to hijack connections, unacceptably weakening robustness trivial to hijack connections, unacceptably weakening robustness
compared with today's Internet. A sender simply needs to guess the compared with today's Internet. A sender simply needs to guess the
sequence numbers in use by a given TCP connection [Bellovin 89] and sequence numbers in use by a given TCP connection [Bellovin 89] and
send traffic with a bogus RG to hijack a connection to an intruder send traffic with a bogus RG to hijack a connection to an intruder at
at an arbitrary location. an arbitrary location.
Providing protection from hijacking implies that the RG used to send Providing protection from hijacking implies that the RG used to send
packets must be bound to a connection end-point (e.g., it is part of packets must be bound to a connection end-point (e.g., it is part of
the connection state). Although it may be reasonable to accept the connection state). Although it may be reasonable to accept
incoming traffic independent of the source RG, the choice of sending incoming traffic independent of the source RG, the choice of sending
RG requires more careful consideration. Indeed, any subsequent change RG requires more careful consideration. Indeed, any subsequent change
in what RG is used for sending traffic must be properly authenticated in what RG is used for sending traffic must be properly authenticated
using cryptographic means. In the GSE proposal, it is not clear how (e.g., using cryptographic means). In the GSE proposal, it is not
to authenticate such a change, since the remote peer doesn't even clear how to authenticate such a change, since the remote peer
know what RG it is using! Consequently, the only reasonable approach doesn't even know its own RG. Consequently, the only reasonable
in GSE is to send to the peer at the first RG used by the peer for approach in GSE is to send to the peer using the first RG used for
the entire life of a connection. That is, always continue to use the the entire life of a connection. That is, always use the first RG
first RG seen. seen.
In summary, changing the RG dynamically in a safe way for a
connection requires that an originator of traffic be able to
authenticate a proposed change in the RG before sending to a
particular ESD via that RG. Such a mechanism would need to be
invented, as the TCP/IP suite has no obvious candidate that operates
at or below the transport layer (using the DNS, an application
protocol that resides above IP, would be problematic due to layering
circularity considerations).
4.1.6.5. Passive Opens
One question that arises is what impact corrupted RG would have on
robustness. Because the RG is not covered by any checksums, it would
be difficult to detect such corruption. Moreover, once a specific RG
is in use, it does not change for the duration of a connection. The
interesting case occurs on the passive side of a TCP connection,
where a server accepts incoming connections from remote clients. If
the initial SYN from the client includes corrupted RG, the server TCP
will create a TCP connection (in the SYN-RECEIVED state) and cache
the corrupted RG with the connection. The second packet of the 3-way
handshake, the SYN-ACK packet, would be sent to the wrong RG and
consequently not reach the correct destination. Later, when the
client retransmits the unacknowledged SYN, the server will continue
to send the SYN-ACK using the bad RG. Eventually the client times
out, and the attempt to open a TCP connection fails. Figure 8 shows
the details.
TCP A TCP B
1. CLOSED LISTEN
2. SYN-SENT --> <SRC RG=BITERR><SEQ=100><CTL=SYN> --> SYN-RECEIVED
3. <-- <DST RG=BITERR><SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
4. SYN-SENT --> <SRC RG><SEQ=100><CTL=SYN> --> SYN-RECEIVED
5. <-- <DST RG=BITERR><SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
... TCP A times out 5.3.7. The Impact of Corrupt Routing Goop
Figure 8 Another interesting issue that arises is what impact corrupted RG
would have on robustness. Because the RG is not covered by the TCP
checksum (the sender doesn't know what source RG will be inserted),
it would be difficult to detect such corruption at the receiver.
Moreover, once a specific RG is in use, it does not change for the
duration of a connection. The interesting case occurs on the passive
side of a TCP connection, where a server accepts incoming connections
from remote clients. If the initial SYN from the client includes
corrupted RG, the server TCP will create a TCP connection (in the
SYN-RECEIVED state) and cache the corrupted RG with the connection.
The second packet of the 3-way handshake, the SYN-ACK packet, would
be sent to the wrong RG and consequently not reach the correct
destination. Later, when the client retransmits the unacknowledged
SYN, the server will continue to send the SYN-ACK using the bad RG.
Eventually the client times out, and the attempt to open a TCP
connection fails.
We next consider relaxing the restriction on switching RGs in an We next consider relaxing the restriction on switching RGs in an
attempt to avoid the previous failure scenario. The situation is attempt to avoid the previous failure scenario. The situation is
complicated by the fact that the RG on received packets may change complicated by the fact that the RG on received packets may change
for legitimate reasons (e.g., a multi-homed site load-shares traffic for legitimate reasons (e.g., a multi-homed site load-shares traffic
across multiple border routers). The key question is how can one across multiple border routers). The key question is how one can
determine which RG is valid and which is not. That is, for each of determine which RG is valid and which is not. That is, for each of
the RGs a sender attempts to use, how can it determine which RG the RGs a sender attempts to use, how can it determine which RG
worked and which did not? Solving this problem is more difficult than worked and which did not? Solving this problem is more difficult than
first appears, since one must cover the cases of delayed segments, first appears, since one must cover the cases of delayed segments,
lost segments, simultaneous opens, etc. If a SYN-ACK is retransmitted lost segments, simultaneous opens, etc. If a SYN-ACK is retransmitted
using different RGs, it is not possible to determine which of those using different RGs, it is not possible to determine which of those
RGs worked correctly. We conclude that the only way TCP could RGs worked correctly. We conclude that the only way TCP could
determine that a particular RG was used to deliver segments was if it determine that a particular RG is correct is by receiving an ACK for
received an ACK for a specific sequence number in which all a specific sequence number in which all transmissions of that
transmissions of that sequence number used the same RG (a non-trivial sequence number used the same RG (a non-trivial addition to TCP).
addition to TCP).
We analyze multiple cases of RG changing within the time of the
opening handshake. One example is diagrammed in Figure 9, and it and
two others are summarized in Table 1. We observe that RG flap and
large numbers of passive opens may coincide, for instance, when a
power failure at a server farm affects both internal routers and
servers.
time TCP A time TCP B
t0 --> <SRC RG=M><SEQ=100><SYN> t1
t3 <-- <DST RG=M><SEQ=300><ACK=101><SYN,ACK> t1
TCP B's SYN,ACK is delayed and crosses with retransmit of TCP A's
SYN on which RG has changed from M to N
t2 --> <SRC RG=N><SEQ=100><SYN> t3
t4 --> <SRC RG=N><SEQ=101><ACK=301> t3 ESTABLISHED
TCP B decides to use DST RG=M for TCP A, because it heard from
RG=M and was ACK'd on a send to RG=M
Figure 9
SYNFROM SYNACKTO ACKFROM SELECT
W W X W
------------------------------------
W
X W X W
------------------------------------
W W
X X Y ??
Table 1
At best, an RG selection algorithm for TCP would be relatively At best, an RG selection algorithm for TCP would be relatively
straightforward but would require new logic in implementations of straightforward but would require new logic in implementations of
TCP's opening handshake --- a significant transition issue. We are TCP's opening handshake --- a significant transition/deployment
not certain that a valid algorithm is attainable, however. RG changes issue. We are not certain that a valid algorithm is attainable,
would have to be handled in all cases handled by the opening however. RG changes would have to be handled in all cases handled by
handshake: delayed segments, lost segments, undetected bit errors in the opening handshake: delayed segments, lost segments, undetected
RG, simultaneous opens, old segments and so on. bit errors in RG, simultaneous opens, old segments, etc.
In the end, we conclude that although the corrupted SYN case of
Figure 8 was a potential problem, the changes that would need to be
made to TCP to robustly deal with such corruption would be
significant, if tractable at all. This would result in transition to
GSE needing a significant TCPng transition.
Our final conclusion is that transport protocol end-points must make
an early, single choice of the RG to use when sending to a peer and
stick with that choice for the duration of the connection.
Specifically:
1) The demultiplexing of arriving packets to their transport end
points should use only the ESD, and not the Routing Stuff.
2) If the application chooses an RG for the remote peer (i.e., an
active open), use the provided RG for all traffic sent to that
peer, even if alternative RGs are received on subsequent
incoming datagrams from the same ESD.
3) For all other cases, use the first RG received with a given ESD
for all sending. We recommend that a means be found for RGs to
be checksummed if the GSE address structure is used.
Consequently, there does not appear to be a straightforward way to
use ESDs in conjunction with mobility or site renumbering (in which
existing connections survive the renumbering).
4.1.6.6. Summary: ESD and RG Not Strictly Independent
We cannot emphasize enough that the use of an ESD independent of an
associated RG can be very dangerous. That is, communicating with a
peer implies that one is always talking to the same peer for the
duration of the communication. But as has been described in previous
sections, such assurance can only take place if there are assurances
that only properly authenticated RG is used.
We conclude that the rules for transport processing when ESDs are
present differ from classical IP. Specifically:
1) The demultiplexing of packets to transport connection end-points
should use ESDs, but should not use the Routing Stuff part of
addresses. This insures that packets are delivered to their
intended destination independent of RG.
2) Once a packet has been delivered to its transport end-point, a In the end, we conclude that although the corrupted SYN case
separate (i.e., distinct) decision should be made concerning introduces potential problems, the changes that would need to be made
whether and how to act upon the received packet. Such a decision to TCP to robustly deal with such corruption would be significant, if
would be transport-protocol specific. A protocol could chose to tractable at all. This would result in a transition to GSE also
completely ignore the packet, it could selectively use parts of having a significant TCPng component, a significant drawback.
the packet (e.g., to attempt out-of-band authentication of the
RG), or it could process the packet in its entirety. It must
not, however, use the received RG to send subsequent return
traffic without first authenticating the RG.
4.1.7. On The Uniqueness Of ESDs 5.3.8. On The Uniqueness Of ESDs
The uniqueness requirements for ESDs depends on what purpose they The uniqueness requirements for ESDs depends on what purpose they
serve. In GSE, ESDs identify end systems, requiring that they be serve and how they are used. In GSE, ESDs identify interfaces,
globally unique. It does not make sense for two different end systems requiring that they be globally unique. It does not make sense for
to use the same ESD; every end system must have its own ESD to two different interfaces to use the same ESD; every interface must
distinguish from other end systems. have its own ESD to distinguish it from others.
If ESDs are only used to identify session endpoints, the situation If ESDs are only used to identify session endpoints, the situation
becomes more complex. At first glance it might appear that two nodes becomes more complex. At first glance it might appear that two nodes
using the same ESD cannot communicate. However, this is not using the same ESD cannot communicate. However, this is not
necessarily the case. In the GSE proposal, for example, a node necessarily the case. In the GSE proposal, for example, a node
queries the DNS to obtain an IPv6 address. The returned address queries the DNS to obtain an IPv6 address. The returned address
includes the Routing Stuff of an address (the RG+STP portions). Since includes the Routing Stuff of an address (the RG+STP portions). Since
the sending host transmits packets based on the entire destination the sending host transmits packets based on the entire destination
IPv6 address, the sender may well forward the packet to a router that IPv6 address, the sender may well forward the packet to a router that
delivers the packet to its correct destination (using the information delivers the packet to its correct destination (using the information
in the Routing Stuff). It is only on receipt of a packet that a node in the Routing Stuff). It is only on receipt of a packet that a node
would extract the ESD portion of a datagram's destination address and would extract the ESD portion of a datagram's destination address and
ask "is this for me?" ask "is this for me?" That is, a sender may not notice the
destination ESD is the same as the sending ESD because of the Routing
Stuff part of the address.
A more problematic case occurs if two nodes using the same ESD A more problematic case occurs if two nodes using the same ESD
communicate with a third party. To the third party, packets received communicate with a third party. To the third party, packets received
from either machine might appear to be coming from the same machine from either machine might appear to be coming from the same machine
since they are both using the same ESD. Consequently, at the since they are both using the same ESD. Consequently, at the
transport level, if both machines choose the same source and transport level, if both machines choose the same source and
destination port numbers (one of the ports --- a server's well-known destination port numbers (one of the ports --- a server's well-known
port number will likely be the same), packets belonging to two port number --- will likely be the same), packets belonging to two
distinct transport connections will be demultiplexed to a single distinct transport connections will be demultiplexed to a single
transport end-point. transport end-point.
When packets from different sources using the same source ESD are When packets from different sources using the same source ESD are
delivered to the same transport end-point, a number of possibilities delivered to the same transport end-point, a number of possibilities
come to mind: come to mind:
1) The transport end-point could accept the packet, without regard 1) The transport end-point could accept the packet, without regard
to the Routing Stuff of the source address. This may lead to a to the Routing Stuff of the source address. This may lead to a
number of robustness problems, if data from two different number of robustness problems, if data from two different
skipping to change at page 35, line 44 skipping to change at page 36, line 18
3) When a packet is received with an unexpected Routing Stuff the 3) When a packet is received with an unexpected Routing Stuff the
receiver could invoke special-purpose code to deal with this receiver could invoke special-purpose code to deal with this
case. Possible actions include attempting to verify whether the case. Possible actions include attempting to verify whether the
Routing Stuff is indeed correct (the saved values may have Routing Stuff is indeed correct (the saved values may have
expired) or attempting to verify whether duplicate ESDs are in expired) or attempting to verify whether duplicate ESDs are in
use (e.g., by inventing a protocol that sends packets using both use (e.g., by inventing a protocol that sends packets using both
Routing Stuff and verifies that they are delivered to the same Routing Stuff and verifies that they are delivered to the same
end-point). end-point).
4.1.8. DNS PTR Queries 5.3.9. New Denial of Service Attacks.
IPv4 uses the domain "IN-ADDR.ARPA" to hold PTR Resource Records. PTR
RRs allow a client to map IP addresses back into the domain name
corresponding to that address. IPv4 addresses can be put into the DNS
because they have hierarchical structure -- the same hierarchy used
to aggregate routes.
The ability to map an IP address into its corresponding DNS name is
used in several contexts:
1) Network packet tracing utilities (e.g., tcpdump) display the
contents of packets. Printing out the DNS names appearing in
those packets (rather than dotted IP addresses) requires access
to an address-to-name mapping mechanism.
2) Some applications perform "cheap" authentication by using the
DNS to map a source address of a peer into a DNS name. Then, the
client queries the DNS a second time, this time asking for the
address(es) corresponding to the peer's DNS name. Only if one of
the addresses returned by the DNS matches the peer address of
the TCP connection is the source of the TCP connection accepted
as being from the indicated DNS name.
It is important to note that although two DNS queries are made
during the above operation, it is the second one --- mapping the
peer's DNS name back into an IP address --- that provides the
authentication property. The first transaction simply obtains
the peer's DNS name, but no assumption is made that the returned
DNS name is correct. Thus, the first DNS query could be
replaced by an alternate mechanism without weakening the already
weak authentication check described above. One possible
alternate mechanism, an ICMP "Who Are You" message, is described
in Section 4.1.11.
3) Applications that log all incoming network connections (e.g.,
anonymous FTP servers) may prefer logging recognizable DNS names
to addresses.
4) Network administrators examining logs or other trace data
containing addresses may wish to determine the DNS name of some
addresses. Note that this may occur sometime after those
addresses were actually used.
Although DNS PTR records have proven useful in several contexts,
there is also widespread agreement that, in practice, many IP
addresses in use today are not properly registered in the IN-
ADDR.ARPA namespace. Consequently, PTR queries frequently fail to
return usable information. Thus, the overall utility of PTR records
is questionable.
It is also worth noting that the primary reason that so few addresses
are properly registered in the PTR space is the absence of incentive
for doing so. With no key piece of the Internet infrastructure
depending on such mappings being in place or correct, there is little
practical harm in failing to keep it up-to-date.
Finally, it might appear at first glance that secure DNS [RFC2065]
provides a means for cryptographically signing a PTR record and
thereby providing authentication. Things are not so simple, however.
The signature on a PTR record indicates that the entity owning an
address has given it a DNS name. It does not mean that the owner of
the address is authorized to use that specific name. For example,
anyone owning an address can set up a PTR record indicating that the
address corresponds to the name "www.ietf.org". However, the name
"www.ietf.org" belongs to only one entity, regardless of how many PTR
records indicate otherwise.
4.1.9. Reverse Mapping of ESDs
It is reasonable to ask if it is necessary or desirable to be able to
map an ESD (alone) into some other meaningful quantity, such as a
fully qualified domain name. The benefits of being able to perform
such a mapping are analogous to those described in the preceding
section.
The primary difficulty with constructing such a mapping is that it
requires that ESDs have sufficient structure to support the
delegating mechanism of a distributed database such as DNS. The sorts
of built-in identifiers now found in computing hardware, such as
"EUI-48" and "EUI-64" addresses [IEEE802, IEEE1212], do not have the
structure required for this delegation. Hence, stateless
autoconfiguration [RFC1971] cannot create addresses with the
necessary hierarchical property.
Another possibility would be to define ESDs with sufficient structure It is clear that there are potential problems if identifiers are not
to permit the construction of a mapping mechanism. However, analysis globally unique. How common such problems would actually occur in
performed during the IPng deliberations concluded that close to 48- practice depends on how many duplicates there are actually are. Thus,
bits of hierarchy were needed to identify all the possible sites one might be tempted to make the argument that a scheme for assigning
30-40 years from now. That would leave only 2 bytes for host identifiers could be made to be "unique enough" in practice. This
numbering at a site, a number clearly incompatible with stateless would be a dangerous and naive assumption, because intruders will
autoconfiguration [RFC1971]. actively impersonate other sites for the sole purpose of invalidating
the uniqueness assumption. For example, one could deny service to
host foo.bar.com by querying the DNS for its corresponding ESD, and
then impersonaiting that ESD.
There are several arguments against having a global ESD-lookup As a specific example, one GSE-specific denial-of-service attack
capability. Adding sufficient structure to an 8-byte ESD would be would be for an intruder to masquerade as another host and "wedge"
incompatible with stateless autoconfiguration, which already uses 6 connections in a SYN-RECEIVED state by sending SYN segments
bytes for its token; two additional bytes for hierarchy are clearly containing an invalid RG in the source IP address for a specific ESD.
insufficient. In addition, experience with the IN-ADDR.ARPA domain Subsequent connection attempts to the wedged host from the legitimate
suggests that the required databases will be poorly maintained. owner of the ESD (if they used the same TCP port numbers) would then
Finally, imposing a required hierarchical structure on ESDs would not complete, since return traffic would be sent to the wrong place.
also introduce a new administrative burden and a new or expanded
registry system to manage ESD space. While the procedures for
assigning ESDs, which need only organizational and not topological
significance, would be simpler than the procedures for managing IPv4
addresses (or DNS names), it is hard to imagine such a process being
universally well-received or without controversy; it seems a laudable
goal to avoid the problem altogether if possible.
4.1.10. Reverse Mapping of Complete GSE Addresses 5.3.10. Summary of Identifier Authentication Issues
Although it seems infeasible to have a global scale, reverse mapping In summary, changing the RG dynamically in a safe way for a
of ESDs, within a Site, one could imagine maintaining a database connection requires that an originator of traffic be able to
keyed on unstructured 8-byte ESDs. However, it is a matter of debate authenticate a proposed change in the RG before sending to a
whether such a database can be kept up-to-date at reasonable cost, particular ESD via that RG. This is difficult for several reasons:
without making unreasonable assumptions as to how large sites are
going to grow, and how frequently ESD registrations will be made or
updated. Note that the issue isn't just the physical database itself,
but the operational issues involved in keeping it up-to-date. For the
rest of this section, however, let us assume such a database can be
built.
A mechanism supporting a lookup keyed on a flat-space ESD from an 1) It can't be done on an end-to-end basis in GSE (e.g., via IPSec)
arbitrary Site requires having sufficient structure to identify the because the sender doesn't know what the RG portion of the
Site that needs to be queried. In practice, an ESD will almost always address will be when it reaches the sender.
be used in conjunction with Routing Stuff (i.e., a full 16-byte
address). Since the Routing Stuff is organized hierarchically, it
becomes feasible to maintain a DNS tree that maps full GSE addresses
into DNS names, in a fashion analogous to what is done with IPv4 PTR
records today.
It should be noted that a GSE address lookup will work only if the 2) It can't easily be done in GSE because there is no mechanism at
Routing Stuff portion of the address is correctly entered in the DNS or below the transport layer to map ESDs into a quantity that
tree. Because the RG portion of an address is expected to change over can be used as a key to jump start the authentication process
time, this assumption will not be valid indefinitely. As a (using the DNS would be problematic due to layering circularity
consequence, a packet trace recorded in the past might not contain considerations).
enough information to identify the off-Site sources of the packets in
the present. This problem can be addressed by requiring that the
database of RG delegations be maintained for some period of time
after the RG is no longer usable for routing packets.
Finally, it should be noted that the problem where an address's RG 3) Any scheme that uses the full IPv6 address to do the
"expires" with the implication that the mapping of "expired" authentication can be used with standard provider-based
addresses into DNS names may no longer hold is not a problem specific addressing, raising the question of what benefit is retained
to the GSE proposal. With provider-based addressing, the same issue from having separate identifiers and locators.
arises when a site renumbers into a new provider prefix and releases
the allocation from a previous block. The authors are aware of one
such renumbering in IPv4 where a block of returned addresses was
reassigned and reused within 24 hours of the renumbering.
4.1.11. The ICMP "Who Are You" Message Our final conclusion is that with the GSE approach, transport
protocol end-points must make an early, single choice of the RG to
use when sending to a peer and stick with that choice for the
duration of the connection. Specifically:
Although there is widespread agreement on the utility of being able 1) The demultiplexing of arriving packets to their transport end
to determine the DNS name one is communicating with, there is also points should use only the ESD, and not the Routing Stuff.
widespread concern that repeating the experience of the "IN-
ADDR.ARPA" domain is undesirable. Consequently, an old proposal to
define an ICMP "Who Are You?" message was resurrected [RFC1788]. A
client would send such a message to a peer, and that peer would
return an ICMP message containing its DNS name.
Asking a remote host to supply its own name in no way implies that 2) If the application chooses an RG for the remote peer (i.e., an
the returned information is accurate. However, having a remote peer active open), use the provided RG for all traffic sent to that
provide a piece of information that a client can use as input to a peer, even if alternative RGs are received on subsequent
separate authentication procedure provides a starting point for incoming datagrams from the same ESD.
performing strong authentication. The actual strength of the
authentication depends on the authentication procedure invoked,
rather than the untrustable piece of information provided by a remote
peer.
Reconsidering the "cheap" authentication procedure described in 3) For all other cases, use the first RG received with a given ESD
Section 4.1.9, the ICMP "Who Are You" replaces the DNS PTR query used for all sending. Simultaneously, we understand that with this
to obtain the DNS name of a remote peer. The second DNS query, to map rule, there are still open issues with regard to invalid RGs,
the DNS name back into a set of addresses, would be performed as either through corruption or through a active hostile attacks.
before. Because the latter DNS query provides the strength of the
authentication, the use of an ICMP "Who Are You" message does not in
any way weaken the strength of the authentication method. Indeed, it
can only make it more useful in practice, because virtually all hosts
can be expected to implement the "Who Are You" message.
The "Who Are You" message could contain an identifier for matching With the above recommendation, there does not appear to be a
replies to requests, and perhaps a nonce value to provide resistance straightforward way to use ESDs in conjunction with mobility or site
to spoofing. In order to minimize the number of WRU packets on the renumbering (in which existing connections survive the renumbering).
Internet, the WRU messages should be sent by DNS servers who would This presents a quandry. The main benefit of separating identifiers
then cache the answers. This has the pleasant side-effect of reducing and locators is the ability to have communication (e.g., a TCP
the impact on existing applications (i.e., they would continue to connection) continue transparently, even when the Routing Stuff
look up addresses using the same API as before). In many cases there associated with a particular ESD changes. However, switching to a new
is a natural TTL that the target node can provide in its reply: Routing Stuff without properly authenticating it makes it trivial to
either the remaining lifetime of a DHCP lease or the remaining valid hijack connections.
time of a prefix from which the address was derived through stateless
autoconfiguration.
The "Who Are You?" (WRU) message described in Section 4.1.10 is We cannot emphasize enough that the use of an ESD independent of an
robust against renumbering, since it follows the paths of valid associated RG can be very dangerous. That is, communicating with a
routable prefixes. Essentially, it uses the Internet routing system peer implies that one is always talking to the same peer for the
in place of the DNS delegation scheme. It is attractive in the duration of the communication. But as has been described in previous
context of GSE-style renumbering, since no host or DNS server needs sections, such assurance can only come from properly authenticated
to be updated after a renumbering event for WRU-based lookups to RG.
work. It has advantages outside the context of GSE as well, including
a more decentralized, and hence more scalable, administration and
easier upkeep than a DNS reverse-lookup zone. It also has drawbacks:
it requires the target node to be up and reachable at the time of the
query and to know its fully qualified domain name. It is also not
possible to resolve addresses once those addresses become unroutable.
In contrast, the DNS PTR mirrors, but is independent of, the routing
hierarchy. The DNS can maintain mappings long after the routing
subsystem stops delivering packets to certain addresses.
The requirement that the target node be up and reachable at the time 5.4. Miscellaneous
of the query makes it very uncertain that one would be able to take
addresses from a packet log and translate them to correct domain
names at a later date. This is a design flaw in the logging system,
as it violates the architectural principle, "Avoid any design that
requires addresses to be ... stored on non-volatile storage."
[RFC1958] A better-designed system would look up domain names
promptly from logged addresses. Indeed, one of the authors is pleased
to be able to state that his site has been doing that for some years.
(Speculative note: Proxy servers to answer WRU queries are possible. 5.4.1. Renumbering and Domain Name System (DNS) Issues
If the boundary between the global and site portions of addresses are
fixed and/or the boundary between the routing and the end-node
portions are fixed, then one could define a well-known anycast
address for proxy WRU service per site and/or per subnet. The low-
order portion of this address would presumably be created from the
IANA's IEEE OUI. The WRU client-side interface would have to be
defined to try this address after or before sending a query to the
target address itself. Nodes answering to this anycast address could
reply to WRU queries using a database maintained by private means.
By carrying a /128 route site-wide or in the site's provider, these
servers need not even be located within the subnet or site they
serve. Co-location of the proxy WRU servers with some DNS servers is
a natural choice in some scenarios.)
4.2. Renumbering and Domain Name System (DNS) Issues Because any mapping scheme is complicated by renumbering, and because
recent IPv4 experience has shown a requirement for renumbering at
some frequency, it is worthwhile to explore the general renumbering
issue.
4.2.1. How Frequently Can We Renumber? 5.4.2. How Frequently Can We Renumber?
One premise of the GSE proposal [GSE] is that an ISP can renumber the One premise of the GSE proposal [GSE] is that an ISP can renumber the
Routing Goop portion of a Site's addresses transparently to the Site Routing Goop portion of a site's addresses transparently to the site
(i.e., without coordinating the change with the Site). This would (i.e., without coordinating the change with the site). This would
make it possible for backbone providers to aggressively renumber the make it possible for backbone providers to aggressively renumber the
Routing Goop part of addresses and achieve a high degree of route Routing Goop part of addresses and achieve a high degree of route
aggregation. On closer examination, frequent (e.g., daily) aggregation. On closer examination, frequent (e.g., daily)
renumbering turns out to be difficult in practice because of a renumbering turns out to be difficult in practice because of a
circular dependency between the DNS and routing. Specifically, if a circular dependency between the DNS and routing. Specifically, if a
Site's Routing Stuff changes, nodes communicating with the Site need site's Routing Stuff changes, nodes communicating with the site need
to obtain the new Routing Stuff. In the GSE proposal, one queries the to obtain the new Routing Stuff. In the GSE proposal, one queries the
DNS to obtain this information. However, in order to reach a Site's DNS to obtain this information. However, in order to reach a site's
DNS servers, the pointers controlling the downward delegation of DNS servers, the pointers controlling the downward delegation of
authoritative DNS servers (i.e., DNS "glue records") must use authoritative DNS servers (i.e., DNS "glue records") must use
addresses (with Routing Stuff) that are reachable. That is, in order addresses with Routing Stuff that are reachable. That is, in order to
to find the address for the web server "www.foo.bar.com", DNS queries find the address for the web server "www.foo.bar.com", DNS queries
might need to be sent to a root DNS servers, as well as DNS servers might need to be sent to a root DNS server, as well as DNS servers
for "bar.com" and "foo.bar.com". Each of these servers must be for "bar.com" and "foo.bar.com". Each of these servers must be
reachable from the querying client. Consequently, there must be an reachable from the querying client. Consequently, there must be an
overlap period during which both the old Routing Stuff and the new overlap period during which both the old Routing Stuff and the new
Routing Stuff can be used simultaneously. During the overlap period, Routing Stuff can be used simultaneously. During the overlap period,
DNS glue records would need to be updated to use the new addresses DNS glue records would need to be updated to use the new addresses
(including Routing Stuff). Only after all relevant DNS servers have (including Routing Stuff). Only after all relevant DNS servers have
been updated and older cached RRs containing the old addresses have been updated and older cached RRs containing the old addresses have
timed out can the old address be deleted. timed out can the old address be deleted.
An important observation is that the above issue is not specific to An important observation is that the above issue is not specific to
GSE: the same requirement exists with today's provider-based GSE: the same requirement exists with today's provider-based
addressing architecture. When a site is renumbered (e.g., it switches addressing architecture. When a site is renumbered (e.g., it switches
ISPs and obtains a new set of addresses from its new provider), the ISPs and obtains a new set of addresses from its new provider), the
DNS must be updated in a similar fashion. DNS must be updated in a similar fashion.
4.2.2. Efficient DNS support for Site Renumbering 5.4.3. Efficient DNS support for Site Renumbering
When a site renumbers to satisfy its ISP, only the site's routing When a site renumbers to satisfy its ISP, only the site's routing
prefix needs to change. That is, the prefix reflects where within the prefix needs to change. That is, the prefix reflects where within the
Internet the site resides. Although some sites may also change the Internet the site resides.
numbering of their internal topology when switching providers, this
is not a requirement. Rather, it may be a convenient time to also
perform any desired internal renumbering since in practice that any
address renumbering tends to cause disruptions.
In the current Internet, when a site is renumbered, the addresses of In the current Internet, when a site is renumbered, the addresses of
all the site's internal nodes change. This requires a potentially all the site's internal nodes change. This requires a potentially
large update to the RR database for that site. Although Dynamic DNS large update to the RR database for that site. Although Dynamic DNS
[DDNS] could potentially be used, the cost is likely to be large due [DDNS] could potentially be used, the cost is likely to be large due
to the large number of individual records that would need to be to the large number of individual records that would need to be
updated. In addition, when DHCP and DDNS are used together [DHCP- updated. In addition, when DHCP and DDNS are used together [DHCP-
DDNS], it may be the case that individual hosts "own" their own A or DDNS], it may be the case that individual hosts "own" their own A or
AAAA records, further complicating the question of who is able to AAAA records, further complicating the question of who is able to
update the contents of DNS RRs. update the contents of DNS RRs.
One change that could reduce the cost of updating the DNS when a site One change that could reduce the cost of updating the DNS when a site
is renumbered is to split addresses into two distinct portions: a is renumbered is to split addresses into two distinct portions: a
Routing Goop that reflects where a node attaches to the Internet and Routing Goop that reflects where a node attaches to the Internet and
a "site internal part" that is the site-specific part of an address. a STP-plus-ESD that is the site-specific part of an address. During a
renumbering, the Routing Goop would change, but the "site internal
During a renumbering, only the Routing Goop would change; the "site part" would remain fixed. Furthermore, the two parts of the address
internal part" would remain fixed. Furthermore, the two parts of the could be stored in the DNS as separate RRs. That way, renumbering a
address could be stored in the DNS as separate RRs. That way, site would only require that the Routing Goop RR of a site be
renumbering a site would only require that the Routing Goop RR of a updated; the "site-internal part" of individual addresses would not
site be updated; the "site-internal part" of individual addresses change.
would not change.
To obtain the address of a node from the DNS, a DNS query for the To obtain the address of a node from the DNS, a DNS query for the
name would return two quantities: the "site internal part" and the name would return two quantities: the "site internal part" and the
DNS name of the Routing Stuff for the site. An additional DNS query DNS name of the Routing Stuff for the site. An additional DNS query
would then obtain the specific RR of the site, and the complete would then obtain the specific RR of the site, and the complete
address would be synthesized by concatenating the two pieces of address would be synthesized by concatenating the two pieces of
information. information.
Implementing these DNS changes increases the practicality of using Implementing these DNS changes increases the practicality of using
Dynamic DNS to update a site's DNS records as it is renumbered. Only Dynamic DNS to update a site's DNS records as it is renumbered. Only
skipping to change at page 42, line 37 skipping to change at page 40, line 8
further study. further study.
If AAAA records are comprised of multiple distinct RRs, then one If AAAA records are comprised of multiple distinct RRs, then one
question is who should be responsible for synthesizing the AAAA from question is who should be responsible for synthesizing the AAAA from
its components: the resolver running on the querying client's machine its components: the resolver running on the querying client's machine
or the queried name server? To minimize the impact on client hosts or the queried name server? To minimize the impact on client hosts
and make it easier to deploy future changes, it is recommended that and make it easier to deploy future changes, it is recommended that
the synthesis of AAAA records from its constituent parts be done on the synthesis of AAAA records from its constituent parts be done on
name servers rather than in client resolvers. name servers rather than in client resolvers.
4.2.3. Two-Faced DNS 5.4.4. Two-Faced DNS
The GSE proposal attempts to hide the RG part of addresses from nodes The GSE proposal attempts to hide the RG part of addresses from nodes
within a Site. If the nodes do not know their own RG, then they can't within a site. If the nodes do not know their own RG, then they can't
store or use them in ways that cause problems should the Site be store or use them in ways that cause problems should the site be
renumbered and its RG change (i.e., the cached RG become invalid). A renumbered and its RG change (i.e., the cached RG become invalid). A
Site's DNS servers, however, will need to have more information about site's DNS servers, however, will need to have more information about
the RG its Site uses. Moreover, the responses it returns will depend the RG its site uses. Moreover, the responses it returns will depend
on who queries the server. A query from a node within the Site should on who queries the server. A query from a node within the site should
return an address with an RG portion equal to "Site local," whereas a return an address with a Site Local RG, whereas a query for the same
query for the same name from a client located at a different Site name from a client located at a different site should return the
would return the appropriate RG portion. This facilitates intra-site global scope RG. This facilitates intra-site communication to be
communication to be more resilient to failures outside of the site. more resilient to failures outside of the site. Such context-
Such context-dependent DNS servers are commonly referred as "two- dependent DNS servers are commonly referred as "two-faced" DNS
faced" DNS servers. servers.
Some issues that must be considered in this context: Some issues that must be considered in this context:
1) A DNS server may recursively attempt to resolve a query on 1) A DNS server may recursively attempt to resolve a query on
behalf of a requesting client. Consequently, a DNS query might behalf of a requesting client. Consequently, a DNS query might
be received from a proxy rather than from the client that be received from a proxy rather than from the client that
actually seeks the information. Because the proxy may not be actually seeks the information. Because the proxy may not be
located at the same Site as the originating client, a DNS server located at the same site as the originating client, a DNS server
cannot reliably determine whether a DNS request is coming from cannot reliably determine whether a DNS request is coming from
the same Site or a remote Site. One solution would be to the same site or a remote site. One solution would be to
disallow recursive queries for off-Site requesters, though this disallow recursive queries for off-site requesters, though this
raises additional questions. raises additional questions.
2) Since cached responses are, in general, context sensitive, a 2) Since cached responses are, in general, context sensitive, a
name server may be unable to correctly answer a query from its name server may be unable to correctly answer a query from its
cache, since the information it has is incomplete. That is, it cache, since the information it has is incomplete. That is, it
may have loaded the information via a query from a local client, may have loaded the information via a query from a local client,
and the information has a Site-local prefix. If a subsequent and the information has a site-local prefix. If a subsequent
request comes in from an off-Site requester, the DNS server request comes in from an off-site requester, the DNS server
cannot return a correct response (i.e., one containing the cannot return a correct response (i.e., one containing the
correct RG). correct RG).
4.2.4. Bootstrapping Issues 5.4.5. Bootstrapping Issues
If Routing Stuff information is distributed via the DNS, key DNS If Routing Stuff information is distributed via the DNS, key DNS
servers must always be reachable. In particular, the addresses servers must always be reachable. In particular, the addresses
(including Routing Stuff) of all root DNS servers are, for all (including Routing Stuff) of all root DNS servers are, for all
practical purposes, well-known and assumed to never change. It is not practical purposes, well-known and assumed to never change. It is not
uncommon for the addresses of root servers to be hard-coded into uncommon for the addresses of root servers to be hard-coded into
software distributions. Consequently, the Routing Stuff associated software distributions. Consequently, the Routing Stuff associated
with such addresses must always be usable for reaching root servers. with such addresses must always be usable for reaching root servers.
If it becomes necessary or desirable to change the Routing Stuff of If it becomes necessary or desirable to change the Routing Stuff of
an address at which a root DNS server resides, the routing subsystem an address at which a root DNS server resides, the routing subsystem
skipping to change at page 44, line 5 skipping to change at page 41, line 27
addresses. Because the total number of root DNS servers is relatively addresses. Because the total number of root DNS servers is relatively
small, the routing subsystem is expected to be able to handle this small, the routing subsystem is expected to be able to handle this
requirement. requirement.
All other DNS server addresses can be changed, since their addresses All other DNS server addresses can be changed, since their addresses
are typically learned from an upper-level DNS server that has are typically learned from an upper-level DNS server that has
delegated a part of the name space to them. So long as the delegating delegated a part of the name space to them. So long as the delegating
server is configured with the new address, the addresses of other server is configured with the new address, the addresses of other
servers can change. servers can change.
4.2.5. Renumbering and Reverse DNS Lookups 6. Conclusion
It is certain that many sites will, from time to time, undergo a
renumbering event, either through the mechanisms proposed for GSE or
using the facilities already specified for IPv6. It would be useful
to an outside node corresponding with such a site to be able to
distinguish a legitimate renumbering from an attempt to impersonate
the site. We claim that the DNS IP6.INT zone, without security
extensions [RFC2065], is of no use in making this determination and
that even a completely secured IP6.INT zone is of little use compared
with the "forward" DNS zone.
The first half of the claim is almost self-evident. An impersonator
can set up an insecure zone at some point in the IP6.INT hierarchy
and load it with any desired data. This is the reason that current
applications doing minimal access control follow a reverse lookup
with a forward lookup.
With a secured reverse zone, the problem of verifying an apparent
renumbering of a site can still be quite complex in the general case,
and will certainly be outside the scope of a transport protocol, if
survival of long-running sessions is contemplated. Under provider-
based addressing [RFC2073], renumbering is expected to occur due to a
change in network topology (e.g., a change in a provider relationship
at some point in the address aggregation tree). This alters the
global prefixes in use below the point of the change, and
correspondingly alters the chain of delegations of the DNS reverse-
mapping tree. And, although operational experience with secure DNS is
quite limited, it seems likely that there would also be a change in
the chain of certifications of the signing key of the leaf zone
representing the site. It is then problematic to translate
established trust in the old reverse mapping zone into trust in the
new zone. Certainly it's simpler to rely on the forward zone only.
The only function of the reverse zone, then, is to suggest an entry
point to the forward zone's database. It is this function which we
propose to achieve by means of a new ICMP message exchange.
4.3. Address Rewriting Routers
One of the most novel pieces of GSE is the rewriting of addresses as
datagrams enter and leave sites. If only a small number of routers
know the RG portion of the addresses, then the operational impact of
renumbering a Site would be small. In fact, assuming that the
critical security issues are dealt with, one could imagine a dynamic
protocol that a Site uses with its upstream provider to be told what
RG to use, so it might even be possible to renumber a Site
transparently.
GSE's ability to insure that the RG portion of a Site's addresses
reflect the actual location of that Site within the Public Internet
means that very aggressive aggregation (i.e., better route scaling)
can be achieved. Both GSE and other route-scaling approaches that use
provider-based addressing depend on aggressive aggregation, but while
other schemes rely largely on operational policies, GSE attempts to
include mechanisms in its core to insure that aggressive aggregation
happens in practice.
GSE has an advantage over other provider-based addressing schemes
like IPv4's CIDR with respect to the "fair distribution of work."
CIDR addresses the scaling of routing in DFZ portions of the
Internet, but the cost of carrying out the renumbering to maintain
the aggregation falls on the shoulders of subscribers who are far
away from the DFZ; in other words, subscribers must do the work of
renumbering so that their provider (or possibly even their provider's
provider) sees better aggregation. With GSE, the majority of the cost
required to make the routing scale would be incurred by the parties
who reap the benefits.
4.3.1. Load Balancing
While not considered a major advantage, with GSE, multi-homed sites
can more easily achieve symmetry with respect to which of their links
is used for a given flow. With GSE, if HostA in multi-homed Site1
initiates a flow to HostB in Site2, then when the initial packet
leaves Site1 the source address will be rewritten with an RG that
identifies the egress link used. As a result, when HostB needs to
send return traffic, it will use the full 16-byte address from the
arriving packet and this necessarily means that traffic for this flow
coming into Site1 will use the same circuit that outgoing traffic for
that flow took. In contrast, if the source address (i.e., Routing
Stuff) is fixed by the sending host, the same return path is used for
return traffic coming back to a site, regardless of which egress
router packets traverse when leaving that site.
4.3.2. End-To-End Argument: Don't Hide RG from Hosts
Despite these significant advantages, however, the overwhelming
consensus was that address rewriting by routers should not be pursued
as part of the current standardization effort. Although hiding RG
knowledge from hosts has advantages in some scenarios, that lack of
knowledge also makes it difficult to solve important problems.
For example, a host in a multi-homed site is known by multiple
addresses, but without knowing its address the host can play no role
in the source address selection; instead, the host relies on the
routing infrastructure to magically select the right one, i.e., by
selecting the egress router closest to the sender. For many sites,
this is the desired behavior. For others, this is not the desired
behavior. In those cases, the historically difficult-to-solve problem
of source address selection is made more difficult by moving it from
an intra-host decision to a distributed one. Now a site's internal
routers would have to have sufficient knowledge to decide which
egress router to forward traffic to, perhaps on a source-by-source
(or worse) basis.
Another end-to-end problem resulting from address rewriting has to do
with how transport connections should deal with the RG portion of the
address in incoming packets, particularly when authenticating the RG
changes. The sections on transport issues deal with the subject in
much more detail.
Interesting questions arise about address rewriting when dealing with
tunnels. Any node that acts as a tunnel for which the other end
resides in a different Site must be able to behave as a Site border
router and do address rewriting. This means that the RG may need to
be configured in more than just a Site's egress router, thus making
renumbering more problematic.
Another problem related to both performance and "architectural
cleanliness" has to do with IPv6's Routing Headers. It may be
necessary for addresses other than just the simple source and
destination to be rewritten. And again, this rewriting would need to
be done by both egress routers and nodes which terminate tunnels that
go to other sites.
4.4. Multi-Homing
Multi-Homing can mean many things. In the context of GSE, multi-
homing refers to a Site having more than one connection to the
Internet and therefore being known by multiple RGs. In many ways this
is close to multi-homing with IPv6 provider-based addressing. It is
hard to make comparisons to IPv4 because multi-homing has
traditionally been done in an ad hoc fashion.
With GSE, the ability of a Site to control the load-sharing over its
multiple links is not clear, partially because there is little
operational experience with multi-homed sites known by multiple
prefixes (with IPv4 the site is generally only known by a single
prefix). The following analysis is relevant to any scheme where an
Internet-connected site is known by multiple prefixes. For flows that
the multi-homed site initiates, load-sharing is impacted by the
source address used because that is the address that the remote site
will use for return traffic. If we assume the model of routers
rewriting source addresses, then the outgoing link selected
determines the load-sharing because that also determines what RG is
contained in the source address. If the routers do not rewrite source
addresses, then the end-host itself will have to make the source
address selection, and the optimal choice may require knowledge of
the topology. For flows initiated by someone outside of the multi-
homed site, the load-sharing is dependent on the destination address
specified, so the DNS has a large impact on load-sharing. There is
some amount of operational experience in using DNS to control load on
servers (e.g., having a Web server resolve to multiple addresses),
though that is load-sharing of a different resource and at a
different scope and scale. It is also worth noting that the selection
of the optimal outgoing link may well depend on the destination,
which has particularly interesting results on the DNS understanding
topology (and brings up the question of whether the DNS servers or
the resolvers are responsible for knowing the topology).
One advantage that GSE has for multi-homed sites is symmetry. Because
the source address is selected based on the outgoing link, and that
source address is what determines the return path, flows initiated by
the Site will be symmetric with respect to which of the Site's links
is used.
The multi-homing mechanism described in Section 3.7 has some
weaknesses and complexities. First, the mechanism only supports
healing a failed link and not a router; in other words, referencing
Figure 7, from Section 3.7, if PBR1 were not up at all, then it could
not tunnel the packets anywhere. One could imagine ways of
distributing PBR1's knowledge of PBR2 to other routers within
Provider1 to add more reliability, though this makes the problem
distributed rather than point-to-point and therefore more difficult.
Second, in the general case, static identification of PBR2 to PBR1,
and vice-versa, is not adequate. Imagine, for example, that the link
to PBR1 is much faster than the link to PBR2. In this case, it's
possible that packets whose destination addresses contain RG1 might
normally transit PBR2 without going directly to the Site. So there
seems to be a need for a dynamic protocol between PBR1 and PBR2 to
notify when PBR2, for example, should forward RG1-prefaced
destinations directly to the Site as opposed to forwarding it towards
PBR1.
Another note about multi-homing is the potential impact of internal
topology changes in the face of address rewriting. Using the
previously referenced diagram, if a flow from a host within the Site
is leaving via SBR1, but then something happens such that SBR2
becomes the host's closest exit point, then the remote end-point of
the flow will begin seeing different RG. Reasons such as this are why
the repercussions on the transport layer are so important (e.g.,
whether or not transport peers pay attention to the RG).
5. Results
This section summarizes the results of the GSE deliberations on the
IPv6 process.
1) Make changes to the IPv6 provider-based addressing document to
facilitate aggressive aggregation that is also operationally
realistic.
2) Create hard boundaries in IPv6 addresses to clearly distinguish
between the portions used to identify hosts, for routing within
a site, and for routing within the Public Internet.
3) Allow an option for the low-order 8 bytes of IPv6 addresses to
be designated as a globally unique End System Designator (ESD).
This change has potential benefits to future transport protocols
(e.g., TCPng).
4) Make a clear distinction between the "locator" part of an
address and the "identifier" part of the address. The former is
used to route a packet to its end-point, the latter is used to
identify an end-point, independent of the path used to deliver
the packet. Although this is a potentially revolutionary change
to IPv6 addressing model, existing transport protocols such as
TCP and UDP will not take advantage of the split. Future
transport protocols (e.g., TCPng), however, may.
5) Make changes to the way AAAA records are stored within the DNS, The GSE proposal provides a concrete example of a network protocol
so that renumbering a site (e.g., when a site changes ISPs) design that separates identifiers from locators in addresses. In
requires few changes to the DNS database in order to effectively this paper we compared GSE with IPv4 to better understand the pro's
change all of a site's address AAAA RRs. and con's of the respective design approaches.
6) Don't hide a node's full address from that node. In a scheme Functionally speaking, identifiers and locators each have a logically
where all nodes know their full address, address rewriting different role to play. Thus overloading both in one field causes
should not be necessary. problems whenever the location of a node changes but its identity
does not. However, our analysis shows that overloading also presents
two critically important benefits.
7) Consider multi-homing and its effect on aggregation and route First, for network entity A to send data to network entity B, A must
scaling from the beginning. Have a goal of architecting a way to not only know B's end identifier but also B's locator. No scalable
do multi-homing that is both scalable and operationally way is known at this time to provide this mapping at the network
practical, and consider related issues such as load-sharing. layer, other than overloading the two quantities into an address as
is done in IPv4. Fundamentally, a scalable mapping algorithm strongly
suggests that the identifier space be structured hierarchically, yet
identifiers in GSE are not sufficiently large to both contain
sufficient heirarchy and support stateless address autoconfiguration.
Instead, GSE forces applications to supply up-to-date locators.
However, relying on the locator provided at the time communication is
established as GSE does is inadequate when the remote locator can
change dynamically, precisely the scenario that is supposed to
benefit from the separation. That is, the benefits of separating the
identifier from the locator are largely lost, if the changes in the
identifier to locator binding are not tracked quickly.
8) Consider the issue of subnetting. For example, how are point- Secondly, when communicating with a remote site, a receiver must be
to-point links numbered? With IPv4, current practice is to able to insure (with reasonable certainty) that received data does
number point-to-point links out of "/30" subnets. However, do indeed come from the expected remote entity. In IPv4, it is possible
network masks longer than 64 bits make sense with the concept of to receive packets from a forged source, but the potential for
the low-order 8 bytes being a globally unique ESD? If not, then mischief between communicating peers is significantly limited because
is it acceptable to either leave point-to-point links un- return traffic will not reach the source of the forged traffic. That
numbered or to use an entire subnet for each point-to-point is, communication involving packets sent in both directions will not
link? Will there need to be an exception for IPv6 host routes succeed. In contrast, architectures like GSE that decouple the
(i.e., /128s) as a work-around for the bootstrapping issue of identifier and locator functions have great difficulty assuring that
addressing root DNS servers? If /128s are allowed, but not masks traffic from a source identified only by an identifer actually comes
between /65 and /127, inclusive, then a possible way to number from the correct source. Short of using cryptographic techniques in
point-to-point links within a backbone is to dedicate a single which both end points share a private secret (e.g., using IPSec),
subnet to them and route them as /128s. there is no known mechanism that can use an identifier alone to
perform this remote entity authentication in a scalable way. That
is, using an identifier alone for authentication of received packets
is dangerously unsafe.
9) Search for ways to minimize the impact that renumbering has on In summary, although overloading the address field with a combined
intra-site communication. Renumbering operations that change identifier and locator leads to difficulties in retaining the
only the RG portion of addresses should not impact existing identity of a node whenever its address changes, analysis in this
intra-site communication. One possible approach is to encourage paper suggests that the benefit of the overloading actually out-
the use of site-local addresses for all intra-site weighs its cost. Completely separating an identifier from its
communication. locator renders the identifier untrustworthy, thus useless, in the
absence of an accompanying authentication system.
6. Security Considerations 7. Security Considerations
The primary security consideration with GSE or, more generally, a The primary security consideration with GSE or, more generally, a
network layer with addresses split into locator and identifier parts, network layer with addresses split into locator and identifier parts,
is that of one node impersonating another by copying the is that of one node impersonating another by copying the
identification without the location. identification without the location.
7. Acknowledgments 8. Acknowledgments
Thanks go to Steve Deering and Bob Hinden (the Chairs of the IPng Thanks go to Steve Deering and Bob Hinden (the Chairs of the IPng
Working Group) as well as Sun Microsystems (the host for the PAL1 Working Group) as well as Sun Microsystems (the host for the PAL1
meeting) for the planning and execution of the interim meeting. meeting) for the planning and execution of the interim meeting.
Thanks also goes to Mike O'Dell for writing the 8+8 and GSE drafts. Thanks also go to Mike O'Dell for writing the 8+8 and GSE drafts; by
By publishing these documents and speaking on their behalf, Mike was publishing these documents and speaking on their behalf, Mike was the
the catalyst for some very valuable discussions that are expected to catalyst for some valuable discussions, both for IPv6 addressing and
result in improved IPv6 addressing. Special thanks to the attendees for addressing architectures in general. Special thanks to the
of the meeting who carried on the high caliber discussions which were attendees of the PAL1 meeting whose high caliber discussions helped
the source for this document. motivate and shape this document.
8. References 9. References
[BATES] Scalable support for multi-homed multi-provider [BATES] Scalable support for multi-homed multi-provider
connectivity, Internet Draft, Tony Bates & Yakov Rekhter, connectivity, Internet Draft, Tony Bates & Yakov Rekhter,
draft-bates-multihoming-01.txt. draft-bates-multihoming-01.txt.
[Bellovin 89] "Security Problems in the TCP/IP Protocol Suite", [Bellovin 89] "Security Problems in the TCP/IP Protocol Suite",
Bellovin, Steve, Computer Communications Review, Vol. 19, Bellovin, Steve, Computer Communications Review, Vol. 19,
No. 2, pp32-48, April 1989. No. 2, pp32-48, April 1989.
[CERT] CERT(sm) Advisory CA-96.21 [CERT] CERT(sm) Advisory CA-96.21
(ftp://info.cert.org/pub/cert_advisories) (ftp://info.cert.org/pub/cert_advisories)
[DANVERS] Minutes of the IPNG working Group, April 1995. [DANVERS] Minutes of the IPNG working Group, April 1995.
ftp://ftp.ietf.cnri.reston.va.us/ietf-online-proceedings/ ftp://ftp.ietf.cnri.reston.va.us/ietf-online-proceedings/
95apr/area.and.wg.reports/ipng/ipngwg/ ipngwg-minutes- 95apr/area.and.wg.reports/ipng/ipngwg/ ipngwg-minutes-
95apr.txt. 95apr.txt.
[DHCP-DDNS] Interaction between DHCP and DNS, Internet Draft, Yakov [DHCP-DDNS] Interaction between DHCP and DNS, Internet Draft, Yakov
Rekhtor, draft-ietf-dhc-dhcp-dns-04.txt. Rekhter, draft-ietf-dhc-dhcp-dns-04.txt.
[DDNS] "Dynamic Updates in the Domain Name System (DNS UPDATE)", [DDNS] "Dynamic Updates in the Domain Name System (DNS UPDATE)",
Paul Vixie (Editor), draft-ietf-dnsind-dynDNS-11.txt, Paul Vixie (Editor), draft-ietf-dnsind-dynDNS-11.txt,
November, 1996. November, 1996.
[EUI64] 64-Bit Global Identifier Format Tutorial. [EUI64] 64-Bit Global Identifier Format Tutorial.
http://standards.ieee.org/db/oui/tutorials/EUI64.html. http://standards.ieee.org/db/oui/tutorials/EUI64.html.
Note: "EUI-64" is claimed as a trademark by an organization Note: "EUI-64" is claimed as a trademark by an organization
which also forbids reference to itself in association with which also forbids reference to itself in association with
that term in a standards document which is not their own, that term in a standards document which is not their own,
skipping to change at page 51, line 13 skipping to change at page 44, line 24
[RFC1752] "The Recommendation for the IP Next Generation Protocol," [RFC1752] "The Recommendation for the IP Next Generation Protocol,"
S. Bradner, A. Mankin, 01/18/1995. S. Bradner, A. Mankin, 01/18/1995.
[RFC1788] "ICMP Domain Name Messages", W. Simpson, 04/14/1995 [RFC1788] "ICMP Domain Name Messages", W. Simpson, 04/14/1995
[RFC1958] Architectural Principles of the Internet. B. Carpenter. [RFC1958] Architectural Principles of the Internet. B. Carpenter.
[RFC1971] IPv6 Stateless Address Autoconfiguration. S. Thomson, T. [RFC1971] IPv6 Stateless Address Autoconfiguration. S. Thomson, T.
Narten. Narten.
[RFC2002] "IP Mobility Support", 10/22/1996, C. Perkins. [RFC2002] "IP Mobility Support", C. Perkins, RFC 2002, October,
1996.
[RFC2008] "Implications of Various Address Allocation Policies for [RFC2008] "Implications of Various Address Allocation Policies for
Internet Routing", Y. Rekhter, T. Li. Internet Routing", Y. Rekhter, T. Li.
[RFC2065] Domain Name System Security Extensions. D. Eastlake, C. [RFC2065] Domain Name System Security Extensions. D. Eastlake, C.
Kaufman. Kaufman.
[RFC2073] An IPv6 Provider-Based Unicast Address Format. Y. [RFC2073] An IPv6 Provider-Based Unicast Address Format. Y.
Rekhter, P. Lothberg, R. Hinden, S. Deering, J. Postel Rekhter, P. Lothberg, R. Hinden, S. Deering, J. Postel
9. Authors' Addresses [RFC2267] Network Ingress Filtering: Defeating Denial of Service
Attacks which employ IP Source Address Spoofing, P.
Ferguson, D. Senie, January 1988.
10. Authors' Addresses
Matt Crawford John Stewart Matt Crawford John Stewart
Fermilab MS 368 USC/ISI Fermilab MS 368 Juniper Networks, Inc.
PO Box 500 4350 North Fairfax Drive PO Box 500 385 Ravendale Drive
Batavia, IL 60510 USA Suite 620 Batavia, IL 60510 USA Mountain View, CA 94043
Phone: 708-840-3461 Arlington, VA 22203 USA Phone: 630-840-3461 Phone: +1 650 526 8000
EMail: crawdad@fnal.gov Phone: 703-807-0132 EMail: crawdad@fnal.gov EMail: jstewart@juniper.net
EMail: jstewart@isi.edu
Allison Mankin Lixia Zhang Allison Mankin Lixia Zhang
USC/ISI UCLA Computer Science Department USC/ISI UCLA Computer Science Department
4350 North Fairfax Drive 4531G Boelter Hall 4350 North Fairfax Drive 4531G Boelter Hall
Suite 620 Los Angeles, CA 90095-1596 USA Suite 620 Los Angeles, CA 90095-1596 USA
Arlington, VA 22203 USA Phone: 310-825-2695 Arlington, VA 22203 USA Phone: 310-825-2695
EMail: mankin@isi.edu EMail: lixia@cs.ucla.edu EMail: mankin@isi.edu EMail: lixia@cs.ucla.edu
Phone: 703-807-0132 Phone: 703-807-0132
Thomas Narten Thomas Narten
IBM Corporation IBM Corporation
3039 Cornwallis Ave. 3039 Cornwallis Ave.
PO Box 12195 - F11/502 PO Box 12195 - F11/502
Research Triangle Park, NC 27709-2195 Research Triangle Park, NC 27709-2195
Phone: 919-254-7798 Phone: 919-254-7798
EMail: narten@raleigh.ibm.com EMail: narten@raleigh.ibm.com
Appendix B -- Ideas Incorporated Into IPv6
This section summarizes changes made to IPv6 specifications which
originated in the GSE proposal or in the discussions arising from it.
First and most visible was the change to the unicast address format.
Instead of an topologically insignificant Registry ID immediately
following the Format Prefix, there is now a Top-Level Aggregation
Identifier. This field will identify a large routable aggregate to
which an address belongs rather than an administrative unit by which
an address was assigned. The TLA corresponds to the "Large
Structure" of GSE. The IPv6 Next-Level Aggregation Identifier (NLA)
is roughly the rest of the GSE "Routing Goop" and the Site-Level
Aggregation Identifier (SLA) is a slightly expanded GSE Site Topology
Partition.
The decision to put fixed boundaries between parts of the unicast
address (TLA, NLA, SLA, Interface Identifier) also came from GSE.
The previous "provider-based" addressing architecture for IPv6 had
fluid boundaries between Registry ID, Provider ID, Subscriber ID and
the Intra-Subscriber part, as well as undefined divisions within the
Provider-ID and Intra-Subscriber part. (On subnetworks with a MAC-
layer address, the latter boundary was generally placed to
accommodate use of that address as an Interface ID.) The new
addressing architecture still expects divisions within the NLA
portion of the address, placed to reflect topological aggregation
points.
Defining a fixed boundary between the routable portion of the address
and the node-on-link identifier required the specification of an
Interface Identifier which would be as suitable as possible for all
subnetwork technologies. The IEEE "EUI-64" identifier was selected,
having the advantages of an easy mapping from 48 bit MAC addresses
and a defined escape flag into locally-administered values.
The second change to come out of the GSE discussions relates to
reducing the number of DNS record changes required in the event of
site renumbering. This work is not finalized as of this writing, but
the result may be that individual IPv6 addresses are stored (and
signed, in the case of Secure DNS) as a partial address and an
indirect pointer which leads to the high-order part of the address.
There may be multiple levels of indirection and a changed record at
any one level would suffice to update the DNS's record of the IPv6
addresses of every node in a given branch of the addressing
hierarchy.
A change in the method of doing DNS address-to-name lookups is also
in the works. This may be a change in the form and/or operation of
the ip6.int domain or some new mechanism which involves participation
by the routers or the end-nodes themselves.
Two other changes arising from GSE will not affect the IPv6 base
specifications themselves, but do direct additional work. Those are
the injection of global prefix information into a site from a
provider or exchange, and some inter-provider cooperative method of
providing multihoming to mutual customers with minimal impact on
routing tables in distant parts of the network.
 End of changes. 219 change blocks. 
1462 lines changed or deleted 1165 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/