draft-ietf-lsvr-bgp-spf-03.txt   draft-ietf-lsvr-bgp-spf-04.txt 
Network Working Group K. Patel Network Working Group K. Patel
Internet-Draft Arrcus, Inc. Internet-Draft Arrcus, Inc.
Intended status: Standards Track A. Lindem Intended status: Standards Track A. Lindem
Expires: March 31, 2019 Cisco Systems Expires: June 23, 2019 Cisco Systems
S. Zandi S. Zandi
Linkedin Linkedin
W. Henderickx W. Henderickx
Nokia Nokia
September 27, 2018 December 20, 2018
Shortest Path Routing Extensions for BGP Protocol Shortest Path Routing Extensions for BGP Protocol
draft-ietf-lsvr-bgp-spf-03.txt draft-ietf-lsvr-bgp-spf-04.txt
Abstract Abstract
Many Massively Scaled Data Centers (MSDCs) have converged on Many Massively Scaled Data Centers (MSDCs) have converged on
simplified layer 3 routing. Furthermore, requirements for simplified layer 3 routing. Furthermore, requirements for
operational simplicity have lead many of these MSDCs to converge on operational simplicity have lead many of these MSDCs to converge on
BGP as their single routing protocol for both their fabric routing BGP as their single routing protocol for both their fabric routing
and their Data Center Interconnect (DCI) routing. This document and their Data Center Interconnect (DCI) routing. This document
describes a solution which leverages BGP Link-State distribution and describes a solution which leverages BGP Link-State distribution and
the Shortest Path First (SPF) algorithm similar to Internal Gateway the Shortest Path First (SPF) algorithm similar to Internal Gateway
skipping to change at page 1, line 42 skipping to change at page 1, line 42
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 31, 2019. This Internet-Draft will expire on June 23, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 41 skipping to change at page 2, line 41
1.2. Requirements Language . . . . . . . . . . . . . . . . . . 5 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 5
2. BGP Peering Models . . . . . . . . . . . . . . . . . . . . . 5 2. BGP Peering Models . . . . . . . . . . . . . . . . . . . . . 5
2.1. BGP Single-Hop Peering on Network Node Connections . . . 5 2.1. BGP Single-Hop Peering on Network Node Connections . . . 5
2.2. BGP Peering Between Directly Connected Network Nodes . . 6 2.2. BGP Peering Between Directly Connected Network Nodes . . 6
2.3. BGP Peering in Route-Reflector or Controller Topology . . 6 2.3. BGP Peering in Route-Reflector or Controller Topology . . 6
3. BGP-LS Shortest Path Routing (SPF) SAFI . . . . . . . . . . . 6 3. BGP-LS Shortest Path Routing (SPF) SAFI . . . . . . . . . . . 6
4. Extensions to BGP-LS . . . . . . . . . . . . . . . . . . . . 7 4. Extensions to BGP-LS . . . . . . . . . . . . . . . . . . . . 7
4.1. Node NLRI Usage and Modifications . . . . . . . . . . . . 7 4.1. Node NLRI Usage and Modifications . . . . . . . . . . . . 7
4.2. Link NLRI Usage . . . . . . . . . . . . . . . . . . . . . 8 4.2. Link NLRI Usage . . . . . . . . . . . . . . . . . . . . . 8
4.2.1. BGP-LS Link NLRI Attribute Prefix-Length TLVs . . . . 9 4.2.1. BGP-LS Link NLRI Attribute Prefix-Length TLVs . . . . 9
4.3. Prefix NLRI Usage . . . . . . . . . . . . . . . . . . . . 9 4.2.2. BGP-LS Link NLRI Attribute BGP SPF Status TLV . . . . 9
4.4. BGP-LS Attribute Sequence-Number TLV . . . . . . . . . . 9 4.2.3. BGP-LS Prefix NLRI Attribute SPF Status TLV . . . . . 10
5. Decision Process with SPF Algorithm . . . . . . . . . . . . . 10 4.3. Prefix NLRI Usage . . . . . . . . . . . . . . . . . . . . 10
5.1. Phase-1 BGP NLRI Selection . . . . . . . . . . . . . . . 11 4.4. BGP-LS Attribute Sequence-Number TLV . . . . . . . . . . 10
5.2. Dual Stack Support . . . . . . . . . . . . . . . . . . . 12 5. Decision Process with SPF Algorithm . . . . . . . . . . . . . 11
5.3. SPF Calculation based on BGP-LS NLRI . . . . . . . . . . 12 5.1. Phase-1 BGP NLRI Selection . . . . . . . . . . . . . . . 12
5.4. NEXT_HOP Manipulation . . . . . . . . . . . . . . . . . . 15 5.2. Dual Stack Support . . . . . . . . . . . . . . . . . . . 13
5.5. IPv4/IPv6 Unicast Address Family Interaction . . . . . . 15 5.3. SPF Calculation based on BGP-LS NLRI . . . . . . . . . . 13
5.6. NLRI Advertisement and Convergence . . . . . . . . . . . 15 5.4. NEXT_HOP Manipulation . . . . . . . . . . . . . . . . . . 16
5.7. Error Handling . . . . . . . . . . . . . . . . . . . . . 16 5.5. IPv4/IPv6 Unicast Address Family Interaction . . . . . . 16
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 5.6. NLRI Advertisement and Convergence . . . . . . . . . . . 17
7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 5.6.1. Link/Prefix Failure Convergence . . . . . . . . . . . 17
8. Management Considerations . . . . . . . . . . . . . . . . . . 16 5.6.2. Node Failure Convergence . . . . . . . . . . . . . . 17
8.1. Configuration . . . . . . . . . . . . . . . . . . . . . . 16 5.7. Error Handling . . . . . . . . . . . . . . . . . . . . . 18
8.2. Operational Data . . . . . . . . . . . . . . . . . . . . 16 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 7. Security Considerations . . . . . . . . . . . . . . . . . . . 18
10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 17 8. Management Considerations . . . . . . . . . . . . . . . . . . 18
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 8.1. Configuration . . . . . . . . . . . . . . . . . . . . . . 18
11.1. Normative References . . . . . . . . . . . . . . . . . . 17 8.2. Operational Data . . . . . . . . . . . . . . . . . . . . 18
11.2. Information References . . . . . . . . . . . . . . . . . 18 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 19
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 19
11.1. Normative References . . . . . . . . . . . . . . . . . . 19
11.2. Information References . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22
1. Introduction 1. Introduction
Many Massively Scaled Data Centers (MSDCs) have converged on Many Massively Scaled Data Centers (MSDCs) have converged on
simplified layer 3 routing. Furthermore, requirements for simplified layer 3 routing. Furthermore, requirements for
operational simplicity have lead many of these MSDCs to converge on operational simplicity have lead many of these MSDCs to converge on
BGP [RFC4271] as their single routing protocol for both their fabric BGP [RFC4271] as their single routing protocol for both their fabric
routing and their Data Center Interconnect (DCI) routing. routing and their Data Center Interconnect (DCI) routing.
Requirements and procedures for using BGP are described in [RFC7938]. Requirements and procedures for using BGP are described in [RFC7938].
This document describes an alternative solution which leverages BGP- This document describes an alternative solution which leverages BGP-
skipping to change at page 9, line 27 skipping to change at page 9, line 27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TBD IPv4 or IPv6 Type | Length | | TBD IPv4 or IPv6 Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Prefix-Length | | Prefix-Length |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
Prefix-length - A one-octet length restricted to 1-32 for IPv4 Prefix-length - A one-octet length restricted to 1-32 for IPv4
Link NLIR endpoint prefixes and 1-128 for IPv6 Link NLIR endpoint prefixes and 1-128 for IPv6
Link NLRI endpoint prefixes. Link NLRI endpoint prefixes.
4.2.2. BGP-LS Link NLRI Attribute BGP SPF Status TLV
A BGP-LS Attribute TLV to BGP-LS Link NLRI is defined to indicate the
status of the link with respect to the BGP SPF calculation. This
will be used to expedite convergence for link failures as discussed
in Section 5.6.1. If the BGP SPF Status TLV is not included with the
Link NLRI, the link is considered up and available.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TBD Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BGP SPF Status|
+-+-+-+-+-+-+-+-+
BGP Status Values: 0 - Reserved
1 - Link Unreachable with respect to BGP SPF
2-254 - Undefined
255 - Reserved
4.2.3. BGP-LS Prefix NLRI Attribute SPF Status TLV
A BGP-LS Attribute TLV to BGP-LS Prefix NLRI is defined to indicate
the status of the prefix with respect to the BGP SPF calculation.
This will be used to expedite convergence for prefix unreachability
as discussed in Section 5.6.1. If the SPF Status TLV is not included
with the Prefix NLRI, the prefix is considered reachable.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TBD Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BGP SPF Status|
+-+-+-+-+-+-+-+-+
BGP Status Values: 0 - Reserved
1 - Prefix down with respect to SPF
2-254 - Undefined
255 - Reserved
4.3. Prefix NLRI Usage 4.3. Prefix NLRI Usage
Prefix NLRI is advertised with a local node descriptor as described Prefix NLRI is advertised with a local node descriptor as described
above and the prefix and length used as the descriptors (TLV 265) as above and the prefix and length used as the descriptors (TLV 265) as
described in [RFC7752]. The prefix metric attribute TLV (TLV 1155) described in [RFC7752]. The prefix metric attribute TLV (TLV 1155)
as well as any others required for non-SPF purposes SHOULD be as well as any others required for non-SPF purposes SHOULD be
advertised. For loopback prefixes, the metric should be 0. For non- advertised. For loopback prefixes, the metric should be 0. For non-
loopback prefixes, the setting of the metric is a local matter and loopback prefixes, the setting of the metric is a local matter and
beyond the scope of this document. beyond the scope of this document.
skipping to change at page 13, line 44 skipping to change at page 14, line 44
list for processing. The Node corresponding to this NLRI will be list for processing. The Node corresponding to this NLRI will be
referred to as the Current Node. If the candidate list is empty, referred to as the Current Node. If the candidate list is empty,
the SPF calculation has completed and the algorithm proceeds to the SPF calculation has completed and the algorithm proceeds to
step 6. step 6.
4. All the Prefix NLRI with the same Node Identifiers as the Current 4. All the Prefix NLRI with the same Node Identifiers as the Current
Node will be considered for installation. The cost for each Node will be considered for installation. The cost for each
prefix is the metric advertised in the Prefix NLRI added to the prefix is the metric advertised in the Prefix NLRI added to the
cost to reach the Current Node. cost to reach the Current Node.
* If the prefix is not in the local RIB, the prefix is installed * If the BGP-LS Prefix attribute includes an BGP-SPF Status TLV
and will inherit the Current Node's next hops. indicating the prefix is unreachable, the BGP-LS Prefix NLRI
is considered unreachable and the next BGP-LS Prefix NLRI is
examined.
* If the prefix is in the local RIB and the cost is greater than * If the prefix is in the local RIB and the cost is greater than
the Current route's metric, the Prefix NLRI does not the Current route's metric, the Prefix NLRI does not
contribute to the route and is ignored. contribute to the route and is ignored.
* If the prefix is in the local RIB and the cost is less than * If the prefix is in the local RIB and the cost is less than
the current route's metric, the Prefix is installed with the the current route's metric, the Prefix is installed with the
Current Node's next-hops replacing the local RIB route's next- Current Node's next-hops replacing the local RIB route's next-
hops and the metric being updated. hops and the metric being updated.
skipping to change at page 14, line 28 skipping to change at page 15, line 31
* Optionally, the prefix(es) associated with the Current Link * Optionally, the prefix(es) associated with the Current Link
are installed into the local RIB using the same rules as were are installed into the local RIB using the same rules as were
used for Prefix NLRI in the previous steps. used for Prefix NLRI in the previous steps.
* The Current Link's endpoint Node NLRI is accessed (i.e., the * The Current Link's endpoint Node NLRI is accessed (i.e., the
Node NLRI with the same Node identifiers as the Link Node NLRI with the same Node identifiers as the Link
endpoint). If it exists, it will be referred to as the endpoint). If it exists, it will be referred to as the
Endpoint Node NLRI and the algorithm will proceed as follows: Endpoint Node NLRI and the algorithm will proceed as follows:
+ If the BGP-LS Link NLRI includes an BGP-SPF Status TLV
indicating the link is down, the BGP-LS Link NLRI is
considered down and the next BGP-LS Link NLRI is examined.
+ All the Link NLRI corresponding the Endpoint Node NLRI will + All the Link NLRI corresponding the Endpoint Node NLRI will
be searched for a back-link NLRI pointing to the current be searched for a back-link NLRI pointing to the current
node. Both the Node identifiers and the Link endpoint node. Both the Node identifiers and the Link endpoint
identifiers in the Endpoint Node's Link NLRI must match for identifiers in the Endpoint Node's Link NLRI must match for
a match. If there is no corresponding Link NLRI a match. If there is no corresponding Link NLRI
corresponding to the Endpoint Node NLRI, the Endpoint Node corresponding to the Endpoint Node NLRI, the Endpoint Node
NLIR fails the bi-directional connectivity test and is not NLIR fails the bi-directional connectivity test and is not
processed further. processed further.
+ If the Endpoint Node NLRI is not on the candidate list, it + If the Endpoint Node NLRI is not on the candidate list, it
skipping to change at page 15, line 46 skipping to change at page 17, line 7
Given the fact that SPF algorithms are based on the assumption that Given the fact that SPF algorithms are based on the assumption that
all routers in the routing domain calculate the precisely the same all routers in the routing domain calculate the precisely the same
SPF tree and install the same set of routes, it is RECOMMENDED that SPF tree and install the same set of routes, it is RECOMMENDED that
BGP-LS SPF IPv4/IPv6 routes be given priority by default when BGP-LS SPF IPv4/IPv6 routes be given priority by default when
installed into their respective RIBs. In common implementations the installed into their respective RIBs. In common implementations the
prioritization is governed by route preference or administrative prioritization is governed by route preference or administrative
distance with lower being more preferred. distance with lower being more preferred.
5.6. NLRI Advertisement and Convergence 5.6. NLRI Advertisement and Convergence
5.6.1. Link/Prefix Failure Convergence
A local failure will prevent a link from being used in the SPF A local failure will prevent a link from being used in the SPF
calculation due to the IGP bi-directional connectivity requirement. calculation due to the IGP bi-directional connectivity requirement.
Consequently, local link failures should always be given priority Consequently, local link failures should always be given priority
over updates (e.g., withdrawing all routes learned on a session) in over updates (e.g., withdrawing all routes learned on a session) in
order to ensure the highest priority propagation and optimal order to ensure the highest priority propagation and optimal
convergence. convergence.
Delaying the withdrawal of non-local routes is an area for further An IGP such as OSPF [RFC2328] will stop using the link as soon as the
study as more IGP-like mechanisms would be required to prevent usage Router-LSA for one side of the link is received. With normal BGP
of stale NLRI. advertisement, the link would continue to be used until the last copy
of the BGP-LS Link NLRI is withdrawn. In order to avoid this delay,
the originator of the Link NLRI will advertise a more recent version
of the BGP-LS Link NLRI including the BGP-SPF Status TLV
Section 4.2.2 indicating the link is down with respect to BGP-SPF.
After some configurable period of time, e.g., 2-3 seconds, the BGP-LS
Link NLRI can be withdrawn with no consequence. If the link becomes
available in that period, the originator of the BGP-LS LINK NLRI will
simply advertise a more recent version of the BGP-LS Link NLRI
without the BGP-SPF status TLV in the BGP-LS Link Attributes.
Similarily, when a prefix becomes unreachable, a more recent version
of the BGP-LS Prefix NLRI will be advertised with the BGP-SPF status
TLV Section 4.2.3 indicating the prefix is unreachable in the BGP-LS
Prefix Attributes and the prefix will be considered unreachable with
respect to BGP SPF. After some configurable period of time, e.g.,
2-3 seconds, the BGP-LS Prefix NLRI can be withdrawn with no
consequence. If the prefix becomes reachable in that period, the
originator of the BGP-LS Prefix NLRI will simply advertise a more
recent version of the BGP-LS Prefix NLRI without the BGP-SPF status
TLV in the BGP-LS Prefix Attributes.
5.6.2. Node Failure Convergence
With BGP without graceful restart [RFC4724], all the NLRI advertised
by node are implicitly withdrawn when a session failure is detected.
If fast failure detection such as BFD is utilized and the node is on
the fastest converging path, the most recent versions of BGP-LS NLRI
may be withdrawn while these versions are in-flight on longer paths.
This will result the older version of the NLRI being used until the
new versions arrive and, potentially, unnecessary route flaps.
Therefore, BGP-LS SPF NLRI SHOULD always be retained before being
implicitly withdrawn for a brief configurable interval, e.g., 2-3
seconds. This will not delay convergence since the adjacent nodes
will detect the link failure and advertise a more recent NLRI
indicating the link is down with respect to BGP SPF Section 5.6.1 and
the BGP-SPF calculation will failure the bi-directional connectivity
check.
5.7. Error Handling 5.7. Error Handling
When a BGP speaker receives a BGP Update containing a malformed SPF When a BGP speaker receives a BGP Update containing a malformed SPF
Capability TLV in the Node NLRI BGP-LS Attribute [RFC7752], it MUST Capability TLV in the Node NLRI BGP-LS Attribute [RFC7752], it MUST
ignore the received TLV and the Node NLRI and not pass it to other ignore the received TLV and the Node NLRI and not pass it to other
BGP peers as specified in [RFC7606]. When discarding a Node NLRI BGP peers as specified in [RFC7606]. When discarding a Node NLRI
with malformed TLV, a BGP speaker SHOULD log an error for further with malformed TLV, a BGP speaker SHOULD log an error for further
analysis. analysis.
skipping to change at page 17, line 17 skipping to change at page 19, line 17
SPF triggering events. Additionally, to troubleshoot SPF scheduling SPF triggering events. Additionally, to troubleshoot SPF scheduling
and backoff [RFC8405], the current SPF backoff state, remaining time- and backoff [RFC8405], the current SPF backoff state, remaining time-
to-learn, remaining holddown, last trigger event time, last SPF time, to-learn, remaining holddown, last trigger event time, last SPF time,
and next SPF time should be available. and next SPF time should be available.
9. Acknowledgements 9. Acknowledgements
The authors would like to thank Sue Hares, Jorge Rabadan, Boris The authors would like to thank Sue Hares, Jorge Rabadan, Boris
Hassanov, Dan Frost, and Fred Baker for their review and comments. Hassanov, Dan Frost, and Fred Baker for their review and comments.
The authors extend special thanks to Eric Rosen for fruitful
discussions on BGP-LS SPF convergence as compared to IGPs.
10. Contributors 10. Contributors
In addition to the authors listed on the front page, the following In addition to the authors listed on the front page, the following
co-authors have contributed to the document. co-authors have contributed to the document.
Derek Yeung Derek Yeung
Arrcus, Inc. Arrcus, Inc.
derek@arrcus.com derek@arrcus.com
Gunter Van De Velde Gunter Van De Velde
 End of changes. 11 change blocks. 
30 lines changed or deleted 124 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/