draft-ietf-opsawg-ntf-10.txt | draft-ietf-opsawg-ntf-11.txt | |||
---|---|---|---|---|
OPSAWG H. Song | OPSAWG H. Song | |||
Internet-Draft Futurewei | Internet-Draft Futurewei | |||
Intended status: Informational F. Qin | Intended status: Informational F. Qin | |||
Expires: 12 May 2022 China Mobile | Expires: 2 June 2022 China Mobile | |||
P. Martinez-Julia | P. Martinez-Julia | |||
NICT | NICT | |||
L. Ciavaglia | L. Ciavaglia | |||
Rakuten Mobile | Rakuten Mobile | |||
A. Wang | A. Wang | |||
China Telecom | China Telecom | |||
8 November 2021 | 29 November 2021 | |||
Network Telemetry Framework | Network Telemetry Framework | |||
draft-ietf-opsawg-ntf-10 | draft-ietf-opsawg-ntf-11 | |||
Abstract | Abstract | |||
Network telemetry is a technology for gaining network insight and | Network telemetry is a technology for gaining network insight and | |||
facilitating efficient and automated network management. It | facilitating efficient and automated network management. It | |||
encompasses various techniques for remote data generation, | encompasses various techniques for remote data generation, | |||
collection, correlation, and consumption. This document describes an | collection, correlation, and consumption. This document describes an | |||
architectural framework for network telemetry, motivated by | architectural framework for network telemetry, motivated by | |||
challenges that are encountered as part of the operation of networks | challenges that are encountered as part of the operation of networks | |||
and by the requirements that ensue. This document clarifies the | and by the requirements that ensue. This document clarifies the | |||
skipping to change at page 1, line 48 ¶ | skipping to change at page 1, line 48 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on 12 May 2022. | This Internet-Draft will expire on 2 June 2022. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2021 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
extracted from this document must include Simplified BSD License text | extracted from this document must include Revised BSD License text as | |||
as described in Section 4.e of the Trust Legal Provisions and are | described in Section 4.e of the Trust Legal Provisions and are | |||
provided without warranty as described in the Simplified BSD License. | provided without warranty as described in the Revised BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
3.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7 | 2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7 | |||
3.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 7 | 2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
3.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 9 | 2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
3.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 10 | 2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 10 | |||
3.5. The Necessity of a Network Telemetry Framework . . . . . 13 | 2.5. The Necessity of a Network Telemetry Framework . . . . . 13 | |||
4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 14 | 3. Network Telemetry Framework . . . . . . . . . . . . . . . . . 14 | |||
4.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 14 | 3.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 14 | |||
4.1.1. Management Plane Telemetry . . . . . . . . . . . . . 18 | 3.1.1. Management Plane Telemetry . . . . . . . . . . . . . 18 | |||
4.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 18 | 3.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 18 | |||
4.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 19 | 3.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 19 | |||
4.1.4. External Data Telemetry . . . . . . . . . . . . . . . 21 | 3.1.4. External Data Telemetry . . . . . . . . . . . . . . . 21 | |||
4.2. Second Level Function Components . . . . . . . . . . . . 22 | 3.2. Second Level Function Components . . . . . . . . . . . . 22 | |||
4.3. Data Acquisition Mechanism and Type Abstraction . . . . . 24 | 3.3. Data Acquisition Mechanism and Type Abstraction . . . . . 24 | |||
4.4. Mapping Existing Mechanisms into the Framework . . . . . 26 | 3.4. Mapping Existing Mechanisms into the Framework . . . . . 26 | |||
5. Evolution of Network Telemetry Applications . . . . . . . . . 27 | 4. Evolution of Network Telemetry Applications . . . . . . . . . 27 | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 27 | 5. Security Considerations . . . . . . . . . . . . . . . . . . . 28 | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 | |||
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 29 | 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 29 | |||
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 30 | 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 29 | |||
10. Informative References . . . . . . . . . . . . . . . . . . . 30 | 9. Informative References . . . . . . . . . . . . . . . . . . . 29 | |||
Appendix A. A Survey on Existing Network Telemetry Techniques . 35 | Appendix A. A Survey on Existing Network Telemetry Techniques . 35 | |||
A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 35 | A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 35 | |||
A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 36 | A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 35 | |||
A.1.2. gRPC Network Management Interface . . . . . . . . . . 36 | A.1.2. gRPC Network Management Interface . . . . . . . . . . 36 | |||
A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 36 | A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 36 | |||
A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 36 | A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 36 | |||
A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 37 | A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 36 | |||
A.3.1. The Alternate Marking (AM) technology . . . . . . . . 37 | A.3.1. The Alternate Marking (AM) technology . . . . . . . . 36 | |||
A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 38 | A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 38 | |||
A.3.3. IP Flow Information Export (IPFIX) Protocol . . . . . 39 | A.3.3. IP Flow Information Export (IPFIX) Protocol . . . . . 38 | |||
A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 39 | A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 38 | |||
A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 39 | A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 39 | |||
A.3.6. Existing OAM for Specific Data Planes . . . . . . . . 39 | A.3.6. Existing OAM for Specific Data Planes . . . . . . . . 39 | |||
A.4. External Data and Event Telemetry . . . . . . . . . . . . 40 | A.4. External Data and Event Telemetry . . . . . . . . . . . . 39 | |||
A.4.1. Sources of External Events . . . . . . . . . . . . . 40 | A.4.1. Sources of External Events . . . . . . . . . . . . . 39 | |||
A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 41 | A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 41 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 | |||
1. Introduction | 1. Introduction | |||
Network visibility is the ability of management tools to see the | Network visibility is the ability of management tools to see the | |||
state and behavior of a network, which is essential for successful | state and behavior of a network, which is essential for successful | |||
network operation. Network Telemetry revolves around network data | network operation. Network Telemetry revolves around network data | |||
that can help provide insights about the current state of the | that can help provide insights about the current state of the | |||
network, including network devices, forwarding, control, and | network, including network devices, forwarding, control, and | |||
skipping to change at page 4, line 20 ¶ | skipping to change at page 4, line 20 ¶ | |||
maintaining, and understanding a network telemetry system. At last, | maintaining, and understanding a network telemetry system. At last, | |||
we outline the evolution stages of the network telemetry system and | we outline the evolution stages of the network telemetry system and | |||
discuss the potential security concerns. | discuss the potential security concerns. | |||
The purpose of the framework and taxonomy is to set a common ground | The purpose of the framework and taxonomy is to set a common ground | |||
for the collection of related work and provide guidance for future | for the collection of related work and provide guidance for future | |||
technique and standard developments. To the best of our knowledge, | technique and standard developments. To the best of our knowledge, | |||
this document is the first such effort for network telemetry in | this document is the first such effort for network telemetry in | |||
industry standards organizations. | industry standards organizations. | |||
2. Glossary | 1.1. Glossary | |||
Before further discussion, we list some key terminology and acronyms | Before further discussion, we list some key terminology and acronyms | |||
used in this document. We make an intended differentiation between | used in this document. We make an intended differentiation between | |||
the terms of network telemetry and OAM. However, it should be | the terms of network telemetry and OAM. However, it should be | |||
understood that there is not a hard-line distinction between the two | understood that there is not a hard-line distinction between the two | |||
concepts. Rather, network telemetry is considered as an extension of | concepts. Rather, network telemetry is considered as an extension of | |||
OAM. It covers all the existing OAM protocols but puts more emphasis | OAM. It covers all the existing OAM protocols but puts more emphasis | |||
on the newer and emerging techniques and protocols concerning all | on the newer and emerging techniques and protocols concerning all | |||
aspects of network data from acquisition to consumption. | aspects of network data from acquisition to consumption. | |||
skipping to change at page 4, line 48 ¶ | skipping to change at page 4, line 48 ¶ | |||
BMP: BGP Monitoring Protocol, specified in [RFC7854]. | BMP: BGP Monitoring Protocol, specified in [RFC7854]. | |||
DPI: Deep Packet Inspection, referring to the techniques that | DPI: Deep Packet Inspection, referring to the techniques that | |||
examines packet beyond packet L3/L4 headers. | examines packet beyond packet L3/L4 headers. | |||
gNMI: gRPC Network Management Interface, a network management | gNMI: gRPC Network Management Interface, a network management | |||
protocol from OpenConfig Operator Working Group, mainly | protocol from OpenConfig Operator Working Group, mainly | |||
contributed by Google. See [gnmi] for details. | contributed by Google. See [gnmi] for details. | |||
GPB: Google Protocol Buffer, an extensible mechanism for serializing | GPB: Google Protocol Buffer, an extensible mechanism for serializing | |||
structured data. | structured data. See [gpb] for details. | |||
gRPC: gRPC Remote Procedure Call, an open source high performance | gRPC: gRPC Remote Procedure Call, an open source high performance | |||
RPC framework that gNMI is based on. See [grpc] for details. | RPC framework that gNMI is based on. See [grpc] for details. | |||
IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. | IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. | |||
IOAM: In-situ OAM, a dataplane on-path telemetry technique. | IOAM: In-situ OAM [I-D.ietf-ippm-ioam-data], a dataplane on-path | |||
telemetry technique. | ||||
JSON: An open standard file format and data interchange format that | JSON: An open standard file format and data interchange format that | |||
uses human-readable text to store and transmit data objects, | uses human-readable text to store and transmit data objects, | |||
specified in [RFC8259]. | specified in [RFC8259]. | |||
MIB: Management Information Base, a database used for managing the | MIB: Management Information Base, a database used for managing the | |||
entities in a network. | entities in a network. | |||
NETCONF: Network Configuration Protocol, specified in [RFC6241]. | NETCONF: Network Configuration Protocol, specified in [RFC6241]. | |||
NetFlow: A Cisco protocol for flow record collecting, described in | NetFlow: A Cisco protocol for flow record collecting, described in | |||
[RFC3594]. | [RFC3954]. | |||
Network Telemetry: The process and instrumentation for acquiring and | Network Telemetry: The process and instrumentation for acquiring and | |||
utilizing network data remotely for network monitoring and | utilizing network data remotely for network monitoring and | |||
operation. A general term for a large set of network visibility | operation. A general term for a large set of network visibility | |||
techniques and protocols, concerning aspects like data generation, | techniques and protocols, concerning aspects like data generation, | |||
collection, correlation, and consumption. Network telemetry | collection, correlation, and consumption. Network telemetry | |||
addresses the current network operation issues and enables smooth | addresses the current network operation issues and enables smooth | |||
evolution toward future intent-driven autonomous networks. | evolution toward future intent-driven autonomous networks. | |||
NMS: Network Management System, referring to applications that allow | NMS: Network Management System, referring to applications that allow | |||
network administrators to manage a network. | network administrators to manage a network. | |||
OAM: Operations, Administration, and Maintenance. A group of | OAM: Operations, Administration, and Maintenance. A group of | |||
network management functions that provide network fault | network management functions that provide network fault | |||
indication, fault localization, performance information, and data | indication, fault localization, performance information, and data | |||
and diagnosis functions. Most conventional network monitoring | and diagnosis functions. Most conventional network monitoring | |||
techniques and protocols belong to network OAM. | techniques and protocols belong to network OAM. | |||
PBT: Postcard-Based Telemetry, a dataplane on-path telemetry | PBT: Postcard-Based Telemetry, a dataplane on-path telemetry | |||
technique. | technique. A representative technique is described in | |||
[I-D.ietf-ippm-ioam-direct-export]. | ||||
RESTCONF: An HTTP-based protocol that provides a programmatic | RESTCONF: An HTTP-based protocol that provides a programmatic | |||
interface for accessing data defined in YANG, using the datastore | interface for accessing data defined in YANG, using the datastore | |||
concepts defined in NETCONF, as specified in [RFC8040]. | concepts defined in NETCONF, as specified in [RFC8040]. | |||
SMIv2 Structure of Management Information Version 2, defining MIB | SMIv2: Structure of Management Information Version 2, defining MIB | |||
objects, specified in [RFC2578]. | objects, specified in [RFC2578]. | |||
SNMP: Simple Network Management Protocol. Version 1 and 2 are | SNMP: Simple Network Management Protocol. Version 1, 2, and 3 are | |||
specified in [RFC1157] and [RFC3416], respectively. | specified in [RFC1157], [RFC3416], and [RFC3414], respectively. | |||
XML; Extensible Markup Language is a markup language for data | XML: Extensible Markup Language is a markup language for data | |||
encoding that is both human-readable and machine-readable, | encoding that is both human-readable and machine-readable, | |||
specified by W3C [xml]. | specified by W3C [xml]. | |||
YANG: YANG is a data modeling language for the definition of data | YANG: YANG is a data modeling language for the definition of data | |||
sent over network management protocols such as the NETCONF and | sent over network management protocols such as the NETCONF and | |||
RESTCONF. YANG is defined in [RFC6020] and [RFC7950]. | RESTCONF. YANG is defined in [RFC6020] and [RFC7950]. | |||
YANG ECA A YANG model for Event-Condition-Action policies, defined | YANG ECA: A YANG model for Event-Condition-Action policies, defined | |||
in [I-D.wwx-netmod-event-yang]. | in [I-D.wwx-netmod-event-yang]. | |||
YANG-Push: A mechanism that allows subscriber applications to | YANG-Push: A mechanism that allows subscriber applications to | |||
request a stream of updates from a YANG datastore on a network | request a stream of updates from a YANG datastore on a network | |||
device. Details are specified in [RFC8641] and [RFC8639]. | device. Details are specified in [RFC8641] and [RFC8639]. | |||
3. Background | 2. Background | |||
The term "big data" is used to describe the extremely large volume of | The term "big data" is used to describe the extremely large volume of | |||
data sets that can be analyzed computationally to reveal patterns, | data sets that can be analyzed computationally to reveal patterns, | |||
trends, and associations. Networks are undoubtedly a source of big | trends, and associations. Networks are undoubtedly a source of big | |||
data because of their scale and the volume of network traffic they | data because of their scale and the volume of network traffic they | |||
forward. When a network's endpoints do not represent individual | forward. When a network's endpoints do not represent individual | |||
users (e.g. in industrial, datacenter, and infrastructure contexts), | users (e.g. in industrial, datacenter, and infrastructure contexts), | |||
network operations can often benefit from large-scale data collection | network operations can often benefit from large-scale data collection | |||
without breaching user privacy. | without breaching user privacy. | |||
skipping to change at page 7, line 28 ¶ | skipping to change at page 7, line 28 ¶ | |||
In the remainder of this section, first we clarify the scope of | In the remainder of this section, first we clarify the scope of | |||
network data (i.e., telemetry data) concerned in the context. Then, | network data (i.e., telemetry data) concerned in the context. Then, | |||
we discuss several key use cases for today's and future network | we discuss several key use cases for today's and future network | |||
operations. Next, we show why the current network OAM techniques and | operations. Next, we show why the current network OAM techniques and | |||
protocols are insufficient for these use cases. The discussion | protocols are insufficient for these use cases. The discussion | |||
underlines the need of new methods, techniques, and protocols, as | underlines the need of new methods, techniques, and protocols, as | |||
well as the extensions of existing ones, which we assign under the | well as the extensions of existing ones, which we assign under the | |||
umbrella term - Network Telemetry. | umbrella term - Network Telemetry. | |||
3.1. Telemetry Data Coverage | 2.1. Telemetry Data Coverage | |||
Any information that can be extracted from networks (including data | Any information that can be extracted from networks (including data | |||
plane, control plane, and management plane) and used to gain | plane, control plane, and management plane) and used to gain | |||
visibility or as basis for actions is considered telemetry data. It | visibility or as basis for actions is considered telemetry data. It | |||
includes statistics, event records and logs, snapshots of state, | includes statistics, event records and logs, snapshots of state, | |||
configuration data, etc. It also covers the outputs of any active | configuration data, etc. It also covers the outputs of any active | |||
and passive measurements [RFC7799]. In some cases, raw data is | and passive measurements [RFC7799]. In some cases, raw data is | |||
processed in network before being sent to a data consumer. Such | processed in network before being sent to a data consumer. Such | |||
processed data is also considered telemetry data. The value of | processed data is also considered telemetry data. The value of | |||
telemetry data varies. Less but higher quality data are often better | telemetry data varies. Less but higher quality data are often better | |||
than lots of low quality data. A classification of telemetry data is | than lots of low quality data. A classification of telemetry data is | |||
provided in Section 4. | provided in Section 3. To preserve user privacy, the user packet | |||
content should not be collected. | ||||
3.2. Use Cases | 2.2. Use Cases | |||
The following set of use cases is essential for network operations. | The following set of use cases is essential for network operations. | |||
While the list is by no means exhaustive, it is enough to highlight | While the list is by no means exhaustive, it is enough to highlight | |||
the requirements for data velocity, variety, volume, and veracity in | the requirements for data velocity, variety, volume, and veracity, | |||
networks. | the attributes of big data, in networks. | |||
* Security: Network intrusion detection and prevention systems need | * Security: Network intrusion detection and prevention systems need | |||
to monitor network traffic and activities and act upon anomalies. | to monitor network traffic and activities and act upon anomalies. | |||
Given increasingly sophisticated attack vector coupled with | Given increasingly sophisticated attack vector coupled with | |||
increasingly severe consequences of security breaches, new tools | increasingly severe consequences of security breaches, new tools | |||
and techniques need to be developed, relying on wider and deeper | and techniques need to be developed, relying on wider and deeper | |||
visibility into networks. The ultimate goal is to achieve the | visibility into networks. The ultimate goal is to achieve the | |||
ideal security with no, or only minimal, human intervention. | ideal security with no, or only minimal, human intervention. | |||
* Policy and Intent Compliance: Network policies are the rules that | * Policy and Intent Compliance: Network policies are the rules that | |||
skipping to change at page 9, line 18 ¶ | skipping to change at page 9, line 19 ¶ | |||
be applied to avoid network congestion. Long-term planning of | be applied to avoid network congestion. Long-term planning of | |||
network capacity and topology requires analysis of real-world | network capacity and topology requires analysis of real-world | |||
network telemetry data that is obtained over long periods of time. | network telemetry data that is obtained over long periods of time. | |||
* Event Tracking and Prediction: The visibility into traffic path | * Event Tracking and Prediction: The visibility into traffic path | |||
and performance is critical for services and applications that | and performance is critical for services and applications that | |||
rely on healthy network operation. Numerous related network | rely on healthy network operation. Numerous related network | |||
events are of interest to network operators. For example, Network | events are of interest to network operators. For example, Network | |||
operators want to learn where and why packets are dropped for an | operators want to learn where and why packets are dropped for an | |||
application flow. They also want to be warned of issues in | application flow. They also want to be warned of issues in | |||
advance so proactive actions can be taken to avoid catastrophic | advance, so proactive actions can be taken to avoid catastrophic | |||
consequences. | consequences. | |||
3.3. Challenges | 2.3. Challenges | |||
For a long time, network operators have relied upon SNMP [RFC3416], | For a long time, network operators have relied upon SNMP [RFC3416], | |||
Command-Line Interface (CLI), or Syslog to monitor the network. Some | Command-Line Interface (CLI), or Syslog [RFC5424] to monitor the | |||
other OAM techniques as described in [RFC7276] are also used to | network. Some other OAM techniques as described in [RFC7276] are | |||
facilitate network troubleshooting. These conventional techniques | also used to facilitate network troubleshooting. These conventional | |||
are not sufficient to support the above use cases for the following | techniques are not sufficient to support the above use cases for the | |||
reasons: | following reasons: | |||
* Most use cases need to continuously monitor the network and | * Most use cases need to continuously monitor the network and | |||
dynamically refine the data collection in real-time. The poll- | dynamically refine the data collection in real-time. The poll- | |||
based low-frequency data collection is ill-suited for these | based low-frequency data collection is ill-suited for these | |||
applications. Subscription-based streaming data directly pushed | applications. Subscription-based streaming data directly pushed | |||
from the data source (e.g., the forwarding chip) is preferred to | from the data source (e.g., the forwarding chip) is preferred to | |||
provide enough data quantity and precision at scale. | provide enough data quantity and precision at scale. | |||
* Comprehensive data is needed from packet processing engine to | * Comprehensive data is needed from packet processing engine to | |||
traffic manager, from line cards to main control board, from user | traffic manager, from line cards to main control board, from user | |||
flows to control protocol packets, from device configurations to | flows to control protocol packets, from device configurations to | |||
operations, and from physical layer to application layer. | operations, and from physical layer to application layer. | |||
Conventional OAM only covers a narrow range of data (e.g., SNMP | Conventional OAM only covers a narrow range of data (e.g., SNMP | |||
only handles data from the Management Information Base (MIB)). | only handles data from the Management Information Base (MIB)). | |||
Traditional network devices cannot provide all the necessary | Classical network devices cannot provide all the necessary probes. | |||
probes. More open and programmable network devices are therefore | More open and programmable network devices are therefore needed. | |||
needed. | ||||
* Many application scenarios need to correlate network-wide data | * Many application scenarios need to correlate network-wide data | |||
from multiple sources (i.e., from distributed network devices, | from multiple sources (i.e., from distributed network devices, | |||
different components of a network device, or different network | different components of a network device, or different network | |||
planes). A piecemeal solution is often lacking the capability to | planes). A piecemeal solution is often lacking the capability to | |||
consolidate the data from multiple sources. The composition of a | consolidate the data from multiple sources. The composition of a | |||
complete solution, as partly proposed by Autonomic Resource | complete solution, as partly proposed by Autonomic Resource | |||
Control Architecture(ARCA) | Control Architecture(ARCA) | |||
[I-D.pedro-nmrg-anticipated-adaptation], will be empowered and | [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and | |||
guided by a comprehensive framework. | guided by a comprehensive framework. | |||
* Some conventional OAM techniques (e.g., CLI and Syslog) lack a | * Some conventional OAM techniques (e.g., CLI and Syslog) lack a | |||
formal data model. The unstructured data hinder the tool | formal data model. The unstructured data hinder the tool | |||
automation and application extensibility. Standardized data | automation and application extensibility. Standardized data | |||
models are essential to support the programmable networks. | models are essential to support the programmable networks. | |||
* Although some conventional OAM techniques support data push (e.g., | * Although some conventional OAM techniques support data push (e.g., | |||
SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data | SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow [RFC3176]), the | |||
are limited to only predefined management plane warnings (e.g., | pushed data are limited to only predefined management plane | |||
SNMP Trap) or sampled user packets (e.g., sFlow). Network | warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow). | |||
operators require the data with arbitrary source, granularity, and | Network operators require the data with arbitrary source, | |||
precision which are beyond the capability of the existing | granularity, and precision which are beyond the capability of the | |||
techniques. | existing techniques. | |||
* The conventional passive measurement techniques can either consume | * The conventional passive measurement techniques can either consume | |||
excessive network resources and render excessive redundant data, | excessive network resources and render excessive redundant data, | |||
or lead to inaccurate results; on the other hand, the conventional | or lead to inaccurate results; on the other hand, the conventional | |||
active measurement techniques can interfere with the user traffic | active measurement techniques can interfere with the user traffic | |||
and their results are indirect. Techniques that can collect | and their results are indirect. Techniques that can collect | |||
direct and on-demand data from user traffic are more favorable. | direct and on-demand data from user traffic are more favorable. | |||
These challenges were addressed by newer standards and techniques | These challenges were addressed by newer standards and techniques | |||
(e.g., IPFIX/Netflow, PSAMP, IOAM, and YANG-Push) and more are | (e.g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push) | |||
emerging. These standards and techniques need to be recognized and | and more are emerging. These standards and techniques need to be | |||
accommodated in a new framework. | recognized and accommodated in a new framework. | |||
3.4. Network Telemetry | 2.4. Network Telemetry | |||
Network telemetry has emerged as a mainstream technical term to refer | Network telemetry has emerged as a mainstream technical term to refer | |||
to the network data collection and consumption techniques. Several | to the network data collection and consumption techniques. Several | |||
network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and | network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and | |||
gRPC [grpc]) have been widely deployed. Network telemetry allows | gRPC [grpc]) have been widely deployed. Network telemetry allows | |||
separate entities to acquire data from network devices so that data | separate entities to acquire data from network devices so that data | |||
can be visualized and analyzed to support network monitoring and | can be visualized and analyzed to support network monitoring and | |||
operation. Network telemetry covers the conventional network OAM and | operation. Network telemetry covers the conventional network OAM and | |||
has a wider scope. It is expected that network telemetry can provide | has a wider scope. It is expected that network telemetry can provide | |||
the necessary network insight for autonomous networks and address the | the necessary network insight for autonomous networks and address the | |||
skipping to change at page 11, line 39 ¶ | skipping to change at page 11, line 39 ¶ | |||
overall network automation needs. Efforts are made to normalize | overall network automation needs. Efforts are made to normalize | |||
the data representation and unify the protocols, so to simplify | the data representation and unify the protocols, so to simplify | |||
data analysis and provide integrated analysis across heterogeneous | data analysis and provide integrated analysis across heterogeneous | |||
devices and data sources across a network. | devices and data sources across a network. | |||
* Model-based: The telemetry data is modeled in advance which allows | * Model-based: The telemetry data is modeled in advance which allows | |||
applications to configure and consume data with ease. | applications to configure and consume data with ease. | |||
* Data Fusion: The data for a single application can come from | * Data Fusion: The data for a single application can come from | |||
multiple data sources (e.g., cross-domain, cross-device, and | multiple data sources (e.g., cross-domain, cross-device, and | |||
cross-layer) and needs to be correlated to take effect. | cross-layer) based on common naming/ID and needs to be correlated | |||
to take effect. | ||||
* Dynamic and Interactive: Since the network telemetry means to be | * Dynamic and Interactive: Since the network telemetry means to be | |||
used in a closed control loop for network automation, it needs to | used in a closed control loop for network automation, it needs to | |||
run continuously and adapt to the dynamic and interactive queries | run continuously and adapt to the dynamic and interactive queries | |||
from the network operation controller. | from the network operation controller. | |||
In addition, an ideal network telemetry solution may also have the | In addition, an ideal network telemetry solution may also have the | |||
following features or properties: | following features or properties: | |||
* In-Network Customization: The data that is generated can be | * In-Network Customization: The data that is generated can be | |||
skipping to change at page 13, line 5 ¶ | skipping to change at page 13, line 5 ¶ | |||
Although in many cases a system for network telemetry involves a | Although in many cases a system for network telemetry involves a | |||
remote data collecting and consuming entity, it is important to | remote data collecting and consuming entity, it is important to | |||
understand that there are no inherent assumptions about how a system | understand that there are no inherent assumptions about how a system | |||
should be architected. While a network architecture with centralized | should be architected. While a network architecture with centralized | |||
controller (e.g., SDN) seems a natural fit for network telemetry, | controller (e.g., SDN) seems a natural fit for network telemetry, | |||
network telemetry can work in distributed fashions as well. For | network telemetry can work in distributed fashions as well. For | |||
example, telemetry data producers and consumers can have a peer-to- | example, telemetry data producers and consumers can have a peer-to- | |||
peer relationship, in which a network node can be the direct consumer | peer relationship, in which a network node can be the direct consumer | |||
of telemetry data from other nodes. | of telemetry data from other nodes. | |||
3.5. The Necessity of a Network Telemetry Framework | 2.5. The Necessity of a Network Telemetry Framework | |||
Network data analytics and machine-learning technologies are applied | Network data analytics and machine-learning technologies are applied | |||
for network operation automation, relying on abundant and coherent | for network operation automation, relying on abundant and coherent | |||
data from networks. Data acquisition that is limited to a single | data from networks. Data acquisition that is limited to a single | |||
source and static in nature will in many cases not be sufficient to | source and static in nature will in many cases not be sufficient to | |||
meet an application's telemetry data needs. As a result, multiple | meet an application's telemetry data needs. As a result, multiple | |||
data sources, involving a variety of techniques and standards, will | data sources, involving a variety of techniques and standards, will | |||
need to be integrated. It is desirable to have a framework that | need to be integrated. It is desirable to have a framework that | |||
classifies and organizes different telemetry data source and types, | classifies and organizes different telemetry data source and types, | |||
defines different components of a network telemetry system and their | defines different components of a network telemetry system and their | |||
skipping to change at page 13, line 49 ¶ | skipping to change at page 13, line 49 ¶ | |||
comprehensive information. | comprehensive information. | |||
* Applications require network telemetry to be elastic in order to | * Applications require network telemetry to be elastic in order to | |||
make efficient use of network resources and reduce the impact of | make efficient use of network resources and reduce the impact of | |||
processing related to network telemetry on network performance. | processing related to network telemetry on network performance. | |||
For example, routine network monitoring should cover the entire | For example, routine network monitoring should cover the entire | |||
network with a low data sampling rate. Only when issues arise or | network with a low data sampling rate. Only when issues arise or | |||
critical trends emerge should telemetry data source be modified | critical trends emerge should telemetry data source be modified | |||
and telemetry data rates boosted as needed. | and telemetry data rates boosted as needed. | |||
* Efficient data fusion is critical for applications to reduce the | * Efficient data aggregation is critical for applications to reduce | |||
overall quantity of data and improve the accuracy of analysis. | the overall quantity of data and improve the accuracy of analysis. | |||
A telemetry framework collects together all the telemetry-related | A telemetry framework collects together all the telemetry-related | |||
works from different sources and working groups within IETF. This | works from different sources and working groups within IETF. This | |||
makes it possible to assemble a comprehensive network telemetry | makes it possible to assemble a comprehensive network telemetry | |||
system and to avoid repetitious or redundant work. The framework | system and to avoid repetitious or redundant work. The framework | |||
should cover the concepts and components from the standardization | should cover the concepts and components from the standardization | |||
perspective. This document describes the modules which make up a | perspective. This document describes the modules which make up a | |||
network telemetry framework and decomposes the telemetry system into | network telemetry framework and decomposes the telemetry system into | |||
a set of distinct components that existing and future work can easily | a set of distinct components that existing and future work can easily | |||
map to. | map to. | |||
4. Network Telemetry Framework | Disclaimer: large-scale network data collection is a major threat to | |||
user privacy [RFC7258]. The network telemetry framework presented in | ||||
this document should not be applied to collect and retain individual | ||||
user data or any data that can identify end users without consent. | ||||
Any data collection or retention using the framework must be tightly | ||||
limited to protect user privacy. | ||||
3. Network Telemetry Framework | ||||
The top level network telemetry framework partitions the network | The top level network telemetry framework partitions the network | |||
telemetry into four modules based on the telemetry data object source | telemetry into four modules based on the telemetry data object source | |||
and represents their relationship. At the next level, the framework | and represents their relationship. At the next level, the framework | |||
decomposes each module into separate components. Each of the modules | decomposes each module into separate components. Each of the modules | |||
follows the same underlying structure, with one component dedicated | follows the same underlying structure, with one component dedicated | |||
to the configuration of data subscriptions and data sources, a second | to the configuration of data subscriptions and data sources, a second | |||
component dedicated to encoding and exporting data, and a third | component dedicated to encoding and exporting data, and a third | |||
component instrumenting the generation of telemetry related to the | component instrumenting the generation of telemetry related to the | |||
underlying resources. Throughout the framework, the same set of | underlying resources. Throughout the framework, the same set of | |||
abstract data acquiring mechanisms and data types (Section 4.3) are | abstract data acquiring mechanisms and data types (Section 3.3) are | |||
applied. The two-level architecture with the uniform data | applied. The two-level architecture with the uniform data | |||
abstraction helps accurately pinpoint a protocol or technique to its | abstraction helps accurately pinpoint a protocol or technique to its | |||
position in a network telemetry system or disaggregate a network | position in a network telemetry system or disaggregate a network | |||
telemetry system into manageable parts. | telemetry system into manageable parts. | |||
4.1. Top Level Modules | 3.1. Top Level Modules | |||
Telemetry can be applied on the forwarding plane, the control plane, | Telemetry can be applied on the forwarding plane, the control plane, | |||
and the management plane in a network, as well as other sources out | and the management plane in a network, as well as other sources out | |||
of the network, as shown in Figure 1. Therefore, we categorize the | of the network, as shown in Figure 1. Therefore, we categorize the | |||
network telemetry into four distinct modules with each having its own | network telemetry into four distinct modules with each having its own | |||
interface to Network Operation Applications. | interface to Network Operation Applications. | |||
+------------------------------+ | +------------------------------+ | |||
| | | | | | |||
| Network Operation |<-------+ | | Network Operation |<-------+ | |||
skipping to change at page 15, line 39 ¶ | skipping to change at page 15, line 39 ¶ | |||
Figure 1: Modules in Layer Category of NTF | Figure 1: Modules in Layer Category of NTF | |||
The rationale of this partition lies in the different telemetry data | The rationale of this partition lies in the different telemetry data | |||
objects which result in different data source and export locations. | objects which result in different data source and export locations. | |||
Such differences have profound implications on in-network data | Such differences have profound implications on in-network data | |||
programming and processing capability, data encoding and transport | programming and processing capability, data encoding and transport | |||
protocol, and required data bandwidth and latency. Data can be sent | protocol, and required data bandwidth and latency. Data can be sent | |||
directly, or proxied via the control and management planes. There | directly, or proxied via the control and management planes. There | |||
are advantages/disadvantages to both approaches. | are advantages/disadvantages to both approaches. | |||
Note that in some cases the network controller itself may be the | ||||
source of telemetry data that is unique to it or derived from the | ||||
telemetry data collected from the network elements. Some of the | ||||
principles and taxonomy specific to the control plane and management | ||||
plane telemetry could also be applied to the controller when it is | ||||
required to provide the telemetry data to Network Operation | ||||
Applications hosted outside. The scope of the document is focused on | ||||
the network elements telemetry and further details related to | ||||
controllers are thus out of scope. | ||||
We summarize the major differences of the four modules in the | We summarize the major differences of the four modules in the | |||
following table. They are compared from six angles: | following table. They are compared from six angles: | |||
* Data Object | * Data Object | |||
* Data Export Location | * Data Export Location | |||
* Data Model | * Data Model | |||
* Data Encoding | * Data Encoding | |||
* Telemetry Protocol | * Telemetry Application Protocol | |||
* Data Transport Method | ||||
* Transport Method | ||||
Data Object is the target and source of each module. Because the | Data Object is the target and source of each module. Because the | |||
data source varies, the location where data is mostly conveniently | data source varies, the location where data is mostly conveniently | |||
exported also varies. For example, forwarding plane data mainly | exported also varies. For example, forwarding plane data mainly | |||
originates as data exported from the forwarding ASICs, while control | originates as data exported from the forwarding Application-Specific | |||
plane data mainly originates from the protocol daemons running on the | Integrated Circuits (ASICs), while control plane data mainly | |||
control CPU(s). For convenience and efficiency, it is preferred to | originates from the protocol daemons running on the control CPU(s). | |||
export the data off the device from locations near the source. | For convenience and efficiency, it is preferred to export the data | |||
Because the locations that can export data have different | off the device from locations near the source. Because the locations | |||
capabilities, different choices of data model, encoding, and | that can export data have different capabilities, different choices | |||
transport method are made to balance the performance and cost. For | of data model, encoding, and transport method are made to balance the | |||
example, the forwarding chip has high throughput but limited capacity | performance and cost. For example, the forwarding chip has high | |||
for processing complex data and maintaining states, while the main | throughput but limited capacity for processing complex data and | |||
control CPU is capable of complex data and state processing, but has | maintaining states, while the main control CPU is capable of complex | |||
limited bandwidth for high throughput data. As a result, the | data and state processing, but has limited bandwidth for high | |||
suitable telemetry protocol for each module can be different. Some | throughput data. As a result, the suitable telemetry protocol for | |||
representative techniques are shown in the corresponding table blocks | each module can be different. Some representative techniques are | |||
to highlight the technical diversity of these modules. Note that the | shown in the corresponding table blocks to highlight the technical | |||
selected techniques just reflect the de facto state of the art and | diversity of these modules. Note that the selected techniques just | |||
are by no means exhaustive (e.g., IPFIX can also be implemented over | reflect the de facto state of the art and are by no means exhaustive | |||
TCP and SCTP but that is not recommended for forwarding plane). The | (e.g., IPFIX can also be implemented over TCP and SCTP, but that is | |||
key point is that one cannot expect to use a universal protocol to | not recommended for forwarding plane). The key point is that one | |||
cover all the network telemetry requirements. | cannot expect to use a universal protocol to cover all the network | |||
telemetry requirements. | ||||
+-----------+-------------+-------------+--------------+----------+ | +-----------+-------------+-------------+--------------+----------+ | |||
| Module |Management |Control |Forwarding |External | | | Module |Management |Control |Forwarding |External | | |||
| |Plane |Plane |Plane |Data | | | |Plane |Plane |Plane |Data | | |||
+-----------+-------------+-------------+--------------+----------+ | +-----------+-------------+-------------+--------------+----------+ | |||
|Object |config. & |control |flow & packet |terminal, | | |Object |config. & |control |flow & packet |terminal, | | |||
| |operation |protocol & |QoS, traffic |social & | | | |operation |protocol & |QoS, traffic |social & | | |||
| |state |signaling, |stat., buffer |environ- | | | |state |signaling, |stat., buffer |environ- | | |||
| | |RIB |& queue stat.,|mental | | | | |RIB |& queue stat.,|mental | | |||
| | | |ACL, FIB | | | | | | |ACL, FIB | | | |||
+-----------+-------------+-------------+--------------+----------+ | +-----------+-------------+-------------+--------------+----------+ | |||
|Export |main control |main control |fwding chip |various | | |Export |main control |main control |fwding chip |various | | |||
|Location |CPU |CPU, |or linecard | | | |Location |CPU |CPU, |or linecard | | | |||
| | |linecard CPU |CPU; main | | | | | |linecard CPU |CPU; main | | | |||
| | |or forwarding|control CPU | | | | | |or forwarding|control CPU | | | |||
| | |chip |unlikely | | | | | |chip |unlikely | | | |||
+-----------+-------------+-------------+--------------+----------+ | +-----------+-------------+-------------+--------------+----------+ | |||
|Data |YANG, MIB, |YANG, |template, |YANG, | | |Data |YANG, MIB, |YANG, |YANG |YANG, | | |||
|Model |syslog |custom |YANG, |custom | | |Model |syslog |custom |custom, |custom | | |||
| | | |custom | | | ||||
+-----------+-------------+-------------+--------------+----------+ | +-----------+-------------+-------------+--------------+----------+ | |||
|Data |GPB, JSON, |GPB, JSON, |plain |GPB, JSON | | |Data |GPB, JSON, |GPB, JSON, |plain text |GPB, JSON | | |||
|Encoding |XML |XML, plain | |XML, plain| | |Encoding |XML |XML, | |XML, plain| | |||
| | |plain text | |text | | ||||
+-----------+-------------+-------------+--------------+----------+ | +-----------+-------------+-------------+--------------+----------+ | |||
|Application|gRPC,NETCONF,|gRPC,NETCONF,|IPFIX, mirror,|gRPC | | |Application|gRPC,NETCONF,|gRPC,NETCONF,|IPFIX, traffic|gRPC | | |||
|Protocol |RESTCONF |IPFIX, mirror|gRPC, NETFLOW | | | |Protocol |RESTCONF |IPFIX,traffic|mirroring, | | | |||
| | |mirroring |gRPC, NETFLOW | | | ||||
+-----------+-------------+-------------+--------------+----------+ | +-----------+-------------+-------------+--------------+----------+ | |||
|Data |HTTP, TCP |HTTP, TCP, |UDP |HTTP,TCP | | |Data |HTTP(S), TCP |HTTP(S), TCP,|UDP |HTTP(S), | | |||
|Transport | |UDP | |UDP | | |Transport | |UDP | |TCP, UDP | | |||
+-----------+-------------+-------------+--------------+----------+ | +-----------+-------------+-------------+--------------+----------+ | |||
Figure 2: Comparison of the Data Object Modules | Figure 2: Comparison of the Data Object Modules | |||
Note that the interaction with the applications that consume network | Note that the interaction with the applications that consume network | |||
telemetry data can be indirect. Some in-device data transfer is | telemetry data can be indirect. Some in-device data transfer is | |||
possible. For example, in the management plane telemetry, the | possible. For example, in the management plane telemetry, the | |||
management plane will need to acquire data from the data plane. Some | management plane will need to acquire data from the data plane. Some | |||
operational states can only be derived from data plane data sources | operational states can only be derived from data plane data sources | |||
such as the interface status and statistics. As another example, | such as the interface status and statistics. As another example, | |||
skipping to change at page 18, line 10 ¶ | skipping to change at page 18, line 10 ¶ | |||
On the other hand, an application may involve more than one plane and | On the other hand, an application may involve more than one plane and | |||
interact with multiple planes simultaneously. For example, an SLA | interact with multiple planes simultaneously. For example, an SLA | |||
compliance application may require both the data plane telemetry and | compliance application may require both the data plane telemetry and | |||
the control plane telemetry. | the control plane telemetry. | |||
The requirements and challenges for each module are summarized as | The requirements and challenges for each module are summarized as | |||
follows (note that the requirements may pertain across all telemetry | follows (note that the requirements may pertain across all telemetry | |||
modules; however, we emphasize those that are most pronounced for a | modules; however, we emphasize those that are most pronounced for a | |||
particular plane). | particular plane). | |||
4.1.1. Management Plane Telemetry | 3.1.1. Management Plane Telemetry | |||
The management plane of network elements interacts with the Network | The management plane of network elements interacts with the Network | |||
Management System (NMS), and provides information such as performance | Management System (NMS), and provides information such as performance | |||
data, network logging data, network warning and defects data, and | data, network logging data, network warning and defects data, and | |||
network statistics and state data. The management plane includes | network statistics and state data. The management plane includes | |||
many protocols, including some that are considered "legacy", such as | many protocols, including some that are considered "legacy", such as | |||
SNMP and syslog. Regardless the protocol, management plane telemetry | SNMP and syslog. Regardless the protocol, management plane telemetry | |||
must address the following requirements: | must address the following requirements: | |||
* Convenient Data Subscription: An application should have the | * Convenient Data Subscription: An application should have the | |||
freedom to choose which data is exported (see section 4.3) and the | freedom to choose which data is exported (see section 4.3) and the | |||
means and frequency of how that data is exported (e.g., on-change | means and frequency of how that data is exported (e.g., on-change | |||
or periodic subscription). | or periodic subscription). | |||
* Structured Data: For automatic network operation, machines will | * Structured Data: For automatic network operation, machines will | |||
replace human for network data comprehension. Data modeling | replace human for network data comprehension. Data modeling | |||
languages, such as YANG, can efficiently describe structured data | languages, such as YANG, can efficiently describe structured data | |||
and normalize data encoding and transformation. | and normalize data encoding and transformation. | |||
* High Speed Data Transport: In order to keep up with the velocity | * High Speed Data Transport: In order to keep up with the velocity | |||
of information, a server needs to be able to send large amounts of | of information, a data source needs to be able to send large | |||
data at high frequency. Compact encoding formats or data | amounts of data at high frequency. Compact encoding formats or | |||
compression schemes are needed to reduce the quantity of data and | data compression schemes are needed to reduce the quantity of data | |||
improve the data transport efficiency. The subscription mode, by | and improve the data transport efficiency. The subscription mode, | |||
replacing the query mode, reduces the interactions between clients | by replacing the query mode, reduces the interactions between | |||
and servers and helps to improve the server's efficiency. | clients and servers and helps to improve the data source's | |||
efficiency. | ||||
* Network Congestion Avoidance: The application must protect the | * Network Congestion Avoidance: The application must protect the | |||
network from congestion by congestion control mechanisms or at | network from congestion by congestion control mechanisms or at | |||
least circuit breakers. [RFC8084] and [RFC8085] provide some | least circuit breakers. [RFC8084] and [RFC8085] provide some | |||
solutions in this space. | solutions in this space. | |||
4.1.2. Control Plane Telemetry | 3.1.2. Control Plane Telemetry | |||
The control plane telemetry refers to the health condition monitoring | The control plane telemetry refers to the health condition monitoring | |||
of different network control protocols at all layers of the protocol | of different network control protocols at all layers of the protocol | |||
stack. Keeping track of the operational status of these protocols is | stack. Keeping track of the operational status of these protocols is | |||
beneficial for detecting, localizing, and even predicting various | beneficial for detecting, localizing, and even predicting various | |||
network issues, as well as network optimization, in real-time and | network issues, as well as network optimization, in real-time and | |||
with fine granularity. Some particular challenges and issues faced | with fine granularity. Some particular challenges and issues faced | |||
by the control plane telemetry are as follows: | by the control plane telemetry are as follows: | |||
* One challenging problem for the control plane telemetry is how to | * One challenging problem for the control plane telemetry is how to | |||
correlate the End-to-End (E2E) Key Performance Indicators (KPI) to | correlate the End-to-End (E2E) Key Performance Indicators (KPI) to | |||
a specific layer's KPIs. For example, an IPTV user may describe | a specific layer's KPIs. For example, IPTV users may describe | |||
his User Experience (UE) by the video fluency and definition. | their User Experience (UE) by the video smoothness and definition. | |||
Then in case of an unusually poor UE KPI or a service | Then in case of an unusually poor UE KPI or a service | |||
disconnection, it is non-trivial to delimit and pinpoint the issue | disconnection, it is non-trivial to delimit and pinpoint the issue | |||
in the responsible protocol layer (e.g., the Transport Layer or | in the responsible protocol layer (e.g., the Transport Layer or | |||
the Network Layer), the responsible protocol (e.g., ISIS or BGP at | the Network Layer), the responsible protocol (e.g., ISIS or BGP at | |||
the Network Layer), and finally the responsible device(s) with | the Network Layer), and finally the responsible device(s) with | |||
specific reasons. | specific reasons. | |||
* Traditional OAM-based approaches for control plane KPI measurement | * Conventional OAM-based approaches for control plane KPI | |||
include Ping (L3), Traceroute (L3), Y.1731 (L2), and so on. One | measurement include Ping (L3), Traceroute (L3), Y.1731 [y1731] | |||
common issue behind these methods is that they only measure the | (L2), and so on. One common issue behind these methods is that | |||
KPIs instead of reflecting the actual running status of these | they only measure the KPIs instead of reflecting the actual | |||
protocols, making them less effective or efficient for control | running status of these protocols, making them less effective or | |||
plane troubleshooting and network optimization. | efficient for control plane troubleshooting and network | |||
optimization. | ||||
* An example of the control plane telemetry is the BGP monitoring | * An example of the control plane telemetry is the BGP monitoring | |||
protocol (BMP), it is currently used for monitoring the BGP routes | protocol (BMP), it is currently used for monitoring the BGP routes | |||
and enables rich applications, such as BGP peer analysis, AS | and enables rich applications, such as BGP peer analysis, AS | |||
analysis, prefix analysis, and security analysis. However, the | analysis, prefix analysis, and security analysis. However, the | |||
monitoring of other layers, protocols and the cross-layer, cross- | monitoring of other layers, protocols and the cross-layer, cross- | |||
protocol KPI correlations are still in their infancy (e.g., IGP | protocol KPI correlations are still in their infancy (e.g., IGP | |||
monitoring is not as extensive as BMP), which require further | monitoring is not as extensive as BMP), which require further | |||
research. | research. | |||
* The requirement and solutions for network congestion avoidance are | * The requirement and solutions for network congestion avoidance are | |||
also applicable to the control plane telemetry. | also applicable to the control plane telemetry. | |||
4.1.3. Forwarding Plane Telemetry | 3.1.3. Forwarding Plane Telemetry | |||
An effective forwarding plane telemetry system relies on the data | An effective forwarding plane telemetry system relies on the data | |||
that the network device can expose. The quality, quantity, and | that the network device can expose. The quality, quantity, and | |||
timeliness of data must meet some stringent requirements. This | timeliness of data must meet some stringent requirements. This | |||
raises some challenges to the network data plane devices where the | raises some challenges to the network data plane devices where the | |||
first-hand data originates. | first-hand data originates. | |||
* A data plane device's main function is user traffic processing and | * A data plane device's main function is user traffic processing and | |||
forwarding. While supporting network visibility is important, the | forwarding. While supporting network visibility is important, the | |||
telemetry is just an auxiliary function, and it should strive to | telemetry is just an auxiliary function, and it should strive to | |||
skipping to change at page 21, line 8 ¶ | skipping to change at page 21, line 8 ¶ | |||
equipped with advanced telemetry features and provide flexibility to | equipped with advanced telemetry features and provide flexibility to | |||
support customized telemetry functions. | support customized telemetry functions. | |||
Technique Taxonomy: concerning about how one instruments the | Technique Taxonomy: concerning about how one instruments the | |||
telemetry, there can be multiple possible dimensions to classify the | telemetry, there can be multiple possible dimensions to classify the | |||
forwarding plane telemetry techniques. | forwarding plane telemetry techniques. | |||
* Active, Passive, and Hybrid: This dimension concerns about the | * Active, Passive, and Hybrid: This dimension concerns about the | |||
end-to-end measurement. Active and passive methods (as well as | end-to-end measurement. Active and passive methods (as well as | |||
the hybrid types) are well documented in [RFC7799]. Passive | the hybrid types) are well documented in [RFC7799]. Passive | |||
methods include TCPDUMP, IPFIX [RFC7011], sflow, and traffic | methods include TCPDUMP, IPFIX [RFC7011], sFlow, and traffic | |||
mirroring. These methods usually have low data coverage. The | mirroring. These methods usually have low data coverage. The | |||
bandwidth cost is very high in order to improve the data coverage. | bandwidth cost is very high in order to improve the data coverage. | |||
On the other hand, active methods include Ping, OWAMP [RFC4656], | On the other hand, active methods include Ping, OWAMP [RFC4656], | |||
TWAMP [RFC5357], STAMP [RFC8762], and Cisco's SLA Protocol | TWAMP [RFC5357], STAMP [RFC8762], and Cisco's SLA Protocol | |||
[RFC6812]. These methods are intrusive and only provide indirect | [RFC6812]. These methods are intrusive and only provide indirect | |||
network measurements. Hybrid methods, including in-situ OAM | network measurements. Hybrid methods, including in-situ OAM | |||
[I-D.ietf-ippm-ioam-data], Alternate-Marking (AM) [RFC8321], and | [I-D.ietf-ippm-ioam-data], Alternate-Marking (AM) [RFC8321], and | |||
Multipoint Alternate Marking [I-D.ietf-ippm-multipoint-alt-mark], | Multipoint Alternate Marking [RFC8889], provide a well-balanced | |||
provide a well-balanced and more flexible approach. However, | and more flexible approach. However, these methods are also more | |||
these methods are also more complex to implement. | complex to implement. | |||
* In-Band and Out-of-Band: Telemetry data carried in user packets | * In-Band and Out-of-Band: Telemetry data carried in user packets | |||
before being exported to a data collector is considered in-band | before being exported to a data collector is considered in-band | |||
(e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]). Telemetry data | (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]). Telemetry data | |||
that is directly exported to a data collector without modifying | that is directly exported to a data collector without modifying | |||
user packets is considered out-of-band (e.g., the postcard-based | user packets is considered out-of-band (e.g., the postcard-based | |||
approach described in Appendix A.3.5). It is also possible to | approach described in Appendix A.3.5). It is also possible to | |||
have hybrid methods, where only the telemetry instruction or | have hybrid methods, where only the telemetry instruction or | |||
partial data is carried by user packets (e.g., AM [RFC8321]). | partial data is carried by user packets (e.g., AM [RFC8321]). | |||
skipping to change at page 21, line 40 ¶ | skipping to change at page 21, line 40 ¶ | |||
at, the network end hosts (e.g., Ping). In-Network methods work | at, the network end hosts (e.g., Ping). In-Network methods work | |||
in networks and are transparent to end hosts. However, if needed, | in networks and are transparent to end hosts. However, if needed, | |||
In-Network methods can be easily extended into end hosts. | In-Network methods can be easily extended into end hosts. | |||
* Data Subject: Depending on the telemetry objective, the methods | * Data Subject: Depending on the telemetry objective, the methods | |||
can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), | can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), | |||
path-based (e.g., Traceroute), and node-based (e.g., IPFIX | path-based (e.g., Traceroute), and node-based (e.g., IPFIX | |||
[RFC7011]). The various data objects can be packet, flow record, | [RFC7011]). The various data objects can be packet, flow record, | |||
measurement, states, and signal. | measurement, states, and signal. | |||
4.1.4. External Data Telemetry | 3.1.4. External Data Telemetry | |||
Events that occur outside the boundaries of the network system are | Events that occur outside the boundaries of the network system are | |||
another important source of network telemetry. Correlating both | another important source of network telemetry. Correlating both | |||
internal telemetry data and external events with the requirements of | internal telemetry data and external events with the requirements of | |||
network systems, as presented in | network systems, as presented in | |||
[I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and | [I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and | |||
functional advantage to management operations. | functional advantage to management operations. | |||
As with other sources of telemetry information, the data and events | As with other sources of telemetry information, the data and events | |||
must meet strict requirements, especially in terms of timeliness, | must meet strict requirements, especially in terms of timeliness, | |||
which is essential to properly incorporate external event information | which is essential to properly incorporate external event information | |||
into network management applications. The specific challenges are | into network management applications. The specific challenges are | |||
described as follows: | described as follows: | |||
* The role of the external event detector can be played by multiple | * The role of the external event detector can be played by multiple | |||
elements, including hardware (e.g., physical sensors, such as | elements, including hardware (e.g., physical sensors, such as | |||
seismometers) and software (e.g., Big Data sources that analyze | seismometers) and software (e.g., Big Data sources that can | |||
streams of information, such as Twitter messages). Thus, the | analyze streams of information, such as Twitter messages). Thus, | |||
transmitted data must support different shapes but, at the same | the transmitted data must support different shapes but, at the | |||
time, follow a common but extensible schema. | same time, follow a common but extensible schema. | |||
* Since the main function of the external event detectors is to | * Since the main function of the external event detectors is to | |||
perform the notifications, their timeliness is assumed. However, | perform the notifications, their timeliness is assumed. However, | |||
once messages have been dispatched, they must be quickly collected | once messages have been dispatched, they must be quickly collected | |||
and inserted into the control plane with variable priority, which | and inserted into the control plane with variable priority, which | |||
is higher for important sources and events and lower for secondary | is higher for important sources and events and lower for secondary | |||
ones. | ones. | |||
* The schema used by external detectors must be easily adopted by | * The schema used by external detectors must be easily adopted by | |||
current and future devices and applications. Therefore, it must | current and future devices and applications. Therefore, it must | |||
skipping to change at page 22, line 41 ¶ | skipping to change at page 22, line 41 ¶ | |||
of congestion is even more relevant in this context and proper | of congestion is even more relevant in this context and proper | |||
counter-measures must be taken. Solutions such as network | counter-measures must be taken. Solutions such as network | |||
transport circuit breakers are needed as well. | transport circuit breakers are needed as well. | |||
Organizing both internal and external telemetry information together | Organizing both internal and external telemetry information together | |||
will be key for the general exploitation of the management | will be key for the general exploitation of the management | |||
possibilities of current and future network systems, as reflected in | possibilities of current and future network systems, as reflected in | |||
the incorporation of cognitive capabilities to new hardware and | the incorporation of cognitive capabilities to new hardware and | |||
software (virtual) elements. | software (virtual) elements. | |||
4.2. Second Level Function Components | 3.2. Second Level Function Components | |||
The telemetry module at each plane can be further partitioned into | The telemetry module at each plane can be further partitioned into | |||
five distinct conceptual components: | five distinct conceptual components: | |||
* Data Query, Analysis, and Storage: This component works at the | * Data Query, Analysis, and Storage: This component works at the | |||
application layer. It is normally a part of the network | application layer. It is normally a part of the network | |||
management system at the receiver side. On the one hand, it is | management system at the receiver side. On the one hand, it is | |||
responsible for issuing data requirements. The data of interest | responsible for issuing data requirements. The data of interest | |||
can be modeled data through configuration or custom data through | can be modeled data through configuration or custom data through | |||
programming. The data requirements can be queries for one-shot | programming. The data requirements can be queries for one-shot | |||
skipping to change at page 24, line 34 ¶ | skipping to change at page 24, line 34 ¶ | |||
| & Processing | | | | | & Processing | | | | |||
| | | | | | | | | | |||
+----------------------------------------| | | | +----------------------------------------| | | | |||
| | | | | | | | | | |||
| Data Object and Source | |-+ | | Data Object and Source | |-+ | |||
| |-+ | | |-+ | |||
+----------------------------------------+ | +----------------------------------------+ | |||
Figure 3: Components in the Network Telemetry Framework | Figure 3: Components in the Network Telemetry Framework | |||
4.3. Data Acquisition Mechanism and Type Abstraction | 3.3. Data Acquisition Mechanism and Type Abstraction | |||
Broadly speaking, network data can be acquired through subscription | Broadly speaking, network data can be acquired through subscription | |||
(push) and query (poll). A subscription is a contract between | (push) and query (poll). A subscription is a contract between | |||
publisher and subscriber. After initial setup, the subscribed data | publisher and subscriber. After initial setup, the subscribed data | |||
is automatically delivered to registered subscribers until the | is automatically delivered to registered subscribers until the | |||
subscription expires. There are two variations of subscription. The | subscription expires. There are two variations of subscription. The | |||
subscriptions can be either pre-defined, or the subscribers are | subscriptions can be either pre-defined, or the subscribers are | |||
allowed to configure and tailor the published data to their specific | allowed to configure and tailor the published data to their specific | |||
needs. | needs. | |||
skipping to change at page 25, line 35 ¶ | skipping to change at page 25, line 35 ¶ | |||
events, including using Finite State Machine (FSM) or Event | events, including using Finite State Machine (FSM) or Event | |||
Condition Action (ECA) [I-D.wwx-netmod-event-yang]. | Condition Action (ECA) [I-D.wwx-netmod-event-yang]. | |||
* Streaming Data: The data are continuously generated. It can be | * Streaming Data: The data are continuously generated. It can be | |||
time series or the dump of databases. For example, an interface | time series or the dump of databases. For example, an interface | |||
packet counter is exported every second. The streaming data | packet counter is exported every second. The streaming data | |||
reflect realtime network states and metrics and require large | reflect realtime network states and metrics and require large | |||
bandwidth and processing power. The streaming data are always | bandwidth and processing power. The streaming data are always | |||
actively pushed to the subscribers. | actively pushed to the subscribers. | |||
The above data types are not mutually exclusive. Rather, they are | The above telemetry data types are not mutually exclusive. Rather, | |||
often composite. Derived data is composed of simple data; Event- | they are often composite. Derived data is composed of simple data; | |||
triggered data can be simple or derived; streaming data can be based | Event-triggered data can be simple or derived; streaming data can be | |||
on some recurring event. The relationships of these data types are | based on some recurring event. The relationships of these data types | |||
illustrated in Figure 4. | are illustrated in Figure 4. | |||
+----------------------+ +-----------------+ | +----------------------+ +-----------------+ | |||
| Event-triggered Data |<----+ Streaming Data | | | Event-triggered Data |<----+ Streaming Data | | |||
+-------+---+----------+ +-----+---+-------+ | +-------+---+----------+ +-----+---+-------+ | |||
| | | | | | | | | | |||
| | | | | | | | | | |||
| | +--------------+ | | | | | +--------------+ | | | |||
| +-->| Derived Data |<--+ | | | +-->| Derived Data |<--+ | | |||
| +------+------ + | | | +------+------ + | | |||
| | | | | | | | |||
skipping to change at page 26, line 27 ¶ | skipping to change at page 26, line 27 ¶ | |||
+--------------+ | +--------------+ | |||
Figure 4: Data Type Relationship | Figure 4: Data Type Relationship | |||
Subscription usually deals with event-triggered data and streaming | Subscription usually deals with event-triggered data and streaming | |||
data, and query usually deals with simple data and derived data. But | data, and query usually deals with simple data and derived data. But | |||
the other ways are also possible. Advanced network telemetry | the other ways are also possible. Advanced network telemetry | |||
techniques are designed mainly for event-triggered or streaming data | techniques are designed mainly for event-triggered or streaming data | |||
subscription, and derived data query. | subscription, and derived data query. | |||
4.4. Mapping Existing Mechanisms into the Framework | 3.4. Mapping Existing Mechanisms into the Framework | |||
The following table shows how the existing mechanisms (mainly | The following table shows how the existing mechanisms (mainly | |||
published in IETF and with the emphasis on the latest new | published in IETF and with the emphasis on the latest new | |||
technologies) are positioned in the framework. Given the vast body | technologies) are positioned in the framework. Given the vast body | |||
of existing work, we cannot provide an exhaustive list, so the | of existing work, we cannot provide an exhaustive list, so the | |||
mechanisms in the tables should be considered as just examples. | mechanisms in the tables should be considered as just examples. | |||
Also, some comprehensive protocols and techniques may cover multiple | Also, some comprehensive protocols and techniques may cover multiple | |||
aspects or modules of the framework, so a name in a block only | aspects or modules of the framework, so a name in a block only | |||
emphasizes one particular characteristic of it. More details about | emphasizes one particular characteristic of it. More details about | |||
some listed mechanisms can be found in Appendix A. | some listed mechanisms can be found in Appendix A. | |||
skipping to change at page 27, line 6 ¶ | skipping to change at page 27, line 6 ¶ | |||
| | YANG-Push | YANG-Push | YANG-Push | | | | YANG-Push | YANG-Push | YANG-Push | | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
| data gen. & | MIB, | YANG | IOAM, PSAMP | | | data gen. & | MIB, | YANG | IOAM, PSAMP | | |||
| process | YANG | | PBT, AM, | | | process | YANG | | PBT, AM, | | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
| data encode.| gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP | | | data encode.| gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP | | |||
| & export | | | | | | & export | | | | | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
Figure 5: Existing Work Mapping | Figure 5: Existing Work Mapping | |||
5. Evolution of Network Telemetry Applications | Although the framework is generally suitable for any network | |||
environments, the multi-domain telemetry has some unique challenges | ||||
which deserve further architectural consideration, which is out of | ||||
the scope of this document. | ||||
4. Evolution of Network Telemetry Applications | ||||
Network telemetry is an evolving technical area. As the network | Network telemetry is an evolving technical area. As the network | |||
moves towards the automated operation, network telemetry applications | moves towards the automated operation, network telemetry applications | |||
undergo several stages of evolution which add new layer of | undergo several stages of evolution which add new layer of | |||
requirements to the underlying network telemetry techniques. Each | requirements to the underlying network telemetry techniques. Each | |||
stage is built upon the techniques adopted by the previous stages | stage is built upon the techniques adopted by the previous stages | |||
plus some new requirements. | plus some new requirements. | |||
Stage 0 - Static Telemetry: The telemetry data source and type are | Stage 0 - Static Telemetry: The telemetry data source and type are | |||
determined at design time. The network operator can only | determined at design time. The network operator can only | |||
skipping to change at page 27, line 44 ¶ | skipping to change at page 28, line 5 ¶ | |||
issues the telemetry data requests, analyzes the data, and updates | issues the telemetry data requests, analyzes the data, and updates | |||
the network operations in closed control loops. | the network operations in closed control loops. | |||
Existing technologies are ready for stage 0 and stage 1. Individual | Existing technologies are ready for stage 0 and stage 1. Individual | |||
stage 2 and stage 3 applications are also possible now. However, the | stage 2 and stage 3 applications are also possible now. However, the | |||
future autonomic networks may need a comprehensive operation | future autonomic networks may need a comprehensive operation | |||
management system which works at stage 2 and stage 3 to cover all the | management system which works at stage 2 and stage 3 to cover all the | |||
network operation tasks. A well-defined network telemetry framework | network operation tasks. A well-defined network telemetry framework | |||
is the first step towards this direction. | is the first step towards this direction. | |||
6. Security Considerations | 5. Security Considerations | |||
The complexity of network telemetry raises significant security | The complexity of network telemetry raises significant security | |||
implications. For example, telemetry data can be manipulated to | implications. For example, telemetry data can be manipulated to | |||
exhaust various network resources at each plane as well as the data | exhaust various network resources at each plane as well as the data | |||
consumer; falsified or tampered data can mislead the decision-making | consumer; falsified or tampered data can mislead the decision-making | |||
and paralyze networks; wrong configuration and programming for | and paralyze networks; wrong configuration and programming for | |||
telemetry is equally harmful. The telemetry data is highly | telemetry is equally harmful. The telemetry data is highly | |||
sensitive, which exposes a lot of information about the network and | sensitive, which exposes a lot of information about the network and | |||
its configuration. Some of that information can make designing | its configuration. Some of that information can make designing | |||
attacks against the network much easier (e.g., exact details of what | attacks against the network much easier (e.g., exact details of what | |||
skipping to change at page 29, line 19 ¶ | skipping to change at page 29, line 19 ¶ | |||
Access Control and Event-Condition-Action policies. Also, potential | Access Control and Event-Condition-Action policies. Also, potential | |||
conflicts between network telemetry mechanisms must be detected | conflicts between network telemetry mechanisms must be detected | |||
accurately and resolved quickly to avoid unnecessary network | accurately and resolved quickly to avoid unnecessary network | |||
telemetry traffic propagation escalating into an unintended or | telemetry traffic propagation escalating into an unintended or | |||
intended denial of service attack. | intended denial of service attack. | |||
Further study of the security issues will be required, and it is | Further study of the security issues will be required, and it is | |||
expected that the security mechanisms and protocols are developed and | expected that the security mechanisms and protocols are developed and | |||
deployed along with a network telemetry system. | deployed along with a network telemetry system. | |||
In addition to security, privacy is also an important issue. Large- | 6. IANA Considerations | |||
scale network data collection is a major threat to user privacy | ||||
[RFC7258]. The Network Telemetry Framework is not applicable to | ||||
networks whose endpoints represent individual users, such as general- | ||||
purpose access networks. Any collection or retention of data in | ||||
those networks must be tightly limited to protect user privacy. | ||||
7. IANA Considerations | ||||
This document includes no request to IANA. | This document includes no request to IANA. | |||
8. Contributors | 7. Contributors | |||
The other contributors of this document are listed as follows. | ||||
* Tianran Zhou | ||||
* Zhenbin Li | ||||
* Zhenqiang Li | ||||
* Daniel King | ||||
* Adrian Farrel | ||||
* Alexander Clemm | The other contributors of this document are Tianran Zhou, Zhenbin Li, | |||
Zhenqiang Li, Daniel King, Adrian Farrel, and Alexander Clemm | ||||
9. Acknowledgments | 8. Acknowledgments | |||
We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe | We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe | |||
Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe | Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe | |||
Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra, | Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra, | |||
Ben Schwartz, Alexey Melnikov, Michael Scharf, and many others who | Ben Schwartz, Alexey Melnikov, Michael Scharf, Dhruv Dhody, Martin | |||
have provided helpful comments and suggestions to improve this | Duke, and many others who have provided helpful comments and | |||
document. | suggestions to improve this document. | |||
10. Informative References | 9. Informative References | |||
[gnmi] "gNMI - gRPC Network Management Interface", | [gnmi] "gNMI - gRPC Network Management Interface", | |||
<https://github.com/openconfig/reference/tree/master/rpc/ | <https://github.com/openconfig/reference/tree/master/rpc/ | |||
gnmi>. | gnmi>. | |||
[gpb] "Google Protocol Buffers", | ||||
<https://developers.google.com/protocol-buffers>. | ||||
[grpc] "gPPC, A high performance, open-source universal RPC | [grpc] "gPPC, A high performance, open-source universal RPC | |||
framework", <https://grpc.io>. | framework", <https://grpc.io>. | |||
[I-D.ietf-grow-bmp-adj-rib-out] | ||||
Evens, T., Bayraktar, S., Lucente, P., Mi, P., and S. | ||||
Zhuang, "Support for Adj-RIB-Out in the BGP Monitoring | ||||
Protocol (BMP)", Work in Progress, Internet-Draft, draft- | ||||
ietf-grow-bmp-adj-rib-out-07, 5 August 2019, | ||||
<https://www.ietf.org/archive/id/draft-ietf-grow-bmp-adj- | ||||
rib-out-07.txt>. | ||||
[I-D.ietf-grow-bmp-local-rib] | [I-D.ietf-grow-bmp-local-rib] | |||
Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, | Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, | |||
"Support for Local RIB in BGP Monitoring Protocol (BMP)", | "Support for Local RIB in BGP Monitoring Protocol (BMP)", | |||
Work in Progress, Internet-Draft, draft-ietf-grow-bmp- | Work in Progress, Internet-Draft, draft-ietf-grow-bmp- | |||
local-rib-13, 31 August 2021, | local-rib-13, 31 August 2021, | |||
<https://www.ietf.org/archive/id/draft-ietf-grow-bmp- | <https://www.ietf.org/archive/id/draft-ietf-grow-bmp- | |||
local-rib-13.txt>. | local-rib-13.txt>. | |||
[I-D.ietf-ippm-ioam-data] | [I-D.ietf-ippm-ioam-data] | |||
Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields | Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields | |||
for In-situ OAM", Work in Progress, Internet-Draft, draft- | for In-situ OAM", Work in Progress, Internet-Draft, draft- | |||
ietf-ippm-ioam-data-16, 8 November 2021, | ietf-ippm-ioam-data-16, 8 November 2021, | |||
<https://www.ietf.org/archive/id/draft-ietf-ippm-ioam- | <https://www.ietf.org/archive/id/draft-ietf-ippm-ioam- | |||
data-16.txt>. | data-16.txt>. | |||
[I-D.ietf-ippm-multipoint-alt-mark] | [I-D.ietf-ippm-ioam-direct-export] | |||
Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, | Song, H., Gafni, B., Zhou, T., Li, Z., Brockners, F., | |||
"Multipoint Alternate-Marking Method for Passive and | Bhandari, S., Sivakolundu, R., and T. Mizrahi, "In-situ | |||
Hybrid Performance Monitoring", Work in Progress, | OAM Direct Exporting", Work in Progress, Internet-Draft, | |||
Internet-Draft, draft-ietf-ippm-multipoint-alt-mark-09, 23 | draft-ietf-ippm-ioam-direct-export-07, 13 October 2021, | |||
March 2020, <https://www.ietf.org/archive/id/draft-ietf- | <https://www.ietf.org/archive/id/draft-ietf-ippm-ioam- | |||
ippm-multipoint-alt-mark-09.txt>. | direct-export-07.txt>. | |||
[I-D.ietf-netconf-distributed-notif] | [I-D.ietf-netconf-distributed-notif] | |||
Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, | Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, | |||
"Subscription to Distributed Notifications", Work in | "Subscription to Distributed Notifications", Work in | |||
Progress, Internet-Draft, draft-ietf-netconf-distributed- | Progress, Internet-Draft, draft-ietf-netconf-distributed- | |||
notif-02, 6 May 2021, <https://www.ietf.org/archive/id/ | notif-02, 6 May 2021, <https://www.ietf.org/archive/id/ | |||
draft-ietf-netconf-distributed-notif-02.txt>. | draft-ietf-netconf-distributed-notif-02.txt>. | |||
[I-D.ietf-netconf-udp-notif] | [I-D.ietf-netconf-udp-notif] | |||
Zheng, G., Zhou, T., Graf, T., Francois, P., Feng, A. H., | Zheng, G., Zhou, T., Graf, T., Francois, P., Feng, A. H., | |||
skipping to change at page 31, line 28 ¶ | skipping to change at page 31, line 5 ¶ | |||
notif-04.txt>. | notif-04.txt>. | |||
[I-D.irtf-nmrg-ibn-concepts-definitions] | [I-D.irtf-nmrg-ibn-concepts-definitions] | |||
Clemm, A., Ciavaglia, L., Granville, L. Z., and J. | Clemm, A., Ciavaglia, L., Granville, L. Z., and J. | |||
Tantsura, "Intent-Based Networking - Concepts and | Tantsura, "Intent-Based Networking - Concepts and | |||
Definitions", Work in Progress, Internet-Draft, draft- | Definitions", Work in Progress, Internet-Draft, draft- | |||
irtf-nmrg-ibn-concepts-definitions-05, 2 September 2021, | irtf-nmrg-ibn-concepts-definitions-05, 2 September 2021, | |||
<https://www.ietf.org/archive/id/draft-irtf-nmrg-ibn- | <https://www.ietf.org/archive/id/draft-irtf-nmrg-ibn- | |||
concepts-definitions-05.txt>. | concepts-definitions-05.txt>. | |||
[I-D.kumar-rtgwg-grpc-protocol] | ||||
Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC | ||||
Protocol", Work in Progress, Internet-Draft, draft-kumar- | ||||
rtgwg-grpc-protocol-00, 8 July 2016, | ||||
<https://www.ietf.org/archive/id/draft-kumar-rtgwg-grpc- | ||||
protocol-00.txt>. | ||||
[I-D.openconfig-rtgwg-gnmi-spec] | ||||
Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack, | ||||
C., and C. Morrow, "gRPC Network Management Interface | ||||
(gNMI)", Work in Progress, Internet-Draft, draft- | ||||
openconfig-rtgwg-gnmi-spec-01, 5 March 2018, | ||||
<https://www.ietf.org/archive/id/draft-openconfig-rtgwg- | ||||
gnmi-spec-01.txt>. | ||||
[I-D.pedro-nmrg-anticipated-adaptation] | [I-D.pedro-nmrg-anticipated-adaptation] | |||
Martinez-Julia, P., "Exploiting External Event Detectors | Martinez-Julia, P., "Exploiting External Event Detectors | |||
to Anticipate Resource Requirements for the Elastic | to Anticipate Resource Requirements for the Elastic | |||
Adaptation of SDN/NFV Systems", Work in Progress, | Adaptation of SDN/NFV Systems", Work in Progress, | |||
Internet-Draft, draft-pedro-nmrg-anticipated-adaptation- | Internet-Draft, draft-pedro-nmrg-anticipated-adaptation- | |||
02, 29 June 2018, <https://www.ietf.org/archive/id/draft- | 02, 29 June 2018, <https://www.ietf.org/archive/id/draft- | |||
pedro-nmrg-anticipated-adaptation-02.txt>. | pedro-nmrg-anticipated-adaptation-02.txt>. | |||
[I-D.song-ippm-postcard-based-telemetry] | [I-D.song-ippm-postcard-based-telemetry] | |||
Song, H., Mirsky, G., Filsfils, C., Abdelsalam, A., Zhou, | Song, H., Mirsky, G., Filsfils, C., Abdelsalam, A., Zhou, | |||
T., Li, Z., Shin, J., and K. Lee, "Postcard-based On-Path | T., Li, Z., Shin, J., and K. Lee, "In-Situ OAM Marking- | |||
Flow Data Telemetry using Packet Marking", Work in | based Direct Export", Work in Progress, Internet-Draft, | |||
Progress, Internet-Draft, draft-song-ippm-postcard-based- | draft-song-ippm-postcard-based-telemetry-11, 15 November | |||
telemetry-10, 9 July 2021, | 2021, <https://www.ietf.org/archive/id/draft-song-ippm- | |||
<https://www.ietf.org/archive/id/draft-song-ippm-postcard- | postcard-based-telemetry-11.txt>. | |||
based-telemetry-10.txt>. | ||||
[I-D.song-opsawg-dnp4iq] | [I-D.song-opsawg-dnp4iq] | |||
Song, H. and J. Gong, "Requirements for Interactive Query | Song, H. and J. Gong, "Requirements for Interactive Query | |||
with Dynamic Network Probes", Work in Progress, Internet- | with Dynamic Network Probes", Work in Progress, Internet- | |||
Draft, draft-song-opsawg-dnp4iq-01, 19 June 2017, | Draft, draft-song-opsawg-dnp4iq-01, 19 June 2017, | |||
<https://www.ietf.org/archive/id/draft-song-opsawg-dnp4iq- | <https://www.ietf.org/archive/id/draft-song-opsawg-dnp4iq- | |||
01.txt>. | 01.txt>. | |||
[I-D.song-opsawg-ifit-framework] | [I-D.song-opsawg-ifit-framework] | |||
Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- | Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- | |||
skipping to change at page 33, line 5 ¶ | skipping to change at page 32, line 9 ¶ | |||
[RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. | [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. | |||
Schoenwaelder, Ed., "Structure of Management Information | Schoenwaelder, Ed., "Structure of Management Information | |||
Version 2 (SMIv2)", STD 58, RFC 2578, | Version 2 (SMIv2)", STD 58, RFC 2578, | |||
DOI 10.17487/RFC2578, April 1999, | DOI 10.17487/RFC2578, April 1999, | |||
<https://www.rfc-editor.org/info/rfc2578>. | <https://www.rfc-editor.org/info/rfc2578>. | |||
[RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981, | [RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981, | |||
DOI 10.17487/RFC2981, October 2000, | DOI 10.17487/RFC2981, October 2000, | |||
<https://www.rfc-editor.org/info/rfc2981>. | <https://www.rfc-editor.org/info/rfc2981>. | |||
[RFC3176] Phaal, P., Panchen, S., and N. McKee, "InMon Corporation's | ||||
sFlow: A Method for Monitoring Traffic in Switched and | ||||
Routed Networks", RFC 3176, DOI 10.17487/RFC3176, | ||||
September 2001, <https://www.rfc-editor.org/info/rfc3176>. | ||||
[RFC3414] Blumenthal, U. and B. Wijnen, "User-based Security Model | ||||
(USM) for version 3 of the Simple Network Management | ||||
Protocol (SNMPv3)", STD 62, RFC 3414, | ||||
DOI 10.17487/RFC3414, December 2002, | ||||
<https://www.rfc-editor.org/info/rfc3414>. | ||||
[RFC3416] Presuhn, R., Ed., "Version 2 of the Protocol Operations | [RFC3416] Presuhn, R., Ed., "Version 2 of the Protocol Operations | |||
for the Simple Network Management Protocol (SNMP)", | for the Simple Network Management Protocol (SNMP)", | |||
STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002, | STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002, | |||
<https://www.rfc-editor.org/info/rfc3416>. | <https://www.rfc-editor.org/info/rfc3416>. | |||
[RFC3594] Duffy, P., "PacketCable Security Ticket Control Sub-Option | ||||
for the DHCP CableLabs Client Configuration (CCC) Option", | ||||
RFC 3594, DOI 10.17487/RFC3594, September 2003, | ||||
<https://www.rfc-editor.org/info/rfc3594>. | ||||
[RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management | [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management | |||
Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, | Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, | |||
September 2004, <https://www.rfc-editor.org/info/rfc3877>. | September 2004, <https://www.rfc-editor.org/info/rfc3877>. | |||
[RFC3954] Claise, B., Ed., "Cisco Systems NetFlow Services Export | ||||
Version 9", RFC 3954, DOI 10.17487/RFC3954, October 2004, | ||||
<https://www.rfc-editor.org/info/rfc3954>. | ||||
[RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. | [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. | |||
Zekauskas, "A One-way Active Measurement Protocol | Zekauskas, "A One-way Active Measurement Protocol | |||
(OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, | (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, | |||
<https://www.rfc-editor.org/info/rfc4656>. | <https://www.rfc-editor.org/info/rfc4656>. | |||
[RFC5085] Nadeau, T., Ed. and C. Pignataro, Ed., "Pseudowire Virtual | [RFC5085] Nadeau, T., Ed. and C. Pignataro, Ed., "Pseudowire Virtual | |||
Circuit Connectivity Verification (VCCV): A Control | Circuit Connectivity Verification (VCCV): A Control | |||
Channel for Pseudowires", RFC 5085, DOI 10.17487/RFC5085, | Channel for Pseudowires", RFC 5085, DOI 10.17487/RFC5085, | |||
December 2007, <https://www.rfc-editor.org/info/rfc5085>. | December 2007, <https://www.rfc-editor.org/info/rfc5085>. | |||
[RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. | [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. | |||
Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", | Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", | |||
RFC 5357, DOI 10.17487/RFC5357, October 2008, | RFC 5357, DOI 10.17487/RFC5357, October 2008, | |||
<https://www.rfc-editor.org/info/rfc5357>. | <https://www.rfc-editor.org/info/rfc5357>. | |||
[RFC5424] Gerhards, R., "The Syslog Protocol", RFC 5424, | ||||
DOI 10.17487/RFC5424, March 2009, | ||||
<https://www.rfc-editor.org/info/rfc5424>. | ||||
[RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for | [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for | |||
the Network Configuration Protocol (NETCONF)", RFC 6020, | the Network Configuration Protocol (NETCONF)", RFC 6020, | |||
DOI 10.17487/RFC6020, October 2010, | DOI 10.17487/RFC6020, October 2010, | |||
<https://www.rfc-editor.org/info/rfc6020>. | <https://www.rfc-editor.org/info/rfc6020>. | |||
[RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., | [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., | |||
and A. Bierman, Ed., "Network Configuration Protocol | and A. Bierman, Ed., "Network Configuration Protocol | |||
(NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, | (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, | |||
<https://www.rfc-editor.org/info/rfc6241>. | <https://www.rfc-editor.org/info/rfc6241>. | |||
skipping to change at page 35, line 29 ¶ | skipping to change at page 34, line 46 ¶ | |||
[RFC8639] Voit, E., Clemm, A., Gonzalez Prieto, A., Nilsen-Nygaard, | [RFC8639] Voit, E., Clemm, A., Gonzalez Prieto, A., Nilsen-Nygaard, | |||
E., and A. Tripathy, "Subscription to YANG Notifications", | E., and A. Tripathy, "Subscription to YANG Notifications", | |||
RFC 8639, DOI 10.17487/RFC8639, September 2019, | RFC 8639, DOI 10.17487/RFC8639, September 2019, | |||
<https://www.rfc-editor.org/info/rfc8639>. | <https://www.rfc-editor.org/info/rfc8639>. | |||
[RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications | [RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications | |||
for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641, | for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641, | |||
September 2019, <https://www.rfc-editor.org/info/rfc8641>. | September 2019, <https://www.rfc-editor.org/info/rfc8641>. | |||
[RFC8671] Evens, T., Bayraktar, S., Lucente, P., Mi, P., and S. | ||||
Zhuang, "Support for Adj-RIB-Out in the BGP Monitoring | ||||
Protocol (BMP)", RFC 8671, DOI 10.17487/RFC8671, November | ||||
2019, <https://www.rfc-editor.org/info/rfc8671>. | ||||
[RFC8762] Mirsky, G., Jun, G., Nydell, H., and R. Foote, "Simple | [RFC8762] Mirsky, G., Jun, G., Nydell, H., and R. Foote, "Simple | |||
Two-Way Active Measurement Protocol", RFC 8762, | Two-Way Active Measurement Protocol", RFC 8762, | |||
DOI 10.17487/RFC8762, March 2020, | DOI 10.17487/RFC8762, March 2020, | |||
<https://www.rfc-editor.org/info/rfc8762>. | <https://www.rfc-editor.org/info/rfc8762>. | |||
[RFC8889] Fioccola, G., Ed., Cociglio, M., Sapio, A., and R. Sisto, | ||||
"Multipoint Alternate-Marking Method for Passive and | ||||
Hybrid Performance Monitoring", RFC 8889, | ||||
DOI 10.17487/RFC8889, August 2020, | ||||
<https://www.rfc-editor.org/info/rfc8889>. | ||||
[RFC8924] Aldrin, S., Pignataro, C., Ed., Kumar, N., Ed., Krishnan, | [RFC8924] Aldrin, S., Pignataro, C., Ed., Kumar, N., Ed., Krishnan, | |||
R., and A. Ghanwani, "Service Function Chaining (SFC) | R., and A. Ghanwani, "Service Function Chaining (SFC) | |||
Operations, Administration, and Maintenance (OAM) | Operations, Administration, and Maintenance (OAM) | |||
Framework", RFC 8924, DOI 10.17487/RFC8924, October 2020, | Framework", RFC 8924, DOI 10.17487/RFC8924, October 2020, | |||
<https://www.rfc-editor.org/info/rfc8924>. | <https://www.rfc-editor.org/info/rfc8924>. | |||
[xml] "Extensible Markup Language (XML) 1.0 (Fifth Edition)", | [xml] "Extensible Markup Language (XML) 1.0 (Fifth Edition)", | |||
<https://www.w3.org/TR/2008/REC-xml-20081126/>. | <https://www.w3.org/TR/2008/REC-xml-20081126/>. | |||
[y1731] "ITU-T Y.1731: OAM Functions and Mechanisms for Ethernet | ||||
based networks, 2015", | ||||
<https://www.itu.int/rec/T-REC-Y.1731/en>. | ||||
Appendix A. A Survey on Existing Network Telemetry Techniques | Appendix A. A Survey on Existing Network Telemetry Techniques | |||
In this non-normative appendix, we provide an overview of some | In this non-normative appendix, we provide an overview of some | |||
existing techniques and standard proposals for each network telemetry | existing techniques and standard proposals for each network telemetry | |||
module. | module. | |||
A.1. Management Plane Telemetry | A.1. Management Plane Telemetry | |||
A.1.1. Push Extensions for NETCONF | A.1.1. Push Extensions for NETCONF | |||
NETCONF [RFC6241] is a popular network management protocol | NETCONF [RFC6241] is a popular network management protocol | |||
recommended by IETF. Its core strength is for managing | recommended by IETF. Its core strength is for managing | |||
configuration, but can also be used for data collection. YANG-Push | configuration, but can also be used for data collection. YANG-Push | |||
[RFC8641] [RFC8639] extends NETCONF and enables subscriber | [RFC8641] [RFC8639] extends NETCONF and enables subscriber | |||
applications to request a continuous, customized stream of updates | applications to request a continuous, customized stream of updates | |||
from a YANG datastore. Providing such visibility into changes made | from a YANG datastore. Providing such visibility into changes made | |||
upon YANG configuration and operational objects enables new | upon YANG configuration and operational objects enables new | |||
capabilities based on the remote mirroring of configuration and | capabilities based on the remote mirroring of configuration and | |||
skipping to change at page 36, line 21 ¶ | skipping to change at page 36, line 7 ¶ | |||
from a YANG datastore. Providing such visibility into changes made | from a YANG datastore. Providing such visibility into changes made | |||
upon YANG configuration and operational objects enables new | upon YANG configuration and operational objects enables new | |||
capabilities based on the remote mirroring of configuration and | capabilities based on the remote mirroring of configuration and | |||
operational state. Moreover, distributed data collection mechanism | operational state. Moreover, distributed data collection mechanism | |||
[I-D.ietf-netconf-distributed-notif] via UDP based publication | [I-D.ietf-netconf-distributed-notif] via UDP based publication | |||
channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for | channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for | |||
the NETCONF based telemetry. | the NETCONF based telemetry. | |||
A.1.2. gRPC Network Management Interface | A.1.2. gRPC Network Management Interface | |||
gRPC Network Management Interface (gNMI) | gRPC Network Management Interface (gNMI) [gnmi] is a network | |||
[I-D.openconfig-rtgwg-gnmi-spec] is a network management protocol | management protocol based on the gRPC [grpc] RPC (Remote Procedure | |||
based on the gRPC [I-D.kumar-rtgwg-grpc-protocol] RPC (Remote | Call) framework. With a single gRPC service definition, both | |||
Procedure Call) framework. With a single gRPC service definition, | configuration and telemetry can be covered. gRPC is an HTTP/2 | |||
both configuration and telemetry can be covered. gRPC is an HTTP/2 | [RFC7540]-based open-source micro-service communication framework. | |||
[RFC7540] based open-source micro-service communication framework. | ||||
It provides a number of capabilities which are well-suited for | It provides a number of capabilities which are well-suited for | |||
network telemetry, including: | network telemetry, including: | |||
* Full-duplex streaming transport model combined with a binary | * Full-duplex streaming transport model combined with a binary | |||
encoding mechanism provides good telemetry efficiency. | encoding mechanism provides good telemetry efficiency. | |||
* gRPC provides higher-level features consistency across platforms | * gRPC provides higher-level features consistency across platforms | |||
that common HTTP/2 libraries typically do not. This | that common HTTP/2 libraries typically do not. This | |||
characteristic is especially valuable for the fact that telemetry | characteristic is especially valuable for the fact that telemetry | |||
data collectors normally reside on a large variety of platforms. | data collectors normally reside on a large variety of platforms. | |||
skipping to change at page 37, line 4 ¶ | skipping to change at page 36, line 37 ¶ | |||
A.2.1. BGP Monitoring Protocol | A.2.1. BGP Monitoring Protocol | |||
BGP Monitoring Protocol (BMP) [RFC7854] is used to monitor BGP | BGP Monitoring Protocol (BMP) [RFC7854] is used to monitor BGP | |||
sessions and is intended to provide a convenient interface for | sessions and is intended to provide a convenient interface for | |||
obtaining route views. | obtaining route views. | |||
The BGP routing information is collected from the monitored device(s) | The BGP routing information is collected from the monitored device(s) | |||
to the BMP monitoring station by setting up the BMP TCP session. The | to the BMP monitoring station by setting up the BMP TCP session. The | |||
BGP peers are monitored by the BMP Peer Up and Peer Down | BGP peers are monitored by the BMP Peer Up and Peer Down | |||
Notifications. The BGP routes (including Adjacency_RIB_In [RFC7854], | Notifications. The BGP routes (including Adjacency_RIB_In [RFC7854], | |||
Adjacency_RIB_out [I-D.ietf-grow-bmp-adj-rib-out], and Local_Rib | Adjacency_RIB_out [RFC8671], and Local_Rib | |||
[I-D.ietf-grow-bmp-local-rib]) are encapsulated in the BMP Route | [I-D.ietf-grow-bmp-local-rib]) are encapsulated in the BMP Route | |||
Monitoring Message and the BMP Route Mirroring Message, providing | Monitoring Message and the BMP Route Mirroring Message, providing | |||
both an initial table dump and real-time route updates. In addition, | both an initial table dump and real-time route updates. In addition, | |||
BGP statistics are reported through the BMP Stats Report Message, | BGP statistics are reported through the BMP Stats Report Message, | |||
which could be either timer triggered or event-driven. Future BMP | which could be either timer triggered or event-driven. Future BMP | |||
extensions could further enrich BGP monitoring applications. | extensions could further enrich BGP monitoring applications. | |||
A.3. Data Plane Telemetry | A.3. Data Plane Telemetry | |||
A.3.1. The Alternate Marking (AM) technology | A.3.1. The Alternate Marking (AM) technology | |||
The Alternate Marking method enables efficient measurements of packet | The Alternate Marking method enables efficient measurements of packet | |||
loss, delay, and jitter both in IP and Overlay Networks, as presented | loss, delay, and jitter both in IP and Overlay Networks, as presented | |||
in [RFC8321] and [I-D.ietf-ippm-multipoint-alt-mark]. | in [RFC8321] and [RFC8889]. | |||
This technique can be applied to point-to-point and multipoint-to- | This technique can be applied to point-to-point and multipoint-to- | |||
multipoint flows. Alternate Marking creates batches of packets by | multipoint flows. Alternate Marking creates batches of packets by | |||
alternating the value of 1 bit (or a label) of the packet header. | alternating the value of 1 bit (or a label) of the packet header. | |||
These batches of packets are unambiguously recognized over the | These batches of packets are unambiguously recognized over the | |||
network and the comparison of packet counters for each batch allows | network and the comparison of packet counters for each batch allows | |||
the packet loss calculation. The same idea can be applied to delay | the packet loss calculation. The same idea can be applied to delay | |||
measurement by selecting ad hoc packets with a marking bit dedicated | measurement by selecting ad hoc packets with a marking bit dedicated | |||
for delay measurements. | for delay measurements. | |||
Alternate Marking method needs two counters each marking period for | Alternate Marking method needs two counters each marking period for | |||
each flow under monitor. For instance, by considering n measurement | each flow under monitor. For instance, by considering n measurement | |||
points and m monitored flows, the order of magnitude of the packet | points and m monitored flows, the order of magnitude of the packet | |||
counters for each time interval is n*m*2 (1 per color). | counters for each time interval is n*m*2 (1 per color). | |||
Since networks offer rich sets of network performance measurement | Since networks offer rich sets of network performance measurement | |||
data (e.g., packet counters), traditional approaches run into | data (e.g., packet counters), conventional approaches run into | |||
limitations. The bottleneck is the generation and export of the data | limitations. The bottleneck is the generation and export of the data | |||
and the amount of data that can be reasonably collected from the | and the amount of data that can be reasonably collected from the | |||
network. In addition, management tasks related to determining and | network. In addition, management tasks related to determining and | |||
configuring which data to generate lead to significant deployment | configuring which data to generate lead to significant deployment | |||
challenges. | challenges. | |||
The Multipoint Alternate Marking approach, described in | The Multipoint Alternate Marking approach, described in [RFC8889], | |||
[I-D.ietf-ippm-multipoint-alt-mark], aims to resolve this issue and | aims to resolve this issue and make the performance monitoring more | |||
make the performance monitoring more flexible in case a detailed | flexible in case a detailed analysis is not needed. | |||
analysis is not needed. | ||||
An application orchestrates network performance measurements tasks | An application orchestrates network performance measurements tasks | |||
across the network to allow for optimized monitoring. The | across the network to allow for optimized monitoring. The | |||
application can choose how roughly or precisely to configure | application can choose how roughly or precisely to configure | |||
measurement points depending on the application's requirements. | measurement points depending on the application's requirements. | |||
Using Alternate Marking, it is possible to monitor a Multipoint | Using Alternate Marking, it is possible to monitor a Multipoint | |||
Network without in depth examination by using the Network Clustering | Network without in depth examination by using the Network Clustering | |||
(subnetworks that are portions of the entire network that preserve | (subnetworks that are portions of the entire network that preserve | |||
the same property of the entire network, called clusters). So in the | the same property of the entire network, called clusters). So in the | |||
skipping to change at page 39, line 19 ¶ | skipping to change at page 38, line 39 ¶ | |||
provides a means of transmitting traffic flow information for | provides a means of transmitting traffic flow information for | |||
administrative or other purposes. A typical IPFIX enabled system | administrative or other purposes. A typical IPFIX enabled system | |||
includes a pool of Metering Processes that collects data packets at | includes a pool of Metering Processes that collects data packets at | |||
one or more Observation Points, optionally filters them and | one or more Observation Points, optionally filters them and | |||
aggregates information about these packets. An Exporter then gathers | aggregates information about these packets. An Exporter then gathers | |||
each of the Observation Points together into an Observation Domain | each of the Observation Points together into an Observation Domain | |||
and sends this information via the IPFIX protocol to a Collector. | and sends this information via the IPFIX protocol to a Collector. | |||
A.3.4. In-Situ OAM | A.3.4. In-Situ OAM | |||
Traditional passive and active monitoring and measurement techniques | Classical passive and active monitoring and measurement techniques | |||
are either inaccurate or resource-consuming. It is preferable to | are either inaccurate or resource-consuming. It is preferable to | |||
directly acquire data associated with a flow's packets when the | directly acquire data associated with a flow's packets when the | |||
packets pass through a network. In-situ OAM (iOAM) | packets pass through a network. In-situ OAM (iOAM) | |||
[I-D.ietf-ippm-ioam-data], a data generation technique, embeds a new | [I-D.ietf-ippm-ioam-data], a data generation technique, embeds a new | |||
instruction header to user packets and the instruction directs the | instruction header to user packets and the instruction directs the | |||
network nodes to add the requested data to the packets. Thus, at the | network nodes to add the requested data to the packets. Thus, at the | |||
path end, the packet's experience gained on the entire forwarding | path end, the packet's experience gained on the entire forwarding | |||
path can be collected. Such firsthand data is invaluable to many | path can be collected. Such firsthand data is invaluable to many | |||
network OAM applications. | network OAM applications. | |||
However, iOAM also faces some challenges. The issues on performance | However, iOAM also faces some challenges. The issues on performance | |||
impact, security, scalability and overhead limits, encapsulation | impact, security, scalability and overhead limits, encapsulation | |||
difficulties in some protocols, and cross-domain deployment need to | difficulties in some protocols, and cross-domain deployment need to | |||
be addressed. | be addressed. | |||
A.3.5. Postcard Based Telemetry | A.3.5. Postcard Based Telemetry | |||
PBT [I-D.song-ippm-postcard-based-telemetry] is a proposed | The postcard-based telemetry, as embodied in IOAM DEX | |||
complementary technique to IOAM. PBT directly exports data at each | [I-D.ietf-ippm-ioam-direct-export] and IOAM Marking | |||
node through an independent packet. At the cost of higher bandwidth | [I-D.song-ippm-postcard-based-telemetry], is a complementary | |||
overhead and the need for data correlation, PBT shows several | technique to the passport-based IOAM. PBT directly exports data at | |||
advantages over IOAM. It can also help to identify packet drop | each node through an independent packet. At the cost of higher | |||
bandwidth overhead and the need for data correlation, PBT shows | ||||
several unique advantages. It can also help to identify packet drop | ||||
location in case a packet is dropped on its forwarding path. | location in case a packet is dropped on its forwarding path. | |||
A.3.6. Existing OAM for Specific Data Planes | A.3.6. Existing OAM for Specific Data Planes | |||
Various data planes raises unique OAM requirements. IETF has | Various data planes raises unique OAM requirements. IETF has | |||
published OAM technique and framework documents (e.g., [RFC8924] and | published OAM technique and framework documents (e.g., [RFC8924] and | |||
[RFC5085]) targeting different data planes such as MPLS, L2-VPN, | [RFC5085]) targeting different data planes such as Multi-Protocol | |||
NVO3, VXLAN, BIER, SFC, and DETNET. The aforementioned data plane | Label Switching (MPLS), L2 Virtual Private Network (L2-VPN), Network | |||
telemetry techniques can be used to enhance the OAM capability on | Virtualization Overlays (NVO3), Virtual Extensible LAN (VXLAN), Bit | |||
such data planes. | Indexed Explicit Replication (BIER), Service Function Chaining (SFC), | |||
Segment Routing (SR), and Deterministic Networking (DETNET). The | ||||
aforementioned data plane telemetry techniques can be used to enhance | ||||
the OAM capability on such data planes. | ||||
A.4. External Data and Event Telemetry | A.4. External Data and Event Telemetry | |||
A.4.1. Sources of External Events | A.4.1. Sources of External Events | |||
To ensure that the information provided by external event detectors | To ensure that the information provided by external event detectors | |||
and used by the network management solutions is meaningful for | and used by the network management solutions is meaningful for | |||
management purposes, the network telemetry framework must ensure that | management purposes, the network telemetry framework must ensure that | |||
such detectors (sources) are easily connected to the management | such detectors (sources) are easily connected to the management | |||
solutions (sinks). This requires the specification of a list of | solutions (sinks). This requires the specification of a list of | |||
skipping to change at page 40, line 52 ¶ | skipping to change at page 40, line 31 ¶ | |||
protocol and data format, the sources of this kind of information | protocol and data format, the sources of this kind of information | |||
usually follow a relaxed but structured format. This format will | usually follow a relaxed but structured format. This format will | |||
be part of both the ontology and information model of the | be part of both the ontology and information model of the | |||
telemetry framework. | telemetry framework. | |||
* Global event analyzers. The advance of Big Data analyzers | * Global event analyzers. The advance of Big Data analyzers | |||
provides a huge amount of information and, more interestingly, the | provides a huge amount of information and, more interestingly, the | |||
identification of events detected by analyzing many data streams | identification of events detected by analyzing many data streams | |||
from different origins. In contrast with the other types of | from different origins. In contrast with the other types of | |||
sources, which are focused on specific events, the detectors of | sources, which are focused on specific events, the detectors of | |||
this source type will detect generic events. For example, a | this source type will detect generic events. For example, during | |||
sports event takes place and some unexpected movement makes it | a sport event some unexpected movement makes it fascinating and | |||
fascinating and many people connect to sites that are reporting on | many people connect to sites that are reporting on the event. The | |||
the event. The underlying networks supporting the services that | underlying networks supporting the services that cover the event | |||
cover the event can be affected by such situation so their | can be affected by such situation, so their management solutions | |||
management solutions should be aware of it. In contrast with the | should be aware of it. In contrast with the other source types, a | |||
other source types, a new information model, format, and reporting | new information model, format, and reporting protocol is required | |||
protocol is required to integrate the detectors of this type with | to integrate the detectors of this type with the management | |||
the management solution. | solution. | |||
Additional types of detector types can be added to the system, but | Additional types of detector types can be added to the system, but | |||
they will be generally the result of composing the properties offered | they will be generally the result of composing the properties offered | |||
by these main classes. | by these main classes. | |||
A.4.2. Connectors and Interfaces | A.4.2. Connectors and Interfaces | |||
For allowing external event detectors to be properly integrated with | For allowing external event detectors to be properly integrated with | |||
other management solutions, both elements must expose interfaces and | other management solutions, both elements must expose interfaces and | |||
protocols that are subject to their particular objective. Since | protocols that are subject to their particular objective. Since | |||
End of changes. 90 change blocks. | ||||
246 lines changed or deleted | 269 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |