draft-ietf-opsawg-ntf-05.txt | draft-ietf-opsawg-ntf-06.txt | |||
---|---|---|---|---|
OPSAWG H. Song | OPSAWG H. Song | |||
Internet-Draft Futurewei | Internet-Draft Futurewei | |||
Intended status: Informational F. Qin | Intended status: Informational F. Qin | |||
Expires: April 12, 2021 China Mobile | Expires: July 25, 2021 China Mobile | |||
P. Martinez-Julia | P. Martinez-Julia | |||
NICT | NICT | |||
L. Ciavaglia | L. Ciavaglia | |||
Nokia | Nokia | |||
A. Wang | A. Wang | |||
China Telecom | China Telecom | |||
October 9, 2020 | January 21, 2021 | |||
Network Telemetry Framework | Network Telemetry Framework | |||
draft-ietf-opsawg-ntf-05 | draft-ietf-opsawg-ntf-06 | |||
Abstract | Abstract | |||
Network telemetry is the technology for gaining network insight and | Network telemetry is a technology for gaining network insight and | |||
facilitating efficient and automated network management. It engages | facilitating efficient and automated network management. It | |||
various techniques for remote data collection, correlation, and | encompasses various techniques for remote data generation, | |||
consumption. This document provides an architectural framework for | collection, correlation, and consumption. This document describes an | |||
network telemetry, motivated by the network operation challenges and | architectural framework for network telemetry, motivated by | |||
requirements. As evidenced by some key characteristics and industry | challenges that are encountered as part of the operation of networks | |||
practices, network telemetry covers technologies and protocols beyond | and by the requirements that ensue. Network telemetry, as | |||
the conventional network Operations, Administration, and Management | necessitated by best industry practices, covers technologies and | |||
(OAM). It promises better flexibility, scalability, accuracy, | protocols that extend beyond conventional network Operations, | |||
coverage, and performance and allows automated control loops to suit | Administration, and Management (OAM). The presented network | |||
both today's and tomorrow's network operation. This document | telemetry framework promises better flexibility, scalability, | |||
clarifies the terminologies and classifies the modules and components | accuracy, coverage, and performance. In addition, it facilitates the | |||
of a network telemetry system from several different perspectives. | implementation of automated control loops to address both today's and | |||
The framework and taxonomy help to set a common ground for the | tomorrow's network operational needs. This document clarifies the | |||
collection of related work and provide guidance for related technique | terminologies and classifies the modules and components of a network | |||
and standard developments. | telemetry system from several different perspectives. The framework | |||
and taxonomy help to set a common ground for the collection of | ||||
related work and provide guidance for related technique and standard | ||||
developments. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on April 12, 2021. | This Internet-Draft will expire on July 25, 2021. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2020 IETF Trust and the persons identified as the | Copyright (c) 2021 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 5 | 2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 5 | |||
2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5 | 2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 6 | 2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
2.4. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 8 | 2.4. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
2.5. Network Telemetry . . . . . . . . . . . . . . . . . . . . 9 | 2.5. Network Telemetry . . . . . . . . . . . . . . . . . . . . 10 | |||
3. The Necessity of a Network Telemetry Framework . . . . . . . 11 | 3. The Necessity of a Network Telemetry Framework . . . . . . . 12 | |||
4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 13 | 4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 13 | |||
4.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 13 | 4.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 13 | |||
4.1.1. Management Plane Telemetry . . . . . . . . . . . . . 16 | 4.1.1. Management Plane Telemetry . . . . . . . . . . . . . 17 | |||
4.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 16 | 4.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 17 | |||
4.1.3. Data Plane Telemetry . . . . . . . . . . . . . . . . 17 | 4.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 18 | |||
4.1.4. External Data Telemetry . . . . . . . . . . . . . . . 19 | 4.1.4. External Data Telemetry . . . . . . . . . . . . . . . 20 | |||
4.2. Second Level Function Components . . . . . . . . . . . . 19 | 4.2. Second Level Function Components . . . . . . . . . . . . 20 | |||
4.3. Data Acquiring Mechanism and Type Abstraction . . . . . . 21 | 4.3. Data Acquiring Mechanism and Type Abstraction . . . . . . 22 | |||
4.4. Existing Works Mapped in the Framework . . . . . . . . . 23 | 4.4. Existing Works Mapped in the Framework . . . . . . . . . 24 | |||
5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 24 | 5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 26 | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 25 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 26 | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 | |||
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 | 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 26 | 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
10. Informative References . . . . . . . . . . . . . . . . . . . 26 | 10. Informative References . . . . . . . . . . . . . . . . . . . 28 | |||
Appendix A. A Survey on Existing Network Telemetry Techniques . 30 | Appendix A. A Survey on Existing Network Telemetry Techniques . 32 | |||
A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 30 | A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 32 | |||
A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 30 | A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 32 | |||
A.1.2. gRPC Network Management Interface . . . . . . . . . . 31 | A.1.2. gRPC Network Management Interface . . . . . . . . . . 33 | |||
A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 31 | A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 33 | |||
A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 31 | A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 33 | |||
A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 32 | A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 34 | |||
A.3.1. The Alternate Marking technology . . . . . . . . . . 32 | A.3.1. The Alternate Marking technology . . . . . . . . . . 34 | |||
A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 33 | A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 35 | |||
A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 33 | A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 35 | |||
A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 34 | A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 35 | |||
A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 34 | A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 36 | |||
A.4. External Data and Event Telemetry . . . . . . . . . . . . 34 | A.4. External Data and Event Telemetry . . . . . . . . . . . . 36 | |||
A.4.1. Sources of External Events . . . . . . . . . . . . . 34 | A.4.1. Sources of External Events . . . . . . . . . . . . . 36 | |||
A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 36 | A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 37 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 36 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38 | |||
1. Introduction | 1. Introduction | |||
Network visibility is the ability of management tools to see the | Network visibility is the ability of management tools to see the | |||
state and behavior of a network. It is essential for successful | state and behavior of a network, which is essential for successful | |||
network operation. Network telemetry is the process of measuring, | network operation. Network Telemetry revolves around network data | |||
correlating, recording, and distributing information about the | that can help provide insights about the current state of the | |||
behavior of a network. Network telemetry has been considered as an | network, including network devices, forwarding, control, and | |||
ideal means to gain sufficient network visibility with better | management planes, and that can be generated and obtained through a | |||
flexibility, scalability, accuracy, coverage, and performance than | variety of techniques, including but not limited to network | |||
some conventional network Operations, Administration, and Management | instrumentation and measurements, and that can be processed for | |||
(OAM) techniques. | purposes ranging from service assurance to network security using a | |||
wide variety of techniques including machine learning, data analysis, | ||||
and correlation. In this document, Network Telemetry refer to both | ||||
the data itself (i.e., "Network Telemetry Data"), and the techniques | ||||
and processes used to generate, export, collect, and consume that | ||||
data for use by potentially automated management applications. | ||||
Network telemetry extends beyond the conventional network Operations, | ||||
Administration, and Management (OAM) techniques and expects to | ||||
support better flexibility, scalability, accuracy, coverage, and | ||||
performance. | ||||
However, the term of network telemetry lacks a solid and unambiguous | However, the term of network telemetry lacks a solid and unambiguous | |||
definition. The scope and coverage of it cause confusion and | definition. The scope and coverage of it cause confusion and | |||
misunderstandings. It is beneficial to clarify the concept and | misunderstandings. It is beneficial to clarify the concept and | |||
provide a clear architectural framework for network telemetry, so we | provide a clear architectural framework for network telemetry, so we | |||
can articulate the technical field, and better align the related | can articulate the technical field, and better align the related | |||
techniques and standard works. | techniques and standard works. | |||
To fulfill such an undertaking, we first discuss some key | To fulfill such an undertaking, we first discuss some key | |||
characteristics of network telemetry which set a clear distinction | characteristics of network telemetry which set a clear distinction | |||
from the conventional network OAM and show that some conventional OAM | from the conventional network OAM and show that some conventional OAM | |||
technologies can be considered a subset of the network telemetry | technologies can be considered a subset of the network telemetry | |||
technologies. We then provide an architectural framework for network | technologies. We then provide an architectural framework for network | |||
telemetry by partitioning a network telemetry system into four | telemetry which includes four modules, each concerned with a | |||
modules each with the same building components and data abstracts. | different category of telemetry data and corresponding procedures. | |||
We show how the network telemetry framework can benefit the current | All the modules are internally structured in the same way, including | |||
and future network operations. Based on the distinction of modules | components that allow to configure data sources with regards to what | |||
and function components, we can map the existing and emerging | data to generate and how to make that available to client | |||
techniques and protocols into the framework. The framework can also | applications, components that instrument the underlying data sources, | |||
simplify the tasks for designing, maintaining, and understanding a | and components that perform the actual rendering, encoding, and | |||
network telemetry system. At last, we outline the evolution stages | exporting of the generated data. We show how the network telemetry | |||
of the network telemetry system and discuss the potential security | framework can benefit the current and future network operations. | |||
concerns. | Based on the distinction of modules and function components, we can | |||
map the existing and emerging techniques and protocols into the | ||||
framework. The framework can also simplify the tasks for designing, | ||||
maintaining, and understanding a network telemetry system. At last, | ||||
we outline the evolution stages of the network telemetry system and | ||||
discuss the potential security concerns. | ||||
The purpose of the framework and taxonomy is to set a common ground | The purpose of the framework and taxonomy is to set a common ground | |||
for the collection of related work and provide guidance for future | for the collection of related work and provide guidance for future | |||
technique and standard developments. To the best of our knowledge, | technique and standard developments. To the best of our knowledge, | |||
this document is the first such effort for network telemetry in | this document is the first such effort for network telemetry in | |||
industry standards organizations. | industry standards organizations. | |||
2. Background | 2. Background | |||
The term "big data" is used to describe the extremely large volume of | The term "big data" is used to describe the extremely large volume of | |||
data sets that can be analyzed computationally to reveal patterns, | data sets that can be analyzed computationally to reveal patterns, | |||
trends, and associations. Network is undoubtedly a source of big | trends, and associations. Networks are undoubtedly a source of big | |||
data because of its scale and all the traffic goes through it. It is | data because of their scale and the volume of network traffic they | |||
easy to see that network OAM can benefit from network big data. | forward. It is easy to see that network operations can benefit from | |||
network big data. | ||||
Today one can access advanced big data analytics capability through a | Today one can access advanced big data analytics capability through a | |||
plethora of commercial and open source platforms (e.g., Apache | plethora of commercial and open source platforms (e.g., Apache | |||
Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine | Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine | |||
learning). Thanks to the advance of computing and storage | learning). Thanks to the advance of computing and storage | |||
technologies, network big data analytics gives network operators an | technologies, network big data analytics gives network operators an | |||
opportunity to gain network insights and move towards network | opportunity to gain network insights and move towards network | |||
autonomy. Some operators start to explore the application of | autonomy. Some operators start to explore the application of | |||
Artificial Intelligence (AI) to make sense of network data. Software | Artificial Intelligence (AI) to make sense of network data. Software | |||
tools can use the network data to detect and react on network faults, | tools can use the network data to detect and react on network faults, | |||
skipping to change at page 4, line 50 ¶ | skipping to change at page 5, line 19 ¶ | |||
However, while the data processing capability is improved and | However, while the data processing capability is improved and | |||
applications are hungry for more data, the networks lag behind in | applications are hungry for more data, the networks lag behind in | |||
extracting and translating network data into useful and actionable | extracting and translating network data into useful and actionable | |||
information in efficient ways. The system bottleneck is shifting | information in efficient ways. The system bottleneck is shifting | |||
from data consumption to data supply. Both the number of network | from data consumption to data supply. Both the number of network | |||
nodes and the traffic bandwidth keep increasing at a fast pace. The | nodes and the traffic bandwidth keep increasing at a fast pace. The | |||
network configuration and policy change at smaller time slots than | network configuration and policy change at smaller time slots than | |||
before. More subtle events and fine-grained data through all network | before. More subtle events and fine-grained data through all network | |||
planes need to be captured and exported in real time. In a nutshell, | planes need to be captured and exported in real time. In a nutshell, | |||
it is a challenge to get enough high-quality data out of network | it is a challenge to get enough high-quality data out of the network | |||
efficiently, timely, and flexibly. Therefore, we need to examine the | in a manner that is efficient, timely, and flexible. Therefore, we | |||
existing network technologies and protocols, and identify any | need to survey the existing technologies and protocols and identify | |||
potential technique and standard gaps based on the real network and | any potential gaps. | |||
device architectures. | ||||
In the remaining of this section, first we clarify the scope of | In the remainder of this section, first we clarify the scope of | |||
network data (i.e., telemetry data) concerned in the context. Then, | network data (i.e., telemetry data) concerned in the context. Then, | |||
we discuss several key use cases for today's and future network | we discuss several key use cases for today's and future network | |||
operations. Next, we show why the current network OAM techniques and | operations. Next, we show why the current network OAM techniques and | |||
protocols are insufficient for these use cases. The discussion | protocols are insufficient for these use cases. The discussion | |||
underlines the need of new methods, techniques, and protocols which | underlines the need of new methods, techniques, and protocols which | |||
we assign under an umbrella term - network telemetry. | we assign under the umbrella term - Network Telemetry. | |||
2.1. Telemetry Data Coverage | 2.1. Telemetry Data Coverage | |||
Any information that can be extracted from networks (including data | Any information that can be extracted from networks (including data | |||
plane, control plane, and management plane) and used to gain | plane, control plane, and management plane) and used to gain | |||
visibility or as basis for actions is considered telemetry data. It | visibility or as basis for actions is considered telemetry data. It | |||
includes statistics, event records and logs, snapshots of state, | includes statistics, event records and logs, snapshots of state, | |||
configuration data, etc. It also covers the outputs of any active | configuration data, etc. It also covers the outputs of any active | |||
and passive measurements [RFC7799]. Specially, raw data can be | and passive measurements [RFC7799]. Specially, raw data can be | |||
processed in network before sending to a data consumer. Such | processed in-network before being sent to a data consumer. Such | |||
processed data are also telemetry data in the context. A | processed data is also considered telemetry data. A classification | |||
classification of the telemetry data form is provided in Section 4. | of telemetry data is provided in Section 4. | |||
2.2. Use Cases | 2.2. Use Cases | |||
These use cases are essential for network operations. While the list | The following set of use cases is essential for network operations. | |||
is by no means exhaustive, it is enough to highlight the requirements | While the list is by no means exhaustive, it is enough to highlight | |||
for data velocity, variety, volume, and veracity in networks. | the requirements for data velocity, variety, volume, and veracity in | |||
networks. | ||||
Security: Network intrusion detection and prevention need monitor | Security: Network intrusion detection and prevention systems need to | |||
network traffic and activities, and act upon anomalies. Given the | monitor network traffic and activities and act upon anomalies. | |||
more and more sophisticated attack vector and higher and higher | Given increasingly sophisticated attack vector coupled with | |||
tolls due to security breach, new tools and techniques need to be | increasingly severe consequences of security breaches, new tools | |||
developed, relying on wider and deeper visibility in networks. | and techniques need to be developed, relying on wider and deeper | |||
visibility in networks. | ||||
Policy and Intent Compliance: Network policies are the rules that | Policy and Intent Compliance: Network policies are the rules that | |||
constraint the services for network access, provide service | constraint the services for network access, provide service | |||
differentiation, or enforce specific treatment on the traffic. | differentiation, or enforce specific treatment on the traffic. | |||
For example, a service function chain is a policy that requires | For example, a service function chain is a policy that requires | |||
the selected flows to pass through a set of ordered network | the selected flows to pass through a set of ordered network | |||
functions. Intent, as defined in | functions. Intent, as defined in | |||
[I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational | [I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational | |||
goal that a network should meet and outcomes that a network is | goal that a network should meet and outcomes that a network is | |||
supposed to deliver, defined in a declarative manner without | supposed to deliver, defined in a declarative manner without | |||
skipping to change at page 6, line 15 ¶ | skipping to change at page 6, line 35 ¶ | |||
needs to be reported immediately. | needs to be reported immediately. | |||
SLA Compliance: A Service-Level Agreement (SLA) defines the level of | SLA Compliance: A Service-Level Agreement (SLA) defines the level of | |||
service a user expects from a network operator, which include the | service a user expects from a network operator, which include the | |||
metrics for the service measurement and remedy/penalty procedures | metrics for the service measurement and remedy/penalty procedures | |||
when the service level misses the agreement. Users need to check | when the service level misses the agreement. Users need to check | |||
if they get the service as promised and network operators need to | if they get the service as promised and network operators need to | |||
evaluate how they can deliver the services that can meet the SLA | evaluate how they can deliver the services that can meet the SLA | |||
based on realtime network measurement. | based on realtime network measurement. | |||
Root Cause Analysis: Any network failure can be the cause or effect | Root Cause Analysis: Any network failure can be the effect of a | |||
of a sequence of chained events. Troubleshooting and recovery | sequence of chained events. Troubleshooting and recovery require | |||
require quick identification of the root cause of any observable | quick identification of the root cause of any observable issues. | |||
issues. However, the root cause is not always straightforward to | However, the root cause is not always straightforward to identify, | |||
identify, especially when the failure is sporadic and the related | especially when the failure is sporadic and the number of event | |||
and unrelated events are overwhelming and interleaved. While | messages, both related and unrelated to the same cause, is | |||
machine learning technologies can be used for root cause analysis, | overwhelming. While machine learning technologies can be used for | |||
it up to the network to sense and provide the relevant data. | root cause analysis, it up to the network to sense and provide the | |||
relevant data. | ||||
Network Optimization: This covers all short-term and long-term | Network Optimization: This covers all short-term and long-term | |||
network optimization techniques, including load balancing, Traffic | network optimization techniques, including load balancing, Traffic | |||
Engineering (TE), and network planning. Network operators are | Engineering (TE), and network planning. Network operators are | |||
motivated to optimize their network utilization and differentiate | motivated to optimize their network utilization and differentiate | |||
services for better Return On Investment (ROI) or lower Capital | services for better Return On Investment (ROI) or lower Capital | |||
Expenditures (CAPEX). The first step is to know the real-time | Expenditures (CAPEX). The first step is to know the real-time | |||
network conditions before applying policies for traffic | network conditions before applying policies for traffic | |||
manipulation. In some cases, micro-bursts need to be detected in | manipulation. In some cases, micro-bursts need to be detected in | |||
a very short time-frame so that fine-grained traffic control can | a very short time-frame so that fine-grained traffic control can | |||
be applied to avoid network congestion. The long-term network | be applied to avoid network congestion. Long-term planning of | |||
capacity planning and topology augmentation rely on the | network capacity and topology requires analysis of real-world | |||
accumulated data of network operations. | network telemetry data that is obtained over long periods of time. | |||
Event Tracking and Prediction: The visibility of traffic path and | Event Tracking and Prediction: The visibility of traffic path and | |||
performance is critical for services and applications that rely on | performance is critical for services and applications that rely on | |||
healthy network operation. Numerous related network events are of | healthy network operation. Numerous related network events are of | |||
interest to network operators. For example, Network operators | interest to network operators. For example, Network operators | |||
want to learn where and why packets are dropped for an application | want to learn where and why packets are dropped for an application | |||
flow. They also want to be warned of issues in advance so | flow. They also want to be warned of issues in advance so | |||
proactive actions can be taken to avoid catastrophic consequences. | proactive actions can be taken to avoid catastrophic consequences. | |||
2.3. Challenges | 2.3. Challenges | |||
For a long time, network operators have relied upon SNMP [RFC3416], | For a long time, network operators have relied upon SNMP [RFC3416], | |||
Command-Line Interface (CLI), or Syslog to monitor the network. Some | Command-Line Interface (CLI), or Syslog to monitor the network. Some | |||
other OAM techniques as described in [RFC7276] are also used to | other OAM techniques as described in [RFC7276] are also used to | |||
facilitate network troubleshooting. these conventional techniques | facilitate network troubleshooting. These conventional techniques | |||
are not sufficient to support the above use cases for the following | are not sufficient to support the above use cases for the following | |||
reasons, which explains why new standards and techniques keep | reasons: | |||
emerging and the needs remain high: | ||||
o Most use cases need to continuously monitor the network and | o Most use cases need to continuously monitor the network and | |||
dynamically refine the data collection in real-time. The poll- | dynamically refine the data collection in real-time. The poll- | |||
based low-frequency data collection is ill-suited for these | based low-frequency data collection is ill-suited for these | |||
applications. Subscription-based streaming data directly pushed | applications. Subscription-based streaming data directly pushed | |||
from the data source (e.g., the forwarding chip) is preferred to | from the data source (e.g., the forwarding chip) is preferred to | |||
provide enough data quantity and precision at scale. | provide enough data quantity and precision at scale. | |||
o Comprehensive data is needed from packet processing engine to | o Comprehensive data is needed from packet processing engine to | |||
traffic manager, from line cards to main control board, from user | traffic manager, from line cards to main control board, from user | |||
skipping to change at page 8, line 7 ¶ | skipping to change at page 8, line 25 ¶ | |||
precision which are beyond the capability of the existing | precision which are beyond the capability of the existing | |||
techniques. | techniques. | |||
o The conventional passive measurement techniques can either consume | o The conventional passive measurement techniques can either consume | |||
excessive network resources and render excessive redundant data, | excessive network resources and render excessive redundant data, | |||
or lead to inaccurate results; on the other hand, the conventional | or lead to inaccurate results; on the other hand, the conventional | |||
active measurement techniques can interfere with the user traffic | active measurement techniques can interfere with the user traffic | |||
and their results are indirect. Techniques that can collect | and their results are indirect. Techniques that can collect | |||
direct and on-demand data from user traffic are more favorable. | direct and on-demand data from user traffic are more favorable. | |||
These challenges were addressed by newer standards and techniques | ||||
(e.g., IPFIX/Netflow, PSAMP, IOAM, and YANG-Push) and more are | ||||
emerging. These standards and techniques need to be recognized and | ||||
accommodated in a new framework. | ||||
2.4. Glossary | 2.4. Glossary | |||
Before further discussion, we list some key terminology and acronyms | Before further discussion, we list some key terminology and acronyms | |||
used in this documents. We make an intended differentiation between | used in this documents. We make an intended differentiation between | |||
network telemetry and network OAM. However, it should be understood | the terms of network telemetry and OAM. However, it should be | |||
that there is not a hard-line distinction between the two concepts. | understood that there is not a hard-line distinction between the two | |||
Rather, some OAM techniques are in the scope of network telemetry. | concepts. Rather, network telemetry is considered as the extension | |||
of OAM. It covers all the existing OAM protocols but puts more | ||||
emphasis on the newer and emerging techniques and protocols | ||||
concerning all aspects of network data from acquisition to | ||||
consumption. | ||||
AI: Artificial Intelligence. In network domain, AI refers to the | AI: Artificial Intelligence. In network domain, AI refers to the | |||
machine-learning based technologies for automated network | machine-learning based technologies for automated network | |||
operation and other tasks. | operation and other tasks. | |||
AM: Alternate Marking, a flow performance measurement method, | AM: Alternate Marking, a flow performance measurement method, | |||
specified in [RFC8321]. | specified in [RFC8321]. | |||
BMP: BGP Monitoring Protocol, specified in [RFC7854]. | BMP: BGP Monitoring Protocol, specified in [RFC7854]. | |||
skipping to change at page 8, line 46 ¶ | skipping to change at page 9, line 24 ¶ | |||
IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. | IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. | |||
IOAM: In-situ OAM, a dataplane on-path telemetry technique. | IOAM: In-situ OAM, a dataplane on-path telemetry technique. | |||
NETCONF: Network Configuration Protocol, specified in [RFC6241]. | NETCONF: Network Configuration Protocol, specified in [RFC6241]. | |||
NetFlow: A Cisco protocol for flow record collecting, described in | NetFlow: A Cisco protocol for flow record collecting, described in | |||
[RFC3594]. | [RFC3594]. | |||
Network Telemetry: Acquiring and processing network data remotely | Network Telemetry: The process and instrumentation for acquiring and | |||
for network monitoring and operation. A general term for a large | utilizing network data remotely for network monitoring and | |||
set of network visibility techniques and protocols, with the | operation. A general term for a large set of network visibility | |||
characteristics defined in this document. Network telemetry | techniques and protocols, concerning aspects like data generation, | |||
collection, correlation, and consumption. Network telemetry | ||||
addresses the current network operation issues and enables smooth | addresses the current network operation issues and enables smooth | |||
evolution toward future intent-driven autonomous networks. | evolution toward future intent-driven autonomous networks. | |||
NMS: Network Management System, referring to applications that allow | NMS: Network Management System, referring to applications that allow | |||
network administrators manage a network's software and hardware | network administrators manage a network. | |||
components. It usually records data from a network's remote | ||||
points to carry out central reporting to a system administrator. | ||||
OAM: Operations, Administration, and Maintenance. A group of | OAM: Operations, Administration, and Maintenance. A group of | |||
network management functions that provide network fault | network management functions that provide network fault | |||
indication, fault localization, performance information, and data | indication, fault localization, performance information, and data | |||
and diagnosis functions. Most conventional network monitoring | and diagnosis functions. Most conventional network monitoring | |||
techniques and protocols belong to network OAM. | techniques and protocols belong to network OAM. | |||
PBT: Postcard-Based Telemetry, a dataplane on-path telemetry | PBT: Postcard-Based Telemetry, a dataplane on-path telemetry | |||
technique. | technique. | |||
skipping to change at page 9, line 30 ¶ | skipping to change at page 10, line 7 ¶ | |||
[RFC2578]. | [RFC2578]. | |||
SNMP: Simple Network Management Protocol. Version 1 and 2 are | SNMP: Simple Network Management Protocol. Version 1 and 2 are | |||
specified in [RFC1157] and [RFC3416], respectively. | specified in [RFC1157] and [RFC3416], respectively. | |||
YANG: The abbreviation of "Yet Another Next Generation". YANG is a | YANG: The abbreviation of "Yet Another Next Generation". YANG is a | |||
data modeling language for the definition of data sent over | data modeling language for the definition of data sent over | |||
network management protocols such as the NETCONF and RESTCONF. | network management protocols such as the NETCONF and RESTCONF. | |||
YANG is defined in [RFC6020]. | YANG is defined in [RFC6020]. | |||
YANG ECN A YANG model for Event-Condition-Action policies, defined | YANG ECA A YANG model for Event-Condition-Action policies, defined | |||
in [I-D.wwx-netmod-event-yang]. | in [I-D.wwx-netmod-event-yang]. | |||
YANG FSM: A YANG model that describes events, operations, and finite | YANG FSM: A YANG model that describes events, operations, and finite | |||
state machine of YANG-defined network elements. | state machine of YANG-defined network elements. | |||
YANG PUSH: A method to subscribe pushed data from remote YANG | YANG PUSH: A method to subscribe pushed data from remote YANG | |||
datastore on network devices. Details are specified in [RFC8641] | datastore on network devices. Details are specified in [RFC8641] | |||
and [RFC8639]. | and [RFC8639]. | |||
2.5. Network Telemetry | 2.5. Network Telemetry | |||
Network telemetry has emerged as a mainstream technical term to refer | Network telemetry has emerged as a mainstream technical term to refer | |||
to the newer data collection and consumption techniques, | to the network data collection and consumption techniques. Several | |||
distinguishing itself in some notable ways from the convention | network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and | |||
network OAM. Several such techniques have been widely deployed. The | gPRC [grpc]) have been widely deployed. Network telemetry allows | |||
representative techniques and protocols include IPFIX [RFC7011] and | separate entities to acquire data from network devices so that data | |||
gPRC [grpc]. Network telemetry allows separate entities to acquire | can be visualized and analyzed to support network monitoring and | |||
data from network devices so that data can be visualized and analyzed | operation. Network telemetry covers the conventional network OAM and | |||
to support network monitoring and operation. Network telemetry | has a wider scope. It is expected that network telemetry can provide | |||
overlaps with the conventional network OAM and has a wider scope than | the necessary network insight for autonomous networks and address the | |||
it. It is expected that network telemetry can provide the necessary | shortcomings of conventional OAM techniques. | |||
network insight for autonomous networks and address the shortcomings | ||||
of conventional OAM techniques. | ||||
One difference between the network telemetry and the conventional | Network telemetry usually assumes machines as data consumer rather | |||
network OAM is that in general the network telemetry assumes machines | than human operators. Hence, the network telemetry can directly | |||
as data consumer rather than human operators. Hence, the network | trigger the automated network operation, while in contrast some | |||
telemetry can directly trigger the automated network operation, while | conventional OAM tools are designed and used to help human operators | |||
the conventional OAM tools usually help human operators to monitor | to monitor and diagnose the networks and guide manual network | |||
and diagnose the networks and guide manual network operations. The | operations. Such a proposition leads to very different techniques. | |||
difference leads to very different techniques. | ||||
Although the network telemetry techniques are just emerging and | Although new network telemetry techniques are emerging and subject to | |||
subject to continuous evolution, several characteristics of network | continuous evolution, several characteristics of network telemetry | |||
telemetry have been well accepted. Note that network telemetry is | have been well accepted. Note that network telemetry is intended to | |||
intended to be an umbrella term covering a wide spectrum of | be an umbrella term covering a wide spectrum of techniques, so the | |||
techniques, so the following characteristics are not expected to be | following characteristics are not expected to be held by every | |||
held by every specific technique. | specific technique. | |||
o Push and Streaming: Instead of polling data from network devices, | o Push and Streaming: Instead of polling data from network devices, | |||
the telemetry collector subscribes to the streaming data pushed | telemetry collectors subscribe to streaming data pushed from data | |||
from data sources in network devices. | sources in network devices. | |||
o Volume and Velocity: The telemetry data is intended to be consumed | o Volume and Velocity: The telemetry data is intended to be consumed | |||
by machines rather than by human being. Therefore, the data | by machines rather than by human being. Therefore, the data | |||
volume is huge and the processing is often in realtime. | volume is huge and the processing is often in realtime. | |||
o Normalization and Unification: Telemetry aims to address the | o Normalization and Unification: Telemetry aims to address the | |||
overall network automation needs. The piecemeal solutions offered | overall network automation needs. Efforts are made to normalize | |||
by the conventional OAM approach are no longer suitable. Efforts | the data representation and unify the protocols, so to simplify | |||
need to be made to normalize the data representation and unify the | data analysis and tying it all in with automation solutions | |||
protocols. | ||||
o Model-based: The telemetry data is modeled in advance which allows | o Model-based: The telemetry data is modeled in advance which allows | |||
applications to configure and consume data with ease. | applications to configure and consume data with ease. | |||
o Data Fusion: The data for a single application can come from | o Data Fusion: The data for a single application can come from | |||
multiple data sources (e.g., cross-domain, cross-device, and | multiple data sources (e.g., cross-domain, cross-device, and | |||
cross-layer) and needs to be correlated to take effect. | cross-layer) and needs to be correlated to take effect. | |||
o Dynamic and Interactive: Since the network telemetry means to be | o Dynamic and Interactive: Since the network telemetry means to be | |||
used in a closed control loop for network automation, it needs to | used in a closed control loop for network automation, it needs to | |||
skipping to change at page 11, line 37 ¶ | skipping to change at page 12, line 8 ¶ | |||
data collection approaches, the new hybrid approach allows to | data collection approaches, the new hybrid approach allows to | |||
directly collect data for any target flow on its entire forwarding | directly collect data for any target flow on its entire forwarding | |||
path [I-D.song-opsawg-ifit-framework]. | path [I-D.song-opsawg-ifit-framework]. | |||
It is worth noting that, a network telemetry system should not be | It is worth noting that, a network telemetry system should not be | |||
intrusive to normal network operations, by avoiding the pitfall of | intrusive to normal network operations, by avoiding the pitfall of | |||
the "observer effect". That is, it should not change the network | the "observer effect". That is, it should not change the network | |||
behavior and affect the forwarding performance. Otherwise, the whole | behavior and affect the forwarding performance. Otherwise, the whole | |||
purpose of network telemetry is defied. | purpose of network telemetry is defied. | |||
Although in many cases a network telemetry system involves a remote | Although in many cases a system for network telemetry involves a | |||
data collecting, processing, and reacting entity, it is important to | remote data collecting and consuming entity, it is important to | |||
understand that network telemetry does not infer the necessity of | understand that there are no inherent assumptions about how a system | |||
such an entity. Telemetry data producers and consumers can work in | should be architected. Telemetry data producers and consumers can | |||
distributed or peer-to-peer fashions instead. In such cases, a | work in distributed or peer-to-peer fashions rather than assuming a | |||
network node can be the direct consumer of telemetry data from other | centralized data consuming entity. In such cases, a network node can | |||
nodes. | be the direct consumer of telemetry data from other nodes. | |||
3. The Necessity of a Network Telemetry Framework | 3. The Necessity of a Network Telemetry Framework | |||
Network data analytics and machine-learning technologies are applied | Network data analytics and machine-learning technologies are applied | |||
for network operation automation, relying on abundant and coherent | for network operation automation, relying on abundant and coherent | |||
data from networks. The single-sourced and static data acquisition | data from networks. Data acquisition that is limited to a single | |||
cannot meet the data requirements. The scattered standards and | source and static in nature will in many cases not be sufficient to | |||
diverse techniques are hard to be integrated. It is desirable to | meet an application's telemetry data needs. As a result, multiple | |||
have a framework that classifies and organizes different telemetry | data sources, involving a variety of techniques and standards, will | |||
data source and types, defines different components of a network | need to be integrated. It is desirable to have a framework that | |||
telemetry system and their interactions, and helps coordinate and | classifies and organizes different telemetry data source and types, | |||
integrate multiple telemetry approaches from different layers. This | defines different components of a network telemetry system and their | |||
allows flexible combinations for different applications, while | interactions, and helps coordinate and integrate multiple telemetry | |||
normalizing and simplifying interfaces. In detail, such a framework | approaches across layers. This allows flexible combinations of data | |||
would benefit application development for the following reasons: | for different applications, while normalizing and simplifying | |||
interfaces. In detail, such a framework would benefit application | ||||
development for the following reasons: | ||||
o The future autonomous networks will require a holistic view on | o Future networks, autonomous or otherwise, depend on holistic and | |||
network visibility. All the use cases and applications need to be | comprehensive network visibility. All the use cases and | |||
supported uniformly and coherently under a single intelligent | applications are better to be supported uniformly and coherently | |||
agent. Therefore, the protocols and mechanisms should be | under a single intelligent agent. Therefore, the protocols and | |||
consolidated into a minimum yet comprehensive set. A telemetry | mechanisms should be consolidated into a minimum yet comprehensive | |||
framework can help to normalize the technique developments. | set. A telemetry framework can help to normalize the technique | |||
developments. | ||||
o Network visibility presents multiple viewpoints. For example, the | o Network visibility presents multiple viewpoints. For example, the | |||
device viewpoint takes the network infrastructure as the | device viewpoint takes the network infrastructure as the | |||
monitoring object from which the network topology and device | monitoring object from which the network topology and device | |||
status can be acquired; the traffic viewpoint takes the flows or | status can be acquired; the traffic viewpoint takes the flows or | |||
packets as the monitoring object from which the traffic quality | packets as the monitoring object from which the traffic quality | |||
and path can be acquired. An application may need to switch its | and path can be acquired. An application may need to switch its | |||
viewpoint during operation. It may also need to correlate a | viewpoint during operation. It may also need to correlate a | |||
service and its impact on network experience to acquire the | service and its impact on network experience to acquire the | |||
comprehensive information. | comprehensive information. | |||
o Applications require network telemetry to be elastic in order to | o Applications require network telemetry to be elastic in order to | |||
efficiently use the network resource and reduce the performance | make efficient use of network resources and reduce the impact of | |||
impact. Routine network monitoring covers the entire network with | processing related to network telemetry on network performance. | |||
low data sampling rate. When issues arise or trends emerge, the | For example, routine network monitoring should cover the entire | |||
telemetry data source can be modified and the data rate can be | network with a low data sampling rate. Only when issues arise or | |||
boosted. | critical trends emerge should telemetry data source be modified | |||
and telemetry data rates boosted as needed. | ||||
o Efficient data fusion is critical for applications to reduce the | o Efficient data fusion is critical for applications to reduce the | |||
overall quantity of data and improve the accuracy of analysis. | overall quantity of data and improve the accuracy of analysis. | |||
A telemetry framework collects together all of the telemetry-related | A telemetry framework collects together all of the telemetry-related | |||
works from different sources and working groups within IETF. This | works from different sources and working groups within IETF. This | |||
makes it possible to assemble a comprehensive network telemetry | makes it possible to assemble a comprehensive network telemetry | |||
system and to avoid repetitious or redundant work. The framework | system and to avoid repetitious or redundant work. The framework | |||
should cover the concepts and components from the standardization | should cover the concepts and components from the standardization | |||
perspective. This document clarifies the layered modules on which | perspective. This document describes the modules which make up a | |||
the telemetry is exerted and decomposes the telemetry system into a | network telemetry framework and decomposes the telemetry system into | |||
set of distinct components that the existing and future work can | a set of distinct components that existing and future work can easily | |||
easily map to. | map to. | |||
4. Network Telemetry Framework | 4. Network Telemetry Framework | |||
The top level network telemetry framework partitions the network | The top level network telemetry framework partitions the network | |||
telemetry into four modules based on the telemetry data object source | telemetry into four modules based on the telemetry data object source | |||
and represents their relationship. The next level framework reveals | and represents their relationship. At the next level, the framework | |||
that each module replicates the same architecture comprising the same | decomposes each module into separate components. Each of the modules | |||
set of components. Throughout the framework, the same set of | follows the same underlying structure, with one component dedicated | |||
to the configuration of data subscriptions and data sources, a second | ||||
component dedicated to encoding and exporting data, and a third | ||||
component instrumenting the generation of telemetry related to the | ||||
underlying resources. Throughout the framework, the same set of | ||||
abstract data acquiring mechanisms and data types are applied. The | abstract data acquiring mechanisms and data types are applied. The | |||
two-level architecture with the uniform data abstraction helps | two-level architecture with the uniform data abstraction helps | |||
accurately pinpoint a protocol or technique to its position in a | accurately pinpoint a protocol or technique to its position in a | |||
network telemetry system or disaggregate a network telemetry system | network telemetry system or disaggregate a network telemetry system | |||
into manageable parts. | into manageable parts. | |||
4.1. Top Level Modules | 4.1. Top Level Modules | |||
Telemetry can be applied on the forwarding plane, the control plane, | Telemetry can be applied on the forwarding plane, the control plane, | |||
and the management plane in a network, as well as other sources out | and the management plane in a network, as well as other sources out | |||
skipping to change at page 14, line 12 ¶ | skipping to change at page 14, line 37 ¶ | |||
Figure 1: Modules in Layer Category of NTF | Figure 1: Modules in Layer Category of NTF | |||
The rationale of this partition lies in the different telemetry data | The rationale of this partition lies in the different telemetry data | |||
objects which result in different data source and export locations. | objects which result in different data source and export locations. | |||
Such differences have profound implications on in-network data | Such differences have profound implications on in-network data | |||
programming and processing capability, data encoding and transport | programming and processing capability, data encoding and transport | |||
protocol, and data bandwidth and latency. | protocol, and data bandwidth and latency. | |||
We summarize the major differences of the four modules in the | We summarize the major differences of the four modules in the | |||
following table. They are compared from six aspects: data object, | following table. They are compared from six aspects: | |||
data export location, data model, data encoding, telemetry protocol, | ||||
and transport method. Data object is the target and source of each | o Data Object | |||
module. Because the data source varies, the data export location | ||||
varies. For example, the forwarding plane data are mainly from the | o Data Export Location | |||
fast path(e.g., forwarding chips) while the control plane data are | ||||
mainly from the slow path (e.g., main control CPU). For convenience | o Data Model | |||
and efficiency, it is preferred to export the data from locations | ||||
near the source. Because each data export location has different | o Data Encoding | |||
capability, the proper data model, encoding, and transport method | ||||
cannot be kept the same. For example, the forwarding chip has high | o Telemetry Protocol | |||
throughput but limited capacity for processing complex data and | ||||
maintaining states, while the main control CPU is capable of complex | o Transport Method | |||
data and state processing, but has limited bandwidth for high | ||||
throughput data. As a result, the suitable telemetry protocol for | Data object is the target and source of each module. Because the | |||
each module can be different. Some representative techniques are | data source varies, the data export location varies. For example, | |||
shown in the corresponding table blocks to highlight the technical | the forwarding plane data are mainly from the fast path(e.g., | |||
diversity of these modules. The key point is that one cannot expect | forwarding chips) while the control plane data are mainly from the | |||
to use a universal protocol to cover all the network telemetry | slow path (e.g., main control CPU). For convenience and efficiency, | |||
requirements. | it is preferred to export the data from locations near the source. | |||
Because each data export location has different capability, the | ||||
proper data model, encoding, and transport method cannot be kept the | ||||
same. For example, the forwarding chip has high throughput but | ||||
limited capacity for processing complex data and maintaining states, | ||||
while the main control CPU is capable of complex data and state | ||||
processing, but has limited bandwidth for high throughput data. As a | ||||
result, the suitable telemetry protocol for each module can be | ||||
different. Some representative techniques are shown in the | ||||
corresponding table blocks to highlight the technical diversity of | ||||
these modules. Note that the selected techniques just reflect the | ||||
de-facto state of the art and are not exhaustive. The key point is | ||||
that one cannot expect to use a universal protocol to cover all the | ||||
network telemetry requirements. | ||||
+---------+--------------+--------------+--------------+-----------+ | +---------+--------------+--------------+--------------+-----------+ | |||
| Module | Control | Management | Forwarding | External | | | Module | Control | Management | Forwarding | External | | |||
| | Plane | Plane | Plane | Data | | | | Plane | Plane | Plane | Data | | |||
+---------+--------------+--------------+--------------+-----------+ | +---------+--------------+--------------+--------------+-----------+ | |||
|Object | control | config. & | flow & packet| terminal, | | |Object | control | config. & | flow & packet| terminal, | | |||
| | protocol & | operation | QoS, traffic | social & | | | | protocol & | operation | QoS, traffic | social & | | |||
| | signaling, | state, MIB | stat., buffer| environ- | | | | signaling, | state, MIB | stat., buffer| environ- | | |||
| | RIB, ACL | | & queue stat.| mental | | | | RIB, ACL | | & queue stat.| mental | | |||
+---------+--------------+--------------+--------------+-----------+ | +---------+--------------+--------------+--------------+-----------+ | |||
skipping to change at page 16, line 10 ¶ | skipping to change at page 17, line 10 ¶ | |||
the control plane telemetry. | the control plane telemetry. | |||
The requirements and challenges for each module are summarized as | The requirements and challenges for each module are summarized as | |||
follows. | follows. | |||
4.1.1. Management Plane Telemetry | 4.1.1. Management Plane Telemetry | |||
The management plane of network elements interacts with the Network | The management plane of network elements interacts with the Network | |||
Management System (NMS), and provides information such as performance | Management System (NMS), and provides information such as performance | |||
data, network logging data, network warning and defects data, and | data, network logging data, network warning and defects data, and | |||
network statistics and state data. Some legacy protocols, such as | network statistics and state data. The management plane includes | |||
SNMP and Syslog, are widely used for the management plane. However, | many protocols, including some that are considered "legacy", such as | |||
these protocols are insufficient to meet the requirements of the | SNMP and syslog. Regardless the protocol, management plane telemetry | |||
future automated network operation applications. | must address the following requirements: | |||
New management plane telemetry protocols should consider the | ||||
following requirements: | ||||
Convenient Data Subscription: An application should have the freedom | Convenient Data Subscription: An application should have the freedom | |||
to choose the data export means such as the data types and the | to choose the data export means such as the data types and the | |||
export frequency. | export frequency. | |||
Structured Data: For automatic network operation, machines will | Structured Data: For automatic network operation, machines will | |||
replace human for network data comprehension. The schema | replace human for network data comprehension. The schema | |||
languages such as YANG can efficiently describe structured data | languages such as YANG can efficiently describe structured data | |||
and normalize data encoding and transformation. | and normalize data encoding and transformation. | |||
High Speed Data Transport: In order to retain the information, a | High Speed Data Transport: In order to keep up with the velocity of | |||
server needs to send a large amount of data at high frequency. | information, a server needs to be able to send large amounts of | |||
Compact encoding formats are needed to compress the data and | data at high frequency. Compact encoding formats are needed to | |||
improve the data transport efficiency. The subscription mode, by | compress the data and improve the data transport efficiency. The | |||
replacing the query mode, reduces the interactions between clients | subscription mode, by replacing the query mode, reduces the | |||
and servers and helps to improve the server's efficiency. | interactions between clients and servers and helps to improve the | |||
server's efficiency. | ||||
4.1.2. Control Plane Telemetry | 4.1.2. Control Plane Telemetry | |||
The control plane telemetry refers to the health condition monitoring | The control plane telemetry refers to the health condition monitoring | |||
of different network control protocols covering Layer 2 to Layer 7. | of different network control protocols covering Layer 2 to Layer 7. | |||
Keeping track of the running status of these protocols is beneficial | Keeping track of the running status of these protocols is beneficial | |||
for detecting, localizing, and even predicting various network | for detecting, localizing, and even predicting various network | |||
issues, as well as network optimization, in real-time and in fine | issues, as well as network optimization, in real-time and in fine | |||
granularity. | granularity. | |||
skipping to change at page 17, line 20 ¶ | skipping to change at page 18, line 20 ¶ | |||
and network optimization. | and network optimization. | |||
An example of the control plane telemetry is the BGP monitoring | An example of the control plane telemetry is the BGP monitoring | |||
protocol (BMP), it is currently used to monitoring the BGP routes and | protocol (BMP), it is currently used to monitoring the BGP routes and | |||
enables rich applications, such as BGP peer analysis, AS analysis, | enables rich applications, such as BGP peer analysis, AS analysis, | |||
prefix analysis, security analysis, and so on. However, the | prefix analysis, security analysis, and so on. However, the | |||
monitoring of other layers, protocols and the cross-layer, cross- | monitoring of other layers, protocols and the cross-layer, cross- | |||
protocol KPI correlations are still in their infancy (e.g., the IGP | protocol KPI correlations are still in their infancy (e.g., the IGP | |||
monitoring is missing), which require further research. | monitoring is missing), which require further research. | |||
4.1.3. Data Plane Telemetry | 4.1.3. Forwarding Plane Telemetry | |||
An effective data plane telemetry system relies on the data that the | An effective forwarding plane telemetry system relies on the data | |||
network device can expose. The data's quality, quantity, and | that the network device can expose. The quality, quantity, and | |||
timeliness must meet some stringent requirements. This raises some | timeliness of data must meet some stringent requirements. This | |||
challenges to the network data plane devices where the first hand | raises some challenges to the network data plane devices where the | |||
data originate. | first hand data originate. | |||
o A data plane device's main function is user traffic processing and | o A data plane device's main function is user traffic processing and | |||
forwarding. While supporting network visibility is important, the | forwarding. While supporting network visibility is important, the | |||
telemetry is just an auxiliary function, and it should not impede | telemetry is just an auxiliary function, and it should not impede | |||
normal traffic processing and forwarding (i.e., the performance is | normal traffic processing and forwarding (i.e., the performance is | |||
not lowered and the behavior is not altered due to the telemetry | not lowered and the behavior is not altered due to the telemetry | |||
functions). | functions). | |||
o The network operation applications requires end-to-end visibility | o Network operation applications require end-to-end visibility | |||
from various sources, which results in a huge volume of data. | across various sources, which can result in a huge volume of data. | |||
However, the sheer data quantity should not stress the network | However, the sheer data quantity should not exhaust the network | |||
bandwidth, regardless of the data delivery approach (i.e., through | bandwidth, regardless of the data delivery approach (i.e., whether | |||
in-band or out-of-band channels). | through in-band or out-of-band channels). | |||
o The data plane devices must provide timely data with the minimum | o The data plane devices must provide timely data with the minimum | |||
possible delay. Long processing, transport, storage, and analysis | possible delay. Long processing, transport, storage, and analysis | |||
delay can impact the effectiveness of the control loop and even | delay can impact the effectiveness of the control loop and even | |||
render the data useless. | render the data useless. | |||
o The data should be structured and labeled, and easy for | o The data should be structured and labeled, and easy for | |||
applications to parse and consume. At the same time, the data | applications to parse and consume. At the same time, the data | |||
types needed by applications can vary significantly. The data | types needed by applications can vary significantly. The data | |||
plane devices need to provide enough flexibility and | plane devices need to provide enough flexibility and | |||
programmability to support the precise data provision for | programmability to support the precise data provision for | |||
applications. | applications. | |||
o The data plane telemetry should support incremental deployment and | o The data plane telemetry should support incremental deployment and | |||
work even though some devices are unaware of the system. This | work even though some devices are unaware of the system. This | |||
challenge is highly relevant to the standards and legacy networks. | challenge is highly relevant to the standards and legacy networks. | |||
The data plane programmability is essential to support network | Although not specific to the forwarding plane, these challenges are | |||
telemetry. Newer data plane forwarding chips are equipped with | more difficult to the forwarding plane because of the limited | |||
advanced telemetry features and provide flexibility to support | resource and flexibility. The data plane programmability is | |||
customized telemetry functions. | essential to support network telemetry. Newer data plane forwarding | |||
chips are equipped with advanced telemetry features and provide | ||||
flexibility to support customized telemetry functions. | ||||
4.1.3.1. Technique Taxonomy | 4.1.3.1. Technique Taxonomy | |||
There can be multiple possible dimensions to classify the data plane | There can be multiple possible dimensions to classify the forwarding | |||
telemetry techniques. | plane telemetry techniques. | |||
Active, Passive, and Hybrid: The active and passive methods (as well | Active, Passive, and Hybrid: Active and passive methods (as well as | |||
as the hybrid types) are well documented in [RFC7799]. The | the hybrid types) are well documented in [RFC7799]. Passive | |||
passive methods include TCPDUMP, IPFIX [RFC7011], sflow, and | methods include TCPDUMP, IPFIX [RFC7011], sflow, and traffic | |||
traffic mirror. These methods usually have low data coverage. | mirroring. These methods usually have low data coverage. The | |||
The bandwidth cost is very high in order to improve the data | bandwidth cost is very high in order to improve the data coverage. | |||
coverage. On the other hand, the active methods include Ping, | On the other hand, active methods include Ping, OWAMP [RFC4656], | |||
Traceroute, OWAMP [RFC4656], TWAMP [RFC5357], and Cisco's SLA | TWAMP [RFC5357], and Cisco's SLA Protocol [RFC6812]. These | |||
Protocol [RFC6812]. These methods are intrusive and only provide | methods are intrusive and only provide indirect network | |||
indirect network measurement results. The hybrid methods, | measurement results. Hybrid methods, including in-situ OAM | |||
including in-situ OAM [I-D.ietf-ippm-ioam-data], IPFPM [RFC8321], | [I-D.ietf-ippm-ioam-data], IPFPM [RFC8321], and Multipoint | |||
and Multipoint Alternate Marking | Alternate Marking [I-D.fioccola-ippm-multipoint-alt-mark], provide | |||
[I-D.fioccola-ippm-multipoint-alt-mark], provide a well-balanced | a well-balanced and more flexible approach. However, these | |||
and more flexible approach. However, these methods are also more | methods are also more complex to implement. | |||
complex to implement. | ||||
In-Band and Out-of-Band: The telemetry data, before being exported | In-Band and Out-of-Band: The telemetry data, before being exported | |||
to some collector, can be carried in user packets. Such methods | to some collector, can be carried in user packets. Such methods | |||
are considered in-band (e.g., in-situ OAM | are considered in-band (e.g., in-situ OAM | |||
[I-D.ietf-ippm-ioam-data]). If the telemetry data is directly | [I-D.ietf-ippm-ioam-data]). If the telemetry data is directly | |||
exported to some collector without modifying the user packets, | exported to some collector without modifying the user packets, | |||
such methods are considered out-of-band (e.g., postcard-based | such methods are considered out-of-band (e.g., postcard-based | |||
INT). It is possible to have hybrid methods. For example, only | INT). It is possible to have hybrid methods. For example, only | |||
the telemetry instruction or partial data is carried by user | the telemetry instruction or partial data is carried by user | |||
packets (e.g., IPFPM [RFC8321]). | packets (e.g., IPFPM [RFC8321]). | |||
E2E and In-Network: Some E2E methods start from and end at the | E2E and In-Network: Some E2E methods start from and end at the | |||
network end hosts (e.g., Ping). The other methods work in | network end hosts (e.g., Ping). The other methods work in | |||
networks and are transparent to end hosts. However, if needed, | networks and are transparent to end hosts. However, if needed, | |||
the in-network methods can be easily extended into end hosts. | in-network methods can be easily extended into end hosts. | |||
Information Type: Depending on the telemetry objective, the methods | Information Type: Depending on the telemetry objective, the methods | |||
can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), | can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), | |||
path-based (e.g., Traceroute), and node-based (e.g., IPFIX | path-based (e.g., Traceroute), and node-based (e.g., IPFIX | |||
[RFC7011]). The various data objects can be packet, flow record, | [RFC7011]). The various data objects can be packet, flow record, | |||
measurement, states, and signal. | measurement, states, and signal. | |||
4.1.4. External Data Telemetry | 4.1.4. External Data Telemetry | |||
skipping to change at page 20, line 6 ¶ | skipping to change at page 21, line 6 ¶ | |||
possibilities of current and future network systems, as reflected in | possibilities of current and future network systems, as reflected in | |||
the incorporation of cognitive capabilities to new hardware and | the incorporation of cognitive capabilities to new hardware and | |||
software (virtual) elements. | software (virtual) elements. | |||
4.2. Second Level Function Components | 4.2. Second Level Function Components | |||
Reflecting the best current practice, the telemetry module at each | Reflecting the best current practice, the telemetry module at each | |||
plane is further partitioned into five distinct components: | plane is further partitioned into five distinct components: | |||
Data Query, Analysis, and Storage: This component works at the | Data Query, Analysis, and Storage: This component works at the | |||
application layer. On the one hand, it is responsible for issuing | application layer. It is a part of the network management system | |||
data requirements. The data of interest can be modeled data | at the receiver side. On the one hand, it is responsible for | |||
through configuration or custom data through programming. The | issuing data requirements. The data of interest can be modeled | |||
data requirements can be queries for one-shot data or | data through configuration or custom data through programming. | |||
The data requirements can be queries for one-shot data or | ||||
subscriptions for events or streaming data. On the other hand, it | subscriptions for events or streaming data. On the other hand, it | |||
receives, stores, and processes the returned data from network | receives, stores, and processes the returned data from network | |||
devices. Data analysis can be interactive to initiate further | devices. Data analysis can be interactive to initiate further | |||
data queries. This component can reside in either network devices | data queries. This component can reside in either network devices | |||
or remote controllers. | or remote controllers. It can be centralized and distributed, and | |||
involve one or more instances. | ||||
Data Configuration and Subscription: This component deploys data | Data Configuration and Subscription: This component deploys data | |||
queries on devices. It determines the protocol and channel for | queries on devices. It determines the protocol and channel for | |||
applications to acquire desired data. This component is also | applications to acquire desired data. This component is also | |||
responsible for configuring the desired data that might not be | responsible for configuring the desired data that might not be | |||
directly available form data sources. The subscription data can | directly available form data sources. The subscription data can | |||
be described by models, templates, or programs. | be described by models, templates, or programs. | |||
Data Encoding and Export: This component determines how telemetry | Data Encoding and Export: This component determines how telemetry | |||
data are delivered to the data analysis and storage component. | data are delivered to the data analysis and storage component. | |||
skipping to change at page 21, line 5 ¶ | skipping to change at page 22, line 5 ¶ | |||
data sources. This may involve in-network computing and | data sources. This may involve in-network computing and | |||
processing on either the fast path or the slow path in network | processing on either the fast path or the slow path in network | |||
devices. | devices. | |||
Data Object and Source: This component determines the monitoring | Data Object and Source: This component determines the monitoring | |||
object and original data source. The data source usually just | object and original data source. The data source usually just | |||
provides raw data which needs further processing. A data source | provides raw data which needs further processing. A data source | |||
can be considered a probe. A probe can be statically installed or | can be considered a probe. A probe can be statically installed or | |||
dynamically installed. | dynamically installed. | |||
+----------------------------------------+ | +----------------------------------------+ | |||
| | | +----------------------------------------+ | | |||
| Data Query, Analysis, & Storage | | | | | | |||
| | | | Data Query, Analysis, & Storage | | | |||
| | + | ||||
+-------+++ -----------------------------+ | +-------+++ -----------------------------+ | |||
||| ^^^ | ||| ^^^ | |||
||| ||| | ||| ||| | |||
||V ||| | ||V ||| | |||
+--+V--------------------+++------------+ | +--+V--------------------+++------------+ | |||
+-----V---------------------+------------+ | | +-----V---------------------+------------+ | | |||
+---------------------+-------+----------+ | | | +---------------------+-------+----------+ | | | |||
| Data Configuration | | | | | | Data Configuration | | | | | |||
| & Subscription | Data Encoding | | | | | & Subscription | Data Encoding | | | | |||
| (model, template, | & Export | | | | | (model, template, | & Export | | | | |||
skipping to change at page 22, line 8 ¶ | skipping to change at page 23, line 10 ¶ | |||
In contrast, query is used when a querier expects immediate and one- | In contrast, query is used when a querier expects immediate and one- | |||
off feedback from network devices. The queried data may be directly | off feedback from network devices. The queried data may be directly | |||
extracted from some specific data source, or synthesized and | extracted from some specific data source, or synthesized and | |||
processed from raw data. Query suits for interactive network | processed from raw data. Query suits for interactive network | |||
telemetry applications. | telemetry applications. | |||
There are four types of data from network devices: | There are four types of data from network devices: | |||
Simple Data: The data that are steadily available from some data | Simple Data: The data that are steadily available from some data | |||
store or static probes in network devices. such data can be | store or static probes in network devices. such data can be | |||
specified by YANG model. | specified by YANG model. | |||
Complex Data: The data need to be synthesized or processed in | Complex Data: The data need to be synthesized or processed in | |||
network from raw data from one or more network devices. The data | network from raw data from one or more network devices. The data | |||
processing function can be statically or dynamically loaded into | processing function can be statically or dynamically loaded into | |||
network devices. | network devices. | |||
Event-triggered Data: The data are conditionally acquired based on | Event-triggered Data: The data are conditionally acquired based on | |||
the occurrence of some events. It can be actively pushed through | the occurrence of some events. It can be actively pushed through | |||
subscription or passively polled through query. There are many | subscription or passively polled through query. There are many | |||
ways to model events, including using Finite State Machine (FSM) | ways to model events, including using Finite State Machine (FSM) | |||
or Event Condition Action (ECN) [I-D.wwx-netmod-event-yang]. | or Event Condition Action (ECA) [I-D.wwx-netmod-event-yang]. | |||
Streaming Data: The data are continuously generated. It can be time | Streaming Data: The data are continuously generated. It can be time | |||
series or the dump of databases. The streaming data reflect | series or the dump of databases. The streaming data reflect | |||
realtime network states and metrics and require large bandwidth | realtime network states and metrics and require large bandwidth | |||
and processing power. The streaming data are always actively | and processing power. The streaming data are always actively | |||
pushed to the subscribers. | pushed to the subscribers. | |||
The above data types are not mutually exclusive. Rather, they often | The above data types are not mutually exclusive. Rather, they often | |||
overlap. For example, event-triggered data can be simple or complex, | overlap. For example, event-triggered data can be simple or complex, | |||
and streaming data can be simple, complex, or triggered by events. | and streaming data can be simple, complex, or triggered by events. | |||
skipping to change at page 24, line 10 ¶ | skipping to change at page 25, line 34 ¶ | |||
Figure 5: Existing Work Mapping I | Figure 5: Existing Work Mapping I | |||
The second table is based on the telemetry modules and components. | The second table is based on the telemetry modules and components. | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
| | Management | Control | Forwarding | | | | Management | Control | Forwarding | | |||
| | Plane | Plane | Plane | | | | Plane | Plane | Plane | | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
| data config.| gRPC, NETCONF, | NETCONF/YANG | NETCONF/YANG,| | | data config.| gRPC, NETCONF, | NETCONF/YANG | NETCONF/YANG,| | |||
| & subscribe | SMIv2,YANG PUSH | | YANG FSM | | | & subscribe | SMIv2,YANG PUSH | YANG PUSH | YANG PUSH | | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
| data gen. & | DNP, | DNP, | IOAM, | | | data gen. & | DNP, | DNP, | IOAM, PSAMP | | |||
| process | YANG | YANG | PBT, IPFPM, | | | process | YANG | YANG | PBT, IPFPM, | | |||
| | | | DNP | | | | | | DNP | | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
| data | gRPC, NETCONF | BMP, NETCONF | IPFIX | | | data | gRPC, NETCONF | BMP, NETCONF | IPFIX | | |||
| export | YANG PUSH | | | | | export | YANG PUSH | | | | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
Figure 6: Existing Work Mapping II | Figure 6: Existing Work Mapping II | |||
5. Evolution of Network Telemetry | 5. Evolution of Network Telemetry | |||
skipping to change at page 24, line 34 ¶ | skipping to change at page 26, line 17 ¶ | |||
Network telemetry is a fast evolving technical area. As the network | Network telemetry is a fast evolving technical area. As the network | |||
moves towards the automated operation, network telemetry undergoes | moves towards the automated operation, network telemetry undergoes | |||
several stages of evolution. Each stage is built upon the techniques | several stages of evolution. Each stage is built upon the techniques | |||
enabled by previous stages. | enabled by previous stages. | |||
Stage 0 - Static Telemetry: The telemetry data source and type are | Stage 0 - Static Telemetry: The telemetry data source and type are | |||
determined at design time. The network operator can only | determined at design time. The network operator can only | |||
configure how to use it with limited flexibility. | configure how to use it with limited flexibility. | |||
Stage 1 - Dynamic Telemetry: The custom telemetry data can be | Stage 1 - Dynamic Telemetry: The custom telemetry data can be | |||
dynamically programmed or configured at runtime, allowing a | dynamically programmed or configured at runtime without | |||
tradeoff among resource, performance, flexibility, and coverage. | interrupting the network operation, allowing a tradeoff among | |||
DNP is an effort towards this direction. | resource, performance, flexibility, and coverage. DNP is an | |||
effort towards this direction. | ||||
Stage 2 - Interactive Telemetry: The network operator can | Stage 2 - Interactive Telemetry: The network operator can | |||
continuously customize the telemetry data in real time to reflect | continuously customize and fine tune the telemetry data in real | |||
the network operation's visibility requirements. At this stage, | time to reflect the network operation's visibility requirements. | |||
some tasks can be automated, although ultimately human operators | Compared with Stage 1, the changes are frequent based on the real- | |||
will still need to sit in the middle to make decisions. | time feedback. At this stage, some tasks can be automated, but | |||
human operators still need to sit in the middle to make decisions. | ||||
Stage 3 - Closed-loop Telemetry: Human operators are completely | Stage 3 - Closed-loop Telemetry: The telemetry is free from the | |||
excluded from the control loop. The intelligent network operation | interference of human operators, except for generating the | |||
engine automatically issues the telemetry data requests, analyzes | reports. The intelligent network operation engine automatically | |||
the data, and updates the network operations in closed control | issues the telemetry data requests, analyzes the data, and updates | |||
loops. | the network operations in closed control loops. | |||
The most of the existing technologies belong to stage 0 and stage 1. | The most of the existing technologies belong to stage 0 and stage 1. | |||
Individual stage 2 and stage 3 applications are also possible now. | Individual stage 2 and stage 3 applications are also possible now. | |||
However, the future autonomic networks may need a comprehensive | However, the future autonomic networks may need a comprehensive | |||
operation management system which relies on stage 2 and stage 3 | operation management system which relies on stage 2 and stage 3 | |||
telemetry to cover all the network operation tasks. A well-defined | telemetry to cover all the network operation tasks. A well-defined | |||
network telemetry framework is the first step towards this direction. | network telemetry framework is the first step towards this direction. | |||
6. Security Considerations | 6. Security Considerations | |||
The complexity of network telemetry raises significant security | The complexity of network telemetry raises significant security | |||
implications. For example, telemetry data can be manipulated to | implications. For example, telemetry data can be manipulated to | |||
exhaust various network resources at each plane as well as the data | exhaust various network resources at each plane as well as the data | |||
skipping to change at page 25, line 20 ¶ | skipping to change at page 26, line 52 ¶ | |||
6. Security Considerations | 6. Security Considerations | |||
The complexity of network telemetry raises significant security | The complexity of network telemetry raises significant security | |||
implications. For example, telemetry data can be manipulated to | implications. For example, telemetry data can be manipulated to | |||
exhaust various network resources at each plane as well as the data | exhaust various network resources at each plane as well as the data | |||
consumer; falsified or tampered data can mislead the decision making | consumer; falsified or tampered data can mislead the decision making | |||
and paralyze networks; wrong configuration and programming for | and paralyze networks; wrong configuration and programming for | |||
telemetry is equally harmful. | telemetry is equally harmful. | |||
Given that this document has proposed a framework for network | Given that this document has proposed a framework for network | |||
telemetry and the telemetry mechanisms discussed are distinct (in | telemetry and the telemetry mechanisms discussed are more extensive | |||
both message frequency and traffic amount) from the conventional | (in both message frequency and traffic amount) than the conventional | |||
network OAM concepts, we must also reflect that various new security | network OAM concepts, we must also reflect that various new security | |||
considerations may also arise. A number of techniques already exist | considerations may also arise. A number of techniques already exist | |||
for securing the forwarding plane, the control plane, and the | for securing the forwarding plane, the control plane, and the | |||
management plane in a network, but it is important to consider if any | management plane in a network, but it is important to consider if any | |||
new threat vectors are now being enabled via the use of network | new threat vectors are now being enabled via the use of network | |||
telemetry procedures and mechanisms. | telemetry procedures and mechanisms. | |||
Security considerations for networks that use telemetry methods may | Security considerations for networks that use telemetry methods may | |||
include: | include: | |||
skipping to change at page 25, line 45 ¶ | skipping to change at page 27, line 28 ¶ | |||
telemetry capabilities; | telemetry capabilities; | |||
o Protocol transport used telemetry data and inherent security | o Protocol transport used telemetry data and inherent security | |||
capabilities; | capabilities; | |||
o Telemetry data stores, storage encryption and methods of access; | o Telemetry data stores, storage encryption and methods of access; | |||
o Tracking telemetry events and any abnormalities that might | o Tracking telemetry events and any abnormalities that might | |||
identify malicious attacks using telemetry interfaces. | identify malicious attacks using telemetry interfaces. | |||
o Authentication and signing of telemetry data to make data more | ||||
trustworthy. | ||||
Some of the security considerations highlighted above may be | Some of the security considerations highlighted above may be | |||
minimized or negated with policy management of network telemetry. In | minimized or negated with policy management of network telemetry. In | |||
a network telemetry deployment it would be advantageous to separate | a network telemetry deployment it would be advantageous to separate | |||
telemetry capabilities into different classes of policies, i.e., Role | telemetry capabilities into different classes of policies, i.e., Role | |||
Based Access Control and Event-Condition-Action policies. Also, | Based Access Control and Event-Condition-Action policies. Also, | |||
potential conflicts between network telemetry mechanisms must be | potential conflicts between network telemetry mechanisms must be | |||
detected accurately and resolved quickly to avoid unnecessary network | detected accurately and resolved quickly to avoid unnecessary network | |||
telemetry traffic propagation escalating into an unintended or | telemetry traffic propagation escalating into an unintended or | |||
intended denial of service attack. | intended denial of service attack. | |||
Further study of the security issues will be required, and it is | Further study of the security issues will be required, and it is | |||
expected that the secuirty mechanisms and protocols are devloped and | expected that the secuirty mechanisms and protocols are developed and | |||
deployed along with a network telemetry system. | deployed along with a network telemetry system. | |||
7. IANA Considerations | 7. IANA Considerations | |||
This document includes no request to IANA. | This document includes no request to IANA. | |||
8. Contributors | 8. Contributors | |||
The other contributors of this document are listed as follows. | The other contributors of this document are listed as follows. | |||
skipping to change at page 27, line 14 ¶ | skipping to change at page 29, line 8 ¶ | |||
[I-D.ietf-grow-bmp-adj-rib-out] | [I-D.ietf-grow-bmp-adj-rib-out] | |||
Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S. | Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S. | |||
Zhuang, "Support for Adj-RIB-Out in BGP Monitoring | Zhuang, "Support for Adj-RIB-Out in BGP Monitoring | |||
Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-07 (work | Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-07 (work | |||
in progress), August 2019. | in progress), August 2019. | |||
[I-D.ietf-grow-bmp-local-rib] | [I-D.ietf-grow-bmp-local-rib] | |||
Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, | Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, | |||
"Support for Local RIB in BGP Monitoring Protocol (BMP)", | "Support for Local RIB in BGP Monitoring Protocol (BMP)", | |||
draft-ietf-grow-bmp-local-rib-07 (work in progress), May | draft-ietf-grow-bmp-local-rib-08 (work in progress), | |||
2020. | November 2020. | |||
[I-D.ietf-ippm-ioam-data] | [I-D.ietf-ippm-ioam-data] | |||
Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields | Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields | |||
for In-situ OAM", draft-ietf-ippm-ioam-data-10 (work in | for In-situ OAM", draft-ietf-ippm-ioam-data-11 (work in | |||
progress), July 2020. | progress), November 2020. | |||
[I-D.ietf-netconf-distributed-notif] | [I-D.ietf-netconf-distributed-notif] | |||
Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, | Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, | |||
"Subscription to Distributed Notifications", draft-ietf- | "Subscription to Distributed Notifications", draft-ietf- | |||
netconf-distributed-notif-00 (work in progress), October | netconf-distributed-notif-01 (work in progress), November | |||
2020. | 2020. | |||
[I-D.ietf-netconf-udp-notif] | [I-D.ietf-netconf-udp-notif] | |||
Zheng, G., Zhou, T., Graf, T., Francois, P., and P. | Zheng, G., Zhou, T., Graf, T., Francois, P., and P. | |||
Lucente, "UDP-based Transport for Configured | Lucente, "UDP-based Transport for Configured | |||
Subscriptions", draft-ietf-netconf-udp-notif-00 (work in | Subscriptions", draft-ietf-netconf-udp-notif-01 (work in | |||
progress), October 2020. | progress), November 2020. | |||
[I-D.irtf-nmrg-ibn-concepts-definitions] | [I-D.irtf-nmrg-ibn-concepts-definitions] | |||
Clemm, A., Ciavaglia, L., Granville, L., and J. Tantsura, | Clemm, A., Ciavaglia, L., Granville, L., and J. Tantsura, | |||
"Intent-Based Networking - Concepts and Definitions", | "Intent-Based Networking - Concepts and Definitions", | |||
draft-irtf-nmrg-ibn-concepts-definitions-02 (work in | draft-irtf-nmrg-ibn-concepts-definitions-02 (work in | |||
progress), September 2020. | progress), September 2020. | |||
[I-D.kumar-rtgwg-grpc-protocol] | [I-D.kumar-rtgwg-grpc-protocol] | |||
Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC | Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC | |||
Protocol", draft-kumar-rtgwg-grpc-protocol-00 (work in | Protocol", draft-kumar-rtgwg-grpc-protocol-00 (work in | |||
skipping to change at page 28, line 12 ¶ | skipping to change at page 30, line 6 ¶ | |||
(gNMI)", draft-openconfig-rtgwg-gnmi-spec-01 (work in | (gNMI)", draft-openconfig-rtgwg-gnmi-spec-01 (work in | |||
progress), March 2018. | progress), March 2018. | |||
[I-D.pedro-nmrg-anticipated-adaptation] | [I-D.pedro-nmrg-anticipated-adaptation] | |||
Martinez-Julia, P., "Exploiting External Event Detectors | Martinez-Julia, P., "Exploiting External Event Detectors | |||
to Anticipate Resource Requirements for the Elastic | to Anticipate Resource Requirements for the Elastic | |||
Adaptation of SDN/NFV Systems", draft-pedro-nmrg- | Adaptation of SDN/NFV Systems", draft-pedro-nmrg- | |||
anticipated-adaptation-02 (work in progress), June 2018. | anticipated-adaptation-02 (work in progress), June 2018. | |||
[I-D.song-ippm-postcard-based-telemetry] | [I-D.song-ippm-postcard-based-telemetry] | |||
Song, H., Zhou, T., Li, Z., Shin, J., and K. Lee, | Song, H., Zhou, T., Li, Z., Mirsky, G., Shin, J., and K. | |||
"Postcard-based On-Path Flow Data Telemetry", draft-song- | Lee, "Postcard-based On-Path Flow Data Telemetry using | |||
ippm-postcard-based-telemetry-07 (work in progress), April | Packet Marking", draft-song-ippm-postcard-based- | |||
2020. | telemetry-08 (work in progress), October 2020. | |||
[I-D.song-opsawg-dnp4iq] | [I-D.song-opsawg-dnp4iq] | |||
Song, H. and J. Gong, "Requirements for Interactive Query | Song, H. and J. Gong, "Requirements for Interactive Query | |||
with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 | with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 | |||
(work in progress), June 2017. | (work in progress), June 2017. | |||
[I-D.song-opsawg-ifit-framework] | [I-D.song-opsawg-ifit-framework] | |||
Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- | Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- | |||
situ Flow Information Telemetry", draft-song-opsawg-ifit- | situ Flow Information Telemetry", draft-song-opsawg-ifit- | |||
framework-13 (work in progress), October 2020. | framework-13 (work in progress), October 2020. | |||
[I-D.wwx-netmod-event-yang] | [I-D.wwx-netmod-event-yang] | |||
Bierman, A., WU, Q., Bryskin, I., Birkholz, H., Liu, X., | WU, Q., Bryskin, I., Birkholz, H., Liu, X., and B. Claise, | |||
and B. Claise, "A YANG Data model for ECA Policy | "A YANG Data model for ECA Policy Management", draft-wwx- | |||
Management", draft-wwx-netmod-event-yang-09 (work in | netmod-event-yang-10 (work in progress), November 2020. | |||
progress), July 2020. | ||||
[RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, | [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, | |||
"Simple Network Management Protocol (SNMP)", RFC 1157, | "Simple Network Management Protocol (SNMP)", RFC 1157, | |||
DOI 10.17487/RFC1157, May 1990, | DOI 10.17487/RFC1157, May 1990, | |||
<https://www.rfc-editor.org/info/rfc1157>. | <https://www.rfc-editor.org/info/rfc1157>. | |||
[RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. | [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. | |||
Schoenwaelder, Ed., "Structure of Management Information | Schoenwaelder, Ed., "Structure of Management Information | |||
Version 2 (SMIv2)", STD 58, RFC 2578, | Version 2 (SMIv2)", STD 58, RFC 2578, | |||
DOI 10.17487/RFC2578, April 1999, | DOI 10.17487/RFC2578, April 1999, | |||
skipping to change at page 30, line 52 ¶ | skipping to change at page 32, line 47 ¶ | |||
In this non-normative appendix, we provide an overview of some | In this non-normative appendix, we provide an overview of some | |||
existing techniques and standard proposals for each network telemetry | existing techniques and standard proposals for each network telemetry | |||
module. | module. | |||
A.1. Management Plane Telemetry | A.1. Management Plane Telemetry | |||
A.1.1. Push Extensions for NETCONF | A.1.1. Push Extensions for NETCONF | |||
NETCONF [RFC6241] is one popular network management protocol, which | NETCONF [RFC6241] is one popular network management protocol, which | |||
is also recommended by IETF. Although it can be used for data | is also recommended by IETF. Although it can be used for data | |||
collection, NETCONF is good at configurations. YANG Push | collection, NETCONF is good at configurations. YANG Push [RFC8641] | |||
[RFC8639] extends NETCONF and enables subscriber applications to | ||||
request a continuous, customized stream of updates from a YANG | ||||
datastore. Providing such visibility into changes made upon YANG | ||||
configuration and operational objects enables new capabilities based | ||||
on the remote mirroring of configuration and operational state. | ||||
[RFC8641][RFC8639] extends NETCONF and enables subscriber | Moreover, distributed data collection mechanism | |||
applications to request a continuous, customized stream of updates | ||||
from a YANG datastore. Providing such visibility into changes made | ||||
upon YANG configuration and operational objects enables new | ||||
capabilities based on the remote mirroring of configuration and | ||||
operational state. Moreover, distributed data collection mechanism | ||||
[I-D.ietf-netconf-distributed-notif] via UDP based publication | [I-D.ietf-netconf-distributed-notif] via UDP based publication | |||
channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for | channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for | |||
the NETCONF based telemetry. | the NETCONF based telemetry. | |||
A.1.2. gRPC Network Management Interface | A.1.2. gRPC Network Management Interface | |||
gRPC Network Management Interface (gNMI) | gRPC Network Management Interface (gNMI) | |||
[I-D.openconfig-rtgwg-gnmi-spec] is a network management protocol | [I-D.openconfig-rtgwg-gnmi-spec] is a network management protocol | |||
based on the gRPC [I-D.kumar-rtgwg-grpc-protocol] RPC (Remote | based on the gRPC [I-D.kumar-rtgwg-grpc-protocol] RPC (Remote | |||
Procedure Call) framework. With a single gRPC service definition, | Procedure Call) framework. With a single gRPC service definition, | |||
End of changes. 70 change blocks. | ||||
298 lines changed or deleted | 347 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |