draft-ietf-opsawg-ntf-02.txt | draft-ietf-opsawg-ntf-03.txt | |||
---|---|---|---|---|
OPSAWG H. Song, Ed. | OPSAWG H. Song | |||
Internet-Draft Futurewei | Internet-Draft Futurewei | |||
Intended status: Informational F. Qin | Intended status: Informational F. Qin | |||
Expires: April 10, 2020 China Mobile | Expires: October 15, 2020 China Mobile | |||
P. Martinez-Julia | P. Martinez-Julia | |||
NICT | NICT | |||
L. Ciavaglia | L. Ciavaglia | |||
Nokia | Nokia | |||
A. Wang | A. Wang | |||
China Telecom | China Telecom | |||
October 8, 2019 | April 13, 2020 | |||
Network Telemetry Framework | Network Telemetry Framework | |||
draft-ietf-opsawg-ntf-02 | draft-ietf-opsawg-ntf-03 | |||
Abstract | Abstract | |||
Network telemetry is the technology for gaining network insight and | Network telemetry is the technology for gaining network insight and | |||
facilitating efficient and automated network management. It engages | facilitating efficient and automated network management. It engages | |||
various techniques for remote data collection, correlation, and | various techniques for remote data collection, correlation, and | |||
consumption. This document provides an architectural framework for | consumption. This document provides an architectural framework for | |||
network telemetry, motivated by the network operation challenges and | network telemetry, motivated by the network operation challenges and | |||
requirements. As evidenced by some key characteristics and industry | requirements. As evidenced by some key characteristics and industry | |||
practices, network telemetry covers technologies and protocols beyond | practices, network telemetry covers technologies and protocols beyond | |||
the conventional network Operations, Administration, and Management | the conventional network Operations, Administration, and Management | |||
(OAM). It promises better flexibility, scalability, accuracy, | (OAM). It promises better flexibility, scalability, accuracy, | |||
coverage, and performance and allows automated control loops to suit | coverage, and performance and allows automated control loops to suit | |||
both today's and tomorrow's network operation. This document | both today's and tomorrow's network operation. This document | |||
clarifies the terminologies and classifies the modules and components | clarifies the terminologies and classifies the modules and components | |||
of a network telemetry system from several different perspectives. | of a network telemetry system from several different perspectives. | |||
To the best of our knowledge, this document is the first such effort | The framework and taxonomy help to set a common ground for the | |||
for network telemetry in industry standards organizations. The | collection of related work and provide guidance for related technique | |||
framework and taxonomy help to set a common ground for the collection | and standard developments. | |||
of related work and provide guidance for future technique and | ||||
standard developments. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on April 10, 2020. | This Internet-Draft will expire on October 15, 2020. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2020 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
skipping to change at page 2, line 39 ¶ | skipping to change at page 2, line 36 ¶ | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5 | 2.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
2.2. Challenges . . . . . . . . . . . . . . . . . . . . . . . 6 | 2.2. Challenges . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
2.3. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 7 | 2.3. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 8 | 2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 8 | |||
3. The Necessity of a Network Telemetry Framework . . . . . . . 10 | 3. The Necessity of a Network Telemetry Framework . . . . . . . 10 | |||
4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 11 | 4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 11 | |||
4.1. Data Acquiring Mechanisms and Data Types . . . . . . . . 12 | 4.1. Data Acquiring Mechanisms and Data Types . . . . . . . . 12 | |||
4.2. Data Object Modules . . . . . . . . . . . . . . . . . . . 13 | 4.2. Data Object Modules . . . . . . . . . . . . . . . . . . . 13 | |||
4.2.1. Requirements and Challenges for each Module . . . . . 15 | 4.2.1. Requirements and Challenges for each Module . . . . . 16 | |||
4.3. Function Components . . . . . . . . . . . . . . . . . . . 19 | 4.3. Function Components . . . . . . . . . . . . . . . . . . . 19 | |||
4.4. Existing Works Mapped in the Framework . . . . . . . . . 21 | 4.4. Existing Works Mapped in the Framework . . . . . . . . . 21 | |||
5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 22 | 5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 22 | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 23 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 23 | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 | |||
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 | 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 | 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
10. Informative References . . . . . . . . . . . . . . . . . . . 24 | 10. Informative References . . . . . . . . . . . . . . . . . . . 25 | |||
Appendix A. A Survey on Existing Network Telemetry Techniques . 28 | Appendix A. A Survey on Existing Network Telemetry Techniques . 28 | |||
A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 28 | A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 28 | |||
A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 28 | A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 28 | |||
A.1.2. gRPC Network Management Interface . . . . . . . . . . 28 | A.1.2. gRPC Network Management Interface . . . . . . . . . . 28 | |||
A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 29 | A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 29 | |||
A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 29 | A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 29 | |||
A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 29 | A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 29 | |||
A.3.1. The IPFPM technology . . . . . . . . . . . . . . . . 29 | A.3.1. The IPFPM technology . . . . . . . . . . . . . . . . 29 | |||
A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 30 | A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 30 | |||
A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 31 | A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 31 | |||
A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 31 | A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 31 | |||
A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 31 | A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 31 | |||
A.4. External Data and Event Telemetry . . . . . . . . . . . . 31 | A.4. External Data and Event Telemetry . . . . . . . . . . . . 32 | |||
A.4.1. Sources of External Events . . . . . . . . . . . . . 32 | A.4.1. Sources of External Events . . . . . . . . . . . . . 32 | |||
A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 33 | A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 33 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
1. Introduction | 1. Introduction | |||
Network visibility is the ability of management tools to see the | Network visibility is the ability of management tools to see the | |||
state and behavior of a network. It is essential for successful | state and behavior of a network. It is essential for successful | |||
network operation. Network telemetry is the process of measuring, | network operation. Network telemetry is the process of measuring, | |||
correlating, recording, and distributing information about the | correlating, recording, and distributing information about the | |||
behavior of a network. Network telemetry has been considered as an | behavior of a network. Network telemetry has been considered as an | |||
ideal means to gain sufficient network visibility with better | ideal means to gain sufficient network visibility with better | |||
flexibility, scalability, accuracy, coverage, and performance than | flexibility, scalability, accuracy, coverage, and performance than | |||
some conventional network Operations, Administration, and Management | some conventional network Operations, Administration, and Management | |||
(OAM) techniques. | (OAM) techniques. | |||
However, so far the term of network telemetry lacks a solid and | However, the term of network telemetry lacks a solid and unambiguous | |||
unambiguous definition. The scope and coverage of it cause confusion | definition. The scope and coverage of it cause confusion and | |||
and misunderstandings. It is beneficial to clarify the concept and | misunderstandings. It is beneficial to clarify the concept and | |||
provide a clear architectural framework for network telemetry, so we | provide a clear architectural framework for network telemetry, so we | |||
can articulate the technical field, and better align the related | can articulate the technical field, and better align the related | |||
techniques and standard works. | techniques and standard works. | |||
To fulfill such an undertaking, we first discuss some key | To fulfill such an undertaking, we first discuss some key | |||
characteristics of network telemetry which set a clear distinction | characteristics of network telemetry which set a clear distinction | |||
from the conventional network OAM and show that some conventional OAM | from the conventional network OAM and show that some conventional OAM | |||
technologies can be considered a subset of the network telemetry | technologies can be considered a subset of the network telemetry | |||
technologies. We then provide an architectural framework from three | technologies. We then provide an architectural framework for network | |||
different perspectives for network telemetry. We show how network | telemetry from three different perspectives. We show how network | |||
telemetry can meet the current and future network operation | telemetry can meet the current and future network operation | |||
requirements, and the challenges each telemetry module is facing. | requirements, and the challenges each telemetry module is facing. | |||
Based on the distinction of modules and function components, we can | Based on the distinction of modules and function components, we can | |||
easily map the existing and emerging techniques and protocols into | map the existing and emerging techniques and protocols into the | |||
the framework. At last, we outline a road-map for the evolution of | framework. At last, we outline a road-map for the evolution of the | |||
the network telemetry system and discuss the potential security | network telemetry system and discuss the potential security concerns | |||
concerns for network telemetry. | for network telemetry. | |||
The purpose of the framework and taxonomy is to set a common ground | The purpose of the framework and taxonomy is to set a common ground | |||
for the collection of related work and provide guidance for future | for the collection of related work and provide guidance for future | |||
technique and standard developments. To the best of our knowledge, | technique and standard developments. To the best of our knowledge, | |||
this document is the first such effort for network telemetry in | this document is the first such effort for network telemetry in | |||
industry standards organizations. | industry standards organizations. | |||
2. Motivation | 2. Motivation | |||
The term of Big data is used to describe the extremely large volume | The term "big data" is used to describe the extremely large volume of | |||
of data sets that can be analyzed computationally to reveal patterns, | data sets that can be analyzed computationally to reveal patterns, | |||
trends, and associations. Network is undoubtedly a source of big | trends, and associations. Network is undoubtedly a source of big | |||
data because of its scale and all the traffic goes through it. It is | data because of its scale and all the traffic goes through it. It is | |||
easy to see that network OAM can benefit from network big data. | easy to see that network OAM can benefit from network big data. | |||
Today one can easily access advanced big data analytics capability | Today one can access advanced big data analytics capability through a | |||
through a plethora of commercial and open source platforms (e.g., | plethora of commercial and open source platforms (e.g., Apache | |||
Apache Hadoop), tools (e.g., Apache Spark), and techniques (e.g., | Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine | |||
machine learning). Thanks to the advance of computing and storage | learning). Thanks to the advance of computing and storage | |||
technologies, network big data analytics gives network operators an | technologies, network big data analytics gives network operators an | |||
unprecedented opportunity to gain network insights and move towards | opportunity to gain network insights and move towards network | |||
network autonomy. Some operators start to explore the application of | autonomy. Some operators start to explore the application of | |||
Artificial Intelligence (AI) to make sense of network data. Software | Artificial Intelligence (AI) to make sense of network data. Software | |||
tools can use the network data to detect and react on network faults, | tools can use the network data to detect and react on network faults, | |||
anomalies, and policy violations, as well as predicting future | anomalies, and policy violations, as well as predicting future | |||
events. In turn, the network policy updates for planning, intrusion | events. In turn, the network policy updates for planning, intrusion | |||
prevention, optimization, and self-healing may be applied. | prevention, optimization, and self-healing may be applied. | |||
It is conceivable that an intent-driven autonomic network [RFC7575] | It is conceivable that an intent-driven autonomic network [RFC7575] | |||
is the logical next step for network evolution following Software | is the logical next step for network evolution following Software | |||
Defined Network (SDN), aiming to reduce (or even eliminate) human | Defined Network (SDN), aiming to reduce (or even eliminate) human | |||
labor, make the most efficient usage of network resources, and | labor, make more efficient use of network resources, and provide | |||
provide better services more aligned with customer requirements. | better services more aligned with customer requirements. Although it | |||
Although it takes time to reach the ultimate goal, the journey has | takes time to reach the ultimate goal, the journey has started | |||
started nevertheless. | nevertheless. | |||
However, while the data processing capability is improved and | However, while the data processing capability is improved and | |||
applications are hungry for more data, the networks lag behind in | applications are hungry for more data, the networks lag behind in | |||
extracting and translating network data into useful and actionable | extracting and translating network data into useful and actionable | |||
information. The system bottleneck is shifting from data consumption | information in efficient ways. The system bottleneck is shifting | |||
to data supply. Both the number of network nodes and the traffic | from data consumption to data supply. Both the number of network | |||
bandwidth keep increasing at a fast pace. The network configuration | nodes and the traffic bandwidth keep increasing at a fast pace. The | |||
and policy change at a much smaller time slot than ever before. More | network configuration and policy change at smaller time slots than | |||
subtle events and fine-grained data through all network planes need | before. More subtle events and fine-grained data through all network | |||
to be captured and exported in real time. In a nutshell, it is a | planes need to be captured and exported in real time. In a nutshell, | |||
challenge to get enough high-quality data out of network efficiently, | it is a challenge to get enough high-quality data out of network | |||
timely, and flexibly. Therefore, we need to examine the existing | efficiently, timely, and flexibly. Therefore, we need to examine the | |||
network technologies and protocols, and identify any potential | existing network technologies and protocols, and identify any | |||
technique and standard gaps based on the real network and device | potential technique and standard gaps based on the real network and | |||
architectures. | device architectures. | |||
In the remaining of this section, first we discuss several key use | In the remaining of this section, first we discuss several key use | |||
cases for today's and future network operations. Next, we show why | cases for today's and future network operations. Next, we show why | |||
the current network OAM techniques and protocols are insufficient for | the current network OAM techniques and protocols are insufficient for | |||
these use cases. The discussion underlines the need of new methods, | these use cases. The discussion underlines the need of new methods, | |||
techniques, and protocols which we may assign under an umbrella term | techniques, and protocols which we assign under an umbrella term - | |||
- network telemetry. | network telemetry. | |||
2.1. Use Cases | 2.1. Use Cases | |||
These use cases are essential for network operations. While the list | These use cases are essential for network operations. While the list | |||
is by no means exhaustive, it is enough to highlight the requirements | is by no means exhaustive, it is enough to highlight the requirements | |||
for data velocity, variety, volume, and veracity in networks. | for data velocity, variety, volume, and veracity in networks. | |||
Policy and Intent Compliance: Network policies are the rules that | Policy and Intent Compliance: Network policies are the rules that | |||
constraint the services for network access, provide service | constraint the services for network access, provide service | |||
differentiation, or enforce specific treatment on the traffic. | differentiation, or enforce specific treatment on the traffic. | |||
For example, a service function chain is a policy that requires | For example, a service function chain is a policy that requires | |||
the selected flows to pass through a set of ordered network | the selected flows to pass through a set of ordered network | |||
functions. An intent is a high-level abstract policy which | functions. An intent is a high-level abstract policy which | |||
requires a complex translation and mapping process before being | requires a complex translation and mapping process before being | |||
applied on networks. While a policy is enforced, the compliance | applied on networks. While a policy is enforced, the compliance | |||
needs to be verified and monitored continuously. | needs to be verified and monitored continuously, and any violation | |||
needs to be reported immediately. | ||||
SLA Compliance: A Service-Level Agreement (SLA) defines the level of | SLA Compliance: A Service-Level Agreement (SLA) defines the level of | |||
service a user expects from a network operator, which include the | service a user expects from a network operator, which include the | |||
metrics for the service measurement and remedy/penalty procedures | metrics for the service measurement and remedy/penalty procedures | |||
when the service level misses the agreement. Users need to check | when the service level misses the agreement. Users need to check | |||
if they get the service as promised and network operators need to | if they get the service as promised and network operators need to | |||
evaluate how they can deliver the services that can meet the SLA. | evaluate how they can deliver the services that can meet the SLA | |||
based on realtime network measurement. | ||||
Root Cause Analysis: Any network failure can be the cause or effect | Root Cause Analysis: Any network failure can be the cause or effect | |||
of a sequence of chained events. Troubleshooting and recovery | of a sequence of chained events. Troubleshooting and recovery | |||
require quick identification of the root cause of any observable | require quick identification of the root cause of any observable | |||
issues. However, the root cause is not always straightforward to | issues. However, the root cause is not always straightforward to | |||
identify, especially when the failure is sporadic and the related | identify, especially when the failure is sporadic and the related | |||
and unrelated events are overwhelming. While machine learning | and unrelated events are overwhelming and interleaved. While | |||
technologies can be used for root cause analysis, it up to the | machine learning technologies can be used for root cause analysis, | |||
network to sense and provide all the relevant data. | it up to the network to sense and provide the relevant data. | |||
Network Optimization: This covers all short-term and long-term | Network Optimization: This covers all short-term and long-term | |||
network optimization techniques, including load balancing, Traffic | network optimization techniques, including load balancing, Traffic | |||
Engineering (TE), and network planning. Network operators are | Engineering (TE), and network planning. Network operators are | |||
motivated to optimize their network utilization and differentiate | motivated to optimize their network utilization and differentiate | |||
services for better Return On Investment (ROI) or lower Capital | services for better Return On Investment (ROI) or lower Capital | |||
Expenditures (CAPEX). The first step is to know the real-time | Expenditures (CAPEX). The first step is to know the real-time | |||
network conditions before applying policies for traffic | network conditions before applying policies for traffic | |||
manipulation. In some cases, micro-bursts need to be detected in | manipulation. In some cases, micro-bursts need to be detected in | |||
a very short time-frame so that fine-grained traffic control can | a very short time-frame so that fine-grained traffic control can | |||
be applied to avoid network congestion. The long-term network | be applied to avoid network congestion. The long-term network | |||
capacity planning and topology augmentation also rely on the | capacity planning and topology augmentation rely on the | |||
accumulated data of the network operations. | accumulated data of network operations. | |||
Event Tracking and Prediction: The visibility of user traffic path | Event Tracking and Prediction: The visibility of traffic path and | |||
and performance is critical for healthy network operation. | performance is critical for services and applications that rely on | |||
Numerous related network events are of interest to network | healthy network operation. Numerous related network events are of | |||
operators. For example, Network operators always want to learn | interest to network operators. For example, Network operators | |||
where and why packets are dropped for an application flow. They | want to learn where and why packets are dropped for an application | |||
also want to be warned of issues in advance so proactive actions | flow. They also want to be warned of issues in advance so | |||
can be taken to avoid catastrophic consequences. | proactive actions can be taken to avoid catastrophic consequences. | |||
2.2. Challenges | 2.2. Challenges | |||
For a long time, network operators have relied upon SNMP [RFC3416], | For a long time, network operators have relied upon SNMP [RFC3416], | |||
Command-Line Interface (CLI), or Syslog to monitor the network. Some | Command-Line Interface (CLI), or Syslog to monitor the network. Some | |||
other OAM techniques as described in [RFC7276] are also used to | other OAM techniques as described in [RFC7276] are also used to | |||
facilitate network troubleshooting. These conventional techniques | facilitate network troubleshooting. These conventional techniques | |||
are not sufficient to support the above use cases for the following | are not sufficient to support the above use cases for the following | |||
reasons: | reasons: | |||
o Most use cases need to continuously monitor the network and | o Most use cases need to continuously monitor the network and | |||
dynamically refine the data collection in real-time and | dynamically refine the data collection in real-time. The poll- | |||
interactively. The poll-based low-frequency data collection is | based low-frequency data collection is ill-suited for these | |||
ill-suited for these applications. Subscription-based streaming | applications. Subscription-based streaming data directly pushed | |||
data directly pushed from the data source (e.g., the forwarding | from the data source (e.g., the forwarding chip) is preferred to | |||
chip) is preferred to provide enough data quantity and precision | provide enough data quantity and precision at scale. | |||
at scale. | ||||
o Comprehensive data is needed from packet processing engine to | o Comprehensive data is needed from packet processing engine to | |||
traffic manager, from line cards to main control board, from user | traffic manager, from line cards to main control board, from user | |||
flows to control protocol packets, from device configurations to | flows to control protocol packets, from device configurations to | |||
operations, and from physical layer to application layer. | operations, and from physical layer to application layer. | |||
Conventional OAM only covers a narrow range of data (e.g., SNMP | Conventional OAM only covers a narrow range of data (e.g., SNMP | |||
only handles data from the Management Information Base (MIB)). | only handles data from the Management Information Base (MIB)). | |||
Traditional network devices cannot provide all the necessary | Traditional network devices cannot provide all the necessary | |||
probes. An open and programmable network device is therefore | probes. More open and programmable network devices are therefore | |||
needed. | needed. | |||
o Many application scenarios need to correlate network-wide data | o Many application scenarios need to correlate network-wide data | |||
from multiple sources (i.e., from distributed network devices, | from multiple sources (i.e., from distributed network devices, | |||
different components of a network device, or different network | different components of a network device, or different network | |||
planes). A piecemeal solution is often lacking the capability to | planes). A piecemeal solution is often lacking the capability to | |||
consolidate the data from multiple sources. The composition of a | consolidate the data from multiple sources. The composition of a | |||
complete solution, as partly proposed by Autonomic Resource | complete solution, as partly proposed by Autonomic Resource | |||
Control Architecture(ARCA) | Control Architecture(ARCA) | |||
[I-D.pedro-nmrg-anticipated-adaptation], will be empowered and | [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and | |||
guided by a comprehensive framework. | guided by a comprehensive framework. | |||
o Some of the conventional OAM techniques (e.g., CLI and Syslog) | o Some of the conventional OAM techniques (e.g., CLI and Syslog) | |||
lack a formal data model. The unstructured data hinder the tool | lack a formal data model. The unstructured data hinder the tool | |||
automation and application extensibility. Standardized data | automation and application extensibility. Standardized data | |||
models are essential to support the programmable networks. | models are essential to support the programmable networks. | |||
o Although some conventional OAM techniques support data push (e.g., | o Although some conventional OAM techniques support data push (e.g., | |||
SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data | SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data | |||
are limited to only predefined management plane warnings (e.g., | are limited to only predefined management plane warnings (e.g., | |||
SNMP Trap) or sampled user packets (e.g., sFlow). We require the | SNMP Trap) or sampled user packets (e.g., sFlow). Network | |||
data with arbitrary source, granularity, and precision which are | operators require the data with arbitrary source, granularity, and | |||
beyond the capability of the existing techniques. | precision which are beyond the capability of the existing | |||
techniques. | ||||
o The conventional passive measurement techniques can either consume | o The conventional passive measurement techniques can either consume | |||
too much network resources and render too much redundant data, or | excessive network resources and render excessive redundant data, | |||
lead to inaccurate results; the conventional active measurement | or lead to inaccurate results; on the other hand, the conventional | |||
techniques can interfere with the user traffic and their results | active measurement techniques can interfere with the user traffic | |||
are indirect. We need techniques that can collect direct and on- | and their results are indirect. Techniques that can collect | |||
demand data from user traffic. | direct and on-demand data from user traffic are more favorable. | |||
2.3. Glossary | 2.3. Glossary | |||
Before further discussion, we list some key terminology and acronyms | Before further discussion, we list some key terminology and acronyms | |||
used in this documents. We make an intended distinction between | used in this documents. We make an intended distinction between | |||
network telemetry and network OAM. | network telemetry and network OAM. | |||
AI: Artificial Intelligence. In network domain, AI refers to the | AI: Artificial Intelligence. In network domain, AI refers to the | |||
machine-learning based technologies for automated network | machine-learning based technologies for automated network | |||
operation and other tasks. | operation and other tasks. | |||
skipping to change at page 8, line 14 ¶ | skipping to change at page 8, line 17 ¶ | |||
IOAM: In-situ OAM, a dataplane on-path telemetry technique. | IOAM: In-situ OAM, a dataplane on-path telemetry technique. | |||
NETCONF: Network Configuration Protocol, specified in [RFC6241]. | NETCONF: Network Configuration Protocol, specified in [RFC6241]. | |||
Network Telemetry: Acquiring and processing network data remotely | Network Telemetry: Acquiring and processing network data remotely | |||
for network monitoring and operation. A general term for a large | for network monitoring and operation. A general term for a large | |||
set of network visibility techniques and protocols, with the | set of network visibility techniques and protocols, with the | |||
characteristics defined in this document. Network telemetry | characteristics defined in this document. Network telemetry | |||
addresses the current network operation issues and enables smooth | addresses the current network operation issues and enables smooth | |||
evolution toward intent-driven autonomous networks. | evolution toward future intent-driven autonomous networks. | |||
NMS: Network Management System, referring to applications that allow | NMS: Network Management System, referring to applications that allow | |||
network administrators manage a network's software and hardware | network administrators manage a network's software and hardware | |||
components. It usually records data from a network's remote | components. It usually records data from a network's remote | |||
points to carry out central reporting to a system administrator. | points to carry out central reporting to a system administrator. | |||
OAM: Operations, Administration, and Maintenance. A group of | OAM: Operations, Administration, and Maintenance. A group of | |||
network management functions that provide network fault | network management functions that provide network fault | |||
indication, fault localization, performance information, and data | indication, fault localization, performance information, and data | |||
and diagnosis functions. Most conventional network monitoring | and diagnosis functions. Most conventional network monitoring | |||
skipping to change at page 9, line 10 ¶ | skipping to change at page 9, line 13 ¶ | |||
The representative techniques and protocols include IPFIX [RFC7011] | The representative techniques and protocols include IPFIX [RFC7011] | |||
and gPRC [grpc]. Network telemetry allows separate entities to | and gPRC [grpc]. Network telemetry allows separate entities to | |||
acquire data from network devices so that data can be visualized and | acquire data from network devices so that data can be visualized and | |||
analyzed to support network monitoring and operation. Network | analyzed to support network monitoring and operation. Network | |||
telemetry overlaps with the conventional network OAM and has a wider | telemetry overlaps with the conventional network OAM and has a wider | |||
scope than it. It is expected that network telemetry can provide the | scope than it. It is expected that network telemetry can provide the | |||
necessary network insight for autonomous networks and address the | necessary network insight for autonomous networks and address the | |||
shortcomings of conventional OAM techniques. | shortcomings of conventional OAM techniques. | |||
One difference between the network telemetry and the network OAM is | One difference between the network telemetry and the network OAM is | |||
that the network telemetry assumes machines as data consumer rather | that in general the network telemetry assumes machines as data | |||
than human operators. Hence, the network telemetry can directly | consumer rather than human operators. Hence, the network telemetry | |||
trigger the automated network operation, while the conventional OAM | can directly trigger the automated network operation, while the | |||
tools usually help human operators to monitor and diagnose the | conventional OAM tools usually help human operators to monitor and | |||
networks and guide manual network operations. The difference leads | diagnose the networks and guide manual network operations. The | |||
to very different techniques. | difference leads to very different techniques. | |||
Although the network telemetry techniques are just emerging and | Although the network telemetry techniques are just emerging and | |||
subject to continuous evolution, several characteristics of network | subject to continuous evolution, several characteristics of network | |||
telemetry have been well accepted (Note that network telemetry is | telemetry have been well accepted. Note that network telemetry is | |||
intended to be an umbrella term covering a wide spectrum of | intended to be an umbrella term covering a wide spectrum of | |||
techniques, so the following characteristics are not expected to be | techniques, so the following characteristics are not expected to be | |||
held by every specific technique): | held by every specific technique. | |||
o Push and Streaming: Instead of polling data from network devices, | o Push and Streaming: Instead of polling data from network devices, | |||
the telemetry collector subscribes to the streaming data pushed | the telemetry collector subscribes to the streaming data pushed | |||
from data sources in network devices. | from data sources in network devices. | |||
o Volume and Velocity: The telemetry data is intended to be consumed | o Volume and Velocity: The telemetry data is intended to be consumed | |||
by machines rather than by human being. Therefore, the data | by machines rather than by human being. Therefore, the data | |||
volume is huge and the processing is often in realtime. | volume is huge and the processing is often in realtime. | |||
o Normalization and Unification: Telemetry aims to address the | o Normalization and Unification: Telemetry aims to address the | |||
skipping to change at page 10, line 8 ¶ | skipping to change at page 10, line 11 ¶ | |||
used in a closed control loop for network automation, it needs to | used in a closed control loop for network automation, it needs to | |||
run continuously and adapt to the dynamic and interactive queries | run continuously and adapt to the dynamic and interactive queries | |||
from the network operation controller. | from the network operation controller. | |||
In addition, an ideal network telemetry solution may also have the | In addition, an ideal network telemetry solution may also have the | |||
following features or properties: | following features or properties: | |||
o In-Network Customization: The data can be customized in network at | o In-Network Customization: The data can be customized in network at | |||
run-time to cater to the specific need of applications. This | run-time to cater to the specific need of applications. This | |||
needs the support of a programmable data plane which allows probes | needs the support of a programmable data plane which allows probes | |||
to be deployed at flexible locations. | with custom functions to be deployed at flexible locations. | |||
o In-Network Data Aggregation and Correlation: Network devices and | o In-Network Data Aggregation and Correlation: Network devices and | |||
aggregation points can work out which events and what data needs | aggregation points can work out which events and what data needs | |||
to be stored, reported, or discarded thus reducing the load on the | to be stored, reported, or discarded thus reducing the load on the | |||
central collection and processing points while still ensuring that | central collection and processing points while still ensuring that | |||
the right information is ready to be processed in a timely way. | the right information is ready to be processed in a timely way. | |||
o In-Network Processing and Action: Sometimes it is not necessary or | o In-Network Processing and Action: Sometimes it is not necessary or | |||
feasible to gather all information to a central point so that it | feasible to gather all information to a central point to be | |||
can be processed and acted upon. It is possible for the data | processed and acted upon. It is possible for the data processing | |||
processing to be done in the network, and actions taken more | to be done in network, and actions to be taken locally. | |||
locally and more responsively. | ||||
o Direct Data Plane Export: The data originated from data plane can | o Direct Data Plane Export: The data originated from the data plane | |||
be directly exported to the data consumer for efficiency, | forwarding chips can be directly exported to the data consumer for | |||
especially when the data bandwidth is large and the real-time | efficiency, especially when the data bandwidth is large and the | |||
processing is required. | real-time processing is required. | |||
o In-band Data Collection: In addition to the passive and active | o In-band Data Collection: In addition to the passive and active | |||
data collection approaches, the new hybrid approach allows to | data collection approaches, the new hybrid approach allows to | |||
directly collect data for any target flow on its entire forwarding | directly collect data for any target flow on its entire forwarding | |||
path. | path [I-D.song-opsawg-ifit-framework]. | |||
It is worth noting that, no matter how sophisticated a network | It is worth noting that, a network telemetry system should not be | |||
telemetry system is, it should not be intrusive to networks, by | intrusive to normal network operations, by avoiding the pitfall of | |||
avoiding the pitfall of the "observer effect". That is, it should | the "observer effect". That is, it should not change the network | |||
not change the network behavior and affect the forwarding | behavior and affect the forwarding performance. Otherwise, the whole | |||
performance. | purpose of network telemetry is defied. | |||
Although in many cases a network telemetry system is akin to the SDN | Although in many cases a network telemetry system is akin to the SDN | |||
architecture, it is important to understand that network telemetry | architecture, it is important to understand that network telemetry | |||
does not infer the need of any centralized data processing and | does not infer the need of any centralized data processing and | |||
analytics engine. Telemetry data producers and consumers can | analytics engine. Telemetry data producers and consumers can | |||
perfectly work in distributed or peer-to-peer fashions instead. | perfectly work in distributed or peer-to-peer fashions instead. | |||
3. The Necessity of a Network Telemetry Framework | 3. The Necessity of a Network Telemetry Framework | |||
Big data analytics and machine-learning based AI technologies are | Big data analytics and machine-learning based AI technologies are | |||
skipping to change at page 11, line 21 ¶ | skipping to change at page 11, line 22 ¶ | |||
consolidated into a minimum yet comprehensive set. A telemetry | consolidated into a minimum yet comprehensive set. A telemetry | |||
framework can help to normalize the technique developments. | framework can help to normalize the technique developments. | |||
o Network visibility presents multiple viewpoints. For example, the | o Network visibility presents multiple viewpoints. For example, the | |||
device viewpoint takes the network infrastructure as the | device viewpoint takes the network infrastructure as the | |||
monitoring object from which the network topology and device | monitoring object from which the network topology and device | |||
status can be acquired; the traffic viewpoint takes the flows or | status can be acquired; the traffic viewpoint takes the flows or | |||
packets as the monitoring object from which the traffic quality | packets as the monitoring object from which the traffic quality | |||
and path can be acquired. An application may need to switch its | and path can be acquired. An application may need to switch its | |||
viewpoint during operation. It may also need to correlate a | viewpoint during operation. It may also need to correlate a | |||
service and impact on network experience to acquire the | service and its impact on network experience to acquire the | |||
comprehensive information. | comprehensive information. | |||
o Applications require network telemetry to be elastic in order to | o Applications require network telemetry to be elastic in order to | |||
efficiently use the network resource and reduce the performance | efficiently use the network resource and reduce the performance | |||
impact. Routine network monitoring covers the entire network with | impact. Routine network monitoring covers the entire network with | |||
low data sampling rate. When issues arise or trends emerge, the | low data sampling rate. When issues arise or trends emerge, the | |||
telemetry data source can be modified and the data rate can be | telemetry data source can be modified and the data rate can be | |||
boosted. | boosted. | |||
o Efficient data fusion is critical for applications to reduce the | o Efficient data fusion is critical for applications to reduce the | |||
overall quantity of data and improve the accuracy of analysis. | overall quantity of data and improve the accuracy of analysis. | |||
A telemetry framework collects together all of the telemetry-related | A telemetry framework collects together all of the telemetry-related | |||
work from different sources and working groups within the IETF. This | works from different sources and working groups within IETF. This | |||
makes it possible to assemble a comprehensive network telemetry | makes it possible to assemble a comprehensive network telemetry | |||
system and to avoid repetitious or redundant work. The framework | system and to avoid repetitious or redundant work. The framework | |||
should cover the concepts and components from the standardization | should cover the concepts and components from the standardization | |||
perspective. This document clarifies the layered modules on which | perspective. This document clarifies the layered modules on which | |||
the telemetry is exerted and decomposes the telemetry system into a | the telemetry is exerted and decomposes the telemetry system into a | |||
set of distinct components that the existing and future work can | set of distinct components that the existing and future work can | |||
easily map to. | easily map to. | |||
4. Network Telemetry Framework | 4. Network Telemetry Framework | |||
skipping to change at page 12, line 14 ¶ | skipping to change at page 12, line 14 ¶ | |||
4.1. Data Acquiring Mechanisms and Data Types | 4.1. Data Acquiring Mechanisms and Data Types | |||
Broadly speaking, network data can be acquired through subscription | Broadly speaking, network data can be acquired through subscription | |||
(push) and query (poll). A subscriber may request data when it is | (push) and query (poll). A subscriber may request data when it is | |||
ready. It follows a Publish-Subscription (Pub-Sub) mode or a | ready. It follows a Publish-Subscription (Pub-Sub) mode or a | |||
Subscription-Publish (Sub-Pub) mode. In the Pub-Sub mode, pre- | Subscription-Publish (Sub-Pub) mode. In the Pub-Sub mode, pre- | |||
defined data are published and multiple qualified subscribers can | defined data are published and multiple qualified subscribers can | |||
subscribe the data. In the Sub-Pub mode, a subscriber designates | subscribe the data. In the Sub-Pub mode, a subscriber designates | |||
what data are of interest and demands the network devices to deliver | what data are of interest and demands the network devices to deliver | |||
the data when they are available. | the data when available. | |||
In contrast, a querier expects immediate feedback from network | In contrast, query is used when a querier expects immediate feedback | |||
devices. It is usually used in a more interactive environment. The | from network devices. The queried data may be directly extracted | |||
queried data may be directly extracted from some specific data | from some specific data source, or synthesized and processed from raw | |||
source, or synthesized and processed from raw data. | data. Query suits for interactive network telemetry applications. | |||
There are four types of data from network devices: | There are four types of data from network devices: | |||
Simple Data: The data that are steadily available from some data | Simple Data: The data that are steadily available from some data | |||
store or static probes in network devices. such data can be | store or static probes in network devices. such data can be | |||
specified by YANG model. | specified by YANG model. | |||
Complex Data: The data need to be synthesized or processed from raw | Complex Data: The data need to be synthesized or processed in | |||
data from one or more network devices. The data processing | network from raw data from one or more network devices. The data | |||
function can be statically or dynamically loaded into network | processing function can be statically or dynamically loaded into | |||
devices. | network devices. | |||
Event-triggered Data: The data are conditionally acquired based on | Event-triggered Data: The data are conditionally acquired based on | |||
the occurrence of some event. An event can be modeled as a Finite | the occurrence of some events. An event can be modeled as a | |||
State Machine (FSM). | Finite State Machine (FSM). | |||
Streaming Data: The data are continuously or periodically generated. | Streaming Data: The data are continuously or periodically generated. | |||
It can be time series or the dump of databases. The streaming | It can be time series or the dump of databases. The streaming | |||
data reflect realtime network states and metrics and require large | data reflect realtime network states and metrics and require large | |||
bandwidth and processing power. | bandwidth and processing power. | |||
The above data types are not mutually exclusive. For example, event- | The above data types are not mutually exclusive. For example, event- | |||
triggered data can be simple or complex, and streaming data can be | triggered data can be simple or complex, and streaming data can be | |||
event triggered. The relationships of these data types are | event triggered. The relationships of these data types are | |||
illustrated in Figure 1 | illustrated in Figure 1. | |||
+--------------------------+ | ||||
| +----------------------+ | | +--------------------------+ | |||
| | +-----------------+ | | | | +----------------------+ | | |||
| | | +-------------+ | | | | | | +-----------------+ | | | |||
| | | | Simple Data | | | | | | | | +-------------+ | | | | |||
| | | +-------------+ | | | | | | | | Simple Data | | | | | |||
| | | Complex Data | | | | | | | +-------------+ | | | | |||
| | +-----------------+ | | | | | | Complex Data | | | | |||
| | Event-triggered Data | | | | | +-----------------+ | | | |||
| +----------------------+ | | | | Event-triggered Data | | | |||
| Streaming Data | | | +----------------------+ | | |||
+--------------------------+ | | Streaming Data | | |||
+--------------------------+ | ||||
Figure 1: Data Type Relationship | Figure 1: Data Type Relationship | |||
Subscription usually deals with event-triggered data and streaming | Subscription usually deals with event-triggered data and streaming | |||
data, and query usually deals with simple data and complex data. It | data, and query usually deals with simple data and complex data. The | |||
is easy to see that conventional OAM techniques are mostly about | conventional OAM techniques are mostly about querying simple data. | |||
querying simple data only. While these techniques are still useful, | While these techniques are still useful, more advanced network | |||
advanced network telemetry techniques pay more attention on the other | telemetry techniques are designed mainly for event-triggered or | |||
three data types, and prefer event/streaming data subscription and | streaming data subscription, and complex data query. | |||
complex data query over simple data query. | ||||
4.2. Data Object Modules | 4.2. Data Object Modules | |||
Telemetry can be applied on the forwarding plane, the control plane, | Telemetry can be applied on the forwarding plane, the control plane, | |||
and the management plane in a network, as well as other sources out | and the management plane in a network, as well as other sources out | |||
of the network, as shown in Figure 2. Therefore, we categorize the | of the network, as shown in Figure 2. Therefore, we categorize the | |||
network telemetry into four distinct modules with each having its own | network telemetry into four distinct modules with each having its own | |||
interface to Network Operation Applications. | interface to Network Operation Applications. | |||
+------------------------------+ | +------------------------------+ | |||
skipping to change at page 14, line 37 ¶ | skipping to change at page 14, line 37 ¶ | |||
Figure 2: Modules in Layer Category of NTF | Figure 2: Modules in Layer Category of NTF | |||
The rationale of this partition lies in the different telemetry data | The rationale of this partition lies in the different telemetry data | |||
objects which result in different data source and export locations. | objects which result in different data source and export locations. | |||
Such differences have profound implications on in-network data | Such differences have profound implications on in-network data | |||
programming and processing capability, data encoding and transport | programming and processing capability, data encoding and transport | |||
protocol, and data bandwidth and latency. | protocol, and data bandwidth and latency. | |||
We summarize the major differences of the four modules in the | We summarize the major differences of the four modules in the | |||
following table. They are mainly compared from six aspects: data | following table. They are compared from six aspects: data object, | |||
object, data export location, data model, data encoding, telemetry | data export location, data model, data encoding, telemetry protocol, | |||
protocol, and transport method. Data object is the target and source | and transport method. Data object is the target and source of each | |||
of each module. Because the data source varies, the data export | module. Because the data source varies, the data export location | |||
location varies. Because each data export location has different | varies. Because each data export location has different capability, | |||
capability, the proper data model, encoding, and transport method | the proper data model, encoding, and transport method cannot be kept | |||
cannot be kept the same. As a result, the suitable telemetry | the same. As a result, the suitable telemetry protocol for each | |||
protocol for each module can be different. Some representative | module can be different. Some representative techniques are shown in | |||
techniques are shown in some table blocks to highlight the technical | the corresponding table blocks to highlight the technical diversity | |||
diversity of these modules. One cannot expect to use a universal | of these modules. The key point is that one cannot expect to use a | |||
protocol to cover all the network telemetry requirements. | universal protocol to cover all the network telemetry requirements. | |||
+---------+--------------+--------------+--------------+-----------+ | +---------+--------------+--------------+--------------+-----------+ | |||
| Module | Control | Management | Forwarding | External | | | Module | Control | Management | Forwarding | External | | |||
| | Plane | Plane | Plane | Data | | | | Plane | Plane | Plane | Data | | |||
+---------+--------------+--------------+--------------+-----------+ | +---------+--------------+--------------+--------------+-----------+ | |||
|Object | control | config. & | flow & packet| terminal, | | |Object | control | config. & | flow & packet| terminal, | | |||
| | protocol & | operation | QoS, traffic | social & | | | | protocol & | operation | QoS, traffic | social & | | |||
| | signaling, | state, MIB | stat., buffer| environ- | | | | signaling, | state, MIB | stat., buffer| environ- | | |||
| | RIB, ACL | | & queue stat.| mental | | | | RIB, ACL | | & queue stat.| mental | | |||
+---------+--------------+--------------+--------------+-----------+ | +---------+--------------+--------------+--------------+-----------+ | |||
skipping to change at page 15, line 37 ¶ | skipping to change at page 15, line 37 ¶ | |||
|Protocol | gRPC,NETCONF,| gPRC,NETCONF,| IPFIX, mirror| gRPC | | |Protocol | gRPC,NETCONF,| gPRC,NETCONF,| IPFIX, mirror| gRPC | | |||
| | IPFIX,mirror | | | | | | | IPFIX,mirror | | | | | |||
+---------+--------------+--------------+--------------+-----------+ | +---------+--------------+--------------+--------------+-----------+ | |||
|Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP | | |Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP | | |||
| | UDP | | | UDP | | | | UDP | | | UDP | | |||
+---------+--------------+--------------+--------------+-----------+ | +---------+--------------+--------------+--------------+-----------+ | |||
Figure 3: Comparison of the Data Object Modules | Figure 3: Comparison of the Data Object Modules | |||
Note that the interaction with the network operation applications can | Note that the interaction with the network operation applications can | |||
be indirect. For example, in the management plane telemetry, the | be indirect. Some in-device data transfer is possible. For example, | |||
management plane may need to acquire data from the data plane. Some | in the management plane telemetry, the management plane may need to | |||
of the operational states can only be derived from the data plane | acquire data from the data plane. Some of the operational states can | |||
such as the interface status and statistics. For another example, | only be derived from the data plane such as the interface status and | |||
the control plane telemetry may need to access the Forwarding | statistics. For another example, the control plane telemetry may | |||
Information Base (FIB) in data plane. On the other hand, an | need to access the Forwarding Information Base (FIB) in data plane. | |||
application may involve more than one plane simultaneously. For | ||||
example, an SLA compliance application may require both the data | On the other hand, an application may involve more than one plane and | |||
plane telemetry and the control plane telemetry. | interact with multiple planes simultaneously. For example, an SLA | |||
compliance application may require both the data plane telemetry and | ||||
the control plane telemetry. | ||||
4.2.1. Requirements and Challenges for each Module | 4.2.1. Requirements and Challenges for each Module | |||
4.2.1.1. Management Plane Telemetry | 4.2.1.1. Management Plane Telemetry | |||
The management plane of network elements interacts with the Network | The management plane of network elements interacts with the Network | |||
Management System (NMS), and provides information such as performance | Management System (NMS), and provides information such as performance | |||
data, network logging data, network warning and defects data, and | data, network logging data, network warning and defects data, and | |||
network statistics and state data. Some legacy protocols, such as | network statistics and state data. Some legacy protocols, such as | |||
SNMP and Syslog, are widely used for the management plane. However, | SNMP and Syslog, are widely used for the management plane. However, | |||
these protocols are insufficient to meet the requirements of the | these protocols are insufficient to meet the requirements of the | |||
future automated network operation applications. | future automated network operation applications. | |||
skipping to change at page 16, line 29 ¶ | skipping to change at page 16, line 32 ¶ | |||
export frequency. | export frequency. | |||
Structured Data: For automatic network operation, machines will | Structured Data: For automatic network operation, machines will | |||
replace human for network data comprehension. The schema | replace human for network data comprehension. The schema | |||
languages such as YANG can efficiently describe structured data | languages such as YANG can efficiently describe structured data | |||
and normalize data encoding and transformation. | and normalize data encoding and transformation. | |||
High Speed Data Transport: In order to retain the information, a | High Speed Data Transport: In order to retain the information, a | |||
server needs to send a large amount of data at high frequency. | server needs to send a large amount of data at high frequency. | |||
Compact encoding formats are needed to compress the data and | Compact encoding formats are needed to compress the data and | |||
improve the data transport efficiency. The push mode, by | improve the data transport efficiency. The subscription mode, by | |||
replacing the poll mode, can also reduce the interactions between | replacing the query mode, reduces the interactions between clients | |||
clients and servers, which help to improve the server's | and servers and helps to improve the server's efficiency. | |||
efficiency. | ||||
4.2.1.2. Control Plane Telemetry | 4.2.1.2. Control Plane Telemetry | |||
The control plane telemetry refers to the health condition monitoring | The control plane telemetry refers to the health condition monitoring | |||
of different network protocols, which covers Layer 2 to Layer 7. | of different network control protocols covering Layer 2 to Layer 7. | |||
Keeping track of the running status of these protocols is beneficial | Keeping track of the running status of these protocols is beneficial | |||
for detecting, localizing, and even predicting various network | for detecting, localizing, and even predicting various network | |||
issues, as well as network optimization, in real-time and in fine | issues, as well as network optimization, in real-time and in fine | |||
granularity. | granularity. | |||
One of the most challenging problems for the control plane telemetry | One of the most challenging problems for the control plane telemetry | |||
is how to correlate the E2E Key Performance Indicators (KPI) to a | is how to correlate the End-to-End (E2E) Key Performance Indicators | |||
specific layer's KPIs. For example, an IPTV user may describe his | (KPI) to a specific layer's KPIs. For example, an IPTV user may | |||
User Experience (UE) by the video fluency and definition. Then in | describe his User Experience (UE) by the video fluency and | |||
case of an unusually poor UE KPI or a service disconnection, it is | definition. Then in case of an unusually poor UE KPI or a service | |||
non-trivial work to delimit and localize the issue to the responsible | disconnection, it is non-trivial to delimit and pinpoint the issue in | |||
protocol layer (e.g., the Transport Layer or the Network Layer), the | the responsible protocol layer (e.g., the Transport Layer or the | |||
responsible protocol (e.g., ISIS or BGP at the Network Layer), and | Network Layer), the responsible protocol (e.g., ISIS or BGP at the | |||
finally the responsible device(s) with specific reasons. | Network Layer), and finally the responsible device(s) with specific | |||
reasons. | ||||
Traditional OAM-based approaches for control plane KPI measurement | Traditional OAM-based approaches for control plane KPI measurement | |||
include PING (L3), Tracert (L3), Y.1731 (L2) and so on. One common | include PING (L3), Tracert (L3), Y.1731 (L2), and so on. One common | |||
issue behind these methods is that they only measure the KPIs instead | issue behind these methods is that they only measure the KPIs instead | |||
of reflecting the actual running status of these protocols, making | of reflecting the actual running status of these protocols, making | |||
them less effective or efficient for control plane troubleshooting | them less effective or efficient for control plane troubleshooting | |||
and network optimization. An example of the control plane telemetry | and network optimization. | |||
is the BGP monitoring protocol (BMP), it is currently used to | ||||
monitoring the BGP routes and enables rich applications, such as BGP | An example of the control plane telemetry is the BGP monitoring | |||
peer analysis, AS analysis, prefix analysis, security analysis, and | protocol (BMP), it is currently used to monitoring the BGP routes and | |||
so on. However, the monitoring of other layers, protocols and the | enables rich applications, such as BGP peer analysis, AS analysis, | |||
cross-layer, cross-protocol KPI correlations are still in their | prefix analysis, security analysis, and so on. However, the | |||
infancy (e.g., the IGP monitoring is missing), which require | monitoring of other layers, protocols and the cross-layer, cross- | |||
substantial further research. | protocol KPI correlations are still in their infancy (e.g., the IGP | |||
monitoring is missing), which require further research. | ||||
4.2.1.3. Data Plane Telemetry | 4.2.1.3. Data Plane Telemetry | |||
An effective data plane telemetry system relies on the data that the | An effective data plane telemetry system relies on the data that the | |||
network device can expose. The data's quality, quantity, and | network device can expose. The data's quality, quantity, and | |||
timeliness must meet some stringent requirements. This raises some | timeliness must meet some stringent requirements. This raises some | |||
challenges to the network data plane devices where the first hand | challenges to the network data plane devices where the first hand | |||
data originate. | data originate. | |||
o A data plane device's main function is user traffic processing and | o A data plane device's main function is user traffic processing and | |||
skipping to change at page 18, line 9 ¶ | skipping to change at page 18, line 11 ¶ | |||
applications to parse and consume. At the same time, the data | applications to parse and consume. At the same time, the data | |||
types needed by applications can vary significantly. The data | types needed by applications can vary significantly. The data | |||
plane devices need to provide enough flexibility and | plane devices need to provide enough flexibility and | |||
programmability to support the precise data provision for | programmability to support the precise data provision for | |||
applications. | applications. | |||
o The data plane telemetry should support incremental deployment and | o The data plane telemetry should support incremental deployment and | |||
work even though some devices are unaware of the system. This | work even though some devices are unaware of the system. This | |||
challenge is highly relevant to the standards and legacy networks. | challenge is highly relevant to the standards and legacy networks. | |||
The industry has agreed that the data plane programmability is | The data plane programmability is essential to support network | |||
essential to support network telemetry. Newer data plane chips are | telemetry. Newer data plane forwarding chips are equipped with | |||
all equipped with advanced telemetry features and provide flexibility | advanced telemetry features and provide flexibility to support | |||
to support customized telemetry functions. | customized telemetry functions. | |||
4.2.1.3.1. Technique Taxonomy | 4.2.1.3.1. Technique Taxonomy | |||
There can be multiple possible dimensions to classify the data plane | There can be multiple possible dimensions to classify the data plane | |||
telemetry techniques. | telemetry techniques. | |||
Active and Passive: The active and passive methods (as well as the | Active, Passive, and Hybrid: The active and passive methods (as well | |||
hybrid types) are well documented in [RFC7799]. The passive | as the hybrid types) are well documented in [RFC7799]. The | |||
methods include TCPDUMP, IPFIX [RFC7011], sflow, and traffic | passive methods include TCPDUMP, IPFIX [RFC7011], sflow, and | |||
mirror. These methods usually have low data coverage. The | traffic mirror. These methods usually have low data coverage. | |||
bandwidth cost is very high in order to improve the data coverage. | The bandwidth cost is very high in order to improve the data | |||
On the other hand, the active methods include Ping, Traceroute, | coverage. On the other hand, the active methods include Ping, | |||
OWAMP [RFC4656], and TWAMP [RFC5357]. These methods are intrusive | Traceroute, OWAMP [RFC4656], and TWAMP [RFC5357]. These methods | |||
and only provide indirect network measurement results. The hybrid | are intrusive and only provide indirect network measurement | |||
methods, including in-situ OAM | results. The hybrid methods, including in-situ OAM | |||
[I-D.brockners-inband-oam-requirements], IPFPM [RFC8321], and | [I-D.ietf-ippm-ioam-data], IPFPM [RFC8321], and Multipoint | |||
Multipoint Alternate Marking | Alternate Marking [I-D.fioccola-ippm-multipoint-alt-mark], provide | |||
[I-D.fioccola-ippm-multipoint-alt-mark], provide a well-balanced | a well-balanced and more flexible approach. However, these | |||
and more flexible approach. However, these methods are also more | methods are also more complex to implement. | |||
complex to implement. | ||||
In-Band and Out-of-Band: The telemetry data, before being exported | In-Band and Out-of-Band: The telemetry data, before being exported | |||
to some collector, can be carried in user packets. Such methods | to some collector, can be carried in user packets. Such methods | |||
are considered in-band (e.g., in-situ OAM | are considered in-band (e.g., in-situ OAM | |||
[I-D.brockners-inband-oam-requirements]). If the telemetry data | [I-D.ietf-ippm-ioam-data]). If the telemetry data is directly | |||
is directly exported to some collector without modifying the user | exported to some collector without modifying the user packets, | |||
packets, Such methods are considered out-of-band (e.g., postcard- | such methods are considered out-of-band (e.g., postcard-based | |||
based INT). It is possible to have hybrid methods. For example, | INT). It is possible to have hybrid methods. For example, only | |||
only the telemetry instruction or partial data is carried by user | the telemetry instruction or partial data is carried by user | |||
packets (e.g., IPFPM [RFC8321]). | packets (e.g., IPFPM [RFC8321]). | |||
E2E and In-Network: Some E2E methods start from and end at the | E2E and In-Network: Some E2E methods start from and end at the | |||
network end hosts (e.g., Ping). The other methods work in | network end hosts (e.g., Ping). The other methods work in | |||
networks and are transparent to end hosts. However, if needed, | networks and are transparent to end hosts. However, if needed, | |||
the in-network methods can be easily extended into end hosts. | the in-network methods can be easily extended into end hosts. | |||
Flow, Path, and Node: Depending on the telemetry objective, the | Flow, Path, and Node: Depending on the telemetry objective, the | |||
methods can be flow-based (e.g., in-situ OAM | methods can be flow-based (e.g., in-situ OAM | |||
[I-D.brockners-inband-oam-requirements]), path-based (e.g., | ||||
Traceroute), and node-based (e.g., IPFIX [RFC7011]). | [I-D.ietf-ippm-ioam-data]), path-based (e.g., Traceroute), and | |||
node-based (e.g., IPFIX [RFC7011]). | ||||
4.2.1.4. External Data Telemetry | 4.2.1.4. External Data Telemetry | |||
Events that occur outside the boundaries of the network system are | Events that occur outside the boundaries of the network system are | |||
another important source of telemetry information. Correlating both | another important source of network telemetry. Correlating both | |||
internal telemetry data and external events with the requirements of | internal telemetry data and external events with the requirements of | |||
network systems, as presented in Exploiting External Event Detectors | network systems, as presented in | |||
to Anticipate Resource Requirements for the Elastic Adaptation of | [I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and | |||
SDN/NFV Systems [I-D.pedro-nmrg-anticipated-adaptation], provides a | functional advantage to management operations. | |||
strategic and functional advantage to management operations. | ||||
As with other sources of telemetry information, the data and events | As with other sources of telemetry information, the data and events | |||
must meet strict requirements, especially in terms of timeliness, | must meet strict requirements, especially in terms of timeliness, | |||
which is essential to properly incorporate external event information | which is essential to properly incorporate external event information | |||
to management cycles. Thus, the specific challenges are described as | to management cycles. The specific challenges are described as | |||
follows: | follows: | |||
o The role of external event detector can be played by multiple | o The role of external event detector can be played by multiple | |||
elements, including hardware (e.g. physical sensors, such as | elements, including hardware (e.g. physical sensors, such as | |||
seismometers) and software (e.g. Big Data sources that analyze | seismometers) and software (e.g. Big Data sources that analyze | |||
streams of information, such as Twitter messages). Thus, the | streams of information, such as Twitter messages). Thus, the | |||
transmitted data must support different shapes but, at the same | transmitted data must support different shapes but, at the same | |||
time, follow a common but extensible ontology. | time, follow a common but extensible schema. | |||
o Since the main function of the external event detectors is to | o Since the main function of the external event detectors is to | |||
perform the notifications, their timeliness is assumed. However, | perform the notifications, their timeliness is assumed. However, | |||
once messages have been dispatched, they must be quickly collected | once messages have been dispatched, they must be quickly collected | |||
and inserted into the control plane with variable priority, which | and inserted into the control plane with variable priority, which | |||
will be high for important sources and/or important events and low | will be high for important sources and/or important events and low | |||
for secondary ones. | for secondary ones. | |||
o The ontology used by external detectors must be easily adopted by | o The schema used by external detectors must be easily adopted by | |||
current and future devices and applications. Therefore, it must | current and future devices and applications. Therefore, it must | |||
be easily mapped to current information models, such as in terms | be easily mapped to current information models, such as in terms | |||
of YANG. | of YANG. | |||
Organizing together both internal and external telemetry information | Organizing together both internal and external telemetry information | |||
will be key for the general exploitation of the management | will be key for the general exploitation of the management | |||
possibilities of current and future network systems, as reflected in | possibilities of current and future network systems, as reflected in | |||
the incorporation of cognitive capabilities to new hardware and | the incorporation of cognitive capabilities to new hardware and | |||
software (virtual) elements. | software (virtual) elements. | |||
4.3. Function Components | 4.3. Function Components | |||
At each plane, the telemetry can be further partitioned into five | The telemetry module at each plane can be further partitioned into | |||
distinct components: | five distinct components: | |||
Data Query, Analysis, and Storage: This component works at the | Data Query, Analysis, and Storage: This component works at the | |||
application layer. On the one hand, it is responsible for issuing | application layer. On the one hand, it is responsible for issuing | |||
data queries. The queries can be for modeled data through | data requirements. The data of interest can be modeled data | |||
configuration or custom data through programming. The queries can | through configuration or custom data through programming. The | |||
be one shot or subscriptions for events or streaming data. On the | data requirements can be queries for one-shot data or | |||
other hand, it receives, stores, and processes the returned data | subscriptions for events or streaming data. On the other hand, it | |||
from network devices. Data analysis can be interactive to | receives, stores, and processes the returned data from network | |||
initiate further data queries. Note that this component can | devices. Data analysis can be interactive to initiate further | |||
reside in either network devices or remote controllers. | data queries. This component can reside in either network devices | |||
or remote controllers. | ||||
Data Configuration and Subscription: This component deploys data | Data Configuration and Subscription: This component deploys data | |||
queries on devices. It determines the protocol and channel for | queries on devices. It determines the protocol and channel for | |||
applications to acquire desired data. This component is also | applications to acquire desired data. This component is also | |||
responsible for configuring the desired data that might not be | responsible for configuring the desired data that might not be | |||
directly available form data sources. The subscription data can | directly available form data sources. The subscription data can | |||
be described by models, templates, or programs. | be described by models, templates, or programs. | |||
Data Encoding and Export: This component determines how telemetry | Data Encoding and Export: This component determines how telemetry | |||
data are delivered to the data analysis and storage component. | data are delivered to the data analysis and storage component. | |||
skipping to change at page 21, line 35 ¶ | skipping to change at page 21, line 35 ¶ | |||
| Data Object and Source | | | Data Object and Source | | |||
| | | | | | |||
+----------------------------------------+ | +----------------------------------------+ | |||
Figure 4: Components in the Network Telemetry Framework | Figure 4: Components in the Network Telemetry Framework | |||
4.4. Existing Works Mapped in the Framework | 4.4. Existing Works Mapped in the Framework | |||
The following two tables provide a non-exhaustive list of existing | The following two tables provide a non-exhaustive list of existing | |||
works (mainly published in IETF and with the emphasis on the latest | works (mainly published in IETF and with the emphasis on the latest | |||
new technologies) and shows their positions in the framework. The | new technologies) and shows their positions in the framework. More | |||
details about the mentioned work can be found in Appendix A. | details can be found in Appendix A. | |||
The first table is based on the data acquiring mechanisms and data | ||||
types. | ||||
+-----------------+---------------+----------------+ | +-----------------+---------------+----------------+ | |||
| | Query | Subscription | | | | Query | Subscription | | |||
| | | | | | | | | | |||
+-----------------+---------------+----------------+ | +-----------------+---------------+----------------+ | |||
| Simple Data | SNMP, NETCONF,| | | | Simple Data | SNMP, NETCONF,| SNMP, NETCONF | | |||
| | YANG, BMP, | | | | | YANG, BMP, | YANG, gRPC | | |||
| | IOAM, PBT,gPRC| | | | | gRPC | | | |||
+-----------------+---------------+----------------+ | +-----------------+---------------+----------------+ | |||
| Complex Data | DNP, YANG FSM | | | | Complex Data | DNP, YANG FSM | DNP, YANG PUSH | | |||
| | gRPC, NETCONF | | | | | gRPC, NETCONF | gPRC, NETCONF | | |||
+-----------------+---------------+----------------+ | +-----------------+---------------+----------------+ | |||
| Event-triggered | | gRPC, NETCONF, | | | Event-triggered | | gRPC, NETCONF, | | |||
| Data | | YANG PUSH, DNP | | | Data | N/A | YANG PUSH, DNP | | |||
| | | IOAM, PBT, | | ||||
| | | YANG FSM | | | | | YANG FSM | | |||
+-----------------+---------------+----------------+ | +-----------------+---------------+----------------+ | |||
| Streaming Data | | gRPC, NETCONF, | | | Streaming Data | | gRPC, NETCONF, | | |||
| | | IOAM, PBT, DNP | | | | N/A | IOAM, PBT, DNP | | |||
| | | IPFIX, IPFPM | | | | | IPFIX, IPFPM | | |||
+-----------------+---------------+----------------+ | +-----------------+---------------+----------------+ | |||
Figure 5: Existing Work Mapping I | Figure 5: Existing Work Mapping I | |||
The second table is based on the telemetry modules and components. | ||||
+--------------+---------------+----------------+---------------+ | +--------------+---------------+----------------+---------------+ | |||
| | Management | Control | Forwarding | | | | Management | Control | Forwarding | | |||
| | Plane | Plane | Plane | | | | Plane | Plane | Plane | | |||
+--------------+---------------+----------------+---------------+ | +--------------+---------------+----------------+---------------+ | |||
| data Config. | gRPC, NETCONF,| NETCONF/YANG | NETCONF/YANG, | | | data Config. | gRPC, NETCONF,| NETCONF/YANG | NETCONF/YANG, | | |||
| & subscrib. | YANG PUSH | | YANG FSM | | | & subscrib. | YANG PUSH | | YANG FSM | | |||
+--------------+---------------+----------------+---------------+ | +--------------+---------------+----------------+---------------+ | |||
| data gen. & | DNP, | DNP, | IOAM, | | | data gen. & | DNP, | DNP, | IOAM, | | |||
| processing | YANG | YANG | PBT, IPFPM, | | | processing | YANG | YANG | PBT, IPFPM, | | |||
| | | | DNP | | | | | | DNP | | |||
+--------------+---------------+----------------+---------------+ | +--------------+---------------+----------------+---------------+ | |||
| data | gRPC, NETCONF | BMP, NETCONF | IPFIX | | | data | gRPC, NETCONF | BMP, NETCONF | IPFIX | | |||
| export | YANG PUSH | | | | | export | YANG PUSH | | | | |||
+--------------+---------------+----------------+---------------+ | +--------------+---------------+----------------+---------------+ | |||
Figure 6: Existing Work Mapping II | Figure 6: Existing Work Mapping II | |||
5. Evolution of Network Telemetry | 5. Evolution of Network Telemetry | |||
As the network is evolving towards the automated operation, network | Network telemetry is a fast evolving technical area. As the network | |||
telemetry also undergoes several levels of evolution. | moves towards the automated operation, network telemetry undergoes | |||
several levels of evolution. | ||||
Level 0 - Static Telemetry: The telemetry data source and type are | Level 0 - Static Telemetry: The telemetry data source and type are | |||
determined at design time. The network operator can only | determined at design time. The network operator can only | |||
configure how to use it with limited flexibility. | configure how to use it with limited flexibility. | |||
Level 1 - Dynamic Telemetry: The telemetry data can be dynamically | Level 1 - Dynamic Telemetry: The telemetry data can be dynamically | |||
programmed or configured at runtime, allowing a tradeoff among | programmed or configured at runtime, allowing a tradeoff among | |||
resource, performance, flexibility, and coverage. DNP is an | resource, performance, flexibility, and coverage. DNP is an | |||
effort towards this direction. | effort towards this direction. | |||
Level 2 - Interactive Telemetry: The network operator can | Level 2 - Interactive Telemetry: The network operator can | |||
continuously customize the telemetry data in real time to reflect | continuously customize the telemetry data in real time to reflect | |||
the network operation's visibility requirements. At this level, | the network operation's visibility requirements. At this level, | |||
some tasks can be automated, although ultimately human operators | some tasks can be automated, although ultimately human operators | |||
will still need to sit in the middle to make decisions. | will still need to sit in the middle to make decisions. | |||
Level 3 - Closed-loop Telemetry: Human operators are completely | Level 3 - Closed-loop Telemetry: Human operators are completely | |||
excluded from the control loop. The intelligent network operation | excluded from the control loop. The intelligent network operation | |||
engine automatically issues the telemetry data request, analyzes | engine automatically issues the telemetry data requests, analyzes | |||
the data, and updates the network operations in closed control | the data, and updates the network operations in closed control | |||
loops. | loops. | |||
While most of the existing technologies belong to level 0 and level | While most of the existing technologies belong to level 0 and level | |||
1, with the help of a clearly defined network telemetry framework, we | 1, with the help of a clearly defined network telemetry framework, we | |||
can assemble the technologies to support level 2 and make solid steps | are now possible to assemble the technologies to support level 2 and | |||
towards level 3. | make solid steps towards level 3. | |||
6. Security Considerations | 6. Security Considerations | |||
Given that this document has proposed a framework for network | Given that this document has proposed a framework for network | |||
telemetry and the telemetry mechanisms discussed are distinct (in | telemetry and the telemetry mechanisms discussed are distinct (in | |||
both message frequency and traffic amount) from the conventional | both message frequency and traffic amount) from the conventional | |||
network OAM concepts, we must also reflect that various new security | network OAM concepts, we must also reflect that various new security | |||
considerations may also arise. A number of techniques already exist | considerations may also arise. A number of techniques already exist | |||
for securing the data plane, control plane, and the management plane | for securing the forwarding plane, the control plane, and the | |||
in a network, but the it is important to consider if any new threat | management plane in a network, but it is important to consider if any | |||
vectors are now being enabled via the use of network telemetry | new threat vectors are now being enabled via the use of network | |||
procedures and mechanisms. | telemetry procedures and mechanisms. | |||
Security considerations for networks that use telemetry methods may | Security considerations for networks that use telemetry methods may | |||
include: | include: | |||
o Telemetry framework trust and policy model; | o Telemetry framework trust and policy model; | |||
o Role management and access control for enabling and disabling | o Role management and access control for enabling and disabling | |||
telemetry capabilities; | telemetry capabilities; | |||
o Protocol transport used telemetry data and inherent security | o Protocol transport used telemetry data and inherent security | |||
skipping to change at page 24, line 20 ¶ | skipping to change at page 24, line 20 ¶ | |||
Some of the security considerations highlighted above may be | Some of the security considerations highlighted above may be | |||
minimized or negated with policy management of network telemetry. In | minimized or negated with policy management of network telemetry. In | |||
a network telemetry deployment it would be advantageous to separate | a network telemetry deployment it would be advantageous to separate | |||
telemetry capabilities into different classes of policies, i.e., Role | telemetry capabilities into different classes of policies, i.e., Role | |||
Based Access Control and Event-Condition-Action policies. Also, | Based Access Control and Event-Condition-Action policies. Also, | |||
potential conflicts between network telemetry mechanisms must be | potential conflicts between network telemetry mechanisms must be | |||
detected accurately and resolved quickly to avoid unnecessary network | detected accurately and resolved quickly to avoid unnecessary network | |||
telemetry traffic propagation escalating into an unintended or | telemetry traffic propagation escalating into an unintended or | |||
intended denial of service attack. | intended denial of service attack. | |||
Further discussion and development of this section will be required, | Further study of the security issues will be required, and it is | |||
and it is expected that this security section, and subsequent policy | expected that the secuirty mechanisms and protocols are devloped and | |||
section will be developed further. | deployed along with a network telemetry system. | |||
7. IANA Considerations | 7. IANA Considerations | |||
This document includes no request to IANA. | This document includes no request to IANA. | |||
8. Contributors | 8. Contributors | |||
The other contributors of this document are listed as follows. | The other contributors of this document are listed as follows. | |||
o Tianran Zhou | o Tianran Zhou | |||
o Zhenbin Li | o Zhenbin Li | |||
o Zhenqiang Li | ||||
o Daniel King | o Daniel King | |||
o Adrian Farrel | o Adrian Farrel | |||
9. Acknowledgments | 9. Acknowledgments | |||
We would like to thank Randy Presuhn, Joe Clarke, Victor Liu, James | We would like to thank Randy Presuhn, Joe Clarke, Victor Liu, James | |||
Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Yegani, | Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Yegani, | |||
Young Lee, Alexander Clemm, Qin Wu, and many others who have provided | Young Lee, Alexander Clemm, Qin Wu, and many others who have provided | |||
helpful comments and suggestions to improve this document. | helpful comments and suggestions to improve this document. | |||
10. Informative References | 10. Informative References | |||
[gnmi] "gNMI - gRPC Network Management Interface", | [gnmi] "gNMI - gRPC Network Management Interface", | |||
<https://github.com/openconfig/reference/tree/master/rpc/ | <https://github.com/openconfig/reference/tree/master/rpc/ | |||
gnmi>. | gnmi>. | |||
[grpc] "gPPC, A high performance, open-source universal RPC | [grpc] "gPPC, A high performance, open-source universal RPC | |||
framework", <https://grpc.io>. | framework", <https://grpc.io>. | |||
[I-D.brockners-inband-oam-requirements] | ||||
Brockners, F., Bhandari, S., Dara, S., Pignataro, C., | ||||
Gredler, H., Leddy, J., Youell, S., Mozes, D., Mizrahi, | ||||
T., Lapukhov, P., and r. remy@barefootnetworks.com, | ||||
"Requirements for In-situ OAM", draft-brockners-inband- | ||||
oam-requirements-03 (work in progress), March 2017. | ||||
[I-D.fioccola-ippm-multipoint-alt-mark] | [I-D.fioccola-ippm-multipoint-alt-mark] | |||
Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, | Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, | |||
"Multipoint Alternate Marking method for passive and | "Multipoint Alternate Marking method for passive and | |||
hybrid performance monitoring", draft-fioccola-ippm- | hybrid performance monitoring", draft-fioccola-ippm- | |||
multipoint-alt-mark-04 (work in progress), June 2018. | multipoint-alt-mark-04 (work in progress), June 2018. | |||
[I-D.ietf-grow-bmp-adj-rib-out] | [I-D.ietf-grow-bmp-adj-rib-out] | |||
Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S. | Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S. | |||
Zhuang, "Support for Adj-RIB-Out in BGP Monitoring | Zhuang, "Support for Adj-RIB-Out in BGP Monitoring | |||
Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-07 (work | Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-07 (work | |||
in progress), August 2019. | in progress), August 2019. | |||
[I-D.ietf-grow-bmp-local-rib] | [I-D.ietf-grow-bmp-local-rib] | |||
Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, | Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, | |||
"Support for Local RIB in BGP Monitoring Protocol (BMP)", | "Support for Local RIB in BGP Monitoring Protocol (BMP)", | |||
draft-ietf-grow-bmp-local-rib-05 (work in progress), | draft-ietf-grow-bmp-local-rib-06 (work in progress), | |||
August 2019. | November 2019. | |||
[I-D.ietf-ippm-ioam-data] | ||||
Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., | ||||
Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, | ||||
P., remy@barefootnetworks.com, r., daniel.bernier@bell.ca, | ||||
d., and J. Lemon, "Data Fields for In-situ OAM", draft- | ||||
ietf-ippm-ioam-data-09 (work in progress), March 2020. | ||||
[I-D.ietf-netconf-udp-pub-channel] | [I-D.ietf-netconf-udp-pub-channel] | |||
Zheng, G., Zhou, T., and A. Clemm, "UDP based Publication | Zheng, G., Zhou, T., and A. Clemm, "UDP based Publication | |||
Channel for Streaming Telemetry", draft-ietf-netconf-udp- | Channel for Streaming Telemetry", draft-ietf-netconf-udp- | |||
pub-channel-05 (work in progress), March 2019. | pub-channel-05 (work in progress), March 2019. | |||
[I-D.ietf-netconf-yang-push] | [I-D.ietf-netconf-yang-push] | |||
Clemm, A. and E. Voit, "Subscription to YANG Datastores", | Clemm, A. and E. Voit, "Subscription to YANG Datastores", | |||
draft-ietf-netconf-yang-push-25 (work in progress), May | draft-ietf-netconf-yang-push-25 (work in progress), May | |||
2019. | 2019. | |||
skipping to change at page 26, line 14 ¶ | skipping to change at page 26, line 20 ¶ | |||
[I-D.pedro-nmrg-anticipated-adaptation] | [I-D.pedro-nmrg-anticipated-adaptation] | |||
Martinez-Julia, P., "Exploiting External Event Detectors | Martinez-Julia, P., "Exploiting External Event Detectors | |||
to Anticipate Resource Requirements for the Elastic | to Anticipate Resource Requirements for the Elastic | |||
Adaptation of SDN/NFV Systems", draft-pedro-nmrg- | Adaptation of SDN/NFV Systems", draft-pedro-nmrg- | |||
anticipated-adaptation-02 (work in progress), June 2018. | anticipated-adaptation-02 (work in progress), June 2018. | |||
[I-D.song-ippm-postcard-based-telemetry] | [I-D.song-ippm-postcard-based-telemetry] | |||
Song, H., Zhou, T., Li, Z., Shin, J., and K. Lee, | Song, H., Zhou, T., Li, Z., Shin, J., and K. Lee, | |||
"Postcard-based On-Path Flow Data Telemetry", draft-song- | "Postcard-based On-Path Flow Data Telemetry", draft-song- | |||
ippm-postcard-based-telemetry-05 (work in progress), | ippm-postcard-based-telemetry-06 (work in progress), | |||
September 2019. | October 2019. | |||
[I-D.song-opsawg-dnp4iq] | [I-D.song-opsawg-dnp4iq] | |||
Song, H. and J. Gong, "Requirements for Interactive Query | Song, H. and J. Gong, "Requirements for Interactive Query | |||
with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 | with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 | |||
(work in progress), June 2017. | (work in progress), June 2017. | |||
[I-D.song-opsawg-ifit-framework] | ||||
Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- | ||||
situ Flow Information Telemetry", draft-song-opsawg-ifit- | ||||
framework-11 (work in progress), March 2020. | ||||
[I-D.zhou-netconf-multi-stream-originators] | [I-D.zhou-netconf-multi-stream-originators] | |||
Zhou, T., Zheng, G., Voit, E., Clemm, A., and A. Bierman, | Zhou, T., Zheng, G., Voit, E., and A. Clemm, "Subscription | |||
"Subscription to Multiple Stream Originators", draft-zhou- | to Multiple Stream Originators", draft-zhou-netconf-multi- | |||
netconf-multi-stream-originators-06 (work in progress), | stream-originators-10 (work in progress), November 2019. | |||
July 2019. | ||||
[RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, | [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, | |||
"Simple Network Management Protocol (SNMP)", RFC 1157, | "Simple Network Management Protocol (SNMP)", RFC 1157, | |||
DOI 10.17487/RFC1157, May 1990, | DOI 10.17487/RFC1157, May 1990, | |||
<https://www.rfc-editor.org/info/rfc1157>. | <https://www.rfc-editor.org/info/rfc1157>. | |||
[RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981, | [RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981, | |||
DOI 10.17487/RFC2981, October 2000, | DOI 10.17487/RFC2981, October 2000, | |||
<https://www.rfc-editor.org/info/rfc2981>. | <https://www.rfc-editor.org/info/rfc2981>. | |||
skipping to change at page 31, line 28 ¶ | skipping to change at page 31, line 33 ¶ | |||
information about these packets. An Exporter then gathers each of | information about these packets. An Exporter then gathers each of | |||
the Observation Points together into an Observation Domain and sends | the Observation Points together into an Observation Domain and sends | |||
this information via the IPFIX protocol to a Collector. | this information via the IPFIX protocol to a Collector. | |||
A.3.4. In-Situ OAM | A.3.4. In-Situ OAM | |||
Traditional passive and active monitoring and measurement techniques | Traditional passive and active monitoring and measurement techniques | |||
are either inaccurate or resource-consuming. It is preferable to | are either inaccurate or resource-consuming. It is preferable to | |||
directly acquire data associated with a flow's packets when the | directly acquire data associated with a flow's packets when the | |||
packets pass through a network. In-situ OAM (iOAM) | packets pass through a network. In-situ OAM (iOAM) | |||
[I-D.brockners-inband-oam-requirements], a data generation technique, | [I-D.ietf-ippm-ioam-data], a data generation technique, embeds a new | |||
embeds a new instruction header to user packets and the instruction | instruction header to user packets and the instruction directs the | |||
directs the network nodes to add the requested data to the packets. | network nodes to add the requested data to the packets. Thus, at the | |||
Thus, at the path end, the packet's experience gained on the entire | path end, the packet's experience gained on the entire forwarding | |||
forwarding path can be collected. Such firsthand data is invaluable | path can be collected. Such firsthand data is invaluable to many | |||
to many network OAM applications. | network OAM applications. | |||
However, iOAM also faces some challenges. The issues on performance | However, iOAM also faces some challenges. The issues on performance | |||
impact, security, scalability and overhead limits, encapsulation | impact, security, scalability and overhead limits, encapsulation | |||
difficulties in some protocols, and cross-domain deployment need to | difficulties in some protocols, and cross-domain deployment need to | |||
be addressed. | be addressed. | |||
A.3.5. Postcard Based Telemetry | A.3.5. Postcard Based Telemetry | |||
PBT [I-D.song-ippm-postcard-based-telemetry] is an alternative to | PBT [I-D.song-ippm-postcard-based-telemetry] is an alternative to | |||
IOAM. PBT directly exports data at each node through an independent | IOAM. PBT directly exports data at each node through an independent | |||
skipping to change at page 33, line 43 ¶ | skipping to change at page 34, line 4 ¶ | |||
In some situations, the interconnection between the external event | In some situations, the interconnection between the external event | |||
detectors and the management system is via the management plane. For | detectors and the management system is via the management plane. For | |||
those situations there will be a special connector that provides the | those situations there will be a special connector that provides the | |||
typical interfaces found in most other elements connected to the | typical interfaces found in most other elements connected to the | |||
management plane. For instance, the interfaces will accomplish with | management plane. For instance, the interfaces will accomplish with | |||
a specific information model (YANG) and specific telemetry protocol, | a specific information model (YANG) and specific telemetry protocol, | |||
such as NETCONF, SNMP, or gRPC. | such as NETCONF, SNMP, or gRPC. | |||
Authors' Addresses | Authors' Addresses | |||
Haoyu Song | ||||
Haoyu Song (editor) | ||||
Futurewei | Futurewei | |||
2330 Central Expressway | 2330 Central Expressway | |||
Santa Clara | Santa Clara | |||
USA | USA | |||
Email: hsong@futurewei.com | Email: hsong@futurewei.com | |||
Fengwei Qin | Fengwei Qin | |||
China Mobile | China Mobile | |||
No. 32 Xuanwumenxi Ave., Xicheng District | No. 32 Xuanwumenxi Ave., Xicheng District | |||
Beijing, 100032 | Beijing, 100032 | |||
P.R. China | P.R. China | |||
Email: qinfengwei@chinamobile.com | Email: qinfengwei@chinamobile.com | |||
Pedro Martinez-Julia | Pedro Martinez-Julia | |||
NICT | NICT | |||
End of changes. 84 change blocks. | ||||
262 lines changed or deleted | 276 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |