--- 1/draft-ietf-opsawg-ntf-04.txt 2020-10-09 17:13:12.605926881 -0700 +++ 2/draft-ietf-opsawg-ntf-05.txt 2020-10-09 17:13:12.685928916 -0700 @@ -1,25 +1,25 @@ OPSAWG H. Song Internet-Draft Futurewei Intended status: Informational F. Qin -Expires: March 25, 2021 China Mobile +Expires: April 12, 2021 China Mobile P. Martinez-Julia NICT L. Ciavaglia Nokia A. Wang China Telecom - September 21, 2020 + October 9, 2020 Network Telemetry Framework - draft-ietf-opsawg-ntf-04 + draft-ietf-opsawg-ntf-05 Abstract Network telemetry is the technology for gaining network insight and facilitating efficient and automated network management. It engages various techniques for remote data collection, correlation, and consumption. This document provides an architectural framework for network telemetry, motivated by the network operation challenges and requirements. As evidenced by some key characteristics and industry practices, network telemetry covers technologies and protocols beyond @@ -41,21 +41,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on March 25, 2021. + This Internet-Draft will expire on April 12, 2021. Copyright Notice Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -68,51 +68,51 @@ Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 5 2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 6 2.4. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 8 2.5. Network Telemetry . . . . . . . . . . . . . . . . . . . . 9 3. The Necessity of a Network Telemetry Framework . . . . . . . 11 - 4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 12 + 4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 13 4.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 13 - 4.1.1. Management Plane Telemetry . . . . . . . . . . . . . 15 - 4.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 15 - 4.1.3. Data Plane Telemetry . . . . . . . . . . . . . . . . 16 - 4.1.4. External Data Telemetry . . . . . . . . . . . . . . . 18 + 4.1.1. Management Plane Telemetry . . . . . . . . . . . . . 16 + 4.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 16 + 4.1.3. Data Plane Telemetry . . . . . . . . . . . . . . . . 17 + 4.1.4. External Data Telemetry . . . . . . . . . . . . . . . 19 4.2. Second Level Function Components . . . . . . . . . . . . 19 - 4.3. Data Acquiring Mechanism and Type Abstraction . . . . . . 20 - 4.4. Existing Works Mapped in the Framework . . . . . . . . . 22 - 5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 23 - 6. Security Considerations . . . . . . . . . . . . . . . . . . . 24 - 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 - 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 25 - 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25 - 10. Informative References . . . . . . . . . . . . . . . . . . . 25 - Appendix A. A Survey on Existing Network Telemetry Techniques . 29 - A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 29 - A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 29 - A.1.2. gRPC Network Management Interface . . . . . . . . . . 30 - A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 30 - A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 30 - A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 31 - A.3.1. The Alternate Marking technology . . . . . . . . . . 31 - A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 32 - A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 32 - A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 32 - A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 33 - A.4. External Data and Event Telemetry . . . . . . . . . . . . 33 - A.4.1. Sources of External Events . . . . . . . . . . . . . 33 - A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 34 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 + 4.3. Data Acquiring Mechanism and Type Abstraction . . . . . . 21 + 4.4. Existing Works Mapped in the Framework . . . . . . . . . 23 + 5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 24 + 6. Security Considerations . . . . . . . . . . . . . . . . . . . 25 + 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 + 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 + 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 26 + 10. Informative References . . . . . . . . . . . . . . . . . . . 26 + Appendix A. A Survey on Existing Network Telemetry Techniques . 30 + A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 30 + A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 30 + A.1.2. gRPC Network Management Interface . . . . . . . . . . 31 + A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 31 + A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 31 + A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 32 + A.3.1. The Alternate Marking technology . . . . . . . . . . 32 + A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 33 + A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 33 + A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 34 + A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 34 + A.4. External Data and Event Telemetry . . . . . . . . . . . . 34 + A.4.1. Sources of External Events . . . . . . . . . . . . . 34 + A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 36 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 36 1. Introduction Network visibility is the ability of management tools to see the state and behavior of a network. It is essential for successful network operation. Network telemetry is the process of measuring, correlating, recording, and distributing information about the behavior of a network. Network telemetry has been considered as an ideal means to gain sufficient network visibility with better flexibility, scalability, accuracy, coverage, and performance than @@ -201,24 +201,24 @@ underlines the need of new methods, techniques, and protocols which we assign under an umbrella term - network telemetry. 2.1. Telemetry Data Coverage Any information that can be extracted from networks (including data plane, control plane, and management plane) and used to gain visibility or as basis for actions is considered telemetry data. It includes statistics, event records and logs, snapshots of state, configuration data, etc. It also covers the outputs of any active - and passive measurements. Specially, raw data can be processed in - network before sending to a data consumer. Such processed data are - also telemetry data in the context. A classification of the - telemetry data form is provided in Section 4. + and passive measurements [RFC7799]. Specially, raw data can be + processed in network before sending to a data consumer. Such + processed data are also telemetry data in the context. A + classification of the telemetry data form is provided in Section 4. 2.2. Use Cases These use cases are essential for network operations. While the list is by no means exhaustive, it is enough to highlight the requirements for data velocity, variety, volume, and veracity in networks. Security: Network intrusion detection and prevention need monitor network traffic and activities, and act upon anomalies. Given the more and more sophisticated attack vector and higher and higher @@ -331,22 +331,24 @@ o The conventional passive measurement techniques can either consume excessive network resources and render excessive redundant data, or lead to inaccurate results; on the other hand, the conventional active measurement techniques can interfere with the user traffic and their results are indirect. Techniques that can collect direct and on-demand data from user traffic are more favorable. 2.4. Glossary Before further discussion, we list some key terminology and acronyms - used in this documents. We make an intended distinction between - network telemetry and network OAM. + used in this documents. We make an intended differentiation between + network telemetry and network OAM. However, it should be understood + that there is not a hard-line distinction between the two concepts. + Rather, some OAM techniques are in the scope of network telemetry. AI: Artificial Intelligence. In network domain, AI refers to the machine-learning based technologies for automated network operation and other tasks. AM: Alternate Marking, a flow performance measurement method, specified in [RFC8321]. BMP: BGP Monitoring Protocol, specified in [RFC7854]. @@ -397,48 +399,51 @@ [RFC2578]. SNMP: Simple Network Management Protocol. Version 1 and 2 are specified in [RFC1157] and [RFC3416], respectively. YANG: The abbreviation of "Yet Another Next Generation". YANG is a data modeling language for the definition of data sent over network management protocols such as the NETCONF and RESTCONF. YANG is defined in [RFC6020]. + YANG ECN A YANG model for Event-Condition-Action policies, defined + in [I-D.wwx-netmod-event-yang]. + YANG FSM: A YANG model that describes events, operations, and finite state machine of YANG-defined network elements. YANG PUSH: A method to subscribe pushed data from remote YANG datastore on network devices. Details are specified in [RFC8641] and [RFC8639]. 2.5. Network Telemetry Network telemetry has emerged as a mainstream technical term to refer to the newer data collection and consumption techniques, - distinguishing itself from the convention techniques for network OAM. - Many such techniques have been widely deployed. The representative - techniques and protocols include IPFIX [RFC7011] and gPRC [grpc]. - Network telemetry allows separate entities to acquire data from - network devices so that data can be visualized and analyzed to - support network monitoring and operation. Network telemetry overlaps - with the conventional network OAM and has a wider scope than it. It - is expected that network telemetry can provide the necessary network - insight for autonomous networks and address the shortcomings of - conventional OAM techniques. + distinguishing itself in some notable ways from the convention + network OAM. Several such techniques have been widely deployed. The + representative techniques and protocols include IPFIX [RFC7011] and + gPRC [grpc]. Network telemetry allows separate entities to acquire + data from network devices so that data can be visualized and analyzed + to support network monitoring and operation. Network telemetry + overlaps with the conventional network OAM and has a wider scope than + it. It is expected that network telemetry can provide the necessary + network insight for autonomous networks and address the shortcomings + of conventional OAM techniques. - One difference between the network telemetry and the network OAM is - that in general the network telemetry assumes machines as data - consumer rather than human operators. Hence, the network telemetry - can directly trigger the automated network operation, while the - conventional OAM tools usually help human operators to monitor and - diagnose the networks and guide manual network operations. The + One difference between the network telemetry and the conventional + network OAM is that in general the network telemetry assumes machines + as data consumer rather than human operators. Hence, the network + telemetry can directly trigger the automated network operation, while + the conventional OAM tools usually help human operators to monitor + and diagnose the networks and guide manual network operations. The difference leads to very different techniques. Although the network telemetry techniques are just emerging and subject to continuous evolution, several characteristics of network telemetry have been well accepted. Note that network telemetry is intended to be an umbrella term covering a wide spectrum of techniques, so the following characteristics are not expected to be held by every specific technique. o Push and Streaming: Instead of polling data from network devices, @@ -612,27 +617,36 @@ objects which result in different data source and export locations. Such differences have profound implications on in-network data programming and processing capability, data encoding and transport protocol, and data bandwidth and latency. We summarize the major differences of the four modules in the following table. They are compared from six aspects: data object, data export location, data model, data encoding, telemetry protocol, and transport method. Data object is the target and source of each module. Because the data source varies, the data export location - varies. Because each data export location has different capability, - the proper data model, encoding, and transport method cannot be kept - the same. As a result, the suitable telemetry protocol for each - module can be different. Some representative techniques are shown in - the corresponding table blocks to highlight the technical diversity - of these modules. The key point is that one cannot expect to use a - universal protocol to cover all the network telemetry requirements. + varies. For example, the forwarding plane data are mainly from the + fast path(e.g., forwarding chips) while the control plane data are + mainly from the slow path (e.g., main control CPU). For convenience + and efficiency, it is preferred to export the data from locations + near the source. Because each data export location has different + capability, the proper data model, encoding, and transport method + cannot be kept the same. For example, the forwarding chip has high + throughput but limited capacity for processing complex data and + maintaining states, while the main control CPU is capable of complex + data and state processing, but has limited bandwidth for high + throughput data. As a result, the suitable telemetry protocol for + each module can be different. Some representative techniques are + shown in the corresponding table blocks to highlight the technical + diversity of these modules. The key point is that one cannot expect + to use a universal protocol to cover all the network telemetry + requirements. +---------+--------------+--------------+--------------+-----------+ | Module | Control | Management | Forwarding | External | | | Plane | Plane | Plane | Data | +---------+--------------+--------------+--------------+-----------+ |Object | control | config. & | flow & packet| terminal, | | | protocol & | operation | QoS, traffic | social & | | | signaling, | state, MIB | stat., buffer| environ- | | | RIB, ACL | | & queue stat.| mental | +---------+--------------+--------------+--------------+-----------+ @@ -954,32 +969,35 @@ Simple Data: The data that are steadily available from some data store or static probes in network devices. such data can be specified by YANG model. Complex Data: The data need to be synthesized or processed in network from raw data from one or more network devices. The data processing function can be statically or dynamically loaded into network devices. Event-triggered Data: The data are conditionally acquired based on - the occurrence of some events. An event can be modeled as a - Finite State Machine (FSM). + the occurrence of some events. It can be actively pushed through + subscription or passively polled through query. There are many + ways to model events, including using Finite State Machine (FSM) + or Event Condition Action (ECN) [I-D.wwx-netmod-event-yang]. - Streaming Data: The data are continuously or periodically generated. - It can be time series or the dump of databases. The streaming - data reflect realtime network states and metrics and require large - bandwidth and processing power. + Streaming Data: The data are continuously generated. It can be time + series or the dump of databases. The streaming data reflect + realtime network states and metrics and require large bandwidth + and processing power. The streaming data are always actively + pushed to the subscribers. - The above data types are not mutually exclusive. For example, event- - triggered data can be simple or complex, and streaming data can be - simple, complex, or triggered by events. The relationships of these - data types are illustrated in Figure 4. + The above data types are not mutually exclusive. Rather, they often + overlap. For example, event-triggered data can be simple or complex, + and streaming data can be simple, complex, or triggered by events. + The relationships of these data types are illustrated in Figure 4. +--------------+ +------>| Simple Data |<------+ | +------------- + | | ^ | | | | | +------+-------+ | | +-->| Complex Data |<--+ | | | +--------------+ | | | | | | @@ -1012,22 +1030,22 @@ | | Query | Subscription | | | | | +-----------------+---------------+----------------+ | Simple Data | SNMP, NETCONF,| SNMP, NETCONF | | | YANG, BMP, | YANG, gRPC | | | SMIv2, gRPC | | +-----------------+---------------+----------------+ | Complex Data | DNP, YANG FSM | DNP, YANG PUSH | | | gRPC, NETCONF | gPRC, NETCONF | +-----------------+---------------+----------------+ - | Event-triggered | | gRPC, NETCONF, | - | Data | N/A | YANG PUSH, DNP | + | Event-triggered | DNP, NETCONF, | gRPC, NETCONF, | + | Data | YANG FSM | YANG PUSH, DNP | | | | YANG FSM | +-----------------+---------------+----------------+ | Streaming Data | | gRPC, NETCONF, | | | N/A | IOAM, PBT, DNP | | | | IPFIX, IPFPM | +-----------------+---------------+----------------+ Figure 5: Existing Work Mapping I The second table is based on the telemetry modules and components. @@ -1149,24 +1167,24 @@ o Zhenqiang Li o Daniel King o Adrian Farrel o Alexander Clemm 9. Acknowledgments - We would like to thank Randy Presuhn, Joe Clarke, Victor Liu, James - Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Yegani, - Young Lee, Qin Wu, and many others who have provided helpful comments - and suggestions to improve this document. + We would like to thank Greg Mirsky, Randy Presuhn, Joe Clarke, Victor + Liu, James Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, + Parviz Yegani, Young Lee, Qin Wu, and many others who have provided + helpful comments and suggestions to improve this document. 10. Informative References [gnmi] "gNMI - gRPC Network Management Interface", . [grpc] "gPPC, A high performance, open-source universal RPC framework", . @@ -1186,24 +1204,31 @@ Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, "Support for Local RIB in BGP Monitoring Protocol (BMP)", draft-ietf-grow-bmp-local-rib-07 (work in progress), May 2020. [I-D.ietf-ippm-ioam-data] Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields for In-situ OAM", draft-ietf-ippm-ioam-data-10 (work in progress), July 2020. - [I-D.ietf-netconf-udp-pub-channel] - Zheng, G., Zhou, T., and A. Clemm, "UDP based Publication - Channel for Streaming Telemetry", draft-ietf-netconf-udp- - pub-channel-05 (work in progress), March 2019. + [I-D.ietf-netconf-distributed-notif] + Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, + "Subscription to Distributed Notifications", draft-ietf- + netconf-distributed-notif-00 (work in progress), October + 2020. + + [I-D.ietf-netconf-udp-notif] + Zheng, G., Zhou, T., Graf, T., Francois, P., and P. + Lucente, "UDP-based Transport for Configured + Subscriptions", draft-ietf-netconf-udp-notif-00 (work in + progress), October 2020. [I-D.irtf-nmrg-ibn-concepts-definitions] Clemm, A., Ciavaglia, L., Granville, L., and J. Tantsura, "Intent-Based Networking - Concepts and Definitions", draft-irtf-nmrg-ibn-concepts-definitions-02 (work in progress), September 2020. [I-D.kumar-rtgwg-grpc-protocol] Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC Protocol", draft-kumar-rtgwg-grpc-protocol-00 (work in @@ -1228,26 +1253,27 @@ 2020. [I-D.song-opsawg-dnp4iq] Song, H. and J. Gong, "Requirements for Interactive Query with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 (work in progress), June 2017. [I-D.song-opsawg-ifit-framework] Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- situ Flow Information Telemetry", draft-song-opsawg-ifit- - framework-12 (work in progress), April 2020. + framework-13 (work in progress), October 2020. - [I-D.zhou-netconf-multi-stream-originators] - Zhou, T., Zheng, G., Voit, E., and A. Clemm, "Subscription - to Multiple Stream Originators", draft-zhou-netconf-multi- - stream-originators-10 (work in progress), November 2019. + [I-D.wwx-netmod-event-yang] + Bierman, A., WU, Q., Bryskin, I., Birkholz, H., Liu, X., + and B. Claise, "A YANG Data model for ECA Policy + Management", draft-wwx-netmod-event-yang-09 (work in + progress), July 2020. [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, "Simple Network Management Protocol (SNMP)", RFC 1157, DOI 10.17487/RFC1157, May 1990, . [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. Schoenwaelder, Ed., "Structure of Management Information Version 2 (SMIv2)", STD 58, RFC 2578, DOI 10.17487/RFC2578, April 1999, @@ -1349,29 +1375,30 @@ existing techniques and standard proposals for each network telemetry module. A.1. Management Plane Telemetry A.1.1. Push Extensions for NETCONF NETCONF [RFC6241] is one popular network management protocol, which is also recommended by IETF. Although it can be used for data collection, NETCONF is good at configurations. YANG Push + [RFC8641][RFC8639] extends NETCONF and enables subscriber applications to request a continuous, customized stream of updates from a YANG datastore. Providing such visibility into changes made upon YANG configuration and operational objects enables new capabilities based on the remote mirroring of configuration and operational state. Moreover, distributed data collection mechanism - [I-D.zhou-netconf-multi-stream-originators] via UDP based publication - channel [I-D.ietf-netconf-udp-pub-channel] provides enhanced - efficiency for the NETCONF based telemetry. + [I-D.ietf-netconf-distributed-notif] via UDP based publication + channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for + the NETCONF based telemetry. A.1.2. gRPC Network Management Interface gRPC Network Management Interface (gNMI) [I-D.openconfig-rtgwg-gnmi-spec] is a network management protocol based on the gRPC [I-D.kumar-rtgwg-grpc-protocol] RPC (Remote Procedure Call) framework. With a single gRPC service definition, both configuration and telemetry can be covered. gRPC is an HTTP/2 [RFC7540] based open source micro service communication framework. It provides a number of capabilities which are well-suited for