--- 1/draft-ietf-rmcat-video-traffic-model-05.txt 2018-11-03 21:13:07.954494040 -0700 +++ 2/draft-ietf-rmcat-video-traffic-model-06.txt 2018-11-03 21:13:07.994495000 -0700 @@ -1,20 +1,20 @@ Network Working Group X. Zhu Internet-Draft S. Mena Intended status: Informational Cisco Systems -Expires: January 20, 2019 Z. Sarker +Expires: May 7, 2019 Z. Sarker Ericsson AB - July 19, 2018 + November 3, 2018 Video Traffic Models for RTP Congestion Control Evaluations - draft-ietf-rmcat-video-traffic-model-05 + draft-ietf-rmcat-video-traffic-model-06 Abstract This document describes two reference video traffic models for evaluating RTP congestion control algorithms. The first model statistically characterizes the behavior of a live video encoder in response to changing requests on target video rate. The second model is trace-driven, and emulates the output of actual encoded video frame sizes from a high-resolution test sequence. Both models are designed to strike a balance between simplicity, repeatability, and @@ -31,21 +31,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on January 20, 2019. + This Internet-Draft will expire on May 7, 2019. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -57,21 +57,22 @@ Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Desired Behavior of A Synthetic Video Traffic Model . . . . . 3 4. Interactions Between Synthetic Video Traffic Source and Other Components at the Sender . . . . . . . . . . . . . . . 4 5. A Statistical Reference Model . . . . . . . . . . . . . . . . 6 5.1. Time-damped response to target rate update . . . . . . . 7 - 5.2. Temporary burst and oscillation during transient . . . . 8 + 5.2. Temporary burst and oscillation during the transient + period . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.3. Output rate fluctuation at steady state . . . . . . . . . 8 5.4. Rate range limit imposed by video content . . . . . . . . 9 6. A Trace-Driven Model . . . . . . . . . . . . . . . . . . . . 9 6.1. Choosing the video sequence and generating the traces . . 10 6.2. Using the traces in the synthetic codec . . . . . . . . . 11 6.2.1. Main algorithm . . . . . . . . . . . . . . . . . . . 11 6.2.2. Notes to the main algorithm . . . . . . . . . . . . . 13 6.3. Varying frame rate and resolution . . . . . . . . . . . . 13 7. Combining The Two Models . . . . . . . . . . . . . . . . . . 14 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 15 @@ -124,28 +126,28 @@ A live video encoder employs encoder rate control to meet a target rate by varying its encoding parameters, such as quantization step size, frame rate, and picture resolution, based on its estimate of the video content (e.g., motion and scene complexity). In practice, however, several factors prevent the output video rate from perfectly conforming to the input target rate. Due to uncertainties in the captured video scene, the output rate typically deviates from the specified target. In the presence of a - significant change in target rate, it sometimes takes several frames - before the encoder output rate converges to the new target. Finally, - while most of the frames in a live session are encoded in predictive - mode, the encoder can occasionally generate a large intra-coded frame - (or a frame partially containing intra-coded blocks) in an attempt to - recover from losses, to re-sync with the receiver, or during the - transient period of responding to target rate or spatial resolution - changes. + significant change in target rate, the encoder output frame sizes + sometimes fluctuates for a short, transient period of time before the + output rate converges to the new target. Finally, while most of the + frames in a live session are encoded in predictive mode, the encoder + can occasionally generate a large intra-coded frame (or a frame + partially containing intra-coded blocks) in an attempt to recover + from losses, to re-sync with the receiver, or during the transient + period of responding to target rate or spatial resolution changes. Hence, a synthetic video source should have the following capabilities: o To change bitrate. This includes ability to change framerate and/ or spatial resolution, or to skip frames when required. o To fluctuate around the target bitrate specified by the congestion control module. @@ -192,21 +194,21 @@ Section 6 --- follow the same set of interactions. The synthetic video source dynamically generates a sequence of dummy video frames with varying size and interval. These dummy frames are processed by other modules in order to transmit the video stream over the network. During the lifetime of a video transmission session, the synthetic video source will typically be required to adapt its encoding bitrate, and sometimes the spatial resolution and frame rate. - In our model, the synthetic video source module has a group of + In this model, the synthetic video source module has a group of incoming and outgoing interface calls that allow for interaction with other modules. The following are some of the possible incoming interface calls --- marked as (a) in Figure 1 --- that the synthetic video traffic source may accept. The list is not exhaustive and can be complemented by other interface calls if deemed necessary. o Target rate R_v: target rate request, typically calculated by the congestion control module and updated dynamically over time. Depending on the congestion control algorithm in use, the update requests can either be periodic (e.g., once per second), or on- @@ -272,23 +274,25 @@ +===========+====================================+================+ | Notation | Parameter Name | Example Value | +===========+====================================+================+ | R_v | Target rate request | 1 Mbps | +-----------+------------------------------------+----------------+ | FPS | Target frame rate | 30 Hz | +-----------+------------------------------------+----------------+ | tau_v | Encoder reaction latency | 0.2 s | +-----------+------------------------------------+----------------+ - | K_d | Burst duration during transient | 8 frames | + | K_d | Burst duration of the transient | 8 frames | + | | period | | +-----------+------------------------------------+----------------+ - | K_B | Burst frame size during transient | 13.5 KBytes* | + | K_B | Burst frame size during the | 13.5 KBytes* | + | | transient period | | +-----------+------------------------------------+----------------+ | t0 | Reference frame interval 1/FPS | 33 ms | +-----------+------------------------------------+----------------+ | B0 | Reference frame size R_v/8/FPS | 4.17 KBytes | +-----------+------------------------------------+----------------+ | | Scaling parameter of the zero-mean | | | | Laplacian distribution describing | | | SCALE_t | deviations in normalized frame | 0.15 | | | interval (t-t0)/t0 | | +-----------+------------------------------------+----------------+ @@ -312,30 +316,30 @@ 5.1. Time-damped response to target rate update While the congestion control module can update its target rate request R_v at any time, the statistical model dictates that the encoder will only react to such changes tau_v seconds after a previous rate transition. In other words, when the encoder has reacted to a rate change request at time t, it will simply ignore all subsequent rate change requests until time t+tau_v. -5.2. Temporary burst and oscillation during transient +5.2. Temporary burst and oscillation during the transient period The output rate R_o during the period [t, t+tau_v] is considered to - be in transient. Based on observations from video encoder output - data, the transient behavior of an encoder upon reacting to a new - target rate request is modelled in the form of high variation in - output frame sizes. It is assumed that the overall average output - rate R_o during this period matches the target rate R_v. - Consequently, the occasional burst of large frames are followed by - smaller-than-average encoded frames. + be in a transient state. Based on observations from video encoder + output data, the encoder reaction to a new target rate request can be + characterized by high variation in output frame sizes. It is assumed + in the model that the overall average output rate R_o during this + transient period matches the target rate R_v. Consequently, the + occasional burst of large frames are followed by smaller-than-average + encoded frames. This temporary burst is characterized by two parameters: o burst duration K_d: number of frames in the burst event; and o burst frame size K_B: size of the initial burst frame which is typically significantly larger than average frame size at steady state. It can be noted that these burst parameters can also be used to mimic @@ -424,22 +429,23 @@ are representative of the target use cases for the video traffic model. For the example use case of interactive video conferencing, it is recommended to choose a low-motion sequence that resembles a "talking head", e.g. from a news broadcast or recording of an actual video conferencing call. The length of the chosen video sequence is a tradeoff. If it is too long, it will be difficult to manage the data structures containing the traces. If it is too short, there will be an obvious periodic pattern in the output frame sizes, leading to biased results when - evaluating congestion control performance. In our experience, a - sequence with a length between 2 and 4 minutes is a fair tradeoff. + evaluating congestion control performance. It has been empirically + determined that a sequence with a length between 2 and 4 minutes + strikes a fair tradeoff. Given the chosen raw video sequence, denoted S, one can use a live encoder, e.g. some implementation of [H264] or [HEVC], to produce a set of encoded sequences. As discussed in Section 3, the output bitrate of the live encoder can be achieved by tuning three input parameters: quantization step size, frame rate, and picture resolution. In order to simplify the choice of these parameters for a given target rate, one can typically assume a fixed frame rate (e.g. 30 fps) and a fixed resolution (e.g., 720p) when configuring the live encoder. See Section 6.3 for a discussion on how to relax @@ -555,21 +561,21 @@ factor = R_v / R_min framesize = max(1, factor * Traces[R_min][t_current]) c) R_v >= R_max: the output frame size is calculated by scaling with respect to the highest bitrate R_max: factor = R_v / R_max framesize = factor * Traces[R_max][t_current] - In case b), we set the minimum output size to 1 byte, since the value + In case b), the minimum output size is set to 1 byte, since the value of factor can be arbitrarily close to 0. 6.2.2. Notes to the main algorithm Note that main algorithm as described above can be further extended to mimic some additional typical behaviors of a live video encoder. Two examples are given below: o I-frames on demand: The synthetic codec can be extended to simulate the sending of I-frames on demand, e.g., as a reaction to @@ -632,45 +638,45 @@ whereas it is straightforward for a trace-driven model to obtain encoded frame size data. On the other hand, once validated, the statistical model is more flexible in mimicking a wide range of encoder/content behaviors by simply varying the correponding parameters in the model. In this regard, a trace-driven model relies -- by definition -- on additional data collection efforts for accommodating new codecs or video contents. In general, the trace-driven model is more realistic for mimicking ongoing, steady-state behavior of a video traffic source whereas the - statistical model is more versatile for simulating transient events - (e.g., when target rate changes from A to B with temporary bursts - during the transition). It is also possible to combine both models - into a hybrid approach, using traces during steady-state and - statistical model during transients. + statistical model is more versatile for simulating its transient- + state behavior such as a sudden rate change. It is also possible to + combine both methods into a hybrid model, so that the steady-state + behavior is driven by traces during steady-state and the transient- + state behavior is driven by the statistical model. - +---------------+ - transient | Generate next | + transient +---------------+ + state | Generate next | +------>| K_d transient | +-------------+ / | frames | R_v | Compare | / +---------------+ ------->| against |/ | previous | | target rate |\ +-------------+ \ +---------------+ \ | Generate next | +------>| frame from | - steady-state | trace | - +---------------+ + steady | trace | + state +---------------+ - Figure 3: Hybrid approach for modeling video traffic + Figure 3: A hybrid video traffic model As shown in Figure 3, the video traffic model operates in transient state if the requested target rate R_v is substantially higher than - the previous target, or else it operates in steady state. During + the previous target, or else it operates in steady state. During the transient state, a total of K_d frames are generated by the statistical model, resulting in one (1) big burst frame with size K_B followed by K_d-1 smaller frames. When operating at steady-state, the video traffic model simply generates a frame according to the trace-driven model given the target rate, while modulating the frame interval according to the distribution specified by the statistical model. One example criterion for determining whether the traffic model should operate in transient state is whether the rate increase exceeds 10% of previous target rate. Finally, as this model follows transient state behavior dictated by the statistical model, upon a