draft-ietf-rmcat-video-traffic-model-03.txt | draft-ietf-rmcat-video-traffic-model-04.txt | |||
---|---|---|---|---|
Network Working Group X. Zhu | Network Working Group X. Zhu | |||
Internet-Draft S. Mena | Internet-Draft S. Mena | |||
Intended status: Informational Cisco Systems | Intended status: Informational Cisco Systems | |||
Expires: January 18, 2018 Z. Sarker | Expires: July 22, 2018 Z. Sarker | |||
Ericsson AB | Ericsson AB | |||
July 17, 2017 | January 18, 2018 | |||
Modeling Video Traffic Sources for RMCAT Evaluations | Modeling Video Traffic Sources for RMCAT Evaluations | |||
draft-ietf-rmcat-video-traffic-model-03 | draft-ietf-rmcat-video-traffic-model-04 | |||
Abstract | Abstract | |||
This document describes two reference video traffic source models for | This document describes two reference video traffic source models for | |||
evaluating RMCAT candidate algorithms. The first model statistically | evaluating RMCAT candidate algorithms. The first model statistically | |||
characterizes the behavior of a live video encoder in response to | characterizes the behavior of a live video encoder in response to | |||
changing requests on target video rate. The second model is trace- | changing requests on target video rate. The second model is trace- | |||
driven, and emulates the encoder output by scaling the pre-encoded | driven, and emulates the encoder output based on actual encoded video | |||
video frame sizes from a widely used video test sequence. Both | frame sizes from a high-resolution test sequence. Both models are | |||
models are designed to strike a balance between simplicity, | designed to strike a balance between simplicity, repeatability, and | |||
repeatability, and authenticity in modeling the interactions between | authenticity in modeling the interactions between a live video | |||
a video traffic source and the congestion control module. | traffic source and the congestion control module. Finally, the | |||
document describes how both approaches can be combined into a hybrid | ||||
model. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 18, 2018. | This Internet-Draft will expire on July 22, 2018. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2017 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
skipping to change at page 2, line 25 ¶ | skipping to change at page 2, line 29 ¶ | |||
3. Desired Behavior of A Synthetic Video Traffic Model . . . . . 3 | 3. Desired Behavior of A Synthetic Video Traffic Model . . . . . 3 | |||
4. Interactions Between Synthetic Video Traffic Source and | 4. Interactions Between Synthetic Video Traffic Source and | |||
Other Components at the Sender . . . . . . . . . . . . . . . 4 | Other Components at the Sender . . . . . . . . . . . . . . . 4 | |||
5. A Statistical Reference Model . . . . . . . . . . . . . . . . 6 | 5. A Statistical Reference Model . . . . . . . . . . . . . . . . 6 | |||
5.1. Time-damped response to target rate update . . . . . . . 7 | 5.1. Time-damped response to target rate update . . . . . . . 7 | |||
5.2. Temporary burst and oscillation during transient . . . . 8 | 5.2. Temporary burst and oscillation during transient . . . . 8 | |||
5.3. Output rate fluctuation at steady state . . . . . . . . . 8 | 5.3. Output rate fluctuation at steady state . . . . . . . . . 8 | |||
5.4. Rate range limit imposed by video content . . . . . . . . 9 | 5.4. Rate range limit imposed by video content . . . . . . . . 9 | |||
6. A Trace-Driven Model . . . . . . . . . . . . . . . . . . . . 9 | 6. A Trace-Driven Model . . . . . . . . . . . . . . . . . . . . 9 | |||
6.1. Choosing the video sequence and generating the traces . . 10 | 6.1. Choosing the video sequence and generating the traces . . 10 | |||
6.2. Using the traces in the syntethic codec . . . . . . . . . 11 | 6.2. Using the traces in the synthetic codec . . . . . . . . . 11 | |||
6.2.1. Main algorithm . . . . . . . . . . . . . . . . . . . 11 | 6.2.1. Main algorithm . . . . . . . . . . . . . . . . . . . 11 | |||
6.2.2. Notes to the main algorithm . . . . . . . . . . . . . 13 | 6.2.2. Notes to the main algorithm . . . . . . . . . . . . . 13 | |||
6.3. Varying frame rate and resolution . . . . . . . . . . . . 13 | 6.3. Varying frame rate and resolution . . . . . . . . . . . . 13 | |||
7. Combining The Two Models . . . . . . . . . . . . . . . . . . 14 | 7. Combining The Two Models . . . . . . . . . . . . . . . . . . 14 | |||
8. Implementation Status . . . . . . . . . . . . . . . . . . . . 15 | 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 15 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . 16 | 10.1. Normative References . . . . . . . . . . . . . . . . . . 16 | |||
10.2. Informative References . . . . . . . . . . . . . . . . . 16 | 10.2. Informative References . . . . . . . . . . . . . . . . . 16 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
1. Introduction | 1. Introduction | |||
When evaluating candidate congestion control algorithms designed for | When evaluating candidate congestion control algorithms designed for | |||
real-time interactive media, it is important to account for the | real-time interactive media, it is important to account for the | |||
characteristics of traffic patterns generated from a live video | characteristics of traffic patterns generated from a live video | |||
skipping to change at page 3, line 15 ¶ | skipping to change at page 3, line 17 ¶ | |||
On the other hand, evaluation results of a candidate RMCAT algorithm | On the other hand, evaluation results of a candidate RMCAT algorithm | |||
should mostly reflect performance of the congestion control module, | should mostly reflect performance of the congestion control module, | |||
and somewhat decouple from peculiarities of any specific video codec. | and somewhat decouple from peculiarities of any specific video codec. | |||
It is also desirable that evaluation tests are repeatable, and be | It is also desirable that evaluation tests are repeatable, and be | |||
easily duplicated across different candidate algorithms. | easily duplicated across different candidate algorithms. | |||
One way to strike a balance between the above considerations is to | One way to strike a balance between the above considerations is to | |||
evaluate RMCAT algorithms using a synthetic video traffic source | evaluate RMCAT algorithms using a synthetic video traffic source | |||
model that captures key characteristics of the behavior of a live | model that captures key characteristics of the behavior of a live | |||
video encoder. To this end, this draft presents two reference | video encoder. To this end, this draft presents two reference | |||
models. The first is based on statistical modelling; the second is | models. The first is based on statistical modeling; the second is | |||
trace-driven. The draft also discusses the pros and cons of each | trace-driven. The draft also discusses the pros and cons of each | |||
approach, as well as how both approaches can be combined. | approach, as well as how both approaches can be combined into a | |||
hybrid model. | ||||
2. Terminology | 2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described RFC2119 [RFC2119]. | document are to be interpreted as described RFC2119 [RFC2119]. | |||
3. Desired Behavior of A Synthetic Video Traffic Model | 3. Desired Behavior of A Synthetic Video Traffic Model | |||
A live video encoder employs encoder rate control to meet a target | A live video encoder employs encoder rate control to meet a target | |||
skipping to change at page 4, line 31 ¶ | skipping to change at page 4, line 37 ¶ | |||
o Statistical resemblance: The synthetic traffic should match the | o Statistical resemblance: The synthetic traffic should match the | |||
outcome of the real video encoder in terms of statistical | outcome of the real video encoder in terms of statistical | |||
characteristics, such as the mean, variance, peak, and | characteristics, such as the mean, variance, peak, and | |||
autocorrelation coefficients of the bitrate. It is also important | autocorrelation coefficients of the bitrate. It is also important | |||
that the statistical resemblance should hold across different time | that the statistical resemblance should hold across different time | |||
scales, ranging from tens of milliseconds to sub-seconds. | scales, ranging from tens of milliseconds to sub-seconds. | |||
o Wide range of coverage: The model should be easily configurable to | o Wide range of coverage: The model should be easily configurable to | |||
cover a wide range of codec behaviors (e.g., with either fast or | cover a wide range of codec behaviors (e.g., with either fast or | |||
slow reaction time in live encoder rate control) and video content | slow reaction time in live encoder rate control) and video content | |||
variations (e.g, ranging from high-motion to low-motion). | variations (e.g., ranging from high-motion to low-motion). | |||
These distinct behavior features can be characterized via simple | These distinct behavior features can be characterized via simple | |||
statistical models, or a trace-driven approach. We present an | statistical modelling, or a trace-driven approach. Section 5 and | |||
example of each in Section 5 and Section 6 | Section 6 provide an example of each approach, respectively. | |||
Section 7 discusses how both models can be combined together. | ||||
4. Interactions Between Synthetic Video Traffic Source and Other | 4. Interactions Between Synthetic Video Traffic Source and Other | |||
Components at the Sender | Components at the Sender | |||
Figure 1 depitcs the interactions of the synthetic video encoder with | Figure 1 depicts the interactions of the synthetic video encoder with | |||
other components at the sender, such as the application, the | other components at the sender, such as the application, the | |||
congestion control module, the media packet transport module, etc. | congestion control module, the media packet transport module, etc. | |||
Both reference models, as described later in Section 5 and Section 6, | Both reference models, as described later in Section 5 and Section 6, | |||
follow the same set of interactions. | follow the same set of interactions. | |||
The synthetic video encoder takes in raw video frames captured by the | The synthetic video encoder takes in raw video frames captured by the | |||
camera and then dynamically generates a sequence of encoded video | camera and then dynamically generates a sequence of encoded video | |||
frames with varying size and interval. These encoded frames are | frames with varying size and interval. These encoded frames are | |||
processed by other modules in order to transmit the video stream over | processed by other modules in order to transmit the video stream over | |||
the network. During the lifetime of a video transmission session, | the network. During the lifetime of a video transmission session, | |||
skipping to change at page 6, line 28 ¶ | skipping to change at page 6, line 33 ¶ | |||
| | | | | | |||
-------------------+ +--------------------> | -------------------+ +--------------------> | |||
interface from interface to | interface from interface to | |||
other modules (a) other modules (b) | other modules (a) other modules (b) | |||
Figure 1: Interaction between synthetic video encoder and other | Figure 1: Interaction between synthetic video encoder and other | |||
modules at the sender | modules at the sender | |||
5. A Statistical Reference Model | 5. A Statistical Reference Model | |||
In this section, we describe one simple statistical model of the live | This section describes one simple statistical model of the live video | |||
video encoder traffic source. Figure 2 summarizes the list of | encoder traffic source. Figure 2 summarizes the list of tunable | |||
tunable parameters in this statistical model. A more comprehensive | parameters in this statistical model. A more comprehensive survey of | |||
survey of popular methods for modelling video traffic source behavior | popular methods for modeling video traffic source behavior can be | |||
can be found in [Tanwir2013]. | found in [Tanwir2013]. | |||
+==============+====================================+================+ | +==============+====================================+================+ | |||
| Notation | Parameter Name | Example Value | | | Notation | Parameter Name | Example Value | | |||
+==============+====================================+================+ | +==============+====================================+================+ | |||
| R_v | Target rate request to encoder | 1 Mbps | | | R_v | Target rate request to encoder | 1 Mbps | | |||
+--------------+------------------------------------+----------------+ | +--------------+------------------------------------+----------------+ | |||
| FPS | Target frame rate of encoder output| 30 Hz | | | FPS | Target frame rate of encoder output| 30 Hz | | |||
+--------------+------------------------------------+----------------+ | +--------------+------------------------------------+----------------+ | |||
| tau_v | Encoder reaction latency | 0.2 s | | | tau_v | Encoder reaction latency | 0.2 s | | |||
+--------------+------------------------------------+----------------+ | +--------------+------------------------------------+----------------+ | |||
skipping to change at page 7, line 48 ¶ | skipping to change at page 7, line 48 ¶ | |||
* Example value of K_B for a video stream encoded at 720p and 30 frames | * Example value of K_B for a video stream encoded at 720p and 30 frames | |||
per second, using H.264/AVC encoder. | per second, using H.264/AVC encoder. | |||
Figure 2: List of tunable parameters in a statistical video traffic | Figure 2: List of tunable parameters in a statistical video traffic | |||
source model. | source model. | |||
5.1. Time-damped response to target rate update | 5.1. Time-damped response to target rate update | |||
While the congestion control module can update its target rate | While the congestion control module can update its target rate | |||
request R_v at any time, our model dictates that the encoder will | request R_v at any time, the statistical model dictates that the | |||
only react to such changes after tau_v seconds from a previous rate | encoder will only react to such changes tau_v seconds after a | |||
transition. In other words, when the encoder has reacted to a rate | previous rate transition. In other words, when the encoder has | |||
change request at time t, it will simply ignore all subsequent rate | reacted to a rate change request at time t, it will simply ignore all | |||
change requests until time t+tau_v. | subsequent rate change requests until time t+tau_v. | |||
5.2. Temporary burst and oscillation during transient | 5.2. Temporary burst and oscillation during transient | |||
The output rate R_o during the period [t, t+tau_v] is considered to | The output rate R_o during the period [t, t+tau_v] is considered to | |||
be in transient. Based on observations from video encoder output | be in transient. Based on observations from video encoder output | |||
data, we model the transient behavior of an encoder upon reacting to | data, the transient behavior of an encoder upon reacting to a new | |||
a new target rate request in the form of high variation in output | target rate request is modelled in the form of high variation in | |||
frame sizes. It is assumed that the overall average output rate R_o | output frame sizes. It is assumed that the overall average output | |||
during this period matches the target rate R_v. Consequently, the | rate R_o during this period matches the target rate R_v. | |||
occasional burst of large frames are followed by smaller-than average | Consequently, the occasional burst of large frames are followed by | |||
encoded frames. | smaller-than-average encoded frames. | |||
This temporary burst is characterized by two parameters: | This temporary burst is characterized by two parameters: | |||
o burst duration K_d: number of frames in the burst event; and | o burst duration K_d: number of frames in the burst event; and | |||
o burst frame size K_B: size of the initial burst frame which is | o burst frame size K_B: size of the initial burst frame which is | |||
typically significantly larger than average frame size at steady | typically significantly larger than average frame size at steady | |||
state. | state. | |||
It can be noted that these burst parameters can also be used to mimic | It can be noted that these burst parameters can also be used to mimic | |||
the insertion of a large on-demand I frame in the presence of severe | the insertion of a large on-demand I frame in the presence of severe | |||
packet losses. The values of K_d and K_B typically depend on the | packet losses. The values of K_d and K_B typically depend on the | |||
type of video codec, spatial and temporal resolution of the encoded | type of video codec, spatial and temporal resolution of the encoded | |||
stream, as well as the video content activity level. | stream, as well as the video content activity level. | |||
5.3. Output rate fluctuation at steady state | 5.3. Output rate fluctuation at steady state | |||
We model output rate R_o during steady state as randomly fluctuating | The output rate R_o during steady state is modelled as randomly | |||
around the target rate R_v. The output traffic can be characterized | fluctuating around the target rate R_v. The output traffic can be | |||
as the combination of two random processes denoting the frame | characterized as the combination of two random processes denoting the | |||
interval t and output frame size B over time. These two random | frame interval t and output frame size B over time. These two random | |||
processes capture two sources of variations in the encoder output: | processes capture two sources of variations in the encoder output: | |||
o Fluctuations in frame interval: the intervals between adjacent | o Fluctuations in frame interval: the intervals between adjacent | |||
frames have been observed to fluctuate around the reference | frames have been observed to fluctuate around the reference | |||
interval of t0 = 1/FPS. Deviations in normalized frame interval | interval of t0 = 1/FPS. Deviations in normalized frame interval | |||
DELTA_t = (t-t0)/t0 can be modelled by a zero-mean Laplacian | DELTA_t = (t-t0)/t0 can be modelled by a zero-mean Laplacian | |||
distribution with scaling parameter SCALE_t. The value of SCALE_t | distribution with scaling parameter SCALE_t. The value of SCALE_t | |||
dictates the "width" of the Laplacian distribution and therefore | dictates the "width" of the Laplacian distribution and therefore | |||
the amount of fluctuations in actual frame intervals (t) with | the amount of fluctuations in actual frame intervals (t) with | |||
respect to the reference t0. | respect to the reference frame interval t0. | |||
o Fluctuations in frame size: size of the output encoded frames also | o Fluctuations in frame size: size of the output encoded frames also | |||
tend to fluctuate around the reference frame size B0=R_v/8/FPS. | tend to fluctuate around the reference frame size B0=R_v/8/FPS. | |||
Likewise, deviations in the normalized frame size DELTA_B = | Likewise, deviations in the normalized frame size DELTA_B = | |||
(B-B0)/B0 can be modelled by a zero-mean Laplacian distribution | (B-B0)/B0 can be modelled by a zero-mean Laplacian distribution | |||
with scaling parameter SCALE_B. The value of SCALE_B dictates the | with scaling parameter SCALE_B. The value of SCALE_B dictates the | |||
"width" of this second Laplacian distribution and correspondingly | "width" of this second Laplacian distribution and correspondingly | |||
the amount of fluctuations in output frame sizes (B) with respect | the amount of fluctuations in output frame sizes (B) with respect | |||
to the reference target B0. | to the reference target B0. | |||
Both values of SCALE_t and SCALE_B can be obtained via parameter | Both values of SCALE_t and SCALE_B can be obtained via parameter | |||
fitting from empirical data captured for a given video encoder. | fitting from empirical data captured for a given video encoder. | |||
Example values are listed in Figure 2 based on empirical data | Example values are listed in Figure 2 based on empirical data | |||
presented in [IETF-Interim]. | presented in [IETF-Interim]. | |||
5.4. Rate range limit imposed by video content | 5.4. Rate range limit imposed by video content | |||
The output rate R_o is further clipped within the dynamic range | The output rate R_o is further clipped within the dynamic range | |||
[R_min, R_max], which in reality are dictated by scene and motion | [R_min, R_max], which in reality are dictated by scene and motion | |||
complexity of the captured video content. In our model, these | complexity of the captured video content. In the proposed | |||
parameters are specified by the application. | statistical model, these parameters are specified by the application. | |||
6. A Trace-Driven Model | 6. A Trace-Driven Model | |||
We now present the second approach to model a video traffic source. | The second approach for modelling a video traffic source is trace- | |||
This approach is based on running an actual live video encoder on a | driven. This can be achieved by running an actual live video encoder | |||
set of chosen raw video sequences and using the encoder's output | on a set of chosen raw video sequences and using the encoder's output | |||
traces for constructing a synthetic live encoder. With this | traces for constructing a synthetic live encoder. With this | |||
approach, the recorded video traces naturally exhibit temporal | approach, the recorded video traces naturally exhibit temporal | |||
fluctuations around a given target rate request R_v from the | fluctuations around a given target rate request R_v from the | |||
congestion control module. | congestion control module. | |||
The following list summarizes the main steps of this approach: | The following list summarizes the main steps of this approach: | |||
1) Choose one or more representative raw video sequences. | 1. Choose one or more representative raw video sequences. | |||
2) Encode the sequence(s) using an actual live video encoder. Repeat | 2. Encode the sequence(s) using an actual live video encoder. | |||
the process for a number of bitrates. Keep only the sequence of | Repeat the process for a number of bitrates. Keep only the | |||
frame sizes for each bitrate. | sequence of frame sizes for each bitrate. | |||
3) Construct a data structure that contains the output of the | 3. Construct a data structure that contains the output of the | |||
previous step. The data structure should allow for easy bitrate | previous step. The data structure should allow for easy bitrate | |||
lookup. | lookup. | |||
4) Upon a target bitrate request R_v from the controller, look up the | 4. Upon a target bitrate request R_v from the controller, look up | |||
closest bitrates among those previously stored. Use the frame size | the closest bitrates among those previously stored. Use the | |||
sequences stored for those bitrates to approximate the frame sizes to | frame size sequences stored for those bitrates to approximate the | |||
output. | frame sizes to output. | |||
5) The output of the synthetic encoder contains "encoded" frames with | 5. The output of the synthetic encoder contains "encoded" frames | |||
zeros as contents but with realistic sizes. | with zeros as contents but with realistic sizes. | |||
Section 6.1 explains steps 1), 2), and 3), Section 6.2 elaborates on | In the following, Section 6.1 explains the first three steps (1-3), | |||
steps 4) and 5). Finally, Section 6.3 briefly discusses the | Section 6.2 elaborates on the remaining two steps (4-5). Finally, | |||
possibility to extend the model for supporting variable frame rate | Section 6.3 briefly discusses the possibility to extend the trace- | |||
and/or variable frame resolution. | driven model for supporting time-varying frame rate and/or time- | |||
varying frame resolution. | ||||
6.1. Choosing the video sequence and generating the traces | 6.1. Choosing the video sequence and generating the traces | |||
The first step we need to perform is a careful choice of a set of | The first step is a careful choice of a set of video sequences that | |||
video sequences that are representative of the use cases we want to | are representative of the target use cases for the video traffic | |||
model. Our use case here is video conferencing, so we must choose a | model. For the example use case of interactive video conferencing, | |||
low-motion sequence that resembles a "talking head", for instance a | it is recommended to choose a low-motion sequence that resembles a | |||
news broadcast or a video capture of an actual conference call. | "talking head", e.g. from a news broadcast or recording of an actual | |||
video conferencing call. | ||||
The length of the chosen video sequence is a tradeoff. If it is too | The length of the chosen video sequence is a tradeoff. If it is too | |||
long, it will be difficult to manage the data structures containing | long, it will be difficult to manage the data structures containing | |||
the traces. If it is too short, there will be an obvious periodic | the traces. If it is too short, there will be an obvious periodic | |||
pattern in the output frame sizes, leading to biased results when | pattern in the output frame sizes, leading to biased results when | |||
evaluating congestion controller performance. In our experience, a | evaluating congestion control performance. In our experience, a | |||
sequence whose length is between 2 and 4 minutes is a fair tradeoff. | sequence with a length between 2 and 4 minutes is a fair tradeoff. | |||
Once we have chosen the raw video sequence, denoted S, we use a live | Given the chosen raw video sequence, denoted S, one can use a live | |||
encoder, e.g. [H264] or [HEVC] to produce a set of encoded | encoder, e.g. some implementation of [H264] or [HEVC], to produce a | |||
sequences. As discussed in Section 3, a live encoder's output | set of encoded sequences. As discussed in Section 3, the output | |||
bitrate can be tuned by varying three input parameters, namely, | bitrate of the live encoder can be achieved by tuning three input | |||
quantization step size, frame rate, and picture resolution. In order | parameters: quantization step size, frame rate, and picture | |||
to simplify the choice of these parameters for a given target rate, | resolution. In order to simplify the choice of these parameters for | |||
we assume a fixed frame rate (e.g. 30 fps) and a fixed resolution | a given target rate, one can typically assume a fixed frame rate | |||
(e.g., 720p). See section 6.3 for a discussion on how to relax these | (e.g. 30 fps) and a fixed resolution (e.g., 720p) when configuring | |||
assumptions. | the live encoder. See Section 6.3 for a discussion on how to relax | |||
these assumptions. | ||||
Following these simplifications, we run the chosen encoder by setting | Following these simplifications, the chosen encoder can be configured | |||
a constant target bitrate at the beginning, then letting the encoder | to start at a constant target bitrate, then vary the quantization | |||
vary the quantization step size internally while encoding the input | step size (internally via the video encoder rate controller) to meet | |||
video sequence. Besides, we assume that the first frame is encoded | various externally specified target rates. It can be further assumed | |||
as an I-frame and the rest are P-frames. We further assume that the | the first frame is encoded as an I-frame and the rest are P-frames. | |||
encoder algorithm does not use knowledge of frames in the future when | For live encoding, the encoder rate control algorithm typically does | |||
encoding a given frame. | not use knowledge of frames in the future when encoding a given | |||
frame. | ||||
Given R_min and R_max, which are the minimum and maximum bitrates at | Given the minimum and maximum bitrates at which the synthetic codec | |||
which the synthetic codec is to operate (see Section 4), we divide | is to operate (denoted as R_min and R_max, see Section 4), the entire | |||
the bitrate range between R_min and R_max in n_s + 1 bitrate steps of | range of target bitrates can be divided into n_s + 1 bitrate steps of | |||
length l = (R_max - R_min) / n_s. We then use the following simple | length l = (R_max - R_min) / n_s. The following simple algorithm is | |||
algorithm to encode the raw video sequence. | used to encode the raw video sequence. | |||
r = R_min | r = R_min | |||
while r <= R_max do | while r <= R_max do | |||
Traces[r] = encode_sequence(S, r, e) | Traces[r] = encode_sequence(S, r, e) | |||
r = r + l | r = r + l | |||
where function encode_sequence takes as parameters, respectively, a | The function encode_sequence takes as input parameters, respectively, | |||
raw video sequence, a constant target rate, and an encoder algorithm; | a raw video sequence (S), a constant target rate (r), and an encoder | |||
it returns a vector with the sizes of frames in the order they were | rate control algorithm (e); it returns a vector with the sizes of | |||
encoded. The output vector is stored in a map structure called | frames in the order they were encoded. The output vector is stored | |||
Traces, whose keys are bitrates and whose values are vectors of frame | in a map structure called Traces, whose keys are bitrates and whose | |||
sizes. | values are vectors of frame sizes. | |||
The choice of a value for n_s is important, as it determines the | The choice of a value for n_s is important, as it determines the | |||
number of vectors of frame sizes stored in map Traces. The minimum | number of vectors of frame sizes stored in the map Traces. The | |||
value one can choose for n_s is 1, and its maximum value depends on | minimum value one can choose for n_s is 1, and the maximum value | |||
the amount of memory available for holding the map Traces. A | depends on the amount of memory available for holding the map Traces. | |||
reasonable value for n_s is one that makes the steps' length l = 200 | A reasonable value for n_s is one that results in steps of length l = | |||
kbps. We will further discuss step length l in the next section. | 200 kbps. The next section will discuss further the choice of the | |||
step length l. | ||||
Finally, note that, as mentioned in previous sections, R_min and | Finally, note that, as mentioned in previous sections, R_min and | |||
R_max may be modified after the initial sequences are encoded. | R_max may be modified after the initial sequences are encoded. | |||
Hence, the algorithm described in the next section also covers the | Hence, the algorithm described in the next section also covers the | |||
cases when the current target bitrate is less than R_min, or greater | cases when the current target bitrate is less than R_min, or greater | |||
than R_max. | than R_max. | |||
6.2. Using the traces in the syntethic codec | 6.2. Using the traces in the synthetic codec | |||
The main idea behind the trace-driven synthetic codec is that it | The main idea behind the trace-driven synthetic codec is that it | |||
mimics a real live codec's rate adaptation when the congestion | mimics the rate adaptation behavior of a real live codec upon dynamic | |||
controller updates the target rate R_v dynamically. It does so by | updates of the target rate R_v by the congestion control module. It | |||
switching to a different frame size vector stored in the map Traces | does so by switching to a different frame size vector stored in the | |||
when needed. | map Traces when needed. | |||
6.2.1. Main algorithm | 6.2.1. Main algorithm | |||
We maintain two variables r_current and t_current: | The main algorithm for rate adaptation in the synthetic codec | |||
maintains two variables: r_current and t_current. | ||||
* r_current points to one of the keys of map Traces. Upon a change | o The variable r_current points to one of the keys of map Traces. | |||
in the value of R_v, typically because the congestion controller | Upon a change in the value of R_v, typically because the | |||
detects that the network conditions have changed, r_current is | congestion controller detects that the network conditions have | |||
updated to the greatest key in Traces that is less than or equal to | changed, r_current is updated to the greatest key in Traces that | |||
the new value of R_v. For the moment, we assume the value of R_v to | is less than or equal to the new value of R_v. It is assumed that | |||
be clipped in the range [R_min, R_max]. | the value of R_v is clipped within the range [R_min, R_max]. | |||
r_current = r | r_current = r | |||
such that | such that | |||
( r in keys(Traces) and | ( r in keys(Traces) and | |||
r <= R_v and | r <= R_v and | |||
(not(exists) r' in keys(Traces) such that r < r' <= R_v) ) | (not(exists) r' in keys(Traces) such that r < r' <= R_v) ) | |||
* t_current is an index to the frame size vector stored in | o The variable t_current is an index to the frame size vector stored | |||
Traces[r_current]. It is updated every time a new frame is due. We | in Traces[r_current]. It is updated every time a new frame is | |||
assume all vectors stored in Traces to have the same size, denoted | due. It is assumed that all vectors stored Traces to have the | |||
size_traces. The following equation governs the update of t_current: | same size, denoted as size_traces. The following equation governs | |||
the update of t_current: | ||||
if t_current < SkipFrames then | if t_current < SkipFrames then | |||
t_current = t_current + 1 | t_current = t_current + 1 | |||
else | else | |||
t_current = ((t_current+1-SkipFrames) % (size_traces- SkipFrames)) | t_current = ((t_current+1-SkipFrames) % (size_traces-SkipFrames)) | |||
+ SkipFrames | + SkipFrames | |||
where operator % denotes modulo, and SkipFrames is a predefined | where operator % denotes modulo, and SkipFrames is a predefined | |||
constant that denotes the number of frames to be skipped at the | constant that denotes the number of frames to be skipped at the | |||
beginning of frame size vectors after t_current has wrapped around. | beginning of frame size vectors after t_current has wrapped around. | |||
The point of constant SkipFrames is avoiding the effect of | The point of constant SkipFrames is avoiding the effect of | |||
periodically sending a (big) I-frame followed by several smaller- | periodically sending a large I-frame followed by several smaller- | |||
than-normal P-frames. We typically set SkipFrames to 20, although it | than-average P-frames. A typical value of SkipFrames is 20, although | |||
could be set to 0 if we are interested in studying the effect of | it could be set to 0 if one is interested in studying the effect of | |||
sending I-frames periodically. | sending I-frames periodically. | |||
We initialize r_current to R_min, and t_current to 0. | The initial value of r_current is set to R_min, and the initial value | |||
of t_current set to 0. | ||||
When a new frame is due, we need to calculate its size. There are | When a new frame is due, its size can be calculated following one of | |||
three cases: | the three cases below: | |||
a) R_min <= R_v < Rmax: In this case we use linear interpolation of | a) R_min <= R_v < Rmax: the output frame size is calculated via | |||
the frame sizes appearing in Traces[r_current] and | linear interpolation of the frame sizes appearing in | |||
Traces[r_current + l]. The interpolation is done as follows: | Traces[r_current] and Traces[r_current + l]. The interpolation is | |||
done as follows: | ||||
size_lo = Traces[r_current][t_current] | size_lo = Traces[r_current][t_current] | |||
size_hi = Traces[r_current + l][t_current] | size_hi = Traces[r_current + l][t_current] | |||
distance_lo = ( R_v - r_current ) / l | distance_lo = ( R_v - r_current ) / l | |||
framesize = size_hi * distance_lo + size_lo * (1 - distance_lo) | framesize = size_hi * distance_lo + size_lo * (1 - distance_lo) | |||
b) R_v < R_min: In this case, we scale the trace sequence with the | b) R_v < R_min: the output frame size is calculated via scaling with | |||
lowest bitrate, in the following way: | respect to the lowest bitrate R_min, as follows: | |||
factor = R_v / R_min | factor = R_v / R_min | |||
framesize = max(1, factor * Traces[R_min][t_current]) | framesize = max(1, factor * Traces[R_min][t_current]) | |||
c) R_v >= R_max: We also use scaling for this case. We use the | c) R_v >= R_max: the output frame size is calculated by scaling with | |||
trace sequence with the greatest bitrate: | respect to the highest bitrate R_max: | |||
factor = R_v / R_max | factor = R_v / R_max | |||
framesize = factor * Traces[R_max][t_current] | framesize = factor * Traces[R_max][t_current] | |||
In case b), we set the minimum to 1 byte, since the value of factor | In case b), we set the minimum output size to 1 byte, since the value | |||
can be arbitrarily close to 0. | of factor can be arbitrarily close to 0. | |||
6.2.2. Notes to the main algorithm | 6.2.2. Notes to the main algorithm | |||
* Reacting to changes in target bitrate. Similarly to the | Note that main algorithm as described above can be further extended | |||
statistical model presented in Section 5, the trace-driven synthetic | to mimic some additional typical behaviors of a live encoder. Two | |||
codec can have a time bound, tau_v, to reacting to target bitrate | examples are given below: | |||
changes. If the codec has reacted to an update in R_v at time t, it | ||||
will delay any further update to R_v to time t + tau_v. Note that, | ||||
in any case, the value of tau_v cannot be chosen shorter than the | ||||
time between frames, i.e. the inverse of the frame rate. | ||||
* I-frames on demand. The synthetic codec could be extended to | o I-frames on demand: The synthetic codec can be extended to | |||
simulate the sending of I-frames on demand, e.g., as a reaction to | simulate the sending of I-frames on demand, e.g., as a reaction to | |||
losses. To implement this extension, the codec's API is augmented | losses. To implement this extension, the codec's incoming | |||
with a new function to request a new I-frame. Upon calling such | interface (see (a) in Figure 1) is augmented with a new function | |||
function, t_current is reset to 0. | to request a new I-frame. Upon calling such function, t_current | |||
is reset to 0. | ||||
* Variable length l of steps defined between R_min and R_max. In the | o Variable step length l between R_min and R_max: In the main | |||
main algorithm's description, the step length l is fixed. However, | algorithm, the step length l is fixed for ease of explanation. | |||
if the range [R_min, R_max] is very wide, it is also possible to | However, if the range [R_min, R_max] is very wide, it is also | |||
define a set of steps with a non-constant length. The idea behind | possible to define a set of intermediate encoding rates with | |||
this modification is that the difference between 400 kbps and 600 | variable step length. The rationale behind this modification is | |||
kbps as bitrate is much more important than the difference between | that the difference between 400 kbps and 600 kbps as target | |||
4400 kbps and 4600 kbps. For example, one could define steps of | bitrate is much more significant than the difference between 4400 | |||
length 200 Kbps under 1 Mbps, then steps of length 300 kbps between 1 | kbps and 4600 kbps. For example, one could define steps of length | |||
Mbps and 2 Mbps; 400 kbps between 2 Mbps and 3 Mbps, and so on. | 200 Kbps under 1 Mbps, then steps of length 300 Kbps between 1 | |||
Mbps and 2 Mbps; 400 Kbps between 2 Mbps and 3 Mbps, and so on. | ||||
6.3. Varying frame rate and resolution | 6.3. Varying frame rate and resolution | |||
The trace-driven synthetic codec model explained in this section is | The trace-driven synthetic codec model explained in this section is | |||
relatively simple because we have fixed the frame rate and the frame | relatively simple due to fixed frame rate and frame resolution. The | |||
resolution. The model could be extended to have variable frame rate, | model can extended further to accommodate variable frame rate and/or | |||
variable spatial resolution, or both. | variable spatial resolution. | |||
When the encoded picture quality at a given bitrate is low, one can | When the encoded picture quality at a given bitrate is low, one can | |||
potentially decrease the frame rate (if the video sequence is | potentially decrease either the frame rate (if the video sequence is | |||
currently in low motion) or the spatial resolution in order to | currently in low motion) or the spatial resolution in order to | |||
improve quality-of-experince (QoE) in the overall encoded video. On | improve quality-of-experince (QoE) in the overall encoded video. On | |||
the other hand, if target bitrate increases to a point where there is | the other hand, if target bitrate increases to a point where there is | |||
no longer a perceptible improvement in the picture quality of | no longer a perceptible improvement in the picture quality of | |||
individual frames, then one might afford to increase the spatial | individual frames, then one might afford to increase the spatial | |||
resolution or the frame rate (useful if the video is currently in | resolution or the frame rate (useful if the video is currently in | |||
high motion). | high motion). | |||
Many techniques have been proposed to choose over time the best | Many techniques have been proposed to choose over time the best | |||
combination of encoder quatization step size, frame rate, and spatial | combination of encoder quatization step size, frame rate, and spatial | |||
skipping to change at page 14, line 12 ¶ | skipping to change at page 14, line 22 ¶ | |||
[Ozer2011][Hu2010]. Future work may consider extending the trace- | [Ozer2011][Hu2010]. Future work may consider extending the trace- | |||
driven codec to accommodate variable frame rate and/or resolution. | driven codec to accommodate variable frame rate and/or resolution. | |||
From the perspective of congestion control, varying the spatial | From the perspective of congestion control, varying the spatial | |||
resolution typically requires a new intra-coded frame to be | resolution typically requires a new intra-coded frame to be | |||
generated, thereby incurring a temporary burst in the output traffic | generated, thereby incurring a temporary burst in the output traffic | |||
pattern. The impact of frame rate change tends to be more subtle: | pattern. The impact of frame rate change tends to be more subtle: | |||
reducing frame rate from high to low leads to sparsely spaced larger | reducing frame rate from high to low leads to sparsely spaced larger | |||
encoded packets instead of many densely spaced smaller packets. Such | encoded packets instead of many densely spaced smaller packets. Such | |||
difference in traffic profiles may still affect the performance of | difference in traffic profiles may still affect the performance of | |||
congestion control, especially when outgoing packets are not paced at | congestion control, especially when outgoing packets are not paced by | |||
the transport module. We leave the investigation of varying frame | the media transport module. Investigation of varying frame rate and | |||
rate to future work. | resolution are left for future work. | |||
7. Combining The Two Models | 7. Combining The Two Models | |||
It is worthwhile noting that the statistical and trace-driven models | It is worthwhile noting that the statistical and trace-driven models | |||
each has its own advantages and drawbacks. While both models are | each has its own advantages and drawbacks. Both models are fairly | |||
fairly simple to implement, it takes significantly greater effort to | simple to implement. It takes significantly greater effort to fit | |||
fit the parameters of a statistical model to actual encoder output | the parameters of a statistical model to actual encoder output data | |||
data whereas it is straightforward for a trace-driven model to obtain | whereas it is straightforward for a trace-driven model to obtain | |||
encoded frame size data. On the other hand, once validated, the | encoded frame size data. On the other hand, once validated, the | |||
statistical model is more flexible in mimicking a wide range of | statistical model is more flexible in mimicking a wide range of | |||
encoder/content behaviors by simply varying the correponding | encoder/content behaviors by simply varying the correponding | |||
parameters in the model. In this regard, a trace-driven model relies | parameters in the model. In this regard, a trace-driven model relies | |||
-- by definition -- on additional data collection efforts for | -- by definition -- on additional data collection efforts for | |||
accommodating new codecs or video contents. | accommodating new codecs or video contents. | |||
In general, the trace-driven model is more realistic for mimicking | In general, the trace-driven model is more realistic for mimicking | |||
ongoing, steady-state behavior of a video traffic source whereas the | ongoing, steady-state behavior of a video traffic source whereas the | |||
statistical model is more versatile for simulating transient events | statistical model is more versatile for simulating transient events | |||
skipping to change at page 15, line 25 ¶ | skipping to change at page 15, line 25 ¶ | |||
+------>| frame from | | +------>| frame from | | |||
steady-state | trace | | steady-state | trace | | |||
+---------------+ | +---------------+ | |||
Figure 3: Hybrid approach for modeling video traffic | Figure 3: Hybrid approach for modeling video traffic | |||
As shown in Figure 3, the video traffic model operates in transient | As shown in Figure 3, the video traffic model operates in transient | |||
state if the requested target rate R_v is substantially higher than | state if the requested target rate R_v is substantially higher than | |||
the previous target, or else it operates in steady state. During | the previous target, or else it operates in steady state. During | |||
transient state, a total of K_d frames are generated by the | transient state, a total of K_d frames are generated by the | |||
statistical model, resulting in 1 big burst frame with size K_B | statistical model, resulting in one (1) big burst frame with size K_B | |||
followed by K_d-1 smaller frames. When operating at steady-state, | followed by K_d-1 smaller frames. When operating at steady-state, | |||
the video traffic model simply generates a frame according to the | the video traffic model simply generates a frame according to the | |||
trace-driven model given the target rate, while modulating the frame | trace-driven model given the target rate, while modulating the frame | |||
interval according to the distribution specified by the statistical | interval according to the distribution specified by the statistical | |||
model. One example criterion for determining whether the traffic | model. One example criterion for determining whether the traffic | |||
model should operate in transient state is whether the rate increase | model should operate in transient state is whether the rate increase | |||
exceeds 10% of previous target rate. | exceeds 10% of previous target rate. Finally, as this model follows | |||
transient state behavior dictated by the statistical model, upon a | ||||
substantial rate change, the model will follow the time-damping | ||||
mechanism defined in Section 5.1, which is governed by parameter | ||||
tau_v. | ||||
8. Implementation Status | 8. Implementation Status | |||
The statistical model has been implemented as a traffic generator | The statistical model has been implemented as a traffic generator | |||
module within the [ns-2] network simulation platform. | module within the [ns-2] network simulation platform. | |||
More recently, both the statistical and trace-driven models have been | More recently, the statistical, trace-driven, and hybrid models have | |||
implemented as a stand-alone traffic source module. This can be | been implemented as a stand-alone, platform-independent traffic | |||
easily integrated into network simulation platforms such as [ns-2] | source module. This can be easily integrated into network simulation | |||
and [ns-3], as well as testbeds using a real network. The stand- | platforms such as [ns-2] and [ns-3], as well as testbeds using a real | |||
alone traffic source module is available as an open source | network. The stand-alone traffic source module is available as an | |||
implementation at [Syncodecs]. | open source implementation at [Syncodecs]. | |||
9. IANA Considerations | 9. IANA Considerations | |||
There are no IANA impacts in this memo. | There are no IANA impacts in this memo. | |||
10. References | 10. References | |||
10.1. Normative References | 10.1. Normative References | |||
[H264] ITU-T Recommendation H.264, "Advanced video coding for | ||||
generic audiovisual services", 2003, | ||||
<http://www.itu.int/rec/T-REC-H.264-201304-I>. | ||||
[HEVC] ITU-T Recommendation H.265, "High efficiency video | ||||
coding", 2015. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<http://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
10.2. Informative References | 10.2. Informative References | |||
[H264] ITU-T Recommendation H.264, "Advanced video coding for | ||||
generic audiovisual services", May 2003, | ||||
<https://www.itu.int/rec/T-REC-H.264>. | ||||
[HEVC] ITU-T Recommendation H.265, "High efficiency video | ||||
coding", April 2013, | ||||
<https://www.itu.int/rec/T-REC-H.265>. | ||||
[Hu2010] Hu, H., Ma, Z., and Y. Wang, "Optimization of Spatial, | [Hu2010] Hu, H., Ma, Z., and Y. Wang, "Optimization of Spatial, | |||
Temporal and Amplitude Resolution for Rate-Constrained | Temporal and Amplitude Resolution for Rate-Constrained | |||
Video Coding and Scalable Video Adaptation", in Proc. 19th | Video Coding and Scalable Video Adaptation", in Proc. 19th | |||
IEEE International Conference on Image | IEEE International Conference on Image | |||
Processing, (ICIP'12), September 2012. | Processing, (ICIP'12), September 2012. | |||
[IETF-Interim] | [IETF-Interim] | |||
Zhu, X., Mena, S., and Z. Sarker, "Update on RMCAT Video | Zhu, X., Mena, S., and Z. Sarker, "Update on RMCAT Video | |||
Traffic Model: Trace Analysis and Model Update", April | Traffic Model: Trace Analysis and Model Update", April | |||
2017, <https://www.ietf.org/proceedings/interim-2017- | 2017, <https://www.ietf.org/proceedings/ | |||
rmcat-01/slides/slides-interim-2017-rmcat-01-sessa-update- | interim-2017-rmcat-01/slides/slides-interim-2017-rmcat-01- | |||
on-video-traffic-model-draft-00.pdf>. | sessa-update-on-video-traffic-model-draft-00.pdf>. | |||
[ns-2] "The Network Simulator - ns-2", | [ns-2] "The Network Simulator - ns-2", | |||
<http://www.isi.edu/nsnam/ns/>. | <http://www.isi.edu/nsnam/ns/>. | |||
[ns-3] "The Network Simulator - ns-3", <https://www.nsnam.org/>. | [ns-3] "The Network Simulator - ns-3", <https://www.nsnam.org/>. | |||
[Ozer2011] | [Ozer2011] | |||
Ozer, J., "Video Compression for Flash, Apple Devices and | Ozer, J., "Video Compression for Flash, Apple Devices and | |||
HTML5", ISBN 13:978-0976259503, 2011. | HTML5", ISBN 13:978-0976259503, 2011. | |||
End of changes. 68 change blocks. | ||||
204 lines changed or deleted | 220 lines changed or added | |||
This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |