draft-ietf-rmcat-video-traffic-model-02.txt | draft-ietf-rmcat-video-traffic-model-03.txt | |||
---|---|---|---|---|
Network Working Group X. Zhu | Network Working Group X. Zhu | |||
Internet-Draft S. Mena | Internet-Draft S. Mena | |||
Intended status: Informational Cisco Systems | Intended status: Informational Cisco Systems | |||
Expires: July 12, 2017 Z. Sarker | Expires: January 18, 2018 Z. Sarker | |||
Ericsson AB | Ericsson AB | |||
January 8, 2017 | July 17, 2017 | |||
Modeling Video Traffic Sources for RMCAT Evaluations | Modeling Video Traffic Sources for RMCAT Evaluations | |||
draft-ietf-rmcat-video-traffic-model-02 | draft-ietf-rmcat-video-traffic-model-03 | |||
Abstract | Abstract | |||
This document describes two reference video traffic source models for | This document describes two reference video traffic source models for | |||
evaluating RMCAT candidate algorithms. The first model statistically | evaluating RMCAT candidate algorithms. The first model statistically | |||
characterizes the behavior of a live video encoder in response to | characterizes the behavior of a live video encoder in response to | |||
changing requests on target video rate. The second model is trace- | changing requests on target video rate. The second model is trace- | |||
driven, and emulates the encoder output by scaling the pre-encoded | driven, and emulates the encoder output by scaling the pre-encoded | |||
video frame sizes from a widely used video test sequence. Both | video frame sizes from a widely used video test sequence. Both | |||
models are designed to strike a balance between simplicity, | models are designed to strike a balance between simplicity, | |||
skipping to change at page 1, line 40 ¶ | skipping to change at page 1, line 40 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on July 12, 2017. | This Internet-Draft will expire on January 18, 2018. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2017 IETF Trust and the persons identified as the | Copyright (c) 2017 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 24 ¶ | skipping to change at page 2, line 24 ¶ | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
3. Desired Behavior of A Synthetic Video Traffic Model . . . . . 3 | 3. Desired Behavior of A Synthetic Video Traffic Model . . . . . 3 | |||
4. Interactions Between Synthetic Video Traffic Source and | 4. Interactions Between Synthetic Video Traffic Source and | |||
Other Components at the Sender . . . . . . . . . . . . . . . 4 | Other Components at the Sender . . . . . . . . . . . . . . . 4 | |||
5. A Statistical Reference Model . . . . . . . . . . . . . . . . 6 | 5. A Statistical Reference Model . . . . . . . . . . . . . . . . 6 | |||
5.1. Time-damped response to target rate update . . . . . . . 7 | 5.1. Time-damped response to target rate update . . . . . . . 7 | |||
5.2. Temporary burst and oscillation during transient . . . . 8 | 5.2. Temporary burst and oscillation during transient . . . . 8 | |||
5.3. Output rate fluctuation at steady state . . . . . . . . . 8 | 5.3. Output rate fluctuation at steady state . . . . . . . . . 8 | |||
5.4. Rate range limit imposed by video content . . . . . . . . 9 | 5.4. Rate range limit imposed by video content . . . . . . . . 9 | |||
6. A Trace-Driven Model . . . . . . . . . . . . . . . . . . . . 9 | 6. A Trace-Driven Model . . . . . . . . . . . . . . . . . . . . 9 | |||
6.1. Choosing the video sequence and generating the traces . . 9 | 6.1. Choosing the video sequence and generating the traces . . 10 | |||
6.2. Using the traces in the syntethic codec . . . . . . . . . 11 | 6.2. Using the traces in the syntethic codec . . . . . . . . . 11 | |||
6.2.1. Main algorithm . . . . . . . . . . . . . . . . . . . 11 | 6.2.1. Main algorithm . . . . . . . . . . . . . . . . . . . 11 | |||
6.2.2. Notes to the main algorithm . . . . . . . . . . . . . 12 | 6.2.2. Notes to the main algorithm . . . . . . . . . . . . . 13 | |||
6.3. Varying frame rate and resolution . . . . . . . . . . . . 13 | 6.3. Varying frame rate and resolution . . . . . . . . . . . . 13 | |||
7. Combining The Two Models . . . . . . . . . . . . . . . . . . 14 | 7. Combining The Two Models . . . . . . . . . . . . . . . . . . 14 | |||
8. Implementation Status . . . . . . . . . . . . . . . . . . . . 15 | 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 15 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . 15 | 10.1. Normative References . . . . . . . . . . . . . . . . . . 16 | |||
10.2. Informative References . . . . . . . . . . . . . . . . . 15 | 10.2. Informative References . . . . . . . . . . . . . . . . . 16 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
1. Introduction | 1. Introduction | |||
When evaluating candidate congestion control algorithms designed for | When evaluating candidate congestion control algorithms designed for | |||
real-time interactive media, it is important to account for the | real-time interactive media, it is important to account for the | |||
characteristics of traffic patterns generated from a live video | characteristics of traffic patterns generated from a live video | |||
encoder. Unlike synthetic traffic sources that can conform perfectly | encoder. Unlike synthetic traffic sources that can conform perfectly | |||
to the rate changing requests from the congestion control module, a | to the rate changing requests from the congestion control module, a | |||
live video encoder can be sluggish in reacting to such changes. | live video encoder can be sluggish in reacting to such changes. | |||
Output rate of a live video encoder also typically deviates from the | Output rate of a live video encoder also typically deviates from the | |||
skipping to change at page 3, line 17 ¶ | skipping to change at page 3, line 17 ¶ | |||
and somewhat decouple from peculiarities of any specific video codec. | and somewhat decouple from peculiarities of any specific video codec. | |||
It is also desirable that evaluation tests are repeatable, and be | It is also desirable that evaluation tests are repeatable, and be | |||
easily duplicated across different candidate algorithms. | easily duplicated across different candidate algorithms. | |||
One way to strike a balance between the above considerations is to | One way to strike a balance between the above considerations is to | |||
evaluate RMCAT algorithms using a synthetic video traffic source | evaluate RMCAT algorithms using a synthetic video traffic source | |||
model that captures key characteristics of the behavior of a live | model that captures key characteristics of the behavior of a live | |||
video encoder. To this end, this draft presents two reference | video encoder. To this end, this draft presents two reference | |||
models. The first is based on statistical modelling; the second is | models. The first is based on statistical modelling; the second is | |||
trace-driven. The draft also discusses the pros and cons of each | trace-driven. The draft also discusses the pros and cons of each | |||
approach, as well as the how both approaches can be combined. | approach, as well as how both approaches can be combined. | |||
2. Terminology | 2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described RFC2119 [RFC2119]. | document are to be interpreted as described RFC2119 [RFC2119]. | |||
3. Desired Behavior of A Synthetic Video Traffic Model | 3. Desired Behavior of A Synthetic Video Traffic Model | |||
A live video encoder employs encoder rate control to meet a target | A live video encoder employs encoder rate control to meet a target | |||
skipping to change at page 4, line 5 ¶ | skipping to change at page 4, line 5 ¶ | |||
Hence, a synthetic video source should have the following | Hence, a synthetic video source should have the following | |||
capabilities: | capabilities: | |||
o To change bitrate. This includes ability to change framerate and/ | o To change bitrate. This includes ability to change framerate and/ | |||
or spatial resolution, or to skip frames when required. | or spatial resolution, or to skip frames when required. | |||
o To fluctuate around the target bitrate specified by the congestion | o To fluctuate around the target bitrate specified by the congestion | |||
control module. | control module. | |||
o To delay in convergence to the target bitrate. | o To show a delay in convergence to the target bitrate. | |||
o To generate intra-coded or repair frames on demand. | o To generate intra-coded or repair frames on demand. | |||
While there exist many different approaches in developing a synthetic | While there exist many different approaches in developing a synthetic | |||
video traffic model, it is desirable that the outcome follows a few | video traffic model, it is desirable that the outcome follows a few | |||
common characteristics, as outlined below. | common characteristics, as outlined below. | |||
o Low computational complexity: The model should be computationally | o Low computational complexity: The model should be computationally | |||
lightweight, otherwise it defeats the whole purpose of serving as | lightweight, otherwise it defeats the whole purpose of serving as | |||
a substitute for a live video encoder. | a substitute for a live video encoder. | |||
skipping to change at page 5, line 14 ¶ | skipping to change at page 5, line 14 ¶ | |||
encoding bitrate, and sometimes the spatial resolution and frame | encoding bitrate, and sometimes the spatial resolution and frame | |||
rate. | rate. | |||
In our model, the synthetic video encoder module has a group of | In our model, the synthetic video encoder module has a group of | |||
incoming and outgoing interface calls that allow for interaction with | incoming and outgoing interface calls that allow for interaction with | |||
other modules. The following are some of the possible incoming | other modules. The following are some of the possible incoming | |||
interface calls --- marked as (a) in Figure 1 --- that the synthetic | interface calls --- marked as (a) in Figure 1 --- that the synthetic | |||
video encoder may accept. The list is not exhaustive and can be | video encoder may accept. The list is not exhaustive and can be | |||
complemented by other interface calls if deemed necessary. | complemented by other interface calls if deemed necessary. | |||
o Target rate R_v(t): requested at time t, typically from the | o Target rate R_v: target rate request to the encoder, typically | |||
congestion control module. Depending on the congestion control | from the congestion control module and updated dynamically over | |||
algorithm in use, the update requests can either be periodic | time. Depending on the congestion control algorithm in use, the | |||
(e.g., once per second), or on-demand (e.g., only when a drastic | update requests can either be periodic (e.g., once per second), or | |||
bandwidth change over the network is observed). | on-demand (e.g., only when a drastic bandwidth change over the | |||
network is observed). | ||||
o Target frame rate FPS(t): the instantaneous frame rate measured in | o Target frame rate FPS: the instantaneous frame rate measured in | |||
frames-per-second at time t. This depends on the native camera | frames-per-second at a given time. This depends on the native | |||
capture frame rate as well as the target/preferred frame rate | camera capture frame rate as well as the target/preferred frame | |||
configured by the application or user. | rate configured by the application or user. | |||
o Frame resolution XY(t): the 2-dimensional vector indicating the | o Frame resolution XY: the 2-dimensional vector indicating the | |||
preferred frame resolution in pixels at time t. Several factors | preferred frame resolution in pixels. Several factors govern the | |||
govern the resolution requested to the synthetic video encoder | resolution requested to the synthetic video encoder over time. | |||
over time. Examples of such factors are the capturing resolution | Examples of such factors are the capturing resolution of the | |||
of the native camera; or the current target rate R_v(t), since | native camera; or the current target rate R_v, since very small | |||
very small resolutions do not make sense with very high bitrates, | resolutions do not make sense with very high bitrates, and vice- | |||
and vice-versa. | versa. | |||
o Instant frame skipping: the request to skip the encoding of one or | o Instant frame skipping: the request to skip the encoding of one or | |||
several captured video frames, for instance when a drastic | several captured video frames, for instance when a drastic | |||
decrease in available network bandwidth is detected. | decrease in available network bandwidth is detected. | |||
o On-demand generation of intra (I) frame: the request to encode | o On-demand generation of intra (I) frame: the request to encode | |||
another I frame to avoid further error propagation at the | another I frame to avoid further error propagation at the | |||
receiver, if severe packet losses are observed. This request | receiver, if severe packet losses are observed. This request | |||
typically comes from the error control module. | typically comes from the error control module. | |||
skipping to change at page 7, line 5 ¶ | skipping to change at page 7, line 5 ¶ | |||
modules at the sender | modules at the sender | |||
5. A Statistical Reference Model | 5. A Statistical Reference Model | |||
In this section, we describe one simple statistical model of the live | In this section, we describe one simple statistical model of the live | |||
video encoder traffic source. Figure 2 summarizes the list of | video encoder traffic source. Figure 2 summarizes the list of | |||
tunable parameters in this statistical model. A more comprehensive | tunable parameters in this statistical model. A more comprehensive | |||
survey of popular methods for modelling video traffic source behavior | survey of popular methods for modelling video traffic source behavior | |||
can be found in [Tanwir2013]. | can be found in [Tanwir2013]. | |||
+==============+===================================+================+ | +==============+====================================+================+ | |||
| Notation | Parameter Name | Example Value | | | Notation | Parameter Name | Example Value | | |||
+==============+===================================+================+ | +==============+====================================+================+ | |||
| R_v(t) | Target rate request at time t | 1 Mbps | | | R_v | Target rate request to encoder | 1 Mbps | | |||
+--------------+-----------------------------------+----------------+ | +--------------+------------------------------------+----------------+ | |||
| R_o(t) | Output rate at time t | 1.2 Mbps | | | FPS | Target frame rate of encoder output| 30 Hz | | |||
+--------------+-----------------------------------+----------------+ | +--------------+------------------------------------+----------------+ | |||
| tau_v | Encoder reaction latency | 0.2 s | | | tau_v | Encoder reaction latency | 0.2 s | | |||
+--------------+-----------------------------------+----------------+ | +--------------+------------------------------------+----------------+ | |||
| K_d | Burst duration during transient | 8 frames | | | K_d | Burst duration during transient | 8 frames | | |||
+--------------+-----------------------------------+----------------+ | +--------------+------------------------------------+----------------+ | |||
| K_B | Burst frame size during transient | 13.5 KBytes* | | | K_B | Burst frame size during transient | 13.5 KBytes* | | |||
+--------------+-----------------------------------+----------------+ | +--------------+------------------------------------+----------------+ | |||
| R_e(t) | Error in output rate at time t | 0.2 Mbps | | | t0 | Reference frame interval 1/FPS | 33 ms | | |||
+--------------+-----------------------------------+----------------+ | +--------------+------------------------------------+----------------+ | |||
| SIGMA_t | standard deviation of normalized | | | | B0 | Reference frame size R_v/8/FPS | 4.17 KBytes | | |||
| | frame interval (t/t0) | 0.25 | | +--------------+------------------------------------+----------------+ | |||
+--------------+-----------------------------------+----------------+ | | | Scaling parameter of the zero-mean | | | |||
| SIGMA_B | standard deviation of normalized | 0.1 | | | | Laplacian distribution describing | | | |||
| | frame size (B/B0) | | | | SCALE_t | deviations in normalized frame | 0.15 | | |||
+--------------+-----------------------------------+----------------+ | | | interval (t-t0)/t0 | | | |||
| R_min | minimum rate supported by video | 150 Kbps | | +--------------+------------------------------------+----------------+ | |||
| | encoder or content activity | | | | | Scaling parameter of the zero-mean | | | |||
+--------------+-----------------------------------+----------------+ | | | Laplacian distribution describing | | | |||
| R_max | maximum rate supported by video | 1.5 Mbps | | | SCALE_B | deviations in normalized frame | 0.15 | | |||
| | encoder or content activity | | | | | size (B-B0)/B0 | | | |||
+==============+===================================+================+ | +--------------+------------------------------------+----------------+ | |||
| R_min | minimum rate supported by video | 150 Kbps | | ||||
| | encoder or content activity | | | ||||
+--------------+------------------------------------+----------------+ | ||||
| R_max | maximum rate supported by video | 1.5 Mbps | | ||||
| | encoder or content activity | | | ||||
+==============+====================================+================+ | ||||
* Example value of K_B for a video stream encoded at 720p and 30 frames | * Example value of K_B for a video stream encoded at 720p and 30 frames | |||
per second, using H.264/AVC encoder. | per second, using H.264/AVC encoder. | |||
Figure 2: List of tunable parameters in a statistical video traffic | Figure 2: List of tunable parameters in a statistical video traffic | |||
source model. | source model. | |||
5.1. Time-damped response to target rate update | 5.1. Time-damped response to target rate update | |||
While the congestion control module can update its target rate | While the congestion control module can update its target rate | |||
request R_v(t) at any time, our model dictates that the encoder will | request R_v at any time, our model dictates that the encoder will | |||
only react to such changes after tau_v seconds from a previous rate | only react to such changes after tau_v seconds from a previous rate | |||
transition. In other words, when the encoder has reacted to a rate | transition. In other words, when the encoder has reacted to a rate | |||
change request at time t, it will simply ignore all subsequent rate | change request at time t, it will simply ignore all subsequent rate | |||
change requests until time t+tau_v. | change requests until time t+tau_v. | |||
5.2. Temporary burst and oscillation during transient | 5.2. Temporary burst and oscillation during transient | |||
The output rate R_o during the period [t, t+tau_v] is considered to | The output rate R_o during the period [t, t+tau_v] is considered to | |||
be in transient. Based on observations from video encoder output | be in transient. Based on observations from video encoder output | |||
data, we model the transient behavior of an encoder upon reacting to | data, we model the transient behavior of an encoder upon reacting to | |||
a new target rate request in the form of high variation in output | a new target rate request in the form of high variation in output | |||
frame sizes. It is assumed that the overall average output rate R_o | frame sizes. It is assumed that the overall average output rate R_o | |||
during this period matches the target rate R_v. Consequently, the | during this period matches the target rate R_v. Consequently, the | |||
occasional burst of large frames are followed by smaller-than average | occasional burst of large frames are followed by smaller-than average | |||
encoded frames. | encoded frames. | |||
This temporary burst is characterized by two parameters: | This temporary burst is characterized by two parameters: | |||
o burst duration K_d: number frames in the burst event; and | o burst duration K_d: number of frames in the burst event; and | |||
o burst frame size K_B: size of the initial burst frame which is | o burst frame size K_B: size of the initial burst frame which is | |||
typically significantly larger than average frame size at steady | typically significantly larger than average frame size at steady | |||
state. | state. | |||
It can be noted that these burst parameters can also be used to mimic | It can be noted that these burst parameters can also be used to mimic | |||
the insertion of a large on-demand I frame in the presence of severe | the insertion of a large on-demand I frame in the presence of severe | |||
packet losses. The values of K_d and K_B typically depend on the | packet losses. The values of K_d and K_B typically depend on the | |||
type of video codec, spatial and temporal resolution of the encoded | type of video codec, spatial and temporal resolution of the encoded | |||
stream, as well as the video content activity level. | stream, as well as the video content activity level. | |||
5.3. Output rate fluctuation at steady state | 5.3. Output rate fluctuation at steady state | |||
We model output rate R_o as randomly fluctuating around the target | We model output rate R_o during steady state as randomly fluctuating | |||
rate R_v after convergence. There exist two sources of variations in | around the target rate R_v. The output traffic can be characterized | |||
the encoder output: | as the combination of two random processes denoting the frame | |||
interval t and output frame size B over time. These two random | ||||
processes capture two sources of variations in the encoder output: | ||||
o Fluctuations in frame interval: the intervals between adjacent | o Fluctuations in frame interval: the intervals between adjacent | |||
frames have been observed to fluctuate around the reference | frames have been observed to fluctuate around the reference | |||
interval of t0 = 1/FPS. They roughly follow a Gaussian | interval of t0 = 1/FPS. Deviations in normalized frame interval | |||
distribution, and can be modelled with the parameter SIGMA_t, | DELTA_t = (t-t0)/t0 can be modelled by a zero-mean Laplacian | |||
which denotes the standard deviation of the normalized frame | distribution with scaling parameter SCALE_t. The value of SCALE_t | |||
interval (ratio between actual and reference frame interval). | dictates the "width" of the Laplacian distribution and therefore | |||
the amount of fluctuations in actual frame intervals (t) with | ||||
respect to the reference t0. | ||||
o Fluctuations in frame size: size of the output encoded frames also | o Fluctuations in frame size: size of the output encoded frames also | |||
tend to fluctuate around the reference frame size B0=R_v/8/FPS. | tend to fluctuate around the reference frame size B0=R_v/8/FPS. | |||
They can also be modelled via a Gaussian distribution, with the | Likewise, deviations in the normalized frame size DELTA_B = | |||
SIGMA_B denoting the standard deviation of the normalized frame | (B-B0)/B0 can be modelled by a zero-mean Laplacian distribution | |||
size (ratio between actual and reference frame size). | with scaling parameter SCALE_B. The value of SCALE_B dictates the | |||
"width" of this second Laplacian distribution and correspondingly | ||||
the amount of fluctuations in output frame sizes (B) with respect | ||||
to the reference target B0. | ||||
Both values of SIGMA_t and SIGMA_B can be obtained via parameter | Both values of SCALE_t and SCALE_B can be obtained via parameter | |||
fitting from empirical data captured for a given video encoder. | fitting from empirical data captured for a given video encoder. | |||
Example values are listed in Figure 2 based on empirical data | ||||
presented in [IETF-Interim]. | ||||
5.4. Rate range limit imposed by video content | 5.4. Rate range limit imposed by video content | |||
The output rate R_o is further clipped within the dynamic range | The output rate R_o is further clipped within the dynamic range | |||
[R_min, R_max], which in reality are dictated by scene and motion | [R_min, R_max], which in reality are dictated by scene and motion | |||
complexity of the captured video content. In our model, these | complexity of the captured video content. In our model, these | |||
parameters are specified by the application. | parameters are specified by the application. | |||
6. A Trace-Driven Model | 6. A Trace-Driven Model | |||
We now present the second approach to model a video traffic source. | We now present the second approach to model a video traffic source. | |||
This approach is based on running an actual live video encoder | This approach is based on running an actual live video encoder on a | |||
offline on a set of chosen raw video sequences and using the | set of chosen raw video sequences and using the encoder's output | |||
encoder's output traces for constructing a synthetic live encoder. | traces for constructing a synthetic live encoder. With this | |||
With this approach, the recorded video traces naturally exhibit | approach, the recorded video traces naturally exhibit temporal | |||
temporal fluctuations around a given target rate request R_v(t) from | fluctuations around a given target rate request R_v from the | |||
the congestion control module. | congestion control module. | |||
The following list summarizes this approach's main steps: | The following list summarizes the main steps of this approach: | |||
1) Choose one or more representative raw video sequences. | 1) Choose one or more representative raw video sequences. | |||
2) Using an actual live video encoder, encode the sequences at | 2) Encode the sequence(s) using an actual live video encoder. Repeat | |||
various bitrates. Keep just the sequences of frame sizes for each | the process for a number of bitrates. Keep only the sequence of | |||
bitrate. | frame sizes for each bitrate. | |||
3) Construct a data structure that contains the output of the | 3) Construct a data structure that contains the output of the | |||
previous step. The data structure should allow for easy bitrate | previous step. The data structure should allow for easy bitrate | |||
lookup. | lookup. | |||
4) Upon a target bitrate request R_v(t) from the controller, look up | 4) Upon a target bitrate request R_v from the controller, look up the | |||
the closest bitrates among those previously stored. Use the frame | closest bitrates among those previously stored. Use the frame size | |||
size sequences stored for those bitrates to approximate the frame | sequences stored for those bitrates to approximate the frame sizes to | |||
sizes to output. | output. | |||
5) The output of the synthetic encoder contains "encoded" frames with | 5) The output of the synthetic encoder contains "encoded" frames with | |||
zeros as contents but with realistic sizes. | zeros as contents but with realistic sizes. | |||
Section 6.1 explains steps 1), 2), and 3), Section 6.2 elaborates on | Section 6.1 explains steps 1), 2), and 3), Section 6.2 elaborates on | |||
steps 4) and 5). Finally, Section 6.3 briefly discusses the | steps 4) and 5). Finally, Section 6.3 briefly discusses the | |||
possibility to extend the model for supporting variable frame rate | possibility to extend the model for supporting variable frame rate | |||
and/or variable frame resolution. | and/or variable frame resolution. | |||
6.1. Choosing the video sequence and generating the traces | 6.1. Choosing the video sequence and generating the traces | |||
skipping to change at page 10, line 12 ¶ | skipping to change at page 10, line 23 ¶ | |||
video sequences that are representative of the use cases we want to | video sequences that are representative of the use cases we want to | |||
model. Our use case here is video conferencing, so we must choose a | model. Our use case here is video conferencing, so we must choose a | |||
low-motion sequence that resembles a "talking head", for instance a | low-motion sequence that resembles a "talking head", for instance a | |||
news broadcast or a video capture of an actual conference call. | news broadcast or a video capture of an actual conference call. | |||
The length of the chosen video sequence is a tradeoff. If it is too | The length of the chosen video sequence is a tradeoff. If it is too | |||
long, it will be difficult to manage the data structures containing | long, it will be difficult to manage the data structures containing | |||
the traces. If it is too short, there will be an obvious periodic | the traces. If it is too short, there will be an obvious periodic | |||
pattern in the output frame sizes, leading to biased results when | pattern in the output frame sizes, leading to biased results when | |||
evaluating congestion controller performance. In our experience, a | evaluating congestion controller performance. In our experience, a | |||
one-minute-long sequence is a fair tradeoff. | sequence whose length is between 2 and 4 minutes is a fair tradeoff. | |||
Once we have chosen the raw video sequence, denoted S, we use a live | Once we have chosen the raw video sequence, denoted S, we use a live | |||
encoder, e.g. [H264] or [HEVC] to produce a set of encoded | encoder, e.g. [H264] or [HEVC] to produce a set of encoded | |||
sequences. As discussed in Section 3, a live encoder's output | sequences. As discussed in Section 3, a live encoder's output | |||
bitrate can be tuned by varying three input parameters, namely, | bitrate can be tuned by varying three input parameters, namely, | |||
quantization step size, frame rate, and picture resolution. In order | quantization step size, frame rate, and picture resolution. In order | |||
to simplify the choice of these parameters for a given target rate, | to simplify the choice of these parameters for a given target rate, | |||
we assume a fixed frame rate (e.g. 25 fps) and a fixed resolution | we assume a fixed frame rate (e.g. 30 fps) and a fixed resolution | |||
(e.g., 480p). See section 6.3 for a discussion on how to relax these | (e.g., 720p). See section 6.3 for a discussion on how to relax these | |||
assumptions. | assumptions. | |||
Following these simplifications, we run the chosen encoder by setting | Following these simplifications, we run the chosen encoder by setting | |||
a constant target bitrate at the beginning, then letting the encoder | a constant target bitrate at the beginning, then letting the encoder | |||
vary the quantization step size internally while encoding the input | vary the quantization step size internally while encoding the input | |||
video sequence. Besides, we assume that the first frame is encoded | video sequence. Besides, we assume that the first frame is encoded | |||
as an I-frame and the rest are P-frames. We further assume that the | as an I-frame and the rest are P-frames. We further assume that the | |||
encoder algorithm does not use knowledge of frames in the future so | encoder algorithm does not use knowledge of frames in the future when | |||
as to encode a given frame. | encoding a given frame. | |||
We define R_min and R_max as the minimum and maximum bitrate at which | Given R_min and R_max, which are the minimum and maximum bitrates at | |||
the synthetic codec is to operate. We divide the bitrate range | which the synthetic codec is to operate (see Section 4), we divide | |||
between R_min and R_max in n_s + 1 bitrate steps of length l = (R_max | the bitrate range between R_min and R_max in n_s + 1 bitrate steps of | |||
- R_min) / n_s. We then use the following simple algorithm to encode | length l = (R_max - R_min) / n_s. We then use the following simple | |||
the raw video sequence. | algorithm to encode the raw video sequence. | |||
r = R_min | r = R_min | |||
while r <= R_max do | while r <= R_max do | |||
Traces[r] = encode_sequence(S, r, e) | Traces[r] = encode_sequence(S, r, e) | |||
r = r + l | r = r + l | |||
where function encode_sequence takes as parameters, respectively, a | where function encode_sequence takes as parameters, respectively, a | |||
raw video sequence, a constant target rate, and an encoder algorithm; | raw video sequence, a constant target rate, and an encoder algorithm; | |||
it returns a vector with the sizes of frames in the order they were | it returns a vector with the sizes of frames in the order they were | |||
encoded. The output vector is stored in a map structure called | encoded. The output vector is stored in a map structure called | |||
Traces, whose keys are bitrates and values are frame size vectors. | Traces, whose keys are bitrates and whose values are vectors of frame | |||
sizes. | ||||
The choice of a value for n_s is important, as it determines the | The choice of a value for n_s is important, as it determines the | |||
number of frame size vectors stored in map Traces. The minimum value | number of vectors of frame sizes stored in map Traces. The minimum | |||
one can choose for n_s is 1, and its maximum value depends on the | value one can choose for n_s is 1, and its maximum value depends on | |||
amount of memory available for holding the map Traces. A reasonable | the amount of memory available for holding the map Traces. A | |||
value for n_s is one that makes the steps' length l = 200 kbps. We | reasonable value for n_s is one that makes the steps' length l = 200 | |||
will further discuss step length l in the next section. | kbps. We will further discuss step length l in the next section. | |||
Finally, note that, as mentioned in previous sections, R_min and | ||||
R_max may be modified after the initial sequences are encoded. | ||||
Hence, the algorithm described in the next section also covers the | ||||
cases when the current target bitrate is less than R_min, or greater | ||||
than R_max. | ||||
6.2. Using the traces in the syntethic codec | 6.2. Using the traces in the syntethic codec | |||
The main idea behind the trace-driven synthetic codec is that it | The main idea behind the trace-driven synthetic codec is that it | |||
mimics a real live codec's rate adaptation when the congestion | mimics a real live codec's rate adaptation when the congestion | |||
controller updates the target rate R_v(t). It does so by switching | controller updates the target rate R_v dynamically. It does so by | |||
to a different frame size vector stored in the map Traces when | switching to a different frame size vector stored in the map Traces | |||
needed. | when needed. | |||
6.2.1. Main algorithm | 6.2.1. Main algorithm | |||
We maintain two variables r_current and t_current: | We maintain two variables r_current and t_current: | |||
* r_current points to one of the keys of the map Traces. Upon a | * r_current points to one of the keys of map Traces. Upon a change | |||
change in the value of R_v(t), typically because the congestion | in the value of R_v, typically because the congestion controller | |||
controller detects that the network conditions have changed, | detects that the network conditions have changed, r_current is | |||
r_current is updated to the greatest key in Traces that is less than | updated to the greatest key in Traces that is less than or equal to | |||
or equal to the new value of R_v(t). For the moment, we assume the | the new value of R_v. For the moment, we assume the value of R_v to | |||
value of R_v(t) to be clipped in the range [R_min, R_max]. | be clipped in the range [R_min, R_max]. | |||
r_current = r | r_current = r | |||
such that | such that | |||
( r in keys(Traces) and | ( r in keys(Traces) and | |||
r <= R_v(t) and | r <= R_v and | |||
(not(exists) r' in keys(Traces) such that r < r' <= R_v(t)) ) | (not(exists) r' in keys(Traces) such that r < r' <= R_v) ) | |||
* t_current is an index to the frame size vector stored in | * t_current is an index to the frame size vector stored in | |||
Traces[r_current]. It is updated every time a new frame is due. We | Traces[r_current]. It is updated every time a new frame is due. We | |||
assume all vectors stored in Traces to have the same size, denoted | assume all vectors stored in Traces to have the same size, denoted | |||
size_traces. The following equation governs the update of t_current: | size_traces. The following equation governs the update of t_current: | |||
if t_current < SkipFrames then | if t_current < SkipFrames then | |||
t_current = t_current + 1 | t_current = t_current + 1 | |||
else | else | |||
t_current = ((t_current+1-SkipFrames) % (size_traces- SkipFrames)) | t_current = ((t_current+1-SkipFrames) % (size_traces- SkipFrames)) | |||
+ SkipFrames | + SkipFrames | |||
where operator % denotes modulo, and SkipFrames is a predefined | where operator % denotes modulo, and SkipFrames is a predefined | |||
constant that denotes the number of frames to be skipped at the | constant that denotes the number of frames to be skipped at the | |||
beginning of frame size vectors after t_current has wrapped around. | beginning of frame size vectors after t_current has wrapped around. | |||
The point of constant SkipFrames is avoiding the effect of | The point of constant SkipFrames is avoiding the effect of | |||
periodically sending a (big) I-frame followed by several smaller- | periodically sending a (big) I-frame followed by several smaller- | |||
than-normal P-frames. We typically set SkipFrames to 20, although it | than-normal P-frames. We typically set SkipFrames to 20, although it | |||
could be set to 0 if we are interested in studying the effect of | could be set to 0 if we are interested in studying the effect of | |||
sending I-frames periodically. | sending I-frames periodically. | |||
We initialize r_current to R_min, and t_current to 0. | We initialize r_current to R_min, and t_current to 0. | |||
When a new frame is due, we need to calculate its size. There are | When a new frame is due, we need to calculate its size. There are | |||
three cases: | three cases: | |||
a) R_min <= R_v(t) < Rmax: In this case we use linear interpolation | a) R_min <= R_v < Rmax: In this case we use linear interpolation of | |||
of the frame sizes appearing in Traces[r_current] and | the frame sizes appearing in Traces[r_current] and | |||
Traces[r_current + l]. The interpolation is done as follows: | Traces[r_current + l]. The interpolation is done as follows: | |||
size_lo = Traces[r_current][t_current] | size_lo = Traces[r_current][t_current] | |||
size_hi = Traces[r_current + l][t_current] | size_hi = Traces[r_current + l][t_current] | |||
distance_lo = ( R_v(t) - r_current ) / l | distance_lo = ( R_v - r_current ) / l | |||
framesize = size_hi * distance_lo + size_lo * (1 - distance_lo) | framesize = size_hi * distance_lo + size_lo * (1 - distance_lo) | |||
b) R_v(t) < R_min: In this case, we scale the trace sequence with | b) R_v < R_min: In this case, we scale the trace sequence with the | |||
the lowest bitrate, in the following way: | lowest bitrate, in the following way: | |||
factor = R_v(t) / R_min | factor = R_v / R_min | |||
framesize = max(1, factor * Traces[R_min][t_current]) | framesize = max(1, factor * Traces[R_min][t_current]) | |||
c) R_v(t) >= R_max: We also use scaling for this case. We use the | c) R_v >= R_max: We also use scaling for this case. We use the | |||
trace sequence with the greatest bitrate: | trace sequence with the greatest bitrate: | |||
factor = R_v(t) / R_max | factor = R_v / R_max | |||
framesize = factor * Traces[R_max][t_current] | framesize = factor * Traces[R_max][t_current] | |||
In case b), we set the minimum to 1 byte, since the value of factor | In case b), we set the minimum to 1 byte, since the value of factor | |||
can be arbitrarily close to 0. | can be arbitrarily close to 0. | |||
6.2.2. Notes to the main algorithm | 6.2.2. Notes to the main algorithm | |||
* Reacting to changes in target bitrate. Similarly to the | * Reacting to changes in target bitrate. Similarly to the | |||
statistical model presented in Section 5, the trace-driven synthetic | statistical model presented in Section 5, the trace-driven synthetic | |||
codec can have a time bound, tau_v, to reacting to target bitrate | codec can have a time bound, tau_v, to reacting to target bitrate | |||
changes. If the codec has reacted to an update in R_v(t) at time t, | changes. If the codec has reacted to an update in R_v at time t, it | |||
it will delay any further update to R_v(t) to time t + tau_v. Note | will delay any further update to R_v to time t + tau_v. Note that, | |||
that, in any case, the value of tau_v cannot be chosen shorter than | in any case, the value of tau_v cannot be chosen shorter than the | |||
the time between frames, i.e. the inverse of the frame rate. | time between frames, i.e. the inverse of the frame rate. | |||
* I-frames on demand. The synthetic codec could be extended to | * I-frames on demand. The synthetic codec could be extended to | |||
simulate the sending of I-frames on demand, e.g., as a reaction to | simulate the sending of I-frames on demand, e.g., as a reaction to | |||
losses. To implement this extension, the codec's API is augmented | losses. To implement this extension, the codec's API is augmented | |||
with a new function to request a new I-frame. Upon calling such | with a new function to request a new I-frame. Upon calling such | |||
function, t_current is reset to 0. | function, t_current is reset to 0. | |||
* Variable length l of steps defined between R_min and R_max. In the | * Variable length l of steps defined between R_min and R_max. In the | |||
main algorithm's description, the step length l is fixed. However, | main algorithm's description, the step length l is fixed. However, | |||
if the range [R_min, R_max] is very wide, it is also possible to | if the range [R_min, R_max] is very wide, it is also possible to | |||
define a set of steps with a non-constant length. The idea behind | define a set of steps with a non-constant length. The idea behind | |||
this modification is that the difference between 400 kbps and 600 | this modification is that the difference between 400 kbps and 600 | |||
kbps as bitrate is much more important than the difference between | kbps as bitrate is much more important than the difference between | |||
4400 kbps and 4600 kbps. For example, one could define steps of | 4400 kbps and 4600 kbps. For example, one could define steps of | |||
length 200 Kbps under 1 Mbps, then length 300 kbps between 1 Mbps and | length 200 Kbps under 1 Mbps, then steps of length 300 kbps between 1 | |||
2 Mbps, 400 kbps between 2 Mbps and 3 Mbps, and so on. | Mbps and 2 Mbps; 400 kbps between 2 Mbps and 3 Mbps, and so on. | |||
6.3. Varying frame rate and resolution | 6.3. Varying frame rate and resolution | |||
The trace-driven synthetic codec model explained in this section is | The trace-driven synthetic codec model explained in this section is | |||
relatively simple because we have fixed the frame rate and the frame | relatively simple because we have fixed the frame rate and the frame | |||
resolution. The model could be extended to have variable frame rate, | resolution. The model could be extended to have variable frame rate, | |||
variable spatial resolution, or both. | variable spatial resolution, or both. | |||
When the encoded picture quality at a given bitrate is low, one can | When the encoded picture quality at a given bitrate is low, one can | |||
potentially decrease the frame rate (if the video sequence is | potentially decrease the frame rate (if the video sequence is | |||
skipping to change at page 14, line 19 ¶ | skipping to change at page 14, line 30 ¶ | |||
fairly simple to implement, it takes significantly greater effort to | fairly simple to implement, it takes significantly greater effort to | |||
fit the parameters of a statistical model to actual encoder output | fit the parameters of a statistical model to actual encoder output | |||
data whereas it is straightforward for a trace-driven model to obtain | data whereas it is straightforward for a trace-driven model to obtain | |||
encoded frame size data. On the other hand, once validated, the | encoded frame size data. On the other hand, once validated, the | |||
statistical model is more flexible in mimicking a wide range of | statistical model is more flexible in mimicking a wide range of | |||
encoder/content behaviors by simply varying the correponding | encoder/content behaviors by simply varying the correponding | |||
parameters in the model. In this regard, a trace-driven model relies | parameters in the model. In this regard, a trace-driven model relies | |||
-- by definition -- on additional data collection efforts for | -- by definition -- on additional data collection efforts for | |||
accommodating new codecs or video contents. | accommodating new codecs or video contents. | |||
In general, trace-driven model is more realistic for mimicking | In general, the trace-driven model is more realistic for mimicking | |||
ongoing, steady-state behavior of a video traffic source whereas | ongoing, steady-state behavior of a video traffic source whereas the | |||
statistical model is more versatile for simulating transient events | statistical model is more versatile for simulating transient events | |||
(e.g., when target rate changes from A to B with temporary bursts | (e.g., when target rate changes from A to B with temporary bursts | |||
during the transition). It is also possible to combine both models | during the transition). It is also possible to combine both models | |||
into a hybrid approach, using traces during steady-state and | into a hybrid approach, using traces during steady-state and | |||
statistical model during transients. | statistical model during transients. | |||
+---------------+ | +---------------+ | |||
transient | Generate next | | transient | Generate next | | |||
+------>| K_d transient | | +------>| K_d transient | | |||
+-------------+ / | frames | | +-------------+ / | frames | | |||
R_v(t) | Compare | / +---------------+ | R_v | Compare | / +---------------+ | |||
------->| against |/ | ------->| against |/ | |||
| previous | | | previous | | |||
| target rate |\ | | target rate |\ | |||
+-------------+ \ +---------------+ | +-------------+ \ +---------------+ | |||
\ | Generate next | | \ | Generate next | | |||
+------>| frame from | | +------>| frame from | | |||
steady-state | trace | | steady-state | trace | | |||
+---------------+ | +---------------+ | |||
Figure 3: Hybrid approach for modeling video traffic | Figure 3: Hybrid approach for modeling video traffic | |||
As shown in Figure 3, the video traffic model operates in transient | As shown in Figure 3, the video traffic model operates in transient | |||
state if the requested target rate R_v(t) is substantially higher | state if the requested target rate R_v is substantially higher than | |||
than the previous target, or else it operates in steady state. | the previous target, or else it operates in steady state. During | |||
During transient state, a total of K_d frames are generated by the | transient state, a total of K_d frames are generated by the | |||
statistical model, resulting in 1 big burst frame with size K_B | statistical model, resulting in 1 big burst frame with size K_B | |||
followed by K_d-1 smaller frames. When operating at steady-state, | followed by K_d-1 smaller frames. When operating at steady-state, | |||
the video traffic model simply generates a frame according to the | the video traffic model simply generates a frame according to the | |||
trace-driven model given the target rate, while modulating the frame | trace-driven model given the target rate, while modulating the frame | |||
interval according to the distribution specified by the statistical | interval according to the distribution specified by the statistical | |||
model. One example criteria for determining whether the traffic | model. One example criterion for determining whether the traffic | |||
model should operate in transient state is whether the rate increase | model should operate in transient state is whether the rate increase | |||
exceeds 10% of previous target rate. | exceeds 10% of previous target rate. | |||
8. Implementation Status | 8. Implementation Status | |||
The statistical model has been implemented as a traffic generator | The statistical model has been implemented as a traffic generator | |||
module within the [ns-2] network simulation platform. | module within the [ns-2] network simulation platform. | |||
More recently, both the statistical and trace-driven models have been | More recently, both the statistical and trace-driven models have been | |||
implemented as a stand-alone traffic source module. This can be | implemented as a stand-alone traffic source module. This can be | |||
skipping to change at page 15, line 29 ¶ | skipping to change at page 16, line 9 ¶ | |||
implementation at [Syncodecs]. | implementation at [Syncodecs]. | |||
9. IANA Considerations | 9. IANA Considerations | |||
There are no IANA impacts in this memo. | There are no IANA impacts in this memo. | |||
10. References | 10. References | |||
10.1. Normative References | 10.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | ||||
Requirement Levels", BCP 14, RFC 2119, | ||||
DOI 10.17487/RFC2119, March 1997, | ||||
<http://www.rfc-editor.org/info/rfc2119>. | ||||
[H264] ITU-T Recommendation H.264, "Advanced video coding for | [H264] ITU-T Recommendation H.264, "Advanced video coding for | |||
generic audiovisual services", 2003, | generic audiovisual services", 2003, | |||
<http://www.itu.int/rec/T-REC-H.264-201304-I>. | <http://www.itu.int/rec/T-REC-H.264-201304-I>. | |||
[HEVC] ITU-T Recommendation H.265, "High efficiency video | [HEVC] ITU-T Recommendation H.265, "High efficiency video | |||
coding", 2015. | coding", 2015. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | ||||
Requirement Levels", BCP 14, RFC 2119, | ||||
DOI 10.17487/RFC2119, March 1997, | ||||
<http://www.rfc-editor.org/info/rfc2119>. | ||||
10.2. Informative References | 10.2. Informative References | |||
[Hu2010] Hu, H., Ma, Z., and Y. Wang, "Optimization of Spatial, | [Hu2010] Hu, H., Ma, Z., and Y. Wang, "Optimization of Spatial, | |||
Temporal and Amplitude Resolution for Rate-Constrained | Temporal and Amplitude Resolution for Rate-Constrained | |||
Video Coding and Scalable Video Adaptation", in Proc. 19th | Video Coding and Scalable Video Adaptation", in Proc. 19th | |||
IEEE International Conference on Image | IEEE International Conference on Image | |||
Processing, (ICIP'12), September 2012. | Processing, (ICIP'12), September 2012. | |||
[Ozer2011] | [IETF-Interim] | |||
Ozer, J., "Video Compression for Flash, Apple Devices and | Zhu, X., Mena, S., and Z. Sarker, "Update on RMCAT Video | |||
HTML5", ISBN 13:978-0976259503, 2011. | Traffic Model: Trace Analysis and Model Update", April | |||
2017, <https://www.ietf.org/proceedings/interim-2017- | ||||
[Tanwir2013] | rmcat-01/slides/slides-interim-2017-rmcat-01-sessa-update- | |||
Tanwir, S. and H. Perros, "A Survey of VBR Video Traffic | on-video-traffic-model-draft-00.pdf>. | |||
Models", IEEE Communications Surveys and Tutorials, vol. | ||||
15, no. 5, pp. 1778-1802., October 2013. | ||||
[ns-2] "The Network Simulator - ns-2", | [ns-2] "The Network Simulator - ns-2", | |||
<http://www.isi.edu/nsnam/ns/>. | <http://www.isi.edu/nsnam/ns/>. | |||
[ns-3] "The Network Simulator - ns-3", <https://www.nsnam.org/>. | [ns-3] "The Network Simulator - ns-3", <https://www.nsnam.org/>. | |||
[Ozer2011] | ||||
Ozer, J., "Video Compression for Flash, Apple Devices and | ||||
HTML5", ISBN 13:978-0976259503, 2011. | ||||
[Syncodecs] | [Syncodecs] | |||
Mena, S., D'Aronco, S., and X. Zhu, "Syncodecs: Synthetic | Mena, S., D'Aronco, S., and X. Zhu, "Syncodecs: Synthetic | |||
codecs for evaluation of RMCAT work", | codecs for evaluation of RMCAT work", | |||
<https://github.com/cisco/syncodecs>. | <https://github.com/cisco/syncodecs>. | |||
[Tanwir2013] | ||||
Tanwir, S. and H. Perros, "A Survey of VBR Video Traffic | ||||
Models", IEEE Communications Surveys and Tutorials, vol. | ||||
15, no. 5, pp. 1778-1802., October 2013. | ||||
Authors' Addresses | Authors' Addresses | |||
Xiaoqing Zhu | Xiaoqing Zhu | |||
Cisco Systems | Cisco Systems | |||
12515 Research Blvd., Building 4 | 12515 Research Blvd., Building 4 | |||
Austin, TX 78759 | Austin, TX 78759 | |||
USA | USA | |||
Email: xiaoqzhu@cisco.com | Email: xiaoqzhu@cisco.com | |||
End of changes. 51 change blocks. | ||||
151 lines changed or deleted | 181 lines changed or added | |||
This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |