draft-ietf-rmcat-video-traffic-model-07.txt | rfc8593.txt | |||
---|---|---|---|---|
Network Working Group X. Zhu | Internet Engineering Task Force (IETF) X. Zhu | |||
Internet-Draft S. Mena | Request for Comments: 8593 S. Mena | |||
Intended status: Informational Cisco Systems | Category: Informational Cisco Systems | |||
Expires: August 23, 2019 Z. Sarker | ISSN: 2070-1721 Z. Sarker | |||
Ericsson AB | Ericsson AB | |||
February 19, 2019 | May 2019 | |||
Video Traffic Models for RTP Congestion Control Evaluations | Video Traffic Models for RTP Congestion Control Evaluations | |||
draft-ietf-rmcat-video-traffic-model-07 | ||||
Abstract | Abstract | |||
This document describes two reference video traffic models for | This document describes two reference video traffic models for | |||
evaluating RTP congestion control algorithms. The first model | evaluating RTP congestion control algorithms. The first model | |||
statistically characterizes the behavior of a live video encoder in | statistically characterizes the behavior of a live video encoder in | |||
response to changing requests on the target video rate. The second | response to changing requests on the target video rate. The second | |||
model is trace-driven and emulates the output of actual encoded video | model is trace-driven and emulates the output of actual encoded video | |||
frame sizes from a high-resolution test sequence. Both models are | frame sizes from a high-resolution test sequence. Both models are | |||
designed to strike a balance between simplicity, repeatability, and | designed to strike a balance between simplicity, repeatability, and | |||
authenticity in modeling the interactions between a live video | authenticity in modeling the interactions between a live video | |||
traffic source and the congestion control module. Finally, the | traffic source and the congestion control module. Finally, the | |||
document describes how both approaches can be combined into a hybrid | document describes how both approaches can be combined into a hybrid | |||
model. | model. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
provisions of BCP 78 and BCP 79. | published for informational purposes. | |||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Not all documents | |||
approved by the IESG are candidates for any level of Internet | ||||
Standard; see Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on August 23, 2019. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc8593. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
3. Desired Behavior of A Synthetic Video Traffic Model . . . . . 3 | 3. Desired Behavior of a Synthetic Video Traffic Model . . . . . 4 | |||
4. Interactions Between Synthetic Video Traffic Source and | 4. Interactions between Synthetic Video Traffic Source and | |||
Other Components at the Sender . . . . . . . . . . . . . . . 5 | Other Components at the Sender . . . . . . . . . . . . . . . 5 | |||
5. A Statistical Reference Model . . . . . . . . . . . . . . . . 6 | 5. A Statistical Reference Model . . . . . . . . . . . . . . . . 7 | |||
5.1. Time-damped response to target rate update . . . . . . . 7 | 5.1. Time-Damped Response to Target-Rate Update . . . . . . . 9 | |||
5.2. Temporary burst and oscillation during the transient | 5.2. Temporary Burst and Oscillation during the Transient | |||
period . . . . . . . . . . . . . . . . . . . . . . . . . 8 | Period . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
5.3. Output rate fluctuation at steady state . . . . . . . . . 8 | 5.3. Output-Rate Fluctuation at Steady State . . . . . . . . . 9 | |||
5.4. Rate range limit imposed by video content . . . . . . . . 9 | 5.4. Rate Range Limit Imposed by Video Content . . . . . . . . 10 | |||
6. A Trace-Driven Model . . . . . . . . . . . . . . . . . . . . 9 | 6. A Trace-Driven Model . . . . . . . . . . . . . . . . . . . . 10 | |||
6.1. Choosing the video sequence and generating the traces . . 10 | 6.1. Choosing the Video Sequence and Generating the Traces . . 11 | |||
6.2. Using the traces in the synthetic codec . . . . . . . . . 11 | 6.2. Using the Traces in the Synthetic Codec . . . . . . . . . 13 | |||
6.2.1. Main algorithm . . . . . . . . . . . . . . . . . . . 11 | 6.2.1. Main Algorithm . . . . . . . . . . . . . . . . . . . 13 | |||
6.2.2. Notes to the main algorithm . . . . . . . . . . . . . 13 | 6.2.2. Notes to the Main Algorithm . . . . . . . . . . . . . 14 | |||
6.3. Varying frame rate and resolution . . . . . . . . . . . . 14 | 6.3. Varying Frame Rate and Resolution . . . . . . . . . . . . 15 | |||
7. Combining The Two Models . . . . . . . . . . . . . . . . . . 14 | 7. Combining the Two Models . . . . . . . . . . . . . . . . . . 16 | |||
8. Implementation Status . . . . . . . . . . . . . . . . . . . . 16 | 8. Reference Implementation . . . . . . . . . . . . . . . . . . 17 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 | |||
10. Security Considerations . . . . . . . . . . . . . . . . . . . 16 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 17 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 16 | 11.1. Normative References . . . . . . . . . . . . . . . . . . 17 | |||
11.2. Informative References . . . . . . . . . . . . . . . . . 16 | 11.2. Informative References . . . . . . . . . . . . . . . . . 18 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
1. Introduction | 1. Introduction | |||
When evaluating candidate congestion control algorithms designed for | When evaluating candidate congestion control algorithms designed for | |||
real-time interactive media, it is important to account for the | real-time interactive media, it is important to account for the | |||
characteristics of traffic patterns generated from a live video | characteristics of traffic patterns generated from a live video | |||
encoder. Unlike synthetic traffic sources that can conform perfectly | encoder. Unlike synthetic traffic sources that can conform perfectly | |||
to the rate changing requests from the congestion control module, a | to the rate-changing requests from the congestion control module, a | |||
live video encoder can be sluggish in reacting to such changes. The | live video encoder can be sluggish in reacting to such changes. The | |||
output rate of a live video encoder also typically deviates from the | output rate of a live video encoder also typically deviates from the | |||
target rate due to uncertainties in the encoder rate control process. | target rate due to uncertainties in the encoder rate-control process. | |||
Consequently, end-to-end delay and loss performance of a real-time | Consequently, end-to-end delay and loss performance of a real-time | |||
media flow can be further impacted by rate variations introduced by | media flow can be further impacted by rate variations introduced by | |||
the live encoder. | the live encoder. | |||
On the other hand, evaluation results of a candidate RTP congestion | On the other hand, evaluation results of a candidate RTP congestion | |||
control algorithm should mostly reflect the performance of the | control algorithm should mostly reflect the performance of the | |||
congestion control module and somewhat decouple from peculiarities of | congestion control module and somewhat decouple from peculiarities of | |||
any specific video codec. It is also desirable that evaluation tests | any specific video codec. It is also desirable that evaluation tests | |||
are repeatable, and be easily duplicated across different candidate | are repeatable and easily duplicated across different candidate | |||
algorithms. | algorithms. | |||
One way to strike a balance between the above considerations is to | One way to strike a balance between the above considerations is to | |||
evaluate congestion control algorithms using a synthetic video | evaluate congestion control algorithms using a synthetic video | |||
traffic source model that captures key characteristics of the | traffic source model that captures key characteristics of the | |||
behavior of a live video encoder. The synthetic traffic model should | behavior of a live video encoder. The synthetic traffic model should | |||
also contain tunable parameters so that it can be flexibly adjusted | also contain tunable parameters so that it can be flexibly adjusted | |||
to reflect the wide variations in real-world live video encoder | to reflect the wide variations in real-world live video encoder | |||
behaviors. To this end, this draft presents two reference models. | behaviors. To this end, this document presents two reference models. | |||
The first is based on statistical modeling. The second is driven by | The first is based on statistical modeling. The second is driven by | |||
frame size and interval traces recorded from a real-world encoder. | frame size and interval traces recorded from a real-world encoder. | |||
The draft also discusses the pros and cons of each approach, as well | This document also discusses the pros and cons of each approach, as | |||
as how both approaches can be combined into a hybrid model. | well as how both approaches can be combined into a hybrid model. | |||
2. Terminology | 2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
3. Desired Behavior of A Synthetic Video Traffic Model | 3. Desired Behavior of a Synthetic Video Traffic Model | |||
A live video encoder employs encoder rate control to meet a target | A live video encoder employs encoder rate control to meet a target | |||
rate by varying its encoding parameters, such as quantization step | rate by varying its encoding parameters, such as quantization step | |||
size, frame rate, and picture resolution, based on its estimate of | size, frame rate, and picture resolution, based on its estimate of | |||
the video content (e.g., motion and scene complexity). In practice, | the video content (e.g., motion and scene complexity). In practice, | |||
however, several factors prevent the output video rate from perfectly | however, several factors prevent the output video rate from perfectly | |||
conforming to the input target rate. | conforming to the input target rate. | |||
Due to uncertainties in the captured video scene, the output rate | Due to uncertainties in the captured video scene, the output rate | |||
typically deviates from the specified target. In the presence of a | typically deviates from the specified target. In the presence of a | |||
skipping to change at page 4, line 11 ¶ | skipping to change at page 4, line 29 ¶ | |||
frames in a live session are encoded in predictive mode (i.e., | frames in a live session are encoded in predictive mode (i.e., | |||
P-frames in [H264]), the encoder can occasionally generate a large | P-frames in [H264]), the encoder can occasionally generate a large | |||
intra-coded frame (i.e., I-frame as defined in [H264]) or a frame | intra-coded frame (i.e., I-frame as defined in [H264]) or a frame | |||
partially containing intra-coded blocks in an attempt to recover from | partially containing intra-coded blocks in an attempt to recover from | |||
losses, to re-sync with the receiver, or during the transient period | losses, to re-sync with the receiver, or during the transient period | |||
of responding to target rate or spatial resolution changes. | of responding to target rate or spatial resolution changes. | |||
Hence, a synthetic video source should have the following | Hence, a synthetic video source should have the following | |||
capabilities: | capabilities: | |||
o To change bitrate. This includes the ability to change framerate | o To change bitrate. This includes the ability to change frame rate | |||
and/or spatial resolution or to skip frames upon request. | and/or spatial resolution or to skip frames upon request. | |||
o To fluctuate around the target bitrate specified by the congestion | o To fluctuate around the target bitrate specified by the congestion | |||
control module. | control module. | |||
o To show a delay in convergence to the target bitrate. | o To show a delay in convergence to the target bitrate. | |||
o To generate intra-coded or repair frames on demand. | o To generate intra-coded or repair frames on demand. | |||
While there exist many different approaches in developing a synthetic | While there exist many different approaches in developing a synthetic | |||
video traffic model, it is desirable that the outcome follows a few | video traffic model, it is desirable that the outcome follows a few | |||
common characteristics, as outlined below. | common characteristics, as outlined below. | |||
o Low computational complexity: The model should be computationally | o Low computational complexity: The model should be computationally | |||
lightweight, otherwise it defeats the whole purpose of serving as | lightweight, otherwise, it defeats the whole purpose of serving as | |||
a substitute for a live video encoder. | a substitute for a live video encoder. | |||
o Temporal pattern similarity: The individual traffic trace | o Temporal pattern similarity: The individual traffic trace | |||
instances generated by the model should mimic the temporal pattern | instances generated by the model should mimic the temporal pattern | |||
of those from a real video encoder. | of those from a real video encoder. | |||
o Statistical resemblance: The synthetic traffic source should match | o Statistical resemblance: The synthetic traffic source should match | |||
the outcome of the real video encoder in terms of statistical | the outcome of the real video encoder in terms of statistical | |||
characteristics, such as the mean, variance, peak, and | characteristics, such as the mean, variance, peak, and | |||
autocorrelation coefficients of the bitrate. It is also important | autocorrelation coefficients of the bitrate. It is also important | |||
that the statistical resemblance should hold across different time | that the statistical resemblance should hold across different time | |||
scales, ranging from tens of milliseconds to sub-seconds. | scales ranging from tens of milliseconds to sub-seconds. | |||
o A wide range of coverage: The model should be easily configurable | o A wide range of coverage: The model should be easily configurable | |||
to cover a wide range of codec behaviors (e.g., with either fast | to cover a wide range of codec behaviors (e.g., with either fast | |||
or slow reaction time in live encoder rate control) and video | or slow reaction time in live encoder rate control) and video | |||
content variations (e.g., ranging from high to low motion). | content variations (e.g., ranging from high to low motion). | |||
These distinct behavior features can be characterized via simple | These distinct behavior features can be characterized via simple | |||
statistical modeling or a trace-driven approach. Section 5 and | statistical modeling or a trace-driven approach. Sections 5 and 6 | |||
Section 6 provide an example of each approach, respectively. | provide an example of each approach, respectively. Section 7 | |||
Section 7 discusses how both models can be combined together. | discusses how both models can be combined together. | |||
4. Interactions Between Synthetic Video Traffic Source and Other | 4. Interactions between Synthetic Video Traffic Source and Other | |||
Components at the Sender | Components at the Sender | |||
Figure 1 depicts the interactions of the synthetic video traffic | Figure 1 depicts the interactions of the synthetic video traffic | |||
source with other components at the sender, such as the application, | source with other components at the sender, such as the application, | |||
the congestion control module, the media packet transport module, | the congestion control module, the media packet transport module, | |||
etc. Both reference models --- as described later in Section 5 and | etc. Both reference models, as described later in Sections 5 and 6, | |||
Section 6 --- follow the same set of interactions. | follow the same set of interactions. | |||
The synthetic video source dynamically generates a sequence of dummy | The synthetic video source dynamically generates a sequence of dummy | |||
video frames with varying size and interval. These dummy frames are | video frames with varying size and interval. These dummy frames are | |||
processed by other modules in order to transmit the video stream over | processed by other modules in order to transmit the video stream over | |||
the network. During the lifetime of a video transmission session, | the network. During the lifetime of a video transmission session, | |||
the synthetic video source will typically be required to adapt its | the synthetic video source will typically be required to adapt its | |||
encoding bitrate, and sometimes the spatial resolution and frame | encoding bitrate and sometimes the spatial resolution and frame rate. | |||
rate. | ||||
In this model, the synthetic video source module has a group of | In this model, the synthetic video source module has a group of | |||
incoming and outgoing interface calls that allow for interaction with | incoming and outgoing interface calls that allow for interaction with | |||
other modules. The following are some of the possible incoming | other modules. The following are some of the possible incoming | |||
interface calls --- marked as (a) in Figure 1 --- that the synthetic | interface calls, marked as (a) in Figure 1, that the synthetic video | |||
video traffic source may accept. The list is not exhaustive and can | traffic source may accept. The list is not exhaustive and can be | |||
be complemented by other interface calls if necessary. | complemented by other interface calls if necessary. | |||
o Target bitrate R_v: target bitrate request measured in bits per | o Target bitrate R_v: Target bitrate request measured in bits per | |||
second (bps). Typically, the congestion control module calculates | second (bps). Typically, the congestion control module calculates | |||
the target bitrate and updates it dynamically over time. | the target bitrate and updates it dynamically over time. | |||
Depending on the congestion control algorithm in use, the update | Depending on the congestion control algorithm in use, the update | |||
requests can either be periodic (e.g., once per second), or on- | requests can either be periodic (e.g., once per second), or | |||
demand (e.g., only when a drastic bandwidth change over the | on-demand (e.g., only when a drastic bandwidth change over the | |||
network is observed). | network is observed). | |||
o Target frame rate FPS: the instantaneous frame rate measured in | o Target frame rate FPS: The instantaneous frame rate measured in | |||
frames-per-second at a given time. This depends on the native | frames per second at a given time. This depends on the native | |||
camera capture frame rate as well as the target/preferred frame | camera-capture frame rate as well as the target/preferred frame | |||
rate configured by the application or user. | rate configured by the application or user. | |||
o Target frame resolution XY: the 2-dimensional vector indicating | o Target frame resolution XY: The 2-dimensional vector indicating | |||
the preferred frame resolution in pixels. Several factors govern | the preferred frame resolution in pixels. Several factors govern | |||
the resolution requested to the synthetic video source over time. | the resolution requested to the synthetic video source over time. | |||
Examples of such factors include the capturing resolution of the | Examples of such factors include the capturing resolution of the | |||
native camera and the display size of the destination screen. The | native camera and the display size of the destination screen. The | |||
target frame resolution also depends on the current target bitrate | target frame resolution also depends on the current target bitrate | |||
R_v, since it does not make sense to pair very low spatial | R_v, since it does not make sense to pair very low spatial | |||
resolutions with very high bitrates, and vice-versa. | resolutions with very high bitrates, and vice-versa. | |||
o Instant frame skipping: the request to skip the encoding of one or | o Instant frame skipping: The request to skip the encoding of one or | |||
several captured video frames, for instance when a drastic | several captured video frames, for instance, when a drastic | |||
decrease in available network bandwidth is detected. | decrease in available network bandwidth is detected. | |||
o On-demand generation of intra (I) frame: the request to encode | o On-demand generation of intra (I) frame: The request to encode | |||
another I frame to avoid further error propagation at the receiver | another I-frame to avoid further error propagation at the receiver | |||
when severe packet losses are observed. This request typically | when severe packet losses are observed. This request typically | |||
comes from the error control module. It can be initiated either | comes from the error control module. It can be initiated either | |||
by the sender or by the receiver via Full Intra Request (FIR) | by the sender or by the receiver via Full Intra Request (FIR) | |||
messages as defined in [RFC5104]. | messages as defined in [RFC5104]. | |||
An example of outgoing interface call --- marked as (b) in Figure 1 | An example of an outgoing interface call, marked as (b) in Figure 1, | |||
--- is the rate range [R_min, R_max]. Here, R_min and R_max are | is the rate range [R_min, R_max]. Here, R_min and R_max are meant to | |||
meant to capture the dynamic rate range and actual live video encoder | capture the dynamic rate range the actual live video encoder is | |||
is capable of generating given the input video content. This | capable of generating given the input video content. This typically | |||
typically depends on the video content complexity and/or display type | depends on the video content complexity and/or display type (e.g., | |||
(e.g., higher R_max for video contents with higher motion complexity, | higher R_max for video content with higher motion complexity or for | |||
or for displays of higher resolution). Therefore, these values will | displays of higher resolution). Therefore, these values will not | |||
not change with R_v but may change over time if the content is | change with R_v but may change over time if the content is changing. | |||
changing. | ||||
+-------------+ | +-------------+ | |||
| | dummy encoded | | | dummy encoded | |||
| Synthetic | video frames | | Synthetic | video frames | |||
| Video | --------------> | | Video | --------------> | |||
| Source | | | Source | | |||
| | | | | | |||
+--------+----+ | +--------+----+ | |||
/|\ | | /|\ | | |||
| | | | | | |||
-------------------+ +--------------------> | -------------------+ +--------------------> | |||
interface from interface to | interface from interface to | |||
other modules (a) other modules (b) | other modules (a) other modules (b) | |||
Figure 1: Interaction between synthetic video encoder and other | Figure 1: Interaction between Synthetic Video Encoder | |||
modules at the sender | and Other Modules at the Sender | |||
5. A Statistical Reference Model | 5. A Statistical Reference Model | |||
This section describes one simple statistical model of the live video | This section describes one simple statistical model of the live video | |||
encoder traffic source. Figure 2 summarizes the list of tunable | traffic source. Figure 2 summarizes the list of tunable parameters | |||
parameters in this statistical model. A more comprehensive survey of | in this statistical model. A more comprehensive survey of popular | |||
popular methods for modeling video traffic source behavior can be | methods for modeling the behavior of video traffic sources can be | |||
found in [Tanwir2013]. | found in [Tanwir2013]. | |||
+===========+====================================+================+ | +===========+====================================+================+ | |||
| Notation | Parameter Name | Example Value | | | Notation | Parameter Name | Example Value | | |||
+===========+====================================+================+ | +===========+====================================+================+ | |||
| R_v | Target bitrate request | 1 Mbps | | | R_v | Target bitrate request | 1 Mbps | | |||
+-----------+------------------------------------+----------------+ | +-----------+------------------------------------+----------------+ | |||
| FPS | Target frame rate | 30 Hz | | | FPS | Target frame rate | 30 Hz | | |||
+-----------+------------------------------------+----------------+ | +-----------+------------------------------------+----------------+ | |||
| tau_v | Encoder reaction latency | 0.2 s | | | tau_v | Encoder reaction latency | 0.2 s | | |||
+-----------+------------------------------------+----------------+ | +-----------+------------------------------------+----------------+ | |||
| K_d | Burst duration of the transient | 8 frames | | | K_d | Burst duration of the transient | 8 frames | | |||
| | period | | | | | period | | | |||
+-----------+------------------------------------+----------------+ | +-----------+------------------------------------+----------------+ | |||
| K_B | Burst frame size during the | 13.5 KBytes* | | | K_B | Burst frame size during the | 13.5 KB* | | |||
| | transient period | | | | | transient period | | | |||
+-----------+------------------------------------+----------------+ | +-----------+------------------------------------+----------------+ | |||
| t0 | Reference frame interval 1/FPS | 33 ms | | | t0 | Reference frame interval 1/FPS | 33 ms | | |||
+-----------+------------------------------------+----------------+ | +-----------+------------------------------------+----------------+ | |||
| B0 | Reference frame size R_v/8/FPS | 4.17 KBytes | | | B0 | Reference frame size R_v/8/FPS | 4.17 KB | | |||
+-----------+------------------------------------+----------------+ | +-----------+------------------------------------+----------------+ | |||
| | Scaling parameter of the zero-mean | | | | | Scaling parameter of the zero-mean | | | |||
| | Laplacian distribution describing | | | | | Laplacian distribution describing | | | |||
| SCALE_t | deviations in normalized frame | 0.15 | | | SCALE_t | deviations in normalized frame | 0.15 | | |||
| | interval (t-t0)/t0 | | | | | interval (t-t0)/t0 | | | |||
+-----------+------------------------------------+----------------+ | +-----------+------------------------------------+----------------+ | |||
| | Scaling parameter of the zero-mean | | | | | Scaling parameter of the zero-mean | | | |||
| | Laplacian distribution describing | | | | | Laplacian distribution describing | | | |||
| SCALE_B | deviations in normalized frame | 0.15 | | | SCALE_B | deviations in normalized frame | 0.15 | | |||
| | size (B-B0)/B0 | | | | | size (B-B0)/B0 | | | |||
+-----------+------------------------------------+----------------+ | +-----------+------------------------------------+----------------+ | |||
| R_min | minimum rate supported by video | 150 Kbps | | | R_min | Minimum rate supported by video | 150 kbps | | |||
| | encoder type or content activity | | | | | encoder type or content activity | | | |||
+-----------+------------------------------------+----------------+ | +-----------+------------------------------------+----------------+ | |||
| R_max | maximum rate supported by video | 1.5 Mbps | | | R_max | Maximum rate supported by video | 1.5 Mbps | | |||
| | encoder type or content activity | | | | | encoder type or content activity | | | |||
+===========+====================================+================+ | +===========+====================================+================+ | |||
* Example value of K_B for a video stream encoded at 720p and | * Example value of K_B for a video stream encoded at 720p and | |||
30 frames per second, using H.264/AVC encoder. | 30 frames per second using H.264/AVC encoder | |||
Figure 2: List of tunable parameters in a statistical video traffic | Figure 2: List of Tunable Parameters in a Statistical Video Traffic | |||
source model. | Source Model | |||
5.1. Time-damped response to target rate update | 5.1. Time-Damped Response to Target-Rate Update | |||
While the congestion control module can update its target bitrate | While the congestion control module can update its target bitrate | |||
request R_v at any time, the statistical model dictates that the | request R_v at any time, the statistical model dictates that the | |||
encoder will only react to such changes tau_v seconds after a | encoder will only react to such changes tau_v seconds after a | |||
previous rate transition. In other words, when the encoder has | previous rate transition. In other words, when the encoder has | |||
reacted to a rate change request at time t, it will simply ignore all | reacted to a rate-change request at time t, it will simply ignore all | |||
subsequent rate change requests until time t+tau_v. | subsequent rate-change requests until time t+tau_v. | |||
5.2. Temporary burst and oscillation during the transient period | 5.2. Temporary Burst and Oscillation during the Transient Period | |||
The output bitrate R_o during the period [t, t+tau_v] is considered | The output bitrate R_o during the period [t, t+tau_v] is considered | |||
to be in a transient state when reacting to abrupt changes in target | to be in a transient state when reacting to abrupt changes in target | |||
rate. Based on observations from video encoder output data, the | rate. Based on observations from video encoder output, the encoder | |||
encoder reaction to a new target bitrate request can be characterized | reaction to a new target bitrate request can be characterized by high | |||
by high variations in output frame sizes. It is assumed in the model | variations in output frame sizes. It is assumed in the model that | |||
that the overall average output bitrate R_o during this transient | the overall average output bitrate R_o during this transient period | |||
period matches the target bitrate R_v. Consequently, the occasional | matches the target bitrate R_v. Consequently, the occasional burst | |||
burst of large frames is followed by smaller-than-average encoded | of large frames is followed by smaller-than-average encoded frames. | |||
frames. | ||||
This temporary burst is characterized by two parameters: | This temporary burst is characterized by two parameters: | |||
o burst duration K_d: number of frames in the burst event; and | o burst duration K_d: Number of frames in the burst event, and | |||
o burst frame size K_B: size of the initial burst frame which is | o burst frame size K_B: Size of the initial burst frame, which is | |||
typically significantly larger than average frame size at steady | typically significantly larger than the average frame size at | |||
state. | steady state. | |||
It can be noted that these burst parameters can also be used to mimic | It can be noted that these burst parameters can also be used to mimic | |||
the insertion of a large on-demand I frame in the presence of severe | the insertion of a large on-demand I-frame in the presence of severe | |||
packet losses. The values of K_d and K_B typically depend on the | packet losses. The values of K_d and K_B typically depend on the | |||
type of video codec, spatial and temporal resolution of the encoded | type of video codec, spatial and temporal resolution of the encoded | |||
stream, as well as the video content activity level. | stream, as well as the activity level in the video content. | |||
5.3. Output rate fluctuation at steady state | 5.3. Output-Rate Fluctuation at Steady State | |||
The output bitrate R_o during steady state is modeled as randomly | The output bitrate R_o during steady state is modeled as randomly | |||
fluctuating around the target bitrate R_v. The output traffic can be | fluctuating around the target bitrate R_v. The output traffic can be | |||
characterized as the combination of two random processes denoting the | characterized as the combination of two random processes that denote | |||
frame interval t and output frame size B over time, as the two major | the frame interval t and output frame size B over time, which are the | |||
sources of variations in the encoder output. For simplicity, the | two major sources of variations in the encoder output. For | |||
deviations of t and B from their respective reference levels are | simplicity, the deviations of t and B from their respective reference | |||
modeled as independent and identically distributed (i.i.d) random | levels are modeled as independent and identically distributed (i.i.d) | |||
variables following the Laplacian distribution [Papoulis]. More | random variables following the Laplacian distribution [Papoulis]. | |||
specifically: | More specifically: | |||
o Fluctuations in frame interval: the intervals between adjacent | o Fluctuations in frame interval: The intervals between adjacent | |||
frames have been observed to fluctuate around the reference | frames have been observed to fluctuate around the reference | |||
interval of t0 = 1/FPS. Deviations in normalized frame interval | interval of t0 = 1/FPS. Deviations in normalized frame interval | |||
DELTA_t = (t-t0)/t0 can be modeled by a zero-mean Laplacian | DELTA_t = (t-t0)/t0 can be modeled by a zero-mean Laplacian | |||
distribution with scaling parameter SCALE_t. The value of SCALE_t | distribution with scaling parameter SCALE_t. The value of SCALE_t | |||
dictates the "width" of the Laplacian distribution and therefore | dictates the "width" of the Laplacian distribution and therefore | |||
the amount of fluctuation in actual frame intervals (t) with | the amount of fluctuation in actual frame intervals (t) with | |||
respect to the reference frame interval t0. | respect to the reference frame interval t0. | |||
o Fluctuations in frame size: the output encoded frame sizes also | o Fluctuations in frame size: The output-encoded frame sizes also | |||
tend to fluctuate around the reference frame size B0=R_v/8/FPS. | tend to fluctuate around the reference frame size B0=R_v/8/FPS. | |||
Likewise, deviations in the normalized frame size DELTA_B = | Likewise, deviations in the normalized frame size DELTA_B = | |||
(B-B0)/B0 can be modeled by a zero-mean Laplacian distribution | (B-B0)/B0 can be modeled by a zero-mean Laplacian distribution | |||
with scaling parameter SCALE_B. The value of SCALE_B dictates the | with scaling parameter SCALE_B. The value of SCALE_B dictates the | |||
"width" of this second Laplacian distribution and correspondingly | "width" of this second Laplacian distribution and correspondingly | |||
the amount of fluctuations in output frame sizes (B) with respect | the amount of fluctuations in output frame sizes (B) with respect | |||
to the reference target B0. | to the reference target B0. | |||
Both values of SCALE_t and SCALE_B can be obtained via parameter | Both values of SCALE_t and SCALE_B can be obtained via parameter | |||
fitting from empirical data captured for a given video encoder. | fitting from empirical data captured for a given video encoder. | |||
Example values are listed in Figure 2 based on empirical data | Example values are listed in Figure 2 based on empirical data | |||
presented in [IETF-Interim]. | presented in [IETF-Interim]. | |||
5.4. Rate range limit imposed by video content | 5.4. Rate Range Limit Imposed by Video Content | |||
The output bitrate R_o is further clipped within the dynamic range | The output bitrate R_o is further clipped within the dynamic range | |||
[R_min, R_max], which in reality are dictated by scene and motion | [R_min, R_max], which in reality are dictated by scene and motion | |||
complexity of the captured video content. In the proposed | complexity of the captured video content. In the proposed | |||
statistical model, these parameters are specified by the application. | statistical model, these parameters are specified by the application. | |||
6. A Trace-Driven Model | 6. A Trace-Driven Model | |||
The second approach for modeling a video traffic source is trace- | The second approach for modeling a video traffic source is trace- | |||
driven. This can be achieved by running an actual live video encoder | driven. This can be achieved by running an actual live video encoder | |||
skipping to change at page 10, line 4 ¶ | skipping to change at page 11, line 11 ¶ | |||
2. Encode the sequence(s) using an actual live video encoder. | 2. Encode the sequence(s) using an actual live video encoder. | |||
Repeat the process for a number of bitrates. Keep only the | Repeat the process for a number of bitrates. Keep only the | |||
sequence of frame sizes for each bitrate. | sequence of frame sizes for each bitrate. | |||
3. Construct a data structure that contains the output of the | 3. Construct a data structure that contains the output of the | |||
previous step. The data structure should allow for easy bitrate | previous step. The data structure should allow for easy bitrate | |||
lookup. | lookup. | |||
4. Upon a target bitrate request R_v from the controller, look up | 4. Upon a target bitrate request R_v from the controller, look up | |||
the closest bitrates among those previously stored. Use the | the closest bitrates among those previously stored. Use the | |||
frame size sequences stored for those bitrates to approximate the | frame-size sequences stored for those bitrates to approximate the | |||
frame sizes to output. | frame sizes to output. | |||
5. The output of the synthetic video traffic source contains | 5. The output of the synthetic video traffic source contains | |||
"encoded" frames with dummy contents but with realistic sizes. | "encoded" frames with dummy contents but with realistic sizes. | |||
In the following, Section 6.1 explains the first three steps (1-3), | Section 6.1 explains the first three steps (1-3), and Section 6.2 | |||
Section 6.2 elaborates on the remaining two steps (4-5). Finally, | elaborates on the remaining two steps (4-5). Finally, Section 6.3 | |||
Section 6.3 briefly discusses the possibility to extend the trace- | briefly discusses the possibility to extend the trace-driven model | |||
driven model for supporting time-varying frame rate and/or time- | for supporting time-varying frame rate and/or time-varying frame | |||
varying frame resolution. | resolution. | |||
6.1. Choosing the video sequence and generating the traces | 6.1. Choosing the Video Sequence and Generating the Traces | |||
The first step is a careful choice of a set of video sequences that | The first step is a careful choice of a set of video sequences that | |||
are representative of the target use cases for the video traffic | are representative of the target use cases for the video traffic | |||
model. For the example use case of interactive video conferencing, | model. For the example use case of interactive video conferencing, | |||
it is recommended to choose a sequence with content that resembles a | it is recommended to choose a sequence with content that resembles a | |||
"talking head", e.g. from a news broadcast or recording of an actual | "talking head", e.g., from a news broadcast or recording of an actual | |||
video conferencing call. | video conferencing call. | |||
The length of the chosen video sequence is a tradeoff. If it is too | The length of the chosen video sequence is a tradeoff. If it is too | |||
long, it will be difficult to manage the data structures containing | long, it will be difficult to manage the data structures containing | |||
the traces. If it is too short, there will be an obvious periodic | the traces. If it is too short, there will be an obvious periodic | |||
pattern in the output frame sizes, leading to biased results when | pattern in the output frame sizes, leading to biased results when | |||
evaluating congestion control performance. It has been empirically | evaluating congestion control performance. It has been empirically | |||
determined that a sequence 2 to 4 minutes in length sufficiently | determined that a sequence 2 to 4 minutes in length sufficiently | |||
avoids the periodic pattern. | avoids the periodic pattern. | |||
Given the chosen raw video sequence, denoted S, one can use a live | Given the chosen raw video sequence, denoted "S", one can use a live | |||
encoder, e.g. some implementation of [H264] or [HEVC], to produce a | encoder, e.g., some implementation of [H264] or [H265], to produce a | |||
set of encoded sequences. As discussed in Section 3, the output | set of encoded sequences. As discussed in Section 3, the output | |||
bitrate of the live encoder can be achieved by tuning three input | bitrate of the live encoder can be achieved by tuning three input | |||
parameters: quantization step size, frame rate, and picture | parameters: quantization step size, frame rate, and picture | |||
resolution. In order to simplify the choice of these parameters for | resolution. In order to simplify the choice of these parameters for | |||
a given target rate, one can typically assume a fixed frame rate | a given target rate, one can typically assume a fixed frame rate | |||
(e.g. 30 fps) and a fixed resolution (e.g., 720p) when configuring | (e.g., 30 fps) and a fixed resolution (e.g., 720p) when configuring | |||
the live encoder. See Section 6.3 for a discussion on how to relax | the live encoder. See Section 6.3 for a discussion on how to relax | |||
these assumptions. | these assumptions. | |||
Following these simplifications, the chosen encoder can be configured | Following these simplifications, the chosen encoder can be configured | |||
to start at a constant target bitrate, then vary the quantization | to start at a constant target bitrate, then vary the quantization | |||
step size (internally via the video encoder rate controller) to meet | step size (internally via the video encoder rate controller) to meet | |||
various externally specified target rates. It can be further assumed | various externally specified target rates. It can be further assumed | |||
the first frame is encoded as an I-frame and the rest are P-frames | the first frame is encoded as an I-frame and the rest are P-frames | |||
(see, e.g., [H264] for definitions of I- and P-frames). For live | (see, e.g., [H264] for definitions of I-frames and P-frames). For | |||
encoding, the encoder rate control algorithm typically does not use | live encoding, the encoder rate-control algorithm typically does not | |||
knowledge of frames in the future when encoding a given frame. | use knowledge of frames in the future when encoding a given frame. | |||
Given the minimum and maximum bitrates at which the synthetic codec | Given the minimum and maximum bitrates at which the synthetic codec | |||
is to operate (denoted as R_min and R_max, see Section 4), the entire | is to operate (denoted as "R_min" and "R_max", see Section 4), the | |||
range of target bitrates can be divided into n_s steps. This leads | entire range of target bitrates can be divided into n_s steps. This | |||
to a encoding bitrate ladder of (n_s + 1) choices equally spaced | leads to an encoding bitrate ladder of (n_s + 1) choices equally | |||
apart by the step length l = (R_max - R_min)/n_s. The following | spaced apart by the step length l = (R_max - R_min)/n_s. The | |||
simple algorithm is used to encode the raw video sequence. | following simple algorithm is used to encode the raw video sequence. | |||
r = R_min | r = R_min | |||
while r <= R_max do | while r <= R_max do | |||
Traces[r] = encode_sequence(S, r, e) | Traces[r] = encode_sequence(S, r, e) | |||
r = r + l | r = r + l | |||
The function encode_sequence takes as input parameters, respectively, | The function encode_sequence takes as input parameters, respectively, | |||
a raw video sequence (S), a constant target rate (r), and an encoder | a raw video sequence (S), a constant target rate (r), and an encoder | |||
rate control algorithm (e); it returns a vector with the sizes of | rate-control algorithm (e); it returns a vector with the sizes of | |||
frames in the order they were encoded. The output vector is stored | frames in the order they were encoded. The output vector is stored | |||
in a map structure called Traces, whose keys are bitrates and whose | in a map structure called "Traces", whose keys are bitrates and whose | |||
values are vectors of frame sizes. | values are vectors of frame sizes. | |||
The choice of a value for the number of bitrate steps n_s is | The choice of a value for the number of bitrate steps n_s is | |||
important, since it determines the number of vectors of frame sizes | important, since it determines the number of vectors of frame sizes | |||
stored in the map Traces. The minimum value one can choose for n_s | stored in the map Traces. The minimum value one can choose for n_s | |||
is 1; the maximum value depends on the amount of memory available for | is 1; the maximum value depends on the amount of memory available for | |||
holding the map Traces. A reasonable value for n_s is one that | holding the map Traces. A reasonable value for n_s is one that | |||
results in steps of length l = 200 kbps. The next section will | results in steps of length l = 200 kbps. Section 6.2.2 will discuss | |||
discuss further the choice of step length l. | further the choice of step length l. | |||
Finally, note that, as mentioned in previous sections, R_min and | Finally, note that, as mentioned in previous sections, R_min and | |||
R_max may be modified after the initial sequences are encoded. | R_max may be modified after the initial sequences are encoded. | |||
Henceforth, for notational clarity, we refer to the bitrate range of | Henceforth, for notational clarity, we refer to the bitrate range of | |||
the trace file as [Rf_min, Rf_max]. The algorithm described in the | the trace file as [Rf_min, Rf_max]. The algorithm described in | |||
next section also covers the cases when the current target bitrate is | Section 6.2.1 also covers the cases when the current target bitrate | |||
less than Rf_min, or greater than Rf_max. | is less than Rf_min or greater than Rf_max. | |||
6.2. Using the traces in the synthetic codec | 6.2. Using the Traces in the Synthetic Codec | |||
The main idea behind the trace-driven synthetic codec is that it | The main idea behind the trace-driven synthetic codec is that it | |||
mimics the rate adaptation behavior of a real live codec upon dynamic | mimics the rate-adaptation behavior of a real live codec upon dynamic | |||
updates of the target bitrate request R_v by the congestion control | updates of the target bitrate request R_v by the congestion control | |||
module. It does so by switching to a different frame size vector | module. It does so by switching to a different frame-size vector | |||
stored in the map Traces when needed. | stored in the map Traces when needed. | |||
6.2.1. Main algorithm | 6.2.1. Main Algorithm | |||
The main algorithm for rate adaptation in the synthetic codec | The main algorithm for rate adaptation in the synthetic codec | |||
maintains two variables: r_current and t_current. | maintains two variables: r_current and t_current. | |||
o The variable r_current points to one of the keys of map Traces. | o The variable r_current points to one of the keys of map Traces. | |||
Upon a change in the value of R_v, typically because the | Upon a change in the value of R_v, typically because the | |||
congestion controller detects that the network conditions have | congestion controller detects that the network conditions have | |||
changed, r_current is updated based on R_v as follows: | changed, r_current is updated based on R_v as follows: | |||
R_ref = min (Rf_max, max(Rf_min, R_v)) | R_ref = min (Rf_max, max(Rf_min, R_v)) | |||
r_current = r | r_current = r | |||
such that | such that | |||
(r in keys(Traces) and | (r in keys(Traces) and | |||
r <= R_ref and | r <= R_ref and | |||
(not(exists) r' in keys(Traces) such that r <r'<= R_ref)) | (not(exists) r' in keys(Traces) such that r <r'<= R_ref)) | |||
o The variable t_current is an index to the frame size vector stored | o The variable t_current is an index to the frame-size vector stored | |||
in Traces[r_current]. It is updated every time a new frame is | in Traces[r_current]. It is updated every time a new frame is | |||
due. It is assumed that all vectors stored in Traces have the | due. It is assumed that all vectors stored in Traces have the | |||
same size, denoted as size_traces. The following equation governs | same size, denoted as "size_traces". The following equation | |||
the update of t_current: | governs the update of t_current: | |||
if t_current < SkipFrames then | if t_current < SkipFrames then | |||
t_current = t_current + 1 | t_current = t_current + 1 | |||
else | else | |||
t_current = ((t_current + 1 - SkipFrames) | t_current = ((t_current + 1 - SkipFrames) | |||
% (size_traces-SkipFrames)) + SkipFrames | % (size_traces-SkipFrames)) + SkipFrames | |||
where operator % denotes modulo, and SkipFrames is a predefined | where operator "%" denotes modulo, and SkipFrames is a predefined | |||
constant that denotes the number of frames to be skipped at the | constant that denotes the number of frames to be skipped at the | |||
beginning of frame size vectors after t_current has wrapped around. | beginning of frame-size vectors after t_current has wrapped around. | |||
The point of constant SkipFrames is avoiding the effect of | The point of constant SkipFrames is avoiding the effect of | |||
periodically sending a large I-frame followed by several smaller- | periodically sending a large I-frame followed by several smaller- | |||
than-average P-frames. A typical value of SkipFrames is 20, although | than-average P-frames. A typical value of SkipFrames is 20, although | |||
it could be set to 0 if one is interested in studying the effect of | it could be set to 0 if one is interested in studying the effect of | |||
sending I-frames periodically. | sending I-frames periodically. | |||
The initial value of r_current is set to R_min, and the initial value | The initial value of r_current is set to R_min, and the initial value | |||
of t_current is set to 0. | of t_current is set to 0. | |||
When a new frame is due, its size can be calculated following one of | When a new frame is due, its size can be calculated following one of | |||
the three cases below: | the three cases below: | |||
a) Rf_min <= R_v < Rf_max: the output frame size is calculated via | a) Rf_min <= R_v < Rf_max: The output frame size is calculated via | |||
linear interpolation of the frame sizes appearing in | linear interpolation of the frame sizes appearing in | |||
Traces[r_current] and Traces[r_current + l]. The interpolation is | Traces[r_current] and Traces[r_current + l]. The interpolation is | |||
done as follows: | done as follows: | |||
size_lo = Traces[r_current][t_current] | size_lo = Traces[r_current][t_current] | |||
size_hi = Traces[r_current + l][t_current] | size_hi = Traces[r_current + l][t_current] | |||
distance_lo = (R_v - r_current) / l | distance_lo = (R_v - r_current) / l | |||
framesize = size_hi*distance_lo + size_lo*(1-distance_lo) | framesize = size_hi*distance_lo + size_lo*(1-distance_lo) | |||
b) R_v < Rf_min: the output frame size is calculated via scaling | b) R_v < Rf_min: The output frame size is calculated via scaling | |||
with respect to the lowest bitrate Rf_min in the trace file, as | with respect to the lowest bitrate Rf_min in the trace file, as | |||
follows: | follows: | |||
w = R_v / Rf_min | w = R_v / Rf_min | |||
framesize = max(fs_min, factor * Traces[Rf_min][t_current]) | framesize = max(fs_min, factor * Traces[Rf_min][t_current]) | |||
c) R_v >= Rf_max: the output frame size is calculated by scaling | c) R_v >= Rf_max: The output frame size is calculated by scaling | |||
with respect to the highest bitrate Rf_max in the trace file, as | with respect to the highest bitrate Rf_max in the trace file, as | |||
follows: | follows: | |||
w = R_v / Rf_max | w = R_v / Rf_max | |||
framesize = min(fs_max, w * Traces[Rf_max][t_current]) | framesize = min(fs_max, w * Traces[Rf_max][t_current]) | |||
In cases b) and c), floating-point arithmetic is used for computing | In cases b) and c), floating-point arithmetic is used for computing | |||
the scaling factor w. The resulting value of the instantaneous frame | the scaling factor "w". The resulting value of the instantaneous | |||
size (framesize) is further clipped within a reasonable range between | frame size (framesize) is further clipped within a reasonable range | |||
fs_min (e.g., 10 bytes) and fs_max (e.g., 1MB). | between fs_min (e.g., 10 bytes) and fs_max (e.g., 1 MB). | |||
6.2.2. Notes to the main algorithm | 6.2.2. Notes to the Main Algorithm | |||
Note that the main algorithm as described above can be further | Note that the main algorithm as described above can be further | |||
extended to mimic some additional typical behaviors of a live video | extended to mimic some additional typical behaviors of a live video | |||
encoder. Two examples are given below: | encoder. Two examples are given below: | |||
o I-frames on demand: The synthetic codec can be extended to | o I-frames on demand: The synthetic codec can be extended to | |||
simulate the sending of I-frames on demand, e.g., as a reaction to | simulate the sending of I-frames on demand, e.g., as a reaction to | |||
losses. To implement this extension, the codec's incoming | losses. To implement this extension, the codec's incoming | |||
interface (see (a) in Figure 1) is augmented with a new function | interface (see (a) in Figure 1) is augmented with a new function | |||
to request a new I-frame. Upon calling such function, t_current | to request a new I-frame. Upon calling such function, t_current | |||
is reset to 0. | is reset to 0. | |||
o Variable step length l between R_min and R_max: In the main | o Variable step length l between R_min and R_max: In the main | |||
algorithm, the step length l is fixed for ease of explanation. | algorithm, the step length l is fixed for ease of explanation. | |||
However, if the range [R_min, R_max] is very wide, it is also | However, if the range [R_min, R_max] is very wide, it is also | |||
possible to define a set of intermediate encoding rates with | possible to define a set of intermediate encoding rates with | |||
variable step length. The rationale behind this modification is | variable step length. The rationale behind this modification is | |||
that the difference between 400 kbps and 600 kbps as target | that the difference between 400 and 600 kbps as target bitrate is | |||
bitrate is much more significant than the difference between 4400 | much more significant than the difference between 4400 kbps and | |||
kbps and 4600 kbps. For example, one could define steps of length | 4600 kbps. For example, one could define steps of length 200 kbps | |||
200 Kbps under 1 Mbps, then steps of length 300 Kbps between 1 | under 1 Mbps, then steps of length 300 kbps between 1 Mbps and 2 | |||
Mbps and 2 Mbps; 400 Kbps between 2 Mbps and 3 Mbps, and so on. | Mbps, then 400 kbps between 2 Mbps and 3 Mbps, and so on. | |||
6.3. Varying frame rate and resolution | 6.3. Varying Frame Rate and Resolution | |||
The trace-driven synthetic codec model explained in this section is | The trace-driven synthetic codec model explained in this section is | |||
relatively simple due to the choice of fixed frame rate and frame | relatively simple due to the choice of fixed frame rate and frame | |||
resolution. The model can be extended further to accommodate | resolution. The model can be extended further to accommodate | |||
variable frame rate and/or variable spatial resolution. | variable frame rate and/or variable spatial resolution. | |||
When the encoded picture quality at a given bitrate is low, one can | When the encoded picture quality at a given bitrate is low, one can | |||
potentially decrease either the frame rate (if the video sequence is | potentially decrease either the frame rate (if the video sequence is | |||
currently in low motion) or the spatial resolution in order to | currently in low motion) or the spatial resolution in order to | |||
improve quality-of-experience (QoE) in the overall encoded video. On | improve quality of experience (QoE) in the overall encoded video. On | |||
the other hand, if target bitrate increases to a point where there is | the other hand, if target bitrate increases to a point where there is | |||
no longer a perceptible improvement in the picture quality of | no longer a perceptible improvement in the picture quality of | |||
individual frames, then one might afford to increase the spatial | individual frames, then one might afford to increase the spatial | |||
resolution or the frame rate (useful if the video is currently in | resolution or the frame rate (useful if the video is currently in | |||
high motion). | high motion). | |||
Many techniques have been proposed to choose over time the best | Many techniques have been proposed to choose over time the best | |||
combination of encoder quantization step size, frame rate, and | combination of encoder-quantization step size, frame rate, and | |||
spatial resolution in order to maximize the quality of live video | spatial resolution in order to maximize the quality of live video | |||
codecs [Ozer2011][Hu2010]. Future work may consider extending the | codecs [Ozer2011] [Hu2012]. Future work may consider extending the | |||
trace-driven codec to accommodate variable frame rate and/or | trace-driven codec to accommodate variable frame rate and/or | |||
resolution. | resolution. | |||
From the perspective of congestion control, varying the spatial | From the perspective of congestion control, varying the spatial | |||
resolution typically requires a new intra-coded frame to be | resolution typically requires a new intra-coded frame to be | |||
generated, thereby incurring a temporary burst in the output traffic | generated, thereby incurring a temporary burst in the output traffic | |||
pattern. The impact of frame rate change tends to be more subtle: | pattern. The impact of frame-rate change tends to be more subtle: | |||
reducing frame rate from high to low leads to sparsely spaced larger | reducing frame rate from high to low leads to sparsely spaced larger | |||
encoded packets instead of many densely spaced smaller packets. Such | encoded packets instead of many densely spaced smaller packets. Such | |||
difference in traffic profiles may still affect the performance of | difference in traffic profiles may still affect the performance of | |||
congestion control, especially when outgoing packets are not paced by | congestion control, especially when outgoing packets are not paced by | |||
the media transport module. Investigation of varying frame rate and | the media transport module. Investigation of varying frame rate and | |||
resolution are left for future work. | resolution are left for future work. | |||
7. Combining The Two Models | 7. Combining the Two Models | |||
It is worthwhile noting that the statistical and trace-driven models | It is worthwhile noting that the statistical and trace-driven models | |||
each have their own advantages and drawbacks. Both models are fairly | each have their own advantages and drawbacks. Both models are fairly | |||
simple to implement. It takes significantly greater effort to fit | simple to implement. It takes significantly greater effort to fit | |||
the parameters of a statistical model to actual encoder output data. | the parameters of a statistical model to actual encoder output data. | |||
In contrast, it is straightforward for a trace-driven model to obtain | In contrast, it is straightforward for a trace-driven model to obtain | |||
encoded frame size data. Once validated, the statistical model is | encoded frame-size data. Once validated, the statistical model is | |||
more flexible in mimicking a wide range of encoder/content behaviors | more flexible in mimicking a wide range of encoder/content behaviors | |||
by simply varying the corresponding parameters in the model. In this | by simply varying the corresponding parameters in the model. In this | |||
regard, a trace-driven model relies -- by definition -- on additional | regard, a trace-driven model relies, by definition, on additional | |||
data collection efforts for accommodating new codecs or video | data-collection efforts for accommodating new codecs or video | |||
contents. | contents. | |||
In general, the trace-driven model is more realistic for mimicking | In general, the trace-driven model is more realistic for mimicking | |||
the ongoing, steady-state behavior of a video traffic source with | the ongoing steady-state behavior of a video traffic source with | |||
fluctuations around a constant target rate. In contrast, the | fluctuations around a constant target rate. In contrast, the | |||
statistical model is more versatile for simulating the behavior of a | statistical model is more versatile for simulating the behavior of a | |||
video stream in transient, such as when encountering sudden rate | video stream in transient, such as when encountering sudden rate | |||
changes. It is also possible to combine both methods into a hybrid | changes. It is also possible to combine both methods into a hybrid | |||
model. In this case, the steady-state behavior is driven by traces | model. In this case, the steady-state behavior is driven by traces | |||
during steady state and the transient-state behavior is driven by the | during steady state and the transient-state behavior is driven by the | |||
statistical model. | statistical model. | |||
transient +---------------+ | transient +---------------+ | |||
state | Generate next | | state | Generate next | | |||
skipping to change at page 15, line 28 ¶ | skipping to change at page 16, line 42 ¶ | |||
+-----------------+ / | frames | | +-----------------+ / | frames | | |||
R_v | Compare against | / +---------------+ | R_v | Compare against | / +---------------+ | |||
------>| previous |/ | ------>| previous |/ | |||
| target bitrate |\ | | target bitrate |\ | |||
+-----------------+ \ +---------------+ | +-----------------+ \ +---------------+ | |||
\ | Generate next | | \ | Generate next | | |||
+------>| frame from | | +------>| frame from | | |||
steady | trace | | steady | trace | | |||
state +---------------+ | state +---------------+ | |||
Figure 3: A hybrid video traffic model | Figure 3: A Hybrid Video Traffic Model | |||
As shown in Figure 3, the video traffic model operates in a transient | As shown in Figure 3, the video traffic model operates in a transient | |||
state if the requested target rate R_v is substantially different | state if the requested target rate R_v is substantially different | |||
from the previous target, or else it operates in steady state. | from the previous target; otherwise, it operates in a steady state. | |||
During the transient state, a total of K_d frames are generated by | During the transient state, a total of K_d frames are generated by | |||
the statistical model, resulting in one (1) big burst frame with size | the statistical model, resulting in one (1) big burst frame with size | |||
K_B followed by K_d-1 smaller frames. When operating at steady | K_B followed by K_d-1 smaller frames. When operating at steady | |||
state, the video traffic model simply generates a frame according to | state, the video traffic model simply generates a frame according to | |||
the trace-driven model given the target rate, while modulating the | the trace-driven model given the target rate while modulating the | |||
frame interval according to the distribution specified by the | frame interval according to the distribution specified by the | |||
statistical model. One example criterion for determining whether the | statistical model. One example criterion for determining whether the | |||
traffic model should operate in a transient state is whether the rate | traffic model should operate in a transient state is whether the rate | |||
change exceeds 10% of the previous target rate. Finally, as this | change exceeds 10% of the previous target rate. Finally, as this | |||
model follows transient-state behavior dictated by the statistical | model follows transient-state behavior dictated by the statistical | |||
model, upon a substantial rate change, the model will follow the | model, upon a substantial rate change, the model will follow the | |||
time-damping mechanism as defined in Section 5.1, which is governed | time-damping mechanism as defined in Section 5.1, which is governed | |||
by parameter tau_v. | by parameter tau_v. | |||
8. Implementation Status | 8. Reference Implementation | |||
The statistical, trace-driven, and hybrid models as described in this | The statistical, trace-driven, and hybrid models as described in this | |||
draft have been implemented as a stand-alone, platform-independent | document have been implemented as a stand-alone, platform-independent | |||
synthetic traffic source module. It can be easily integrated into | synthetic traffic source module. It can be easily integrated into | |||
network simulation platforms such as [ns-2] and [ns-3], as well as | network simulation platforms such as [ns-2] and [ns-3], as well as | |||
testbeds using a real network. The stand-alone traffic source module | testbeds using a real network. The stand-alone traffic source module | |||
is available as an open source implementation at [Syncodecs]. | is available as an open-source implementation at [Syncodecs]. | |||
9. IANA Considerations | 9. IANA Considerations | |||
There are no IANA impacts in this memo. | This document has no IANA actions. | |||
10. Security Considerations | 10. Security Considerations | |||
The synthetic video traffic models as described in this draft do not | The synthetic video traffic models as described in this document do | |||
impose any security threats. They are designed to mimic realistic | not impose any security threats. They are designed to mimic | |||
traffic patterns for evaluating candidate RTP-based congestion | realistic traffic patterns for evaluating candidate RTP-based | |||
control algorithms, so as to ensure stable operations of the network. | congestion control algorithms so as to ensure stable operations of | |||
It is RECOMMENDED that candidate algorithms be tested using the video | the network. It is RECOMMENDED that candidate algorithms be tested | |||
traffic models presented in this draft before wide deployment over | using the video traffic models presented in this document before wide | |||
the Internet. If the generated synthetic traffic flows are sent over | deployment over the Internet. If the generated synthetic traffic | |||
the Internet, they also need to be congestion controlled. | flows are sent over the Internet, they also need to be congestion | |||
controlled. | ||||
11. References | 11. References | |||
11.1. Normative References | 11.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
11.2. Informative References | 11.2. Informative References | |||
[H264] ITU-T Recommendation H.264, "Advanced video coding for | [H264] ITU-T, "Advanced video coding for generic audiovisual | |||
generic audiovisual services", May 2003, | services", Recommendation H.264, April 2017, | |||
<https://www.itu.int/rec/T-REC-H.264>. | <https://www.itu.int/rec/T-REC-H.264>. | |||
[HEVC] ITU-T Recommendation H.265, "High efficiency video | [H265] ITU-T, "High efficiency video coding", | |||
coding", April 2013, | Recommendation H.265, February 2018, | |||
<https://www.itu.int/rec/T-REC-H.265>. | <https://www.itu.int/rec/T-REC-H.265>. | |||
[Hu2010] Hu, H., Ma, Z., and Y. Wang, "Optimization of Spatial, | [Hu2012] Hu, H., Ma, Z., and Y. Wang, "Optimization of Spatial, | |||
Temporal and Amplitude Resolution for Rate-Constrained | Temporal and Amplitude Resolution for Rate-Constrained | |||
Video Coding and Scalable Video Adaptation", in Proc. 19th | Video Coding and Scalable Video Adaptation", Proc. 19th | |||
IEEE International Conference on Image | IEEE International Conference on Image Processing (ICIP), | |||
Processing, (ICIP'12), September 2012. | DOI 10.1109/ICIP.2012.6466960, September 2012. | |||
[IETF-Interim] | [IETF-Interim] | |||
Zhu, X., Mena, S., and Z. Sarker, "Update on RMCAT Video | Zhu, X., Mena, S., and Z. Sarker, "Update on RMCAT Video | |||
Traffic Model: Trace Analysis and Model Update", April | Traffic Model: Trace Analysis and Model Update", IETF | |||
2017, <https://www.ietf.org/proceedings/ | RMCAT Virtual Interim, April 2017, | |||
interim-2017-rmcat-01/slides/slides-interim-2017-rmcat-01- | <https://www.ietf.org/proceedings/interim-2017-rmcat- | |||
sessa-update-on-video-traffic-model-draft-00.pdf>. | 01/slides/slides-interim-2017-rmcat-01-sessa-update-on- | |||
video-traffic-model-draft-00.pdf>. | ||||
[ns-2] "The Network Simulator - ns-2", | [ns-2] "The Network Simulator - ns-2", December 2015, | |||
<http://www.isi.edu/nsnam/ns/>. | <https://nsnam.sourceforge.net/wiki/index.php/ | |||
User_Information>. | ||||
[ns-3] "The Network Simulator - ns-3", <https://www.nsnam.org/>. | [ns-3] "NS-3 Network Simulator", <https://www.nsnam.org/>. | |||
[Ozer2011] | [Ozer2011] Ozer, J., "Video Compression for Flash, Apple Devices and | |||
Ozer, J., "Video Compression for Flash, Apple Devices and | HTML5", Galax: Doceo Publishing, ISBN-13: 978-0976259503, | |||
HTML5", ISBN 13:978-0976259503, 2011. | 2011. | |||
[Papoulis] | [Papoulis] Papoulis, A. and S. Pillai, "Probability, Random Variables | |||
Papoulis, A., "Probability, Random Variables and | and Stochastic Processes", London: McGraw-Hill Europe, | |||
Stochastic Processes", 2002. | ISBN-13: 978-0071226615, 2002. | |||
[RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, | [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, | |||
"Codec Control Messages in the RTP Audio-Visual Profile | "Codec Control Messages in the RTP Audio-Visual Profile | |||
with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, | with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, | |||
February 2008, <https://www.rfc-editor.org/info/rfc5104>. | February 2008, <https://www.rfc-editor.org/info/rfc5104>. | |||
[Syncodecs] | [Syncodecs] | |||
Mena, S., D'Aronco, S., and X. Zhu, "Syncodecs: Synthetic | "Syncodecs: Synthetic codecs for evaluation of RMCAT | |||
codecs for evaluation of RMCAT work", | work", commit a92d6c8, May 2018, | |||
<https://github.com/cisco/syncodecs>. | <https://github.com/cisco/syncodecs>. | |||
[Tanwir2013] | [Tanwir2013] | |||
Tanwir, S. and H. Perros, "A Survey of VBR Video Traffic | Tanwir, S. and H. Perros, "A Survey of VBR Video Traffic | |||
Models", IEEE Communications Surveys and Tutorials, vol. | Models", IEEE Communications Surveys and Tutorials, Volume | |||
15, no. 5, pp. 1778-1802., October 2013. | 15, Issue 4, p. 1778-1802, | |||
DOI 10.1109/SURV.2013.010413.00071, January 2013. | ||||
Authors' Addresses | Authors' Addresses | |||
Xiaoqing Zhu | Xiaoqing Zhu | |||
Cisco Systems | Cisco Systems | |||
12515 Research Blvd., Building 4 | 12515 Research Blvd., Building 4 | |||
Austin, TX 78759 | Austin, TX 78759 | |||
USA | United States of America | |||
Email: xiaoqzhu@cisco.com | Email: xiaoqzhu@cisco.com | |||
Sergio Mena de la Cruz | Sergio Mena | |||
Cisco Systems | Cisco Systems | |||
EPFL, Quartier de l'Innovation, Batiment E | EPFL, Quartier de l'Innovation, Batiment E | |||
Ecublens, Vaud 1015 | Ecublens, Vaud 1015 | |||
Switzerland | Switzerland | |||
Email: semena@cisco.com | Email: semena@cisco.com | |||
Zaheduzzaman Sarker | Zaheduzzaman Sarker | |||
Ericsson AB | Ericsson AB | |||
Luleae, SE 977 53 | Torshamnsgatan 23 | |||
Stockholm, SE 164 83 | ||||
Sweden | Sweden | |||
Phone: +46 10 717 37 43 | Phone: +46 10 717 37 43 | |||
Email: zaheduzzaman.sarker@ericsson.com | Email: zaheduzzaman.sarker@ericsson.com | |||
End of changes. 111 change blocks. | ||||
227 lines changed or deleted | 227 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |